Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Vítor Santos Costa

2003

Toward automatic management of embarrassingly parallel applications

Authors
Dutra, I; Page, D; Costa, VS; Shavlik, J; Waddell, M;

Publication
EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS

Abstract
Large-scale applications that require executing very large numbers of tasks are only feasible through parallelism. In this work we present a system that automatically handles large numbers of experiments and data in the context of machine learning. Our system controls all experiments, including re-submission of failed jobs and relies on available resource managers to spawn jobs through pools of machines. Our results show that we can manage a very large number of experiments, using a reasonable amount of idle CPU cycles, with very little user intervention.

2003

An empirical evaluation of bagging in inductive logic programming

Authors
De Dutra, IC; Page, D; Costa, VS; Shavlik, J;

Publication
Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Abstract
Ensembles have proven useful for a variety of applications, with a variety of machine learning approaches. While Quinlan has applied boosting to FOIL, the widely-used approach of bagging has never been employed in ILP. Bagging has the advantage over boosting that the different members of the ensemble can be learned and used in parallel. This advantage is especially important for ILP where run-times often are high. We evaluate bagging on three different application domains using the complete-search ILP system, Aleph. We contrast bagging with an approach where we take advantage of the non-determinism in ILP search, by simply allowing Aleph to run multiple times, each time choosing "seed" examples at random.

2011

Trebuchet: Exploring TLP with dataflow virtualisation

Authors
Alves, TAO; Marzulo, LAJ; Franca, FMG; Costa, VS;

Publication
International Journal of High Performance Systems Architecture

Abstract
Parallel programming has become mandatory to fully exploit the potential of multi-core CPUs. The dataflow model provides a natural way to exploit parallelism. However, specifying dependences and control using fine-grained instructions in dataflow programs can be complex and present unwanted overheads. To address this issue, we have designed TALM: a coarse-grained dataflow execution model to be used on top of widespread architectures. We implemented TALM as the Trebuchet virtual machine for multi-cores. The programmer identifies code blocks that can run in parallel and connects them to form a dataflow graph, which allows one to have the benefits of parallel dataflow execution in a Von Neumann machine, with small programming effort. We parallelised a set of seven applications using our approach and compared with OpenMP implementations. Results show that Trebuchet can be competitive with state-of-the-art technology, while providing the benefits of dataflow execution. Copyright © 2011 Inderscience Enterprises Ltd.

2007

Improving model construction of profile HMMs for remote homology detection through structural alignment

Authors
Bernardes, JS; Davila, AM; Costa, VS; Zaverucha, G;

Publication
BMC BIOINFORMATICS

Abstract
Background: Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results: We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion: We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

2007

An integrated approach to feature invention and model construction for drug activity prediction

Authors
Davis, J; Costa, VS; Ray, S; Page, D;

Publication
ACM International Conference Proceeding Series

Abstract
We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational learning. We make two contributions to the state-of-the-art. First, we use multiple-instance (MI) regression, which represents a molecule as a set of 3D conformations, to model activity. Second, the relational learning component employs the "Score As You Use" (SAYU) method to select substructures for their ability to improve the regression model. This is the first application of SAYU to multiple-instance, real-valued prediction. We evaluate our approach on three tasks and demonstrate that (i) SAYU outperforms standard coverage measures when selecting features for regression, (ii) the MI representation improves accuracy over standard single feature-vector encodings and (iii) combining SAYU with MI regression is more accurate for 3D-QSAR than either approach by itself.

2012

Scheduling cyclic task graphs with SCC-map

Authors
Sardinha, A; Alves, TAO; Marzulo, LAJ; Franca, FMG; Barbosa, VC; Costa, VS;

Publication
Proceedings - 3rd Workshop on Applications for Multi-Core Architecture, WAMCA 2012

Abstract
The Dataflow execution model has been shown to be a good way of exploiting TLP, making parallel programming easier. In this model, tasks must be mapped to processing elements (PEs) considering the trade-off between communication and parallelism. Previous work on scheduling dependency graphs have mostly focused on directed a cyclic graphs, which are not suitable for dataflow (loops in the code become cycles in the graph). Thus, we present the SCC-Map: a novel static mapping algorithm that considers the importance of cycles during the mapping process. To validate our approach, we ran a set of benchmarks in on our dataflow simulator varying the communication latency, the number of PEs in the system and the placement algorithm. Our results show that the benchmark programs run significantly faster when mapped with SCC-Map. Moreover, we observed that SCC-Map is more effective than the other mapping algorithms when communication latency is higher. © 2012 IEEE.

  • 19
  • 34