Publications

Publications by Vítor Santos Costa

2011

Trebuchet: Exploring TLP with dataflow virtualisation

Authors
Alves, TAO; Marzulo, LAJ; Franca, FMG; Costa, VS;

Publication
International Journal of High Performance Systems Architecture

Abstract
Parallel programming has become mandatory to fully exploit the potential of multi-core CPUs. The dataflow model provides a natural way to exploit parallelism. However, specifying dependences and control using fine-grained instructions in dataflow programs can be complex and present unwanted overheads. To address this issue, we have designed TALM: a coarse-grained dataflow execution model to be used on top of widespread architectures. We implemented TALM as the Trebuchet virtual machine for multi-cores. The programmer identifies code blocks that can run in parallel and connects them to form a dataflow graph, which allows one to have the benefits of parallel dataflow execution in a Von Neumann machine, with small programming effort. We parallelised a set of seven applications using our approach and compared with OpenMP implementations. Results show that Trebuchet can be competitive with state-of-the-art technology, while providing the benefits of dataflow execution. Copyright © 2011 Inderscience Enterprises Ltd.

CloseRead Abstract

2007

Improving model construction of profile HMMs for remote homology detection through structural alignment

Authors
Bernardes, JS; Davila, AM; Costa, VS; Zaverucha, G;

Publication
BMC BIOINFORMATICS

Abstract
Background: Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results: We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion: We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

CloseRead Abstract

2007

An integrated approach to feature invention and model construction for drug activity prediction

Authors
Davis, J; Costa, VS; Ray, S; Page, D;

Publication
ACM International Conference Proceeding Series

Abstract
We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational learning. We make two contributions to the state-of-the-art. First, we use multiple-instance (MI) regression, which represents a molecule as a set of 3D conformations, to model activity. Second, the relational learning component employs the "Score As You Use" (SAYU) method to select substructures for their ability to improve the regression model. This is the first application of SAYU to multiple-instance, real-valued prediction. We evaluate our approach on three tasks and demonstrate that (i) SAYU outperforms standard coverage measures when selecting features for regression, (ii) the MI representation improves accuracy over standard single feature-vector encodings and (iii) combining SAYU with MI regression is more accurate for 3D-QSAR than either approach by itself.

CloseRead Abstract

2012

Scheduling cyclic task graphs with SCC-map

Authors
Sardinha, A; Alves, TAO; Marzulo, LAJ; Franca, FMG; Barbosa, VC; Costa, VS;

Publication
Proceedings - 3rd Workshop on Applications for Multi-Core Architecture, WAMCA 2012

Abstract
The Dataflow execution model has been shown to be a good way of exploiting TLP, making parallel programming easier. In this model, tasks must be mapped to processing elements (PEs) considering the trade-off between communication and parallelism. Previous work on scheduling dependency graphs have mostly focused on directed a cyclic graphs, which are not suitable for dataflow (loops in the code become cycles in the graph). Thus, we present the SCC-Map: a novel static mapping algorithm that considers the importance of cycles during the mapping process. To validate our approach, we ran a set of benchmarks in on our dataflow simulator varying the communication latency, the number of PEs in the system and the placement algorithm. Our results show that the benchmark programs run significantly faster when mapped with SCC-Map. Moreover, we observed that SCC-Map is more effective than the other mapping algorithms when communication latency is higher. © 2012 IEEE.

CloseRead Abstract

2010

TALM: A hybrid execution model with distributed speculation support

Authors
Marzulo, LAJ; Alves, TAO; Franc, FMG; Costa, VS;

Publication
Proceedings - 22nd International Symposium on Computer Architecture and High Performance Computing Workshops, SBAC-PADW 2010, 1st Workshop on Applications for Multi and Many Core Architectures, WAMMCA

Abstract
Parallel programming has become mandatory to fully exploit the potential of modern CPUs. The data-flow model provides a natural way to exploit parallelism. However, traditional data-flow programming is not trivial: specifying dependencies and control using fine-grained tasks (such as instructions) can be complex and present unwanted overheads. To address this issue we have built a coarse-grained data-flow model with speculative execution support to be used on top of widespread architectures, implemented as a hybrid Von Neumanm/data-flow execution system. We argue that speculative execution fits naturally with the data-flow model. Using speculative execution liberates the programmer to consider only the main dependencies, and still allows correct data-flow execution of coarse-grained tasks. More- over, our speculation mechanism does not demand centralised control, which is a key feature for upcoming many-core systems, where scalability has become an important concern. An initial study on a artificial bank server application suggests that there is a wide range of scenarios where speculation can be very effective. © 2010 IEEE,.

CloseRead Abstract

2005

ReGS: User-level reliability in a grid environment

Authors
Sanches, JAL; Vargas, PK; De Dutra, IC; Costa, VS; Geyer, CFR;

Publication
2005 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2005

Abstract
Grid environments are ideal for executing applications that require a huge amount of computational work, both due to the big number of tasks to execute and to the large amount of data to be analysed. Unfortunately, current tools may require that users deal themselves with corrupted outputs or early termination of tasks. This becomes incovenient as the number of parallel runs grows to easily exceed the thousands. ReCS is a user-level software designed to provide automatic detection and restart of corrupted or early terminated tasks. ReGS uses a web interface to allow the setup and control of grid execution, and provides automatic input data setup. ReGS allows the automatic detection of job dependencies, through the GRID-ADL task management language. Our results show that besides automatically and effectively managing a huge number of tasks in grid environments, ReGS is also a good monitoring tool to spot grid nodes pitfalls. © 2005 IEEE.

CloseRead Abstract