Publicacoes - INESC TEC

Publicações

Publicações por CRACS

2007

A study of structural properties on profiles HMMs

Autores
Bernardes, JulianaS.; Dávila, AlbertoM.R.; Costa, VitorSantos; Zaverucha, Gerson;

Publicação
CoRR

Abstract

2007

Prolog performance on larger datasets

Autores
Costa, VS;

Publicação
PRACTICAL ASPECTS OF DECLARATIVE LANGUAGES

Abstract
Declarative systems, such as logic programming, should be ideal to process large data sets efficiently. Unfortunately, the high-level nature of logic-based representations can cause inefficiencies, and may lead in some cases to unacceptable performance. We discuss how logic programming systems can accommodate large amounts of data in main memory. We use a number of real datasets to evaluate performance and discuss how a number of techniques can be used to improve memory scalabality for such datasets.

FecharLer Abstract

2007

Design, implementation, and evaluation of a dynamic compilation framework for the YAP system

Autores
da Silva, AF; Costa, VS;

Publicação
Logic Programming, Proceedings

Abstract
We propose dynamic compilation for Prolog, in the style of Just-In-Time compilers. Our approach adapts to the actual characteristics of the target program by (i) compiling only the parts of the program that are executed frequently, and (ii) adapting to actual call patterns. This allows aggressive optimization of the parts of the program that are really executed, and better informed heuristics to drive these optimizations. Our compiler does need to support all features in the language, only what is deemed important to performance. Complex execution patterns, such as the ones caused by error handling, may be left to the interpreter. On the other hand, compilation is now part of the run-time, and thus incurs run-time overheads. We have implemented dynamic compilation for YAP system. Our initial results suggest that dynamic compilation achieves very substantial performance improvements over the original interpreter, and that it can approach and even out-perform state-of-the-art native code systems. We believe that we have shown that dynamic compilation is worthwhile and fits naturally with Prolog execution.

FecharLer Abstract

2007

Improving model construction of profile HMMs for remote homology detection through structural alignment

Autores
Bernardes, JS; Davila, AM; Costa, VS; Zaverucha, G;

Publicação
BMC BIOINFORMATICS

Abstract
Background: Remote homology detection is a challenging problem in Bioinformatics. Arguably, profile Hidden Markov Models (pHMMs) are one of the most successful approaches in addressing this important problem. pHMM packages present a relatively small computational cost, and perform particularly well at recognizing remote homologies. This raises the question of whether structural alignments could impact the performance of pHMMs trained from proteins in the Twilight Zone, as structural alignments are often more accurate than sequence alignments at identifying motifs and functional residues. Next, we assess the impact of using structural alignments in pHMM performance. Results: We used the SCOP database to perform our experiments. Structural alignments were obtained using the 3DCOFFEE and MAMMOTH-mult tools; sequence alignments were obtained using CLUSTALW, TCOFFEE, MAFFT and PROBCONS. We performed leave-one-family-out cross-validation over super-families. Performance was evaluated through ROC curves and paired two tailed t-test. Conclusion: We observed that pHMMs derived from structural alignments performed significantly better than pHMMs derived from sequence alignment in low-identity regions, mainly below 20%. We believe this is because structural alignment tools are better at focusing on the important patterns that are more often conserved through evolution, resulting in higher quality pHMMs. On the other hand, sensitivity of these tools is still quite low for these low-identity regions. Our results suggest a number of possible directions for improvements in this area.

FecharLer Abstract

2007

An integrated approach to feature invention and model construction for drug activity prediction

Autores
Davis, J; Costa, VS; Ray, S; Page, D;

Publicação
ACM International Conference Proceeding Series

Abstract
We present a new machine learning approach for 3D-QSAR, the task of predicting binding affinities of molecules to target proteins based on 3D structure. Our approach predicts binding affinity by using regression on substructures discovered by relational learning. We make two contributions to the state-of-the-art. First, we use multiple-instance (MI) regression, which represents a molecule as a set of 3D conformations, to model activity. Second, the relational learning component employs the "Score As You Use" (SAYU) method to select substructures for their ability to improve the regression model. This is the first application of SAYU to multiple-instance, real-valued prediction. We evaluate our approach on three tasks and demonstrate that (i) SAYU outperforms standard coverage measures when selecting features for regression, (ii) the MI representation improves accuracy over standard single feature-vector encodings and (iii) combining SAYU with MI regression is more accurate for 3D-QSAR than either approach by itself.

FecharLer Abstract

2007

Change of Representation for Statistical Relational Learning

Autores
Davis, J; Ong, I; Struyf, J; Page, EBD; Costa, VS;

Publicação
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE

Abstract
Statistical relational learning (SRL) algorithms learn statistical models from relational data, such as that stored in a relational database. We previously introduced view learning for SRL, in which the view of a relational database can be automatically modified, yielding more accurate statistical models. The present paper presents SAYU-VISTA, an algorithm which advances beyond the initial view learning approach in three ways. First, it learns views that introduce new relational tables, rather than merely new fields for an existing table of the database. Second, new tables or new fields are not limited to being approximations to some target concept; instead, the new approach performs a type of predicate invention. The new approach avoids the classical problem with predicate invention, of learning many useless predicates, by keeping only new fields or tables (i.e., new predicates) that immediately improve the performance of the statistical model. Third, retained fields or tables can then be used in the definitions of further new fields or tables. We evaluate the new view learning approach on three relational classification tasks.

FecharLer Abstract