Publications

Publications by Rui Camacho

2009

Partitional Clustering of Protein Sequences - An Inductive Logic Programming Approach

Authors
Fonseca, NA; Costa, VS; Camacho, R; Vieira, C; Vieira, J;

Publication
DISTRIBUTED COMPUTING, ARTIFICIAL INTELLIGENCE, BIOINFORMATICS, SOFT COMPUTING, AND AMBIENT ASSISTED LIVING, PT II, PROCEEDINGS

Abstract
We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that; the method proposed Produces understand able descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.

CloseRead Abstract

2008

k-RNN: k-Relational Nearest Neighbour Algorithm

Authors
Fonseca, NA; Costa, VS; Rocha, R; Camacho, R;

Publication
APPLIED COMPUTING 2008, VOLS 1-3

Abstract
The amount of data collected and stored in databases is growing considerably in almost all areas of human activity. In complex applications the data involves several relations and proposionalization is not a suitable approach. Multi-Relational Data Mining algorithms can analyze data from multiple relations, with no need to transform the data into a single table, but are computationally more expensive. In this paper a novel relational classification algorithm based on the k-nearest neighbour algorithm is presented and evaluated.

CloseRead Abstract

2008

ILP - Just Trie it

Authors
Camacho, R; Fonseca, NA; Rocha, R; Costa, VS;

Publication
INDUCTIVE LOGIC PROGRAMMING

Abstract
Despite the considerable success of Inductive Logic Programming (ILP), deployed ILP systems still have efficiency problems when applied to complex problems. Several techniques have been proposed to address the efficiency issue. Such proposals include query transformations, query packs, lazy evaluation and parallel execution of ILP systems, to mention just a few. We propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique takes advantage of the two stage procedure of Mode Directed Inverse Entailment (MDIE) systems. In the first stage of a MDIE system, where the bottom clause is constructed, we store not only the bottom clause but also valuable additional information. The information stored is sufficient to evaluate the clauses constructed in the second stage without the need for a theorem prover. We used a data structure called Trie to efficiently store all bottom clauses produced using all examples (positive and negative) as seeds. The technique was implemented and evaluated using two well known data sets from the ILP literature. The results are promising both in terms of execution time and accuracy.

CloseRead Abstract

2008

Compile the Hypothesis Space: Do it Once, Use it Often

Authors
Fonseca, NA; Camacho, R; Rocha, R; Costa, VS;

Publication
FUNDAMENTA INFORMATICAE

Abstract
Inductive Logic Programming (ILP) is a powerful and well-developed abstraction for multi-relational data mining techniques. Despite the considerable success of ILP, deployed ILP systems still have efficiency problems when applied to complex problems. In this paper we propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique is based on the Mode Directed Inverse Entailment approach to ILP, where a bottom clause is generated for each example and the generated clauses are subsets of the literals of such bottom clause. We propose to store in a prefix-tree all clauses that can be generated from all bottom clauses together with some extra information. We show that this information is sufficient to estimate the number of examples that can be deduced from a clause and present an ILP algorithm that exploits this representation. We also present an extension of the algorithm where each prefix-tree is computed only once (compiled) per example. The evaluation of hypotheses requires only basic and efficient operations on trees. This proposal avoids re-computation of hypothesis' value in theory-level search, in cross-validation evaluation procedures and in parameter tuning. Both proposals are empirically evaluated on real applications and considerable speedups were observed.

CloseRead Abstract

2006

A pipelined data-parallel algorithm for ILP

Authors
Fonseca, NA; Silva, F; Costa, VS; Camacho, R;

Publication
2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER)

Abstract
The amount of data collected and stored in databases is growing considerably for almost all areas of human activity. Processing this amount of data is very expensive, both humanly and computationally. This justifies the increased interest both on the automatic discovery of useful knowledge from databases, and on using parallel processing for this task. Multi Relational Data Mining (MRDM) techniques, such as Inductive Logic Programming (ILP), can learn rules from relational databases consisting of multiple tables. However current ILP systems are designed to run in main memory and can have long running times. We propose a pipelined data-parallel algorithm for ILP. The algorithm was implemented and evaluated on a commodity PC cluster with 8 processors. The results show that our algorithm yields excellent speedups, while preserving the quality of learning.

CloseRead Abstract

2006

April - An inductive logic programming system

Authors
Fonseca, NA; Silva, F; Camacho, R;

Publication
LOGICS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS

Abstract
Inductive Logic Programming (ILP) is a Machine Learning research field that has been quite successful in knowledge discovery in relational domains. ILP systems use a set of pre-classified examples (positive and negative) and prior knowledge to learn a theory in which positive examples succeed and the negative examples fail. In this paper we present a novel ILP system called April, capable of exploring several parallel strategies in distributed and shared memory machines.

CloseRead Abstract