Publications

Publications by Rui Camacho

2009

Improving the efficiency of inductive logic programming systems

Authors
Fonseca, NA; Costa, VS; Rocha, R; Camacho, R; Silva, F;

Publication
SOFTWARE-PRACTICE & EXPERIENCE

Abstract
Inductive logic programming (ILP) is a sub-field of machine learning that provides an excellent framework for multi-relational data mining applications. The advantages of ILP have been successfully demonstrated in complex and relevant industrial and scientific problems. However, to produce valuable models, ILP systems often require long running times and large amounts of memory. In this paper we address fundamental issues that have direct impact on the efficiency of ILP systems. Namely, we discuss how improvements in the indexing mechanisms of an underlying logic programming system benefit ILP performance. Furthermore, we propose novel data structures to reduce memory requirements and we suggest a new lazy evaluation technique to search the hypothesis space more efficiently. These proposals have been implemented in the April ILP system and evaluated using several well-known data sets. The results observed show significant improvements in running time without compromising the accuracy of the models generated. Indeed, the combined techniques achieve several order of magnitudes speedup in some data sets. Moreover, memory requirements are reduced in nearly half of the data sets. Copyright (C) 2008 John Wiley & Sons, Ltd.

CloseRead Abstract

2009

Parallel ILP for distributed-memory architectures

Authors
Fonseca, NA; Srinivasan, A; Silva, F; Camacho, R;

Publication
MACHINE LEARNING

Abstract
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found.

CloseRead Abstract

2003

Experimental evaluation of a caching technique for ILP

Authors
Fonseca, N; Costa, VS; Silva, F; Camacho, R;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract

2003

Efficient data structures for inductive logic programming

Authors
Fonseca, N; Rocha, R; Camacho, R; Silva, F;

Publication
INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS

Abstract
This work aims at improving the scalability of memory usage in Inductive Logic Programming systems. In this context, we propose two efficient data structures: the Trie, used to represent lists and clauses; and the RL-Tree, a novel data structure used to represent the clauses coverage. We evaluate their performance in the April system using well known datasets. Initial results show a substantial reduction in memory usage without incurring extra execution time overheads. Our proposal is applicable in any ILP system.

CloseRead Abstract

2004

On avoiding redundancy in inductive logic programming

Authors
Fonseca, N; Costa, VS; Silva, F; Camacho, R;

Publication
INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS

Abstract
ILP systems induce first-order clausal theories performing a search through very large hypotheses spaces containing redundant hypotheses. The generation of redundant hypotheses may prevent the systems from finding good models and increases the time to induce them. In this paper we propose a classification of hypotheses redundancy and show how expert knowledge can be provided to an ILP system to avoid it. Experimental results show that the number of hypotheses generated and execution time are reduced when expert knowledge is used to avoid redundancy.

CloseRead Abstract

2005

Strategies to parallelize ILP systems

Authors
Fonseca, NA; Silva, F; Camacho, R;

Publication
INDUCTIVE LOGIC PROGRAMMING, PROCEEDINGS

Abstract
It is well known by Inductive Logic Programming (ILP) practioners that ILP systems usually take a long time to find valuable models (theories). The problem is specially critical for large datasets, preventing ILP systems to scale up to larger applications. One approach to reduce the execution time has been the parallelization of ILP systems. In this paper we overview the state-of-the-art on parallel ILP implementations and present work on the evaluation of some major parallelization strategies for ILP. Conclusions about the applicability of each strategy are presented.

CloseRead Abstract