Publicacoes - INESC TEC

Publicações

Publicações por Rui Camacho

2011

From Sequences to Papers: An Information Retrieval Exercise

Autores
Gonçalves, CT; Camacho, R; Oliveira, EC;

Publicação
Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, Vancouver, BC, Canada, December 11, 2011

Abstract
Whenever new sequences of DNA or proteins have been decoded it is almost compulsory to look at similar sequences and papers describing those sequences in order to both collect relevant information concerning the function and activity of the new sequences and/or know what is known already about similar sequences that might be useful in the explanation of the function or activity of the newly discovered ones. In current web sites and data bases of sequences there are, usually, a set of paper references linked to each sequence. Those links are very useful because the papers describe useful information concerning the sequences. They are, therefore, a good starting point to look for relevant information related to a set of sequences. One way is to implement such approach is to do a blast with the new decoded sequences, and collect similar sequences. Then one looks at the papers linked with the similar sequences. Most often the number of retrieved papers is small and one has to search large data bases for relevant papers. In this paper we propose a process of generating a classifier based on the initially set of relevant papers that are directly linked to the similar sequences retrieved and use that classifier to automatically enlarge the set of relevant papers by searching the MEDLINE using the automatically constructed classifier. We have empirically evaluated our proposal and report very promising results. © 2011 IEEE.

FecharLer Abstract

2007

Distributed generative data mining

Autores
Ramos, R; Camacho, R;

Publicação
ADVANCES IN DATA MINING: THEORETICAL ASPECTS AND APPLICATIONS, PROCEEDINGS

Abstract
A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing the whole KDD process before execution starts. In this paper we propose a technique that we call Distributed Generative Data Mining. The generative feature of the technique is due to its capability of generating new sub-tasks of the Data Mining analysis process at execution time. The workflow of sub-tasks of the DM is, therefore, dynamic. To deploy the proposed technique we extended the Distributed Data Mining system HARVARD and adapted an Inductive Logic Programming system (IndLog) used in a Relational Data Ming task. As a proof-of-concept, the extended system was used to analyse an artificial dataset of a credit scoring problem with eighty million records.

FecharLer Abstract

2005

Topic 5 - Parallel and Distributed Databases, Data Mining and Knowledge Discovery

Autores
Talia, D; Kargupta, H; Valduriez, P; Camacho, R;

Publicação
Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30 - September 2, 2005, Proceedings

Abstract

2003

Improving the efficiency of ILP systems

Autores
Camacho, R;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Inductive Logic Programming (ILP) is a promising technology for knowledge extraction applications. ILP has produced intelligible solutions for a wide variety of domains where it has been applied. The ILP lack of efficiency is, however, a major impediment for its scalability to applications requiring large amounts of data. In this paper we propose a set of techniques that improve ILP systems efficiency and make then more likely to scale up to applications of knowledge extraction from large datasets. We propose and evaluate the lazy evaluation of examples, to improve the efficiency of ILP systems. Lazy evaluation is essentially a way to avoid or postpone the evaluation of the generated hypotheses (coverage tests). The techniques were evaluated using the IndLog system on ILP datasets referenced in the literature. The proposals lead to substantial efficiency improvements and are generally applicable to any ILP system.

FecharLer Abstract

2004

Preface

Autores
Camacho, R; King, R; Srinivasan, A;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

1994

Building symbolic representations of intuitive real-time skills from performance data

Autores
Michie, D; Camacho, R;

Publicação
Machine Intelligence 13

Abstract