Publications

Publications by Carlos Manuel Soares

2003

Is the UCI repository useful for data mining?

Authors
Soares, C;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
We propose a methodology to investigate the relevance for the real world of repositories of benchmark problems like the one commonly known as the UCI repository. It compares the distribution of relative performance of algorithms in data sets from a given repository and from the "real world". If the distributions are different, the knowledge about the relative performance of algorithms obtained from the repository in question is mostly useless. In the case of the UCI repository, this would mean that a significant proportion of published results would be of little practical use. However, this is not what our results indicate. We also propose an adaptation of this method to test whether tool developers are "overfitting" repositories, which also yields negative results in the UCI repository.

CloseRead Abstract

2001

Sampling-based relative landmarks: Systematically test-driving algorithms before choosing

Authors
Soares, C; Petrak, J; Brazdil, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures. © Springer-Verlag Berlin Heidelberg 2001.

CloseRead Abstract

2002

A comparative study of some issues concerning algorithm recommendation using ranking methods

Authors
Soares, C; Brazdil, P;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS

Abstract
Cross-validation (CV) is the most accurate method available for algorithm recommendation but it is rather slow. We show that information about the past performance of algorithms can be used for the same purpose with small loss in accuracy and significant savings in experimentation time. We use a meta-learning framework that combines a simple IBL algorithm with a ranking method. We show that results improve significantly by using a set of selected measures that represent data characteristics that permit to predict algorithm performance. Our results also indicate that the choice of ranking method as a smaller effect on the quality of recommendations. Finally, we present situations that illustrate the advantage of providing recommendation as a ranking of the candidate algorithms, rather than as the single algorithm which is expected to perform best.

CloseRead Abstract

2005

A weighted rank measure of correlation

Authors
Da Costa, JP; Soares, C;

Publication
AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS

Abstract
Spearman's rank correlation coefficient is not entirely suitable for measuring the correlation between two rankings in some applications because it treats all ranks equally. In 2000, Blest proposed an alternative measure of correlation that gives more importance to higher ranks but has some drawbacks. This paper proposes a weighted rank measure of correlation that weights the distance between two ranks using a linear function of those ranks, giving more importance to higher ranks than lower ones. It analyses its distribution and provides a table of critical values to test whether a given value of the coefficient is significantly different from zero. The paper also summarizes a number of applications for which the new measure is more suitable than Spearman's.

CloseRead Abstract

2012

Meta-learning for periodic algorithm selection in time-changing data

Authors
Rossi, ALD; Carvalho, ACPLF; Soares, C;

Publication
Proceedings - Brazilian Symposium on Neural Networks, SBRN

Abstract
When users have to choose a learning algorithm to induce a model for a given dataset, a common practice is to select an algorithm whose bias suits the data distribution. In real-world applications that produce data continuously this distribution may change over time. Thus, a learning algorithm with the adequate bias for a dataset may become unsuitable for new data following a different distribution. In this paper we present a meta-learning approach for periodic algorithm selection when data distribution may change over time. This approach exploits the knowledge obtained from the induction of models for different data chunks to improve the general predictive performance. It periodically applies a meta-classifier to predict the most appropriate learning algorithm for new unlabeled data. Characteristics extracted from past and incoming data, together with the predictive performance from different models, constitute the meta-data, which is used to induce this meta-classifier. Experimental results using data of a travel time prediction problem show its ability to improve the general performance of the learning system. The proposed approach can be applied to other time-changing tasks, since it is domain independent. © 2012 IEEE.

CloseRead Abstract

2012

Combining meta-learning with multi-objective particle swarm algorithms for svm parameter selection: An experimental analysis

Authors
Miranda, PBC; Prudencio, RBC; Carvalho, ACPLF; Soares, C;

Publication
Proceedings - Brazilian Symposium on Neural Networks, SBRN

Abstract
Support Vector Machines (SVMs) have become a well succeeded technique due to the good performance it achieves on different learning problems. However, the SVM performance depends on adjustments of its parameters' values. The automatic SVM parameter selection is treated by many authors as an optimization problem whose goal is to find a suitable configuration of parameters for a given learning problem. This work performs a comparative study of combining Meta-Learning (ML) and Multi-Objective Particle Swarm Optimization (MOPSO) techniques for the SVM parameter selection problem. In this combination, configurations of parameters provided by ML are adopted as initial search points of the MOPSO techniques. Our hypothesis is that, starting the search with reasonable solutions will speed up the process performed by the MOPSO techniques. In our work, we implemented three MOPSO techniques applied to select two SVM parameters for classification. Our work's aim is to optimize the SVMs by seeking for configurations of parameters which maximize the success rate and minimize the number of support vectors (i.e., two objetive functions). In the experiments, the performance of the search algorithms using a traditional random initialization was compared to the performance achieved by initializing the search process using the ML suggestions. We verified that the combination of the techniques with ML obtained solutions with higher quality on a set of 40 classification problems. © 2012 IEEE.

CloseRead Abstract