Publicacoes - INESC TEC

Publicações

Publicações por Carlos Manuel Soares

2001

Sampling-based relative landmarks: Systematically test-driving algorithms before choosing

Autores
Soares, C; Petrak, J; Brazdil, P;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures. © Springer-Verlag Berlin Heidelberg 2001.

FecharLer Abstract

2002

A comparative study of some issues concerning algorithm recommendation using ranking methods

Autores
Soares, C; Brazdil, P;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS

Abstract
Cross-validation (CV) is the most accurate method available for algorithm recommendation but it is rather slow. We show that information about the past performance of algorithms can be used for the same purpose with small loss in accuracy and significant savings in experimentation time. We use a meta-learning framework that combines a simple IBL algorithm with a ranking method. We show that results improve significantly by using a set of selected measures that represent data characteristics that permit to predict algorithm performance. Our results also indicate that the choice of ranking method as a smaller effect on the quality of recommendations. Finally, we present situations that illustrate the advantage of providing recommendation as a ranking of the candidate algorithms, rather than as the single algorithm which is expected to perform best.

FecharLer Abstract

2005

A weighted rank measure of correlation

Autores
Da Costa, JP; Soares, C;

Publicação
AUSTRALIAN & NEW ZEALAND JOURNAL OF STATISTICS

Abstract
Spearman's rank correlation coefficient is not entirely suitable for measuring the correlation between two rankings in some applications because it treats all ranks equally. In 2000, Blest proposed an alternative measure of correlation that gives more importance to higher ranks but has some drawbacks. This paper proposes a weighted rank measure of correlation that weights the distance between two ranks using a linear function of those ranks, giving more importance to higher ranks than lower ones. It analyses its distribution and provides a table of critical values to test whether a given value of the coefficient is significantly different from zero. The paper also summarizes a number of applications for which the new measure is more suitable than Spearman's.

FecharLer Abstract

2012

Meta-learning for periodic algorithm selection in time-changing data

Autores
Rossi, ALD; Carvalho, ACPLF; Soares, C;

Publicação
Proceedings - Brazilian Symposium on Neural Networks, SBRN

Abstract
When users have to choose a learning algorithm to induce a model for a given dataset, a common practice is to select an algorithm whose bias suits the data distribution. In real-world applications that produce data continuously this distribution may change over time. Thus, a learning algorithm with the adequate bias for a dataset may become unsuitable for new data following a different distribution. In this paper we present a meta-learning approach for periodic algorithm selection when data distribution may change over time. This approach exploits the knowledge obtained from the induction of models for different data chunks to improve the general predictive performance. It periodically applies a meta-classifier to predict the most appropriate learning algorithm for new unlabeled data. Characteristics extracted from past and incoming data, together with the predictive performance from different models, constitute the meta-data, which is used to induce this meta-classifier. Experimental results using data of a travel time prediction problem show its ability to improve the general performance of the learning system. The proposed approach can be applied to other time-changing tasks, since it is domain independent. © 2012 IEEE.

FecharLer Abstract

2012

Combining meta-learning with multi-objective particle swarm algorithms for svm parameter selection: An experimental analysis

Autores
Miranda, PBC; Prudencio, RBC; Carvalho, ACPLF; Soares, C;

Publicação
Proceedings - Brazilian Symposium on Neural Networks, SBRN

Abstract
Support Vector Machines (SVMs) have become a well succeeded technique due to the good performance it achieves on different learning problems. However, the SVM performance depends on adjustments of its parameters' values. The automatic SVM parameter selection is treated by many authors as an optimization problem whose goal is to find a suitable configuration of parameters for a given learning problem. This work performs a comparative study of combining Meta-Learning (ML) and Multi-Objective Particle Swarm Optimization (MOPSO) techniques for the SVM parameter selection problem. In this combination, configurations of parameters provided by ML are adopted as initial search points of the MOPSO techniques. Our hypothesis is that, starting the search with reasonable solutions will speed up the process performed by the MOPSO techniques. In our work, we implemented three MOPSO techniques applied to select two SVM parameters for classification. Our work's aim is to optimize the SVMs by seeking for configurations of parameters which maximize the success rate and minimize the number of support vectors (i.e., two objetive functions). In the experiments, the performance of the search algorithms using a traditional random initialization was compared to the performance achieved by initializing the search process using the ML suggestions. We verified that the combination of the techniques with ML obtained solutions with higher quality on a set of 40 classification problems. © 2012 IEEE.

FecharLer Abstract

2010

Combining meta-learning and search techniques to SVM parameter selection

Autores
Gomes, TAF; Prudencio, RBC; Soares, C; Rossi, ALD; Carvalho, A;

Publicação
Proceedings - 2010 11th Brazilian Symposium on Neural Networks, SBRN 2010

Abstract
Support Vector Machines (SVMs) have achieved very good performance on different learning problems. However, the success of SVMs depends on the adequate choice of a number of parameters, including for instance the kernel and the regularization parameters. In the current work, we propose the combination of Meta-Learning and search techniques to the problem of SVM parameter selection. Given an input problem, Meta-Learning is used to recommend SVM parameters based on well-succeeded parameters adopted in previous similar problems. The parameters returned by Meta-Learning are then used as initial search points to a search technique which will perform a further exploration of the parameter space. In this combination, we envisioned that the initial solutions provided by Meta-Learning are located in good regions in the search space (i.e. they are closer to the optimum solutions). Hence, the search technique would need to evaluate a lower number of candidate search points in order to find an adequate solution. In our work, we implemented a prototype in which Particle Swarm Optimization (PSO) was used to select the values of two SVM parameters for regression problems. In the performed experiments, the proposed solution was compared to a PSO with random initialization, obtaining better average results on a set of 40 regression problems. © 2010 IEEE.

FecharLer Abstract