2000
Authors
Soares, C; Brazdil, PB;
Publication
LECTURE NOTES IN COMPUTER SCIENCE <D>
Abstract
Given the wide variety of available classification algorithms and the volume of data today's organizations need to analyze, the selection of the right algorithm to use on a new problem is an important issue. In this paper we present a combination of techniques to address this problem. The first one, zooming, analyzes a given dataset and selects relevant (similar) datasets that were processed by the candidate algoritms in the past. This process is based on the concept of distance, calculated on the basis of several dataset characteristics. The information about the performance of the candidate algorithms on the selected datasets is then processed by a second technique, a ranking method. Such a method uses performance information to generate advice in the form of a ranking, indicating which algorithms should be applied in which order. Here we propose the adjusted ratio of ratios ranking method. This method takes into account not only accuracy but also the time performance of the candidate algorithms. The generalization power of this ranking method is analyzed. For this purpose, an appropriate methodology is defined. The experimental results indicate that on average better results are obtained with zooming than without it.
2012
Authors
Gomes, T; Miranda, P; Prudencio, R; Soares, C; Carvalho, A;
Publication
CEUR Workshop Proceedings
Abstract
In this article we investigate the combination of meta-learning and optimization algorithms for parameter selection. We discuss our general proposal as well as present the recent develop-ments and experiments performed using Support Vector Machines (SVMs). Meta-learning was combined to single and multi-objective optimization techniques to select SVM parameters. The hybrid meth-ods derived from the proposal presented better results on predictive accuracy than the use of traditional optimization techniques.
2012
Authors
Pinto, F; Soares, C;
Publication
CEUR Workshop Proceedings
Abstract
Companies are moving from developing a single model for a problem (e.g., a regression model to predict general sales) to developing several models for sub-problems of the original problem (e.g., regression models to predict sales of each of its product categories). Given the similarity between the sub-problems, the process of model development should not be independent. Information should be shared between processes. Different approaches can be used for that purpose, including metalearning (MtL) and transfer learning. In this work, we use MtL to predict the performance of a model based on the performance of models that were previously developed. Given that the sub-problems are related (e.g., the schemas of the data are the same), domain knowledge is used to develop the metafeatures that characterize them. The approach is applied to the development of models to predict sales of different product categories in a retail company from Portugal.
2010
Authors
Soares, C; Ghani, R;
Publication
Data Mining for Business Applications
Abstract
This chapter introduces the volume on Data Mining (DM) for Business Applications. The chapters in this book provide an overview of some of the major advances in the field, namely in terms of methodology and applications, both traditional and emerging. In this introductory paper, we provide a context for the rest of the book. The framework for discussing the contents of the book is the DM methodology, which is suitable both to organize and relate the diverse contributions of the chapters selected. The chapter closes with an overview of the chapters in the book to guide the reader.
2010
Authors
Torgo, L; Soares, C;
Publication
Data Mining for Business Applications
Abstract
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important impact on the statistics produced by the institute. The detectiong of these rare errors is a manual, time-consuming task. This type of tasks is usually constrained by a limited amount of available resources. Our proposal addresses this issue by producing a ranking of outlyingness that allows a better management of the available resources by allocating them to the cases which are most different from the other and, thus, have a higher probability of being errors. Our method is based on the output of standard agglomerative hierarchical clustering algorithms, resulting in no significant additional computational costs. Our results show that it enables large savings by selecting a small subset of suspicious transactions for manual inspection, which, nevertheless, includes most of the erroneous transactions. In this study we compare our proposal to a state of the art outlier ranking method (LOF) and show that our method achieves better results on this particular application. The results of our experiments are also competitive with previous results on the same data. Finally, the outcome of our experiments raises important questions concerning the method currently followed at INE concerning items with small number of transactions.
2010
Authors
Soares, C; Ghani, R;
Publication
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.