2015
Autores
Abdulrahman, SM; Brazdil, P; Van Rijn, JN; Vanschoren, J;
Publicação
CEUR Workshop Proceedings
Abstract
Identifying the best machine learning algorithm for a given problem continues to be an active area of research. In this paper we present a new method which exploits both meta-level information acquired in past experiments and active testing, an algorithm selection strategy. Active testing attempts to iteratively identify an algorithm whose performance will most likely exceed the performance of previously tried algorithms. The novel method described in this paper uses tests on smaller data sample to rank the most promising candidates, thus optimizing the schedule of experiments to be carried out. The experimental results show that this approach leads to considerably faster algorithm selection.
2015
Autores
Pinto, F; Soares, C; Brazdil, P;
Publicação
INTELLIGENT DATA ANALYSIS
Abstract
Data Mining (DM) researchers often focus on the development and testing of models for a single decision (e.g., direct mailing, churn detection, etc.). In practice, however, multiple decisions have often to be made simultaneously which are not independent and the best global solution is often not the combination of the best individual solutions. This problem can be addressed by searching for the overall best solution by using optimization methods based on the predictions made by the DM models. We describe one case study were this approach was used to optimize the layout of a retail store in order to maximize predicted sales. A metaheuristic is used to search different hypothesis of space allocations for multiple product categories, guided by the predictions made by regression models that estimate the sales for each category based on the assigned space. We test three metaheuristics and three regression algorithms on this task. Results show that the Particle Swam Optimization method guided by the models obtained with Random Forests and Support Vector Machines models obtain good results. We also provide insights about the relationship between the correctness of the regression models and the metaheuristics performance.
2015
Autores
van Rijn, JN; Abdulrahman, SM; Brazdil, P; Vanschoren, J;
Publicação
Advances in Intelligent Data Analysis XIV
Abstract
One of the challenges in Machine Learning to find a classifier and parameter settings that work well on a given dataset. Evaluating all possible combinations typically takes too much time, hence many solutions have been proposed that attempt to predict which classifiers are most promising to try. As the first recommended classifier is not always the correct choice, multiple recommendations should be made, making this a ranking problem rather than a classification problem. Even though this is a well studied problem, there is currently no good way of evaluating such rankings. We advocate the use of Loss Time Curves, as used in the optimization literature. These visualize the amount of budget (time) needed to converge to a acceptable solution. We also investigate a method that utilizes the measured performances of classifiers on small samples of data to make such recommendation, and adapt it so that it works well in Loss Time space. Experimental results show that this method converges extremely fast to an acceptable solution.
2016
Autores
Kanda, J; de Carvalho, A; Hruschka, E; Soares, C; Brazdil, P;
Publicação
NEUROCOMPUTING
Abstract
The Traveling Salesman Problem (TSP) is one of the most studied optimization problems. Various meta heuristics (MHs) have been proposed and investigated on many instances of this problem. It is widely accepted that the best MH varies for different instances. Ideally, one should be able to recommend the best MHs for a new TSP instance without having to execute them. However, this is a very difficult task. We address this task by using a meta-learning approach based on label ranking algorithms. These algorithms build a mapping that relates the characteristics of those instances (i.e., the meta-features) with the relative performance (i.e., the ranking) of MHs, based on (meta-)data extracted from TSP instances that have been already solved by those MHs. The success of this approach depends on the quality of the meta-features that describe the instances. In this work, we investigate four different sets of meta-features based on different measurements of the properties of TSP instances: edge and vertex measures, complex network measures, properties from the MHs, and subsampling landmarkers properties. The models are investigated in four different TSP scenarios presenting symmetry and connection strength variations. The experimental results indicate that meta-learning models can accurately predict rankings of MHs for different TSP scenarios. Good solutions for the investigated TSP instances can be obtained from the prediction of rankings of MHs, regardless of the learning algorithm used at the meta level. The experimental results also show that the definition of the set of meta-features has an important impact on the quality of the solutions obtained.
2015
Autores
Trigo, L; Víta, M; Sarmento, R; Brazdil, P;
Publicação
IC3K 2015 - Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
Abstract
We present an Information Retrieval tool that facilitates the task of the user when searching for a particular information that is of interest to him. Our system processes a given set of documents to produce a graph, where nodes represent documents and links the similarities. The aim is to offer the user a tool to navigate in this space in an easy way. It is possible to collapse/expand nodes. Our case study shows affinity groups based on the similarities of text production of researchers. This goes beyond the already established communities revealed by co-authorship. The system characterizes the activity of each author by a set of automatically generated keywords and by membership to a particular affinity group. The importance of each author is highlighted visually by the size of the node corresponding to the number of publications and different measures of centrality. Regarding the validation of the method, we analyse the impact of using different combinations of titles, abstracts and keywords on capturing the similarity between researchers.
2017
Autores
Vilalta, R; Giraud Carrier, CG; Brazdil, P; Soares, C;
Publicação
Encyclopedia of Machine Learning and Data Mining
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.