Publications

Publications by Pavel Brazdil

2004

Improving progressive sampling via meta-learning on learning curves

Authors
Leite, R; Brazdil, P;

Publication
MACHINE LEARNING: ECML 2004, PROCEEDINGS

Abstract
This paper describes a method that can be seen as an improvement of, the standard progressive sampling. The standard method uses samples of data of increasing size until accuracy of the learned concept cannot be further improved. The issue we have addressed here is how to avoid using some of the samples in this progression. The paper presents a method for predicting the stopping point using a meta-learning approach. The method requires just four iterations of the progressive sampling. The information gathered is used to identify the nearest learning curves, for which the sampling procedure was carried out fully. This in turn permits to generate the prediction regards the stopping point. Experimental evaluation shows that the method can lead to significant savings of time without significant losses of accuracy.

CloseRead Abstract

2003

Improving progressive sampling via meta-learning

Authors
Leite, R; Brazdil, P;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
We present a method that can be seen as an improvement of standard progressive sampling method. The method exploits information concerning performance of a given algorithm on past datasets, which is used to generate predictions of the stopping point. Experimental evaluation shows that the method can lead to significant time savings without significant losses in accuracy.

CloseRead Abstract

2010

Meta-Learning - Concepts and Techniques

Authors
Vilalta, R; Carrier, CGG; Brazdil, P;

Publication
Data Mining and Knowledge Discovery Handbook, 2nd ed.

Abstract

2006

On the behavior of SVM and some older algorithms in binary text classification tasks

Authors
Colas, F; Brazdil, P;

Publication
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS

Abstract
Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM? We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.

CloseRead Abstract

2009

Learning cost-sensitive decision trees to support medical diagnosis

Authors
Freitas, A; Costa Pereira, A; Brazdil, P;

Publication
Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications

Abstract
Classification plays an important role in medicine, especially for medical diagnosis. Real-world medical applications often require classifiers that minimize the total cost, including costs for wrong diagnosis (misclassifications costs) and diagnostic test costs (attribute costs). There are indeed many reasons for considering costs in medicine, as diagnostic tests are not free and health budgets are limited. In this chapter, the authors have defined strategies for cost-sensitive learning. They have developed an algorithm for decision tree induction that considers various types of costs, including test costs, delayed costs and costs associated with risk. Then they have applied their strategy to train and to evaluate cost-sensitive decision trees in medical data. Generated trees can be tested following some strategies, including group costs, common costs, and individual costs. Using the factor of "risk" it is possible to penalize invasive or delayed tests and obtain patient-friendly decision trees. © 2010, IGI Global.

CloseRead Abstract

2009

Cost-sensitive learning in medicine

Authors
Freitas, A; Brazdil, P; Costa Pereira, A;

Publication
Data Mining and Medical Knowledge Management: Cases and Applications

Abstract
This chapter introduces cost-sensitive learning and its importance in medicine. Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). In fact, as in other professional areas, both diagnostic tests and its associated misclassification errors can have significant financial or human costs, including the use of unnecessary resource and patient safety issues. This chapter presents some concepts related to cost-sensitive learning and cost-sensitive classification and its application to medicine. Different types of costs are also present, with an emphasis on diagnostic tests and misclassification costs. In addition, an overview of research in the area of cost-sensitive learning is given, including current methodological approaches. Finally, current methods for the cost-sensitive evaluation of classifiers are discussed. © 2009, IGI Global.

CloseRead Abstract