Publicacoes - INESC TEC

Publicações

Publicações por Rita Paula Ribeiro

2007

Utility-based regression

Autores
Torgo, L; Ribeiro, R;

Publicação
Knowledge Discovery in Databases: PKDD 2007, Proceedings

Abstract
Cost-sensitive learning is a key technique for addressing many real world data mining applications. Most existing research has been focused on classification problems. In this paper we propose a framework for evaluating regression models in applications with non-uniform costs and benefits across the domain of the continuous target variable. Namely, we describe two metrics for asserting the costs and benefits of the predictions of any model given a set of test cases. We illustrate the use of our metrics in the context of a specific type of applications where non-uniform costs are required: the prediction of rare extreme values of a continuous target variable. Our experiments provide clear evidence of the utility of the proposed framework for evaluating the merits of any model in this class of regression domains.

FecharLer Abstract

2003

Predicting outliers

Autores
Torgo, L; Ribeiro, R;

Publicação
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS

Abstract
This paper describes a method designed for data mining applications where the main goal is to predict extreme and rare values of a continuous target variable, as well as to understand under which conditions these values occur. Our objective is to induce models that are accurate at predicting these outliers but are also interpretable from the user perspective. We describe a new splitting criterion for regression trees that enables the induction of trees achieving these goals. We evaluate our proposal on several real world problems and contrast the obtained models with standard regression trees. The results of this evaluation show the clear advantage of our proposal in terms of the evaluation statistics that are relevant for these applications.

FecharLer Abstract

2003

Predicting harmful algae blooms

Autores
Ribeiro, R; Torgo, L;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
In several applications the main interest resides in predicting rare and extreme values. This is the case of the prediction of harmful algae blooms. Though it's rare, the occurrence of these blooms has a strong impact in river life forms and water quality and turns out to be a serious ecological problem. In this paper, we describe a data mining method whose main goal is to predict accurately this kind of rare extreme values. We propose a new splitting criterion for regression trees that enables the induction of trees achieving these goals. We carry out an analysis of the results obtained with our method on this application domain and compare them to those obtained with standard regression trees. We conclude that this new method achieves better results in terms of the evaluation statistics that are relevant for this kind of applications.

FecharLer Abstract

2006

Rule-based prediction of rare extreme values

Autores
Ribeiro, R; Torgo, L;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
This paper describes a rule learning method that obtains models biased towards a particular class of regression tasks. These tasks have as main distinguishing feature the fact that the main goal is to be accurate at predicting rare extreme values of the continuous target variable. Many real-world applications from scientific areas like ecology, meteorology, finance,etc., share this objective. Most existing approaches to regression problems search for the model parameters that optimize a given average error estimator (e.g. mean squared error). This means that they are biased towards achieving a good performance on the most common cases. The motivation for our work is the claim that being accurate at a small set of rare cases requires different error metrics. Moreover, given the nature and relevance of this type of applications an interpretable model is usually of key importance to domain experts, as predicting these rare events is normally associated with costly decisions. Our proposed system (R-PREV) obtains a set of interpretable regression rules derived from a set of bagged regression trees using evaluation metrics that bias the resulting models to predict accurately rare extreme values. We provide an experimental evaluation of our method confirming the advantages of our proposal in terms of accuracy in predicting rare extreme values.

FecharLer Abstract

2009

Precision and Recall for Regression

Autores
Torgo, L; Ribeiro, R;

Publicação
DISCOVERY SCIENCE, PROCEEDINGS

Abstract
Cost sensitive prediction is a key task in many real world applications. Most existing research in this area deals with classification problems. This paper addresses a related regression problem: the prediction of rare extreme values of a continuous variable. These values are often regarded as outliers and removed from posterior analysis. However, for many applications (e.g. in finance, meteorology, biology, etc.) these are the key values that we want to accurately predict. Any learning method obtains models by optimizing some preference criteria. In this paper we propose new evaluation criteria that are more adequate for these applications. We describe a generalization for regression of the concepts of precision and recall often used in classification. Using these new evaluation metrics we are able to focus the evaluation of predictive models on the cases that really matter for these applications. Our experiments indicate the advantages of the use of these new measures when comparing predictive models in the context of our target applications.

FecharLer Abstract

2006

Predicting rare extreme values

Autores
Torgo, L; Ribeiro, R;

Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS

Abstract
Modelling extreme data is very important in several application domains, like for instance finance, meteorology, ecology, etc.. This paper addresses the problem of predicting extreme values of a continuous variable. The main distinguishing feature of our target applications resides on the fact that these values are rare. Any prediction model is obtained by some sort of search process guided by a pre-specified evaluation criterion. In this work we argue against the use of standard criteria for evaluating regression models in the context of our target applications. We propose. a new predictive performance metric for this class of problems that our experiments show to perform better in distinguishing models that are more accurate at rare extreme values. This new evaluation metric could be used as the basis for developing better models in terms of rare extreme values prediction.

FecharLer Abstract