Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2012

Towards Utility Maximization in Regression

Authors
Ribeiro, RP;

Publication
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012)

Abstract
Utilitybased learning is a key technique for addressing many real world data mining applications, where the costs/benefits are not uniform across the domain of the target variable. Still, most of the existing research has been focused on classification problems. In this paper we address a related problem. There are many relevant domains (e. g. ecological, meteorological, finance) where decisions are based on the forecast of a numeric quantity (i.e. the result of a regression model). The goal of the work on this paper is to present an evaluation framework for applications where the numeric outcome of a regression model may lead to different costs/benefits as a consequence of the actions it entails. The new metric provides a more informed estimate of the utility of any regression model, given the application-specific preference biases, and hence makes more reliable the comparison and selection between alternative regression models. We illustrate the objective of our evaluation methodology on a real-life application and also carry a set of experiments over a subset of our target regression tasks: the prediction of rare and extreme values. Results show the effectiveness of our proposed utility metric for identifying the models that perform better on this type of applications.

2012

Conceptual clustering with generalization by intervals [Classification Conceptuelle avec Généralisation par Intervalles]

Authors
Brito, P; Polaillon, G;

Publication
Revue des Nouvelles Technologies de l'Information

Abstract
This paper deals with hierarchical or pyramidal conceptual clustering methods, where each formed cluster corresponds to a concept, i.e., a pair (extent, intent).We consider data presenting real or interval-valued numerical values, ordered values and/or probability/frequency distributions on a set of categories. Concepts are obtained by a Galois connection with generalisation by intervals, which allows dealing with different variable types on a common framework. In the case of distribution data, the obtained concepts are more homogeneous and more easily interpretable than those obtained by using the maximum and minimum operators previously proposed. A measure of generality of a concept is defined similarly for all these variable types. An example illustrates the proposed method.

2012

Divisive monothetic clustering for interval and histogram-valued data

Authors
Brito, P; Chavent, M;

Publication
ICPRAM 2012 - Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods

Abstract
In this paper we propose a divisive top-down clustering method designed for interval and histogram-valued data. The method provides a hierarchy on a set of objects together with a monothetic characterization of each formed cluster. At each step, a cluster is split so as to minimize intra-cluster dispersion, which is measured using a distance suitable for the considered variable types. The criterion is minimized across the bipartitions induced by a set of binary questions. Since interval-valued variables may be considered a special case of histogram-valued variables, the method applies to data described by either kind of variables, or by variables of both types. An example illustrates the proposed approach.

2012

Modelling interval data with Normal and Skew-Normal distributions

Authors
Brito, P; Pedro Duarte Silva, APD;

Publication
JOURNAL OF APPLIED STATISTICS

Abstract
A parametric modelling for interval data is proposed, assuming a multivariate Normal or Skew-Normal distribution for the midpoints and log-ranges of the interval variables. The intrinsic nature of the interval variables leads to special structures of the variance-covariance matrix, which is represented by five different possible configurations. Maximum likelihood estimation for both models under all considered configurations is studied. The proposed modelling is then considered in the context of analysis of variance and multivariate analysis of variance testing. To access the behaviour of the proposed methodology, a simulation study is performed. The results show that, for medium or large sample sizes, tests have good power and their true significance level approaches nominal levels when the constraints assumed for the model are respected; however, for small samples, sizes close to nominal levels cannot be guaranteed. Applications to Chinese meteorological data in three different regions and to credit card usage variables for different card designations, illustrate the proposed methodology.

2012

Combining meta-learning and optimization algorithms for parameter selection

Authors
Gomes, T; Miranda, P; Prudencio, R; Soares, C; Carvalho, A;

Publication
CEUR Workshop Proceedings

Abstract
In this article we investigate the combination of meta-learning and optimization algorithms for parameter selection. We discuss our general proposal as well as present the recent develop-ments and experiments performed using Support Vector Machines (SVMs). Meta-learning was combined to single and multi-objective optimization techniques to select SVM parameters. The hybrid meth-ods derived from the proposal presented better results on predictive accuracy than the use of traditional optimization techniques.

2012

Predicting the accuracy of regression models in the retail industry

Authors
Pinto, F; Soares, C;

Publication
CEUR Workshop Proceedings

Abstract
Companies are moving from developing a single model for a problem (e.g., a regression model to predict general sales) to developing several models for sub-problems of the original problem (e.g., regression models to predict sales of each of its product categories). Given the similarity between the sub-problems, the process of model development should not be independent. Information should be shared between processes. Different approaches can be used for that purpose, including metalearning (MtL) and transfer learning. In this work, we use MtL to predict the performance of a model based on the performance of models that were previously developed. Given that the sub-problems are related (e.g., the schemas of the data are the same), domain knowledge is used to develop the metafeatures that characterize them. The approach is applied to the development of models to predict sales of different product categories in a retail company from Portugal.

  • 296
  • 430