Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2017

Learning Through Utility Optimization in Regression Tasks

Autores
Branco, P; Torgo, L; Ribeiro, RP; Frank, E; Pfahringer, B; Rau, MM;

Publicação
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)

Abstract
Accounting for misclassification costs is important in many practical applications of machine learning, and cost sensitive techniques for classification have been studied extensively. Utility-based learning provides a generalization of purely cost-based approaches that considers both costs and benefits, enabling application to domains with complex cost-benefit settings. However, there is little work on utility- or cost-based learning for regression. In this paper, we formally define the problem of utility-based regression and propose a strategy for maximizing the utility of regression models. We verify our findings in a large set of experiments that show the advantage of our proposal in a diverse set of domains, learning algorithms and cost/benefit settings.

FecharLer Abstract

2017

Preface

Autores
Sayed Mouchaweh, M; Bifet, A; Bouchachia, H; Gama, J; Ribeiro, RP;

Publicação
CEUR Workshop Proceedings

Abstract

2017

Preface

Autores
Sayed Mouchaweh, M; Bouchachia, H; Gama, J; Ribeiro, RP;

Publicação
CEUR Workshop Proceedings

Abstract

2017

Off the beaten track: A new linear model for interval data

Autores
Dias, S; Brito, P;

Publicação
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH

Abstract
We propose a new linear regression model for interval-valued variables. The model uses quantile functions to represent the intervals, thereby considering the distributions within them. In this paper we study the special case where the Uniform distribution is assumed in each observed interval, and we analyze the extension to the Symmetric Triangular distribution. The parameters of the model are obtained solving a constrained quadratic optimization problem that uses the Mallows distance between quantile functions. As in the classical case, a goodness-of-fit measure is deduced. Two applications on up-to-date fields are presented: one predicting duration of unemployment and the other allowing forecasting burned area by forest fires.

FecharLer Abstract

2017

Exploratory data analysis for interval compositional data

Autores
Hron, K; Brito, P; Filzmoser, P;

Publicação
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION

Abstract
Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated.

FecharLer Abstract

2017

Dissimilar Symmetric Word Pairs in the Human Genome

Autores
Tavares, AH; Raymaekers, J; Rousseeuw, PJ; Silva, RM; Bastos, CAC; Pinho, AJ; Brito, P; Afreixo, V;

Publicação
11th International Conference on Practical Applications of Computational Biology & Bioinformatics, PACBB 2017, Porto, Portugal, 21-23 June, 2017

Abstract
In this work we explore the dissimilarity between symmetric word pairs, by comparing the inter-word distance distribution of a word to that of its reversed complement. We propose a new measure of dissimilarity between such distributions. Since symmetric pairs with different patterns could point to evolutionary features, we search for the pairs with the most dissimilar behaviour. We focus our study on the complete human genome and its repeat-masked version. © Springer International Publishing AG 2017.

FecharLer Abstract