Publicacoes - INESC TEC

Publicações

Publicações por Carlos Manuel Soares

2022

On Usefulness of Outlier Elimination in Classification Tasks

Autores
Hetlerovic, D; Popelínský, L; Brazdil, P; Soares, C; Freitas, F;

Publicação
Advances in Intelligent Data Analysis XX - 20th International Symposium on Intelligent Data Analysis, IDA 2022, Rennes, France, April 20-22, 2022, Proceedings

Abstract

2022

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II

Abstract
Density estimation is an important tool for data analysis. Non-parametric approaches have a reputation for offering state-of-the-art density estimates limited to few dimensions. Despite providing less accurate density estimates, histogram-based approaches remain the only alternative for datasets in high-dimensional spaces. In this paper, we present a multivariate histogram approach to estimate the density of a dataset without restrictions on the number of dimensions, containing both numerical and categorical variables (without numerical encoding) and allowing missing data (without the need to preprocess them). Results from the empirical evaluation show that it is possible to estimate the density of datasets without restrictions on dimensionality, and the method is robust to missing values and categorical variables.

FecharLer Abstract

2021

Inmplode: A framework to interpret multiple related rule-based models

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
EXPERT SYSTEMS

Abstract
There is a growing trend to split problems into separate subproblems and develop separate models for each (e.g., different churn models for separate customer segments; different failure prediction models for separate university courses, etc.). While it may lead to better predictive models, the use of multiple models makes interpretability more challenging. In this paper, we address the problem of synthesizing the knowledge contained in a set of models without a significant loss of prediction performance. We focus on decision tree models because their interpretability makes them suitable for problems involving knowledge extraction. We detail the process, identifying alternative methods to address the different phases involved. An extensive set of experiments is carried out on the problem of predicting the failure of students in courses at the University of Porto. We assess the effect of using different methods for the operations of the methodology, both in terms of the knowledge extracted as well as the accuracy of the combined models.

FecharLer Abstract

2022

A case study comparing machine learning with statistical methods for time series forecasting: size matters

Autores
Cerqueira, V; Torgo, L; Soares, C;

Publicação
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

Abstract
Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, evidence was shown that these approaches systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample size. Using a learning curve method, our results suggest that machine learning methods improve their relative predictive performance as the sample size grows. The R code to reproduce all of our experiments is available at https://github.com/vcerqueira/MLforForecasting.

FecharLer Abstract

2024

VEST: automatic feature engineering for forecasting

Autores
Cerqueira, V; Moniz, N; Soares, C;

Publicação
MACHINE LEARNING

Abstract
Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of time series. The result of our research is a novel framework called VEST, designed to perform feature engineering using univariate and numeric time series automatically. The proposed approach works in three main steps. First, recent observations are mapped onto different representations. Second, each representation is summarised by statistical functions. Finally, a filter is applied for feature selection. We discovered that combining the features generated by VEST with auto-regression significantly improves forecasting performance in a database composed by 90 time series with high sampling frequency. However, we also found that there are no improvements when the framework is applied for multi-step forecasting or in time series with low sample size. VEST is publicly available online.

FecharLer Abstract

2022

Metalearning

Autores
Brazdil, P; van Rijn, JN; Soares, C; Vanschoren, J;

Publicação
Cognitive Technologies

Abstract