2022
Autores
Strecht, P; Mendes Moreira, J; Soares, C;
Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II
Abstract
Density estimation is an important tool for data analysis. Non-parametric approaches have a reputation for offering state-of-the-art density estimates limited to few dimensions. Despite providing less accurate density estimates, histogram-based approaches remain the only alternative for datasets in high-dimensional spaces. In this paper, we present a multivariate histogram approach to estimate the density of a dataset without restrictions on the number of dimensions, containing both numerical and categorical variables (without numerical encoding) and allowing missing data (without the need to preprocess them). Results from the empirical evaluation show that it is possible to estimate the density of datasets without restrictions on dimensionality, and the method is robust to missing values and categorical variables.
2021
Autores
Strecht, P; Mendes Moreira, J; Soares, C;
Publicação
EXPERT SYSTEMS
Abstract
There is a growing trend to split problems into separate subproblems and develop separate models for each (e.g., different churn models for separate customer segments; different failure prediction models for separate university courses, etc.). While it may lead to better predictive models, the use of multiple models makes interpretability more challenging. In this paper, we address the problem of synthesizing the knowledge contained in a set of models without a significant loss of prediction performance. We focus on decision tree models because their interpretability makes them suitable for problems involving knowledge extraction. We detail the process, identifying alternative methods to address the different phases involved. An extensive set of experiments is carried out on the problem of predicting the failure of students in courses at the University of Porto. We assess the effect of using different methods for the operations of the methodology, both in terms of the knowledge extracted as well as the accuracy of the combined models.
2022
Autores
Cerqueira, V; Torgo, L; Soares, C;
Publicação
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
Abstract
Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, evidence was shown that these approaches systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample size. Using a learning curve method, our results suggest that machine learning methods improve their relative predictive performance as the sample size grows. The R code to reproduce all of our experiments is available at https://github.com/vcerqueira/MLforForecasting.
2024
Autores
Cerqueira, V; Moniz, N; Soares, C;
Publicação
MACHINE LEARNING
Abstract
Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of time series. The result of our research is a novel framework called VEST, designed to perform feature engineering using univariate and numeric time series automatically. The proposed approach works in three main steps. First, recent observations are mapped onto different representations. Second, each representation is summarised by statistical functions. Finally, a filter is applied for feature selection. We discovered that combining the features generated by VEST with auto-regression significantly improves forecasting performance in a database composed by 90 time series with high sampling frequency. However, we also found that there are no improvements when the framework is applied for multi-step forecasting or in time series with low sample size. VEST is publicly available online.
2022
Autores
Brazdil, P; van Rijn, JN; Soares, C; Vanschoren, J;
Publicação
Cognitive Technologies
Abstract
2022
Autores
Hetlerovic, D; Popelinsky, L; Brazdil, P; Soares, C; Freitas, F;
Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XX, IDA 2022
Abstract
Although outlier detection/elimination has been studied before, few comprehensive studies exist on when exactly this technique would be useful as preprocessing in classification tasks. The objective of our study is to fill in this gap. We have performed experiments with 12 various outlier elimination methods and 10 classification algorithms on 50 different datasets. The results were then processed by the proposed reduction method, whose aim is identify the most useful workflows for a given set of tasks (datasets). The reduction method has identified that just three OEMs that are generally useful for the given set of tasks. We have shown that the inclusion of these OEMs is indeed useful, as it leads to lower loss in accuracy and the difference is quite significant (0.5%) on average.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.