Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Pedro Strecht
  • Cargo

    Investigador Colaborador Externo
  • Desde

    01 abril 2014
Publicações

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

2023

Curbing Dropout: Predictive Analytics at the University of Porto

Autores
Blanquet, L; Grilo, J; Strecht, P; Camanho, A;

Publicação
Atas da Conferencia da Associacao Portuguesa de Sistemas de Informacao

Abstract
This study explores data mining techniques for predicting student dropout in higher education. The research compares different methodological approaches, including alternative algorithms and variations in model specifications. Additionally, we examine the impact of employing either a single model for all university programs or separate models per program. The performance of models with students grouped according to their position on the program study plan was also tested. The training datasets were explored with varying time series lengths (2, 4, 6, and 8 years) and the experiments use academic data from the University of Porto, spanning the academic years from 2012 to 2022. The algorithm that yielded the best results was XGBoost. The best predictions were obtained with models trained with two years of data, both with separate models for each program and with a single model. The findings highlight the potential of data mining approaches in predicting student dropout, offering valuable insights for higher education institutions aiming to improve student retention and success. © 2023 Associacao Portuguesa de Sistemas de Informacao. All rights reserved.

2022

Density Estimation in High-Dimensional Spaces: A Multivariate Histogram Approach

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II

Abstract
Density estimation is an important tool for data analysis. Non-parametric approaches have a reputation for offering state-of-the-art density estimates limited to few dimensions. Despite providing less accurate density estimates, histogram-based approaches remain the only alternative for datasets in high-dimensional spaces. In this paper, we present a multivariate histogram approach to estimate the density of a dataset without restrictions on the number of dimensions, containing both numerical and categorical variables (without numerical encoding) and allowing missing data (without the need to preprocess them). Results from the empirical evaluation show that it is possible to estimate the density of datasets without restrictions on dimensionality, and the method is robust to missing values and categorical variables.

2021

Inmplode: A framework to interpret multiple related rule-based models

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
EXPERT SYSTEMS

Abstract
There is a growing trend to split problems into separate subproblems and develop separate models for each (e.g., different churn models for separate customer segments; different failure prediction models for separate university courses, etc.). While it may lead to better predictive models, the use of multiple models makes interpretability more challenging. In this paper, we address the problem of synthesizing the knowledge contained in a set of models without a significant loss of prediction performance. We focus on decision tree models because their interpretability makes them suitable for problems involving knowledge extraction. We detail the process, identifying alternative methods to address the different phases involved. An extensive set of experiments is carried out on the problem of predicting the failure of students in courses at the University of Porto. We assess the effect of using different methods for the operations of the methodology, both in terms of the knowledge extracted as well as the accuracy of the combined models.

2018

A Framework for Analytical Approaches to Combine Interpretable Models

Autores
Strecht, P; Moreira, JM; Soares, C;

Publicação
Information Management and Big Data, 5th International Conference, SIMBig 2018, Lima, Peru, September 3-5, 2018, Proceedings.

Abstract
Analytic approaches to combine interpretable models, although presented in different contexts, can be generalized to highlight the components that can be specialized. We propose a framework that structures the combination process, formalizes the problems that can be solved in alternative ways and evaluates the combined models based on their predictive ability to replace the base ones, without loss of interpretability. The framework is illustrated with a case study using data from the University of Porto, Portugal, where experiments were carried out. The results show that grouping base models by scientific areas, ordering by the number of variables and intersecting their underlying rules creates conditions for the combined models to outperform them. © 2019, Springer Nature Switzerland AG.