Carlos Manuel Soares

O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais

Instituição
Investigação
Domínios de Investigação
Inteligência Artificial

Bioengenharia

Comunicações

Ciência e Engenharia dos Computadores
Fotónica

Sistemas de Energia

Robótica

Engenharia e Gestão de Sistemas
CENTROS DE INVESTIGAÇÃO
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Inovação
Inovação / Tec4

TEC4AGRO-FOOD

TEC4ENERGY

TEC4HEALTH

TEC4INDUSTRY

TEC4SEA

TECPARTNERSHIPS

Tecnologias Disponíveis
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Laboratórios
Laboratórios de Investigação

iilab
Comunicação
Notícias

Eventos

Media

Boletim Informativo
Porto, Portugal

+351 222 094 000

info@inesctec.pt
Junte-se a nós
Contactos

Home
Pessoas
Carlos Manuel Soares

Tópicos
de interesse

Detalhes

Nome
Carlos Manuel Soares
Cargo
Investigador Colaborador Externo
Desde
01 janeiro 2008

Nacionalidade
Portugal
Centro
Laboratório de Inteligência Artificial e Apoio à Decisão
Contactos
+351222094398
carlos.m.soares@inesctec.pt

006

Publicações

Ler todas as publicações

2024

Multidimensional subgroup discovery on event logs

Autores
Ribeiro, J; Fontes, T; Soares, C; Borges, JL;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Subgroup discovery (SD) aims at finding significant subgroups of a given population of individuals characterized by statistically unusual properties of interest. SD on event logs provides insight into particular behaviors of processes, which may be a valuable complement to the traditional process analysis techniques, especially for low -structured processes. This paper proposes a scalable and efficient method to search significant SD rules on frequent sequences of events, exploiting their multidimensional nature. With this method, it is intended to identify significant subsequences of events where the distribution of values of some target aspect is significantly different than the same distribution for the entire event log. A publicly available real -life event log of a Dutch hospital is used as a running example to demonstrate the applicability of our method. The proposed approach was applied on a real -life case study based on the public transport of a medium size European city (Porto, Portugal), for which the event data consists of 133 million smartcard travel validations from buses, trams and trains. The results include a characterization of mobility flows over multiple aspects, as well as the identification of unexpected behaviors in the flow of commuters (public transport). The generated knowledge provided a useful insight into the behavior of travelers, which can be applied at operational, tactical and strategic business levels, enhancing the current view of the transport services to transport authorities and operators.

FecharLer Abstract

2024

VEST: automatic feature engineering for forecasting

Autores
Cerqueira, V; Moniz, N; Soares, C;

Publicação
MACHINE LEARNING

Abstract
Time series forecasting is a challenging task with applications in a wide range of domains. Auto-regression is one of the most common approaches to address these problems. Accordingly, observations are modelled by multiple regression using their past lags as predictor variables. We investigate the extension of auto-regressive processes using statistics which summarise the recent past dynamics of time series. The result of our research is a novel framework called VEST, designed to perform feature engineering using univariate and numeric time series automatically. The proposed approach works in three main steps. First, recent observations are mapped onto different representations. Second, each representation is summarised by statistical functions. Finally, a filter is applied for feature selection. We discovered that combining the features generated by VEST with auto-regression significantly improves forecasting performance in a database composed by 90 time series with high sampling frequency. However, we also found that there are no improvements when the framework is applied for multi-step forecasting or in time series with low sample size. VEST is publicly available online.

FecharLer Abstract

2024

Systematic Analysis of the Impact of Label Noise Correction on ML Fairness

Autores
Silva, IOE; Soares, C; Sousa, I; Ghani, R;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
Arbitrary, inconsistent, or faulty decision-making raises serious concerns, and preventing unfair models is an increasingly important challenge in Machine Learning. Data often reflect past discriminatory behavior, and models trained on such data may reflect bias on sensitive attributes, such as gender, race, or age. One approach to developing fair models is to preprocess the training data to remove the underlying biases while preserving the relevant information, for example, by correcting biased labels. While multiple label noise correction methods are available, the information about their behavior in identifying discrimination is very limited. In this work, we develop an empirical methodology to systematically evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets. Our methodology involves manipulating the amount of label noise and can be used with fairness benchmarks but also with standard ML datasets. We apply the methodology to analyze six label noise correction methods according to several fairness metrics on standard OpenML datasets. Our results suggest that the Hybrid Label Noise Correction [20] method achieves the best trade-off between predictive performance and fairness. Clustering-Based Correction [14] can reduce discrimination the most, however, at the cost of lower predictive performance.

FecharLer Abstract

2024

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Autores
Strecht, P; Mendes Moreira, J; Soares, C;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II

Abstract
A growing number of organizations are adopting a strategy of breaking down large data analysis problems into specific sub-problems, tailoring models for each. However, handling a large number of individual models can pose challenges in understanding organization-wide phenomena. Recent studies focus on using decision trees to create a consensus model by aggregating local decision trees into sets of rules. Despite efforts, the resulting models may still be incomplete, i.e., not able to cover the entire decision space. This paper explores methodologies to tackle this issue by generating complete consensus models from incomplete rule sets, relying on rough estimates of the distribution of independent variables. Two approaches are introduced: synthetic dataset creation followed by decision tree training and a specialized algorithm for creating a decision tree from symbolic data. The feasibility of generating complete decision trees is demonstrated, along with an empirical evaluation on a number of datasets.

FecharLer Abstract

2024

Detection of Covid-19 in Chest X-Ray Images Using Percolation Features and Hermite Polynomial Classification

Autores
Roberto, GF; Pereira, DC; Martins, AS; Tosta, TAA; Soares, C; Lumini, A; Rozendo, GB; Neves, LA; Nascimento, MZ;

Publicação
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I

Abstract
Covid-19 is a serious disease caused by the Sars-CoV-2 virus that has been first reported in China at late 2019 and has rapidly spread around the world. As the virus affects mostly the lungs, chest X-rays are one of the safest and most accessible ways of diagnosing the infection. In this paper, we propose the use of an approach for detecting Covid-19 in chest X-ray images through the extraction and classification of local and global percolation-based features. The method was applied in two datasets: one containing 2,002 segmented samples split into two classes (Covid-19 and Healthy); and another containing 1,125 non-segmented samples split into three classes (Covid-19, Healthy and Pneumonia). The 48 obtained percolation features were given as input to six different classifiers and then AUC and accuracy values were evaluated. We employed the 10-fold cross-validation method and evaluated the lesion sub-types with binary and multiclass classification using the Hermite Polynomial classifier, which had never been employed in this context. This classifier provided the best overall results when compared to other five machine learning algorithms. These results based in the association of percolation features and Hermite polynomial can contribute to the detection of the lesions by supporting specialists in clinical practices.

FecharLer Abstract

Teses
supervisionadas

Teses supervisionadas

Ver todas as teses supervisionadas

2024

A Framework to Interpret Multiple Related Rule-based Models

Autor
Pedro Rodrigo Caetano Strecht Ribeiro

Instituição
UP-FEUP

2024

A Framework to Interpret Multiple Related Rule-based Models

Autor
Pedro Rodrigo Caetano Strecht Ribeiro

Instituição
UP-FEUP

2024

Enhancing Forecasting using Read & Write Recurrent Neural Networks

Autor
Yassine Baghoussi

Instituição
UP-FEUP

2019

Prescriptive Analytics for Staff Scheduling Optimization in Retail

Autor
Catarina Alexandra Teixeira Ramos

Instituição
UP-FEUP

2019

Ensembles for Time Series Forecasting

Autor
Vítor Manuel Araújo Cerqueira

Instituição
UP-FEUP

Ver todas as teses supervisionadas

Carlos Manuel Soares

Detalhes

Nome

Cargo

Desde

Nacionalidade

Centro

Contactos

BI4UP

CMLDM

Chatbot_Intelligence

opti-MOVES

SSPM

PFAI4_3ed

Multidimensional subgroup discovery on event logs

VEST: automatic feature engineering for forecasting

Systematic Analysis of the Impact of Label Noise Correction on ML Fairness

Symbolic Data Analysis to Improve Completeness of Model Combination Methods

Detection of Covid-19 in Chest X-Ray Images Using Percolation Features and Hermite Polynomial Classification

A Framework to Interpret Multiple Related Rule-based Models

A Framework to Interpret Multiple Related Rule-based Models

Enhancing Forecasting using Read & Write Recurrent Neural Networks

Prescriptive Analytics for Staff Scheduling Optimization in Retail

Ensembles for Time Series Forecasting