Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2022

Semi-causal decision trees

Autores
Nogueira, AR; Ferreira, CA; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Typically, classification algorithms use correlation analysis to make decisions. However, these decisions and the models they learn are not easily understandable for the typical user. Causal discovery is the field that studies the means to find causal relationships in observational data. Although highly interpretable, causal discovery algorithms tend to not perform so well in classification problems. This paper aims to propose a hybrid decision tree approach (SC tree) that mixes causal discovery with correlation analysis through the implementation of a custom metric to split the data in the tree's construction (Semi-causal gain ratio). In the results, the proposed methodology obtained a significant performance improvement (11.26% mean error rate) when compared to several causal baselines CDT-PS (23.67% ) and CDT-SPS (25.14%), matching closely the performance of J48 (10.20%), used as a correlation baseline, in ten binary data sets. Besides, when compared with PC in discrete data sets, the proposed approach obtained substantial improvement (16.17% against 28.07% in terms of mean error rate).

FecharLer Abstract

2022

An Exploratory Diagnosis of Artificial Intelligence Risks for a Responsible Governance

Autores
Teixeira, S; Rodrigues, J; Veloso, B; Gama, J;

Publicação
15th International Conference on Theory and Practice of Electronic Governance, ICEGOV 2022, Guimarães, Portugal, October 4-7, 2022

Abstract
Our lives have been increasingly filled with technologies that use Artificial Intelligence (AI), whether at home, in public spaces, in social organizations, or in services. Like other technologies, adopting this emerging technology also requires society's attention to the challenges that may arise from it. The media brought to the public some unexpected results from using these technologies, for example, the unfairness case in the COMPAS system. It became more evident that these technologies can have unintended consequences. In particular, in the public interest domain, these unintended consequences and their origin are a challenge for public policies, governance, and responsible AI. This work aims to identify the technological and ethical risks in data-driven decision systems based on AI and conduct a diagnosis of these risks and their perception. To do that, we use a triangulation of methods. In the first stage, a search on Web of Science has been performed. We consider all the 412 papers. The second stage corresponds to a analysis of experts. The papers have been classified according to the relevance to the topic by the experts. In the third stage, we use the survey method and include risk insights from stage two in our questions. We found 24 concerns which arise from the perspective of the ethical and technological risk perspective. The perception of participants regarding the level of concern they have with the risks of a data-driven system based on AI is high than their perception of society's concern. Fairness is considered the risk whose perception is more severe. Fairness, Bias, Accountability, Interpretability, and Explainability are considered the most relevant concepts for a responsible AI. Consequently, also the most relevant for responsible governance of AI. © 2022 ACM.

FecharLer Abstract

2022

An Algorithm Adaptation Method for Multi-Label Stream Classification using Self-Organizing Maps

Autores
Cerri, R; Faria, ER; Gama, J;

Publicação
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA

Abstract
Multi-label stream classification is the task of classifying instances in two or more classes simultaneously, with instances flowing continuously in high speed. This task imposes difficult challenges, such as the detection of concept drifts, where the distributions of the instances in the stream change with time, and infinitely delayed labels, when the ground truth labels of the instances are never available to help updating the classifiers. To solve such task, the methods from the literature use the problem transformation approach, which divides the multi-label problem into different sub-problems, associating one classification model for each class. In this paper, we propose a method based on self-organizing maps that, different from the literature, uses only one model to deal with all classes simultaneously. By using the algorithm adaptation approach, our proposal better considers label dependencies, improving the results over its counterparts. Experiments using different synthetic and real-world datasets showed that our proposal obtained the overall best performance when compared to different methods from the literature.

FecharLer Abstract

2022

Contextualization for the Organization of Text Documents Streams

Autores
Sarmento, RP; Cardoso, DdO; Gama, J; Brazdil, P;

Publicação
CoRR

Abstract

2022

Federated Anomaly Detection over Distributed Data Streams

Autores
Silva, PR; Vinagre, J; Gama, J;

Publicação
CoRR

Abstract

2022

Open challenges for Machine Learning based Early Decision-Making research

Autores
Bondu, A; Achenchabe, Y; Bifet, A; Clérot, F; Cornuéjols, A; Gama, J; Hébrail, G; Lemaire, V; Marteau, PF;

Publicação
SIGKDD Explor.

Abstract