Publications

Publications by João Gama

2022

Semi-causal decision trees

Authors
Nogueira, AR; Ferreira, CA; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Typically, classification algorithms use correlation analysis to make decisions. However, these decisions and the models they learn are not easily understandable for the typical user. Causal discovery is the field that studies the means to find causal relationships in observational data. Although highly interpretable, causal discovery algorithms tend to not perform so well in classification problems. This paper aims to propose a hybrid decision tree approach (SC tree) that mixes causal discovery with correlation analysis through the implementation of a custom metric to split the data in the tree's construction (Semi-causal gain ratio). In the results, the proposed methodology obtained a significant performance improvement (11.26% mean error rate) when compared to several causal baselines CDT-PS (23.67% ) and CDT-SPS (25.14%), matching closely the performance of J48 (10.20%), used as a correlation baseline, in ten binary data sets. Besides, when compared with PC in discrete data sets, the proposed approach obtained substantial improvement (16.17% against 28.07% in terms of mean error rate).

CloseRead Abstract

2021

Tensor decomposition for analysing time-evolving social networks: an overview

Authors
Fernandes, S; Fanaee T, H; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Social networks are becoming larger and more complex as new ways of collecting social interaction data arise (namely from online social networks, mobile devices sensors, ...). These networks are often large-scale and of high dimensionality. Therefore, dealing with such networks became a challenging task. An intuitive way to deal with this complexity is to resort to tensors. In this context, the application of tensor decomposition has proven its usefulness in modelling and mining these networks: it has not only been applied for exploratory analysis (thus allowing the discovery of interaction patterns), but also for more demanding and elaborated tasks such as community detection and link prediction. In this work, we provide an overview of the methods based on tensor decomposition for the purpose of analysing time-evolving social networks from various perspectives: from community detection, link prediction and anomaly/event detection to network summarization and visualization. In more detail, we discuss the ideas exploited to carry out each social network analysis task as well as its limitations in order to give a complete coverage of the topic.

CloseRead Abstract

2021

Using network features for credit scoring in microfinance

Authors
Paraiso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers' credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network's topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman's post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov-Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between "good" and "bad" customers, in credit scoring classification problems.

CloseRead Abstract

2022

An Exploratory Diagnosis of Artificial Intelligence Risks for a Responsible Governance

Authors
Teixeira, S; Rodrigues, J; Veloso, B; Gama, J;

Publication
15th International Conference on Theory and Practice of Electronic Governance, ICEGOV 2022, Guimarães, Portugal, October 4-7, 2022

Abstract
Our lives have been increasingly filled with technologies that use Artificial Intelligence (AI), whether at home, in public spaces, in social organizations, or in services. Like other technologies, adopting this emerging technology also requires society's attention to the challenges that may arise from it. The media brought to the public some unexpected results from using these technologies, for example, the unfairness case in the COMPAS system. It became more evident that these technologies can have unintended consequences. In particular, in the public interest domain, these unintended consequences and their origin are a challenge for public policies, governance, and responsible AI. This work aims to identify the technological and ethical risks in data-driven decision systems based on AI and conduct a diagnosis of these risks and their perception. To do that, we use a triangulation of methods. In the first stage, a search on Web of Science has been performed. We consider all the 412 papers. The second stage corresponds to a analysis of experts. The papers have been classified according to the relevance to the topic by the experts. In the third stage, we use the survey method and include risk insights from stage two in our questions. We found 24 concerns which arise from the perspective of the ethical and technological risk perspective. The perception of participants regarding the level of concern they have with the risks of a data-driven system based on AI is high than their perception of society's concern. Fairness is considered the risk whose perception is more severe. Fairness, Bias, Accountability, Interpretability, and Explainability are considered the most relevant concepts for a responsible AI. Consequently, also the most relevant for responsible governance of AI. © 2022 ACM.

CloseRead Abstract

2021

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Authors
Becue, A; Praca, I; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principles and detection techniques applied to operational technology (OT) which forms the main attack surface of manufacturing systems. As some intrusion detection systems (IDS) already involve some AI-based techniques, we focus on existing machine-learning and data-mining based techniques in use for intrusion detection. This article presents the major strengths and weaknesses of the main techniques in use. We also discuss an assessment of their relevance for application to OT, from the manufacturer point of view. Another part of the paper introduces the essential drivers and principles of Industry 4.0, providing insights on the advent of AI in manufacturing systems as well as an understanding of the new set of challenges it implies. AI-based techniques for production monitoring, optimisation and control are proposed with insights on several application cases. The related technical, operational and security challenges are discussed and an understanding of the impact of such transition on current security practices is then provided in more details. The final part of the report further develops a vision of security challenges for Industry 4.0. It addresses aspects of orchestration of distributed detection techniques, introduces an approach to adversarial/robust AI development and concludes with human-machine behaviour monitoring requirements.

CloseRead Abstract

2021

Data stream analysis: Foundations, major tasks and tools

Authors
Bahri, M; Bifet, A; Gama, J; Gomes, HM; Maniu, S;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The significant growth of interconnected Internet-of-Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In practice, several critical issues emerge when extracting useful knowledge from these potentially infinite data, mainly because of their evolving nature and high arrival rate which implies an inability to store them entirely. In this work, we provide a comprehensive survey that discusses the research constraints and the current state-of-the-art in this vibrant framework. Moreover, we present an updated overview of the latest contributions proposed in different stream mining tasks, particularly classification, regression, clustering, and frequent patterns. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

CloseRead Abstract