Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2021

Tensor decomposition for analysing time-evolving social networks: an overview

Authors
Fernandes, S; Fanaee T, H; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Social networks are becoming larger and more complex as new ways of collecting social interaction data arise (namely from online social networks, mobile devices sensors, ...). These networks are often large-scale and of high dimensionality. Therefore, dealing with such networks became a challenging task. An intuitive way to deal with this complexity is to resort to tensors. In this context, the application of tensor decomposition has proven its usefulness in modelling and mining these networks: it has not only been applied for exploratory analysis (thus allowing the discovery of interaction patterns), but also for more demanding and elaborated tasks such as community detection and link prediction. In this work, we provide an overview of the methods based on tensor decomposition for the purpose of analysing time-evolving social networks from various perspectives: from community detection, link prediction and anomaly/event detection to network summarization and visualization. In more detail, we discuss the ideas exploited to carry out each social network analysis task as well as its limitations in order to give a complete coverage of the topic.

2021

Using network features for credit scoring in microfinance

Authors
Paraiso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers' credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network's topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman's post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov-Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between "good" and "bad" customers, in credit scoring classification problems.

2022

An Exploratory Diagnosis of Artificial Intelligence Risks for a Responsible Governance

Authors
Teixeira, S; Rodrigues, J; Veloso, B; Gama, J;

Publication
15th International Conference on Theory and Practice of Electronic Governance, ICEGOV 2022, Guimarães, Portugal, October 4-7, 2022

Abstract
Our lives have been increasingly filled with technologies that use Artificial Intelligence (AI), whether at home, in public spaces, in social organizations, or in services. Like other technologies, adopting this emerging technology also requires society's attention to the challenges that may arise from it. The media brought to the public some unexpected results from using these technologies, for example, the unfairness case in the COMPAS system. It became more evident that these technologies can have unintended consequences. In particular, in the public interest domain, these unintended consequences and their origin are a challenge for public policies, governance, and responsible AI. This work aims to identify the technological and ethical risks in data-driven decision systems based on AI and conduct a diagnosis of these risks and their perception. To do that, we use a triangulation of methods. In the first stage, a search on Web of Science has been performed. We consider all the 412 papers. The second stage corresponds to a analysis of experts. The papers have been classified according to the relevance to the topic by the experts. In the third stage, we use the survey method and include risk insights from stage two in our questions. We found 24 concerns which arise from the perspective of the ethical and technological risk perspective. The perception of participants regarding the level of concern they have with the risks of a data-driven system based on AI is high than their perception of society's concern. Fairness is considered the risk whose perception is more severe. Fairness, Bias, Accountability, Interpretability, and Explainability are considered the most relevant concepts for a responsible AI. Consequently, also the most relevant for responsible governance of AI. © 2022 ACM.

2021

Artificial intelligence, cyber-threats and Industry 4.0: challenges and opportunities

Authors
Becue, A; Praca, I; Gama, J;

Publication
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principles and detection techniques applied to operational technology (OT) which forms the main attack surface of manufacturing systems. As some intrusion detection systems (IDS) already involve some AI-based techniques, we focus on existing machine-learning and data-mining based techniques in use for intrusion detection. This article presents the major strengths and weaknesses of the main techniques in use. We also discuss an assessment of their relevance for application to OT, from the manufacturer point of view. Another part of the paper introduces the essential drivers and principles of Industry 4.0, providing insights on the advent of AI in manufacturing systems as well as an understanding of the new set of challenges it implies. AI-based techniques for production monitoring, optimisation and control are proposed with insights on several application cases. The related technical, operational and security challenges are discussed and an understanding of the impact of such transition on current security practices is then provided in more details. The final part of the report further develops a vision of security challenges for Industry 4.0. It addresses aspects of orchestration of distributed detection techniques, introduces an approach to adversarial/robust AI development and concludes with human-machine behaviour monitoring requirements.

2021

Data stream analysis: Foundations, major tasks and tools

Authors
Bahri, M; Bifet, A; Gama, J; Gomes, HM; Maniu, S;

Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
The significant growth of interconnected Internet-of-Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In practice, several critical issues emerge when extracting useful knowledge from these potentially infinite data, mainly because of their evolving nature and high arrival rate which implies an inability to store them entirely. In this work, we provide a comprehensive survey that discusses the research constraints and the current state-of-the-art in this vibrant framework. Moreover, we present an updated overview of the latest contributions proposed in different stream mining tasks, particularly classification, regression, clustering, and frequent patterns. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining

2020

Detecting Geographical Competitive Structure for POI Visit Dynamics

Authors
Fujii, T; Kumano, M; Gama, J; Kimura, M;

Publication
Complex Networks & Their Applications IX - Volume 2, Proceedings of the Ninth International Conference on Complex Networks and Their Applications, COMPLEX NETWORKS 2020, 1-3 December 2020, Madrid, Spain.

Abstract
We provide a framework for analyzing geographical influence networks that have impacts on visit event sequences for a set of point-of-interests (POIs) in a city. Since mutually-exciting Hawkes processes can naturally model temporal event data and capture interactions between those events, previous work presented a probabilistic model based on Hawkes processes, called CHP model, for finding cooperative structure among online items from their share event sequences. In this paper, based on Hawkes processes, we propose a novel probabilistic model, called RH model, for detecting geographical competitive structure in the set of POIs, and present a method of inferring it from the POI visit event history. We mathematically derive an analytical approximation formula for predicting the popularity of each of the POIs for the RH model, and also extend the CHP model so as to extract geographical cooperative structure. Using synthetic data, we first confirm the effectiveness of the inference method and the validity of the approximation formula. Using real data of Location-Based Social Networks (LBSNs), we demonstrate the significance of the RH model in terms of predicting the future events, and uncover the latent geographical influence networks from the perspective of geographical competitive and cooperative structures. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.

  • 46
  • 88