2021
Autores
Paraiso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;
Publicação
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS
Abstract
The usage of non-traditional data in credit scoring, from microfinance institutions, is very useful when trying to address the problem, very common in emerging markets, of the lack of a verifiable customers' credit history. In this context, this paper relies on data acquired from smartphones in a loan classification problem. We conduct a set of experiments concerning feature selection, strategies to deal with imbalanced datasets and algorithm choice, to define a baseline model. This model is, then, compared to others adding network features to the original ones. For that comparison, we generate a network that links a given user to its phone book contacts which are users of a given mobile application, taking into account the ethics and privacy concerns involved, and use some feature extraction techniques, such as the introduction of centrality measures and the definition of node embeddings, in order to capture certain aspects of the network's topology. Several node embedding algorithms are tested, but only Node2Vec proves to be significantly better than the baseline model, applying Friedman's post hoc tests. This node embedding algorithm outperforms all the other, representing a relative improvement, in comparison with the baseline model, of 5.74% on the mean accuracy, 7.13% on the area under the Receiver Operating Characteristic curve and 30.83% on the Kolmogorov-Smirnov statistic scores. This method, therefore, proves to be very promising when trying to discriminate between "good" and "bad" customers, in credit scoring classification problems.
2021
Autores
Becue, A; Praca, I; Gama, J;
Publicação
ARTIFICIAL INTELLIGENCE REVIEW
Abstract
This survey paper discusses opportunities and threats of using artificial intelligence (AI) technology in the manufacturing sector with consideration for offensive and defensive uses of such technology. It starts with an introduction of Industry 4.0 concept and an understanding of AI use in this context. Then provides elements of security principles and detection techniques applied to operational technology (OT) which forms the main attack surface of manufacturing systems. As some intrusion detection systems (IDS) already involve some AI-based techniques, we focus on existing machine-learning and data-mining based techniques in use for intrusion detection. This article presents the major strengths and weaknesses of the main techniques in use. We also discuss an assessment of their relevance for application to OT, from the manufacturer point of view. Another part of the paper introduces the essential drivers and principles of Industry 4.0, providing insights on the advent of AI in manufacturing systems as well as an understanding of the new set of challenges it implies. AI-based techniques for production monitoring, optimisation and control are proposed with insights on several application cases. The related technical, operational and security challenges are discussed and an understanding of the impact of such transition on current security practices is then provided in more details. The final part of the report further develops a vision of security challenges for Industry 4.0. It addresses aspects of orchestration of distributed detection techniques, introduces an approach to adversarial/robust AI development and concludes with human-machine behaviour monitoring requirements.
2021
Autores
Bahri, M; Bifet, A; Gama, J; Gomes, HM; Maniu, S;
Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
The significant growth of interconnected Internet-of-Things (IoT) devices, the use of social networks, along with the evolution of technology in different domains, lead to a rise in the volume of data generated continuously from multiple systems. Valuable information can be derived from these evolving data streams by applying machine learning. In practice, several critical issues emerge when extracting useful knowledge from these potentially infinite data, mainly because of their evolving nature and high arrival rate which implies an inability to store them entirely. In this work, we provide a comprehensive survey that discusses the research constraints and the current state-of-the-art in this vibrant framework. Moreover, we present an updated overview of the latest contributions proposed in different stream mining tasks, particularly classification, regression, clustering, and frequent patterns. This article is categorized under: Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Motivation and Emergence of Data Mining
2021
Autores
Jesus, SM; Belém, C; Balayan, V; Bento, J; Saleiro, P; Bizarro, P; Gama, J;
Publicação
CoRR
Abstract
2021
Autores
Cavadas, B; Leite, M; Pedro, N; Magalhaes, AC; Melo, J; Correia, M; Maximo, V; Camacho, R; Fonseca, NA; Figueiredo, C; Pereira, L;
Publicação
MICROORGANISMS
Abstract
The continuous characterization of genome-wide diversity in population and case-cohort samples, allied to the development of new algorithms, are shedding light on host ancestry impact and selection events on various infectious diseases. Especially interesting are the long-standing associations between humans and certain bacteria, such as the case of Helicobacter pylori, which could have been strong drivers of adaptation leading to coevolution. Some evidence on admixed gastric cancer cohorts have been suggested as supporting Homo-Helicobacter coevolution, but reliable experimental data that control both the bacterium and the host ancestries are lacking. Here, we conducted the first in vitro coinfection assays with dual human- and bacterium-matched and -mismatched ancestries, in African and European backgrounds, to evaluate the genome wide gene expression host response to H. pylori. Our results showed that: (1) the host response to H. pylori infection was greatly shaped by the human ancestry, with variability on innate immune system and metabolism; (2) African human ancestry showed signs of coevolution with H. pylori while European ancestry appeared to be maladapted; and (3) mismatched ancestry did not seem to be an important differentiator of gene expression at the initial stages of infection as assayed here.
2021
Autores
Egeter, B; Veríssimo, J; Lopes-Lima, M; chaves, c; Pinto, J; Riccardi, N; Beja, P; Fonseca, NA;
Publicação
ARPHA Conference Abstracts
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.