2023
Authors
Silva, PR; Vinagre, J; Gama, J;
Publication
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Federated learning (FL) is a collaborative, decentralized privacy-preserving method to attach the challenges of storing data and data privacy. Artificial intelligence, machine learning, smart devices, and deep learning have strongly marked the last years. Two challenges arose in data science as a result. First, the regulation protected the data by creating the General Data Protection Regulation, in which organizations are not allowed to keep or transfer data without the owner's authorization. Another challenge is the large volume of data generated in the era of big data, and keeping that data in one only server becomes increasingly tricky. Therefore, the data is allocated into different locations or generated by devices, creating the need to build models or perform calculations without transferring data to a single location. The new term FL emerged as a sub-area of machine learning that aims to solve the challenge of making distributed models with privacy considerations. This survey starts by describing relevant concepts, definitions, and methods, followed by an in-depth investigation of federated model evaluation. Finally, we discuss three promising applications for further research: anomaly detection, distributed data streams, and graph representation.This article is categorized under:Technologies > Machine LearningTechnologies > Artificial Intelligence
2023
Authors
Fernandes, S; Fanaee T, H; Gama, J; Tisljaric, L; Smuc, T;
Publication
MACHINE LEARNING
Abstract
Densification events in time-evolving networks refer to instants in which the network density, that is, the number of edges, is substantially larger than in the remaining. These events can occur at a global level, involving the majority of the nodes in the network, or at a local level involving only a subset of nodes.While global densification events affect the overall structure of the network, the same does not hold in local densification events, which may remain undetectable by the existing detection methods. In order to address this issue, we propose WINdowed TENsor decomposition for Densification Event Detection (WINTENDED) for the detection and characterization of both global and local densification events. Our method combines a sliding window decomposition with statistical tools to capture the local dynamics of the network and automatically find the irregular behaviours. According to our experimental evaluation, WINTENDED is able to spot global densification events at least as accurately as its competitors, while also being able to find local densification events, on the contrary to its competitors.
2023
Authors
Tabassum, S; Gama, J; Azevedo, PJ; Cordeiro, M; Martins, C; Martins, A;
Publication
EXPERT SYSTEMS
Abstract
Influence Analysis is one of the well-known areas of Social Network Analysis. However, discovering influencers from micro-blog networks based on topics has gained recent popularity due to its specificity. Besides, these data networks are massive, continuous and evolving. Therefore, to address the above challenges we propose a dynamic framework for topic modelling and identifying influencers in the same process. It incorporates dynamic sampling, community detection and network statistics over graph data stream from a social media activity management application. Further, we compare the graph measures against each other empirically and observe that there is no evidence of correlation between the sets of users having large number of friends and the users whose posts achieve high acceptance (i.e., highly liked, commented and shared posts). Therefore, we propose a novel approach that incorporates a user's reachability and also acceptability by other users. Consequently, we improve on graph metrics by including a dynamic acceptance score (integrating content quality with network structure) for ranking influencers in micro-blogs. Additionally, we analysed the topic clusters' structure and quality with empirical experiments and visualization.
2023
Authors
Costa, JD; Júnior; Faria, ER; Silva, JA; Gama, J; Cerri, R;
Publication
Appl. Soft Comput.
Abstract
Multi-Label Stream Classification (MLSC) is the classification streaming examples into multiple classes simultaneously. Since new classes may emerge during the streaming process (concept evolution) and known classes may change over time (concept drift) it is challenging task. In real situations, concept drift and concept evolution occur in scenarios where the actual labels of arriving examples are never available; hence it is impractical to update decision models in a supervised fashion. This is known as Extreme Verification Latency, a topic that has not been well investigated in MLSC literature. This paper proposes a new method called MultI-label learNing Algorithm for Data Streams with Binary Relevance transformation (MINAS-BR), integrated with a Novelty Detection (ND) procedure for detecting concept evolution and concept drift, updating the model in an unsupervised fashion. Furthermore, since the label space is not static, we propose a new evaluation methodology for MLSC under extreme verification latency. Experiments over synthetic and real-world data sets with different concept drift and concept evolution scenarios confirmed the strategies employed in the MINAS-BR and presented relevant advances for handling streaming multi-label data. © 2023 Elsevier B.V.
2023
Authors
Martins, I; Resende, JS; Gama, J;
Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023
Abstract
As the digital world grows, data is being collected at high speed on a continuous and real-time scale. Hence, the imposed imbalanced and evolving scenario that introduces learning from streaming data remains a challenge. As the research field is still open to consistent strategies that assess continuous and evolving data properties, this paper proposes an unsupervised, online, and incremental anomaly detection ensemble of influence trees that implement adaptive mechanisms to deal with inactive or saturated leaves. This proposal features the fourth standardized moment, also known as kurtosis, as the splitting criteria and the isolation score, Shannon's information content, and the influence function of an instance as the anomaly score. In addition to improving interpretability, this proposal is also evaluated on publicly available datasets, providing a detailed discussion of the results.
2023
Authors
Silva, MEP; Veloso, B; Gama, J;
Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VII
Abstract
The transition to Industry 4.0 provoked a transformation of industrial manufacturing with a significant leap in automation and intelligent systems. This paradigm shift has brought about a mindset that emphasizes predictive maintenance: detecting future failures when current behaviour of industrial processes and machines is thought to be normal. The constant monitoring of industrial equipment produces massive quantities of data that enables the application of machine learning approaches to this task. This study uses deep learning-based models to build a data-driven predictive maintenance framework for the air production unit (APU), a crucial system for the proper functioning of a Metro do Porto train. This public transport system moves thousands of people every day and train failures lead to delays and loss of trust by clients. Therefore, it is essential not only to detect APU failures before they occur to minimize negative impacts, but also to provide explanations for the failure warnings that can aid in decision-making processes. We propose an autoencoder architecture trained with an adversarial loss, known as the Wasserstein Autoencoder with Generative Adversarial Network (WAE-GAN), designed to detect sensor failures in systems connected to the APU. Our model can detect APU failures up to two hours before they occur, allowing timely intervention of the maintenance teams. We further augment our model with an explainability layer, by providing explanations generated by a rule-based model that focuses on rare events. Results show that our model is able to detect APU failures without any false alarms, fulfilling the requisites of Metro do Porto for early detection of the failures.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.