Publicacoes - INESC TEC

Publicações

Publicações por João Gama

2018

Dynamic Laplace: Efficient Centrality Measure for Weighted or Unweighted Evolving Networks

Autores
Cordeiro, M; Sarmento, RP; Brazdil, P; Gama, J;

Publicação
CoRR

Abstract

2016

SimTensor: A synthetic tensor data generator

Autores
T, HadiFanaee; Gama, Joao;

Publicação
CoRR

Abstract

2022

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Autores
Jesus, S; Pombal, J; Alves, D; Cruz, AF; Saleiro, P; Ribeiro, RP; Gama, J; Bizarro, P;

Publicação
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022

Abstract

2022

The MetroPT dataset for predictive maintenance

Autores
Veloso, B; Gama, J; Ribeiro, RP; Pereira, PM;

Publicação
SCIENTIFIC DATA

Abstract
The paper describes the MetroPT data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected in 2022 to develop machine learning methods for online anomaly detection and failure prediction. Several analog sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed) provide a framework that can be easily used and help the development of new machine learning methods. This dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.

FecharLer Abstract

2023

Novelty detection for multi-label stream classification under extreme verification latency

Autores
Costa, JD; Júnior; Faria, ER; Silva, JA; Gama, J; Cerri, R;

Publicação
Appl. Soft Comput.

Abstract
Multi-Label Stream Classification (MLSC) is the classification streaming examples into multiple classes simultaneously. Since new classes may emerge during the streaming process (concept evolution) and known classes may change over time (concept drift) it is challenging task. In real situations, concept drift and concept evolution occur in scenarios where the actual labels of arriving examples are never available; hence it is impractical to update decision models in a supervised fashion. This is known as Extreme Verification Latency, a topic that has not been well investigated in MLSC literature. This paper proposes a new method called MultI-label learNing Algorithm for Data Streams with Binary Relevance transformation (MINAS-BR), integrated with a Novelty Detection (ND) procedure for detecting concept evolution and concept drift, updating the model in an unsupervised fashion. Furthermore, since the label space is not static, we propose a new evaluation methodology for MLSC under extreme verification latency. Experiments over synthetic and real-world data sets with different concept drift and concept evolution scenarios confirmed the strategies employed in the MINAS-BR and presented relevant advances for handling streaming multi-label data. © 2023 Elsevier B.V.

FecharLer Abstract

2023

Online Influence Forest for Streaming Anomaly Detection

Autores
Martins, I; Resende, JS; Gama, J;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023

Abstract
As the digital world grows, data is being collected at high speed on a continuous and real-time scale. Hence, the imposed imbalanced and evolving scenario that introduces learning from streaming data remains a challenge. As the research field is still open to consistent strategies that assess continuous and evolving data properties, this paper proposes an unsupervised, online, and incremental anomaly detection ensemble of influence trees that implement adaptive mechanisms to deal with inactive or saturated leaves. This proposal features the fourth standardized moment, also known as kurtosis, as the splitting criteria and the isolation score, Shannon's information content, and the influence function of an instance as the anomaly score. In addition to improving interpretability, this proposal is also evaluated on publicly available datasets, providing a detailed discussion of the results.

FecharLer Abstract