Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2021

Non-Intrusive Load Monitoring for Household Disaggregated Energy Sensing

Authors
Paulos, JP; Fidalgo, JN; Gama, J;

Publication
2021 IEEE MADRID POWERTECH

Abstract
The present work aims to compare several load disaggregation methods. While the supervised alternative was found to be the most competent, the semi-supervised is proved to be close in terms of potential, while the unsupervised alternative seems insufficient. By the same token, the tests with long-lasting data prove beneficial to confirm the long-term performance since no significant loss of performance is noticed with the scalar of the time-horizon. Finally, the patchwork of new parametrization and methodology fine-tuning also proves interesting for improving global performance in several methods.

2021

Generalised Partial Association in Causal Rules Discovery

Authors
Nogueira, AR; Ferreira, C; Gama, J; Pinto, A;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
One of the most significant challenges for machine learning nowadays is the discovery of causal relationships from data. This causal discovery is commonly performed using Bayesian like algorithms. However, more recently, more and more causal discovery algorithms have appeared that do not fall into this category. In this paper, we present a new algorithm that explores global causal association rules with Uncertainty Coefficient. Our algorithm, CRPA-UC, is a global structure discovery approach that combines the advantages of association mining with causal discovery and can be applied to binary and non-binary discrete data. This approach was compared to the PC algorithm using several well-known data sets, using several metrics.

2021

Modelling Voting Behaviour During a General Election Campaign Using Dynamic Bayesian Networks

Authors
Costa, P; Nogueira, AR; Gama, J;

Publication
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021)

Abstract
This work aims to develop a Machine Learning framework to predict voting behaviour. Data resulted from longitudinally collected variables during the Portuguese 2019 general election campaign. Naive Bayes (NB), and Tree Augmented Naive Bayes (TAN) and three different expert models using Dynamic Bayesian Networks (DBN) predict voting behaviour systematically for each moment in time considered using past information. Even though the differences found in some performance comparisons are not statistically significant, TAN and NB outperformed DBN experts' models. The learned models outperformed one of the experts' models when predicting abstention and two when predicting right-wing parties vote. Specifically, for the right-wing parties vote, TAN and NB presented satisfactory accuracy, while the experts' models were below 50% in the third evaluation moment.

2021

A Survey on Data-Driven Predictive Maintenance for the Railway Industry

Authors
Davari, N; Veloso, B; Costa, GD; Pereira, PM; Ribeiro, RP; Gama, J;

Publication
SENSORS

Abstract
In the last few years, many works have addressed Predictive Maintenance (PdM) by the use of Machine Learning (ML) and Deep Learning (DL) solutions, especially the latter. The monitoring and logging of industrial equipment events, like temporal behavior and fault events-anomaly detection in time-series-can be obtained from records generated by sensors installed in different parts of an industrial plant. However, such progress is incipient because we still have many challenges, and the performance of applications depends on the appropriate choice of the method. This article presents a survey of existing ML and DL techniques for handling PdM in the railway industry. This survey discusses the main approaches for this specific application within a taxonomy defined by the type of task, employed methods, metrics of evaluation, the specific equipment or process, and datasets. Lastly, we conclude and outline some suggestions for future research.

2021

An Analysis of Performance Metrics for Imbalanced Classification

Authors
Gaudreault, JG; Branco, P; Gama, J;

Publication
DISCOVERY SCIENCE (DS 2021)

Abstract
Numerous machine learning applications involve dealing with imbalanced domains, where the learning focus is on the least frequent classes. This imbalance introduces new challenges for both the performance assessment of these models and their predictive modeling. While several performance metrics have been established as baselines in balanced domains, some cannot be applied to the imbalanced case since the use of the majority class in the metric could lead to a misleading evaluation of performance. Other metrics, such as the area under the precision-recall curve, have been demonstrated to be more appropriate for imbalance domains due to their focus on class-specific performance. There are, however, many proposed implementations for this particular metric, which could potentially lead to different conclusions depending on the one used. In this research, we carry out an experimental study to better understand these issues and aim at providing a set of recommendations by studying the impact of using different metrics and different implementations of the same metric under multiple imbalance settings.

2021

A sketch for the KS test for Big Data

Authors
Galeno, TD; Gama, J; Cardoso, DO;

Publication
Anais do IX Symposium on Knowledge Discovery, Mining and Learning (KDMiLe 2021)

Abstract
Motivated by the challenges of Big Data, this paper presents an approximative algorithm to assess the Kolmogorov-Smirnov test. This goodness of fit statistical test is extensively used because it is non-parametric. This work focuses on the one-sample test, which considers the hypothesis that a given univariate sample follows some reference distribution. The method allows to evaluate the departure from such a distribution of a input stream, being space and time efficient. We show the accuracy of our algorithm by making several experiments in different scenarios: varying reference distribution and its parameters, sample size, and available memory. The performance of rival methods, some of which are considered the state-of-the-art, were compared. It is demonstrated that our algorithm is superior in most of the cases, considering the absolute error of the test statistic.

  • 40
  • 89