Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Gama

2021

Forecasting conditional extreme quantiles for wind energy

Authors
Goncalves, C; Cavalcante, L; Brito, M; Bessa, RJ; Gama, J;

Publication
ELECTRIC POWER SYSTEMS RESEARCH

Abstract
Probabilistic forecasting of distribution tails (i.e., quantiles below 0.05 and above 0.95) is challenging for non parametric approaches since data for extreme events are scarce. A poor forecast of extreme quantiles can have a high impact in various power system decision-aid problems. An alternative approach more robust to data sparsity is extreme value theory (EVT), which uses parametric functions for modelling distribution's tails. In this work, we apply conditional EVT estimators to historical data by directly combining gradient boosting trees with a truncated generalized Pareto distribution. The parametric function parameters are conditioned by covariates such as wind speed or direction from a numerical weather predictions grid. The results for a wind power plant located in Galicia, Spain, show that the proposed method outperforms state-of-the-art methods in terms of quantile score.

2020

AutoML for Stream k-Nearest Neighbors Classification

Authors
Bahri, M; Veloso, B; Bifet, A; Gama, J;

Publication
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)

Abstract
The last few decades have witnessed a significant evolution of technology in different domains, changing the way the world operates, which leads to an overwhelming amount of data generated in an open-ended way as streams. Over the past years, we observed the development of several machine learning algorithms to process big data streams. However, the accuracy of these algorithms is very sensitive to their hyper-parameters, which requires expertise and extensive trials to tune. Another relevant aspect is the high-dimensionality of data, which can causes degradation to computational performance. To cope with these issues, this paper proposes a stream k-nearest neighbors (kNN) algorithm that applies an internal dimension reduction to the stream in order to reduce the resource usage and uses an automatic monitoring system that tunes dynamically the configuration of the kNN algorithm and the output dimension size with big data streams. Experiments over a wide range of datasets show that the predictive and computational performances of the kNN algorithm are improved.

2021

CAUSAL DISCOVERY IN MACHINE LEARNING: THEORIES AND APPLICATIONS

Authors
Nogueira, AR; Gama, J; Ferreira, CA;

Publication
JOURNAL OF DYNAMICS AND GAMES

Abstract
Determining the cause of a particular event has been a case of study for several researchers over the years. Finding out why an event happens (its cause) means that, for example, if we remove the cause from the equation, we can stop the effect from happening or if we replicate it, we can create the subsequent effect. Causality can be seen as a mean of predicting the future, based on information about past events, and with that, prevent or alter future outcomes. This temporal notion of past and future is often one of the critical points in discovering the causes of a given event. The purpose of this survey is to present a cross-sectional view of causal discovery domain, with an emphasis in the machine learning/data mining area.

2021

Hyperparameter self-tuning for data streams

Authors
Veloso, B; Gama, J; Malheiro, B; Vinagre, J;

Publication
INFORMATION FUSION

Abstract
The number of Internet of Things devices generating data streams is expected to grow exponentially with the support of emergent technologies such as 5G networks. Therefore, the online processing of these data streams requires the design and development of suitable machine learning algorithms, able to learn online, as data is generated. Like their batch-learning counterparts, stream-based learning algorithms require careful hyperparameter settings. However, this problem is exacerbated in online learning settings, especially with the occurrence of concept drifts, which frequently require the reconfiguration of hyperparameters. In this article, we present SSPT, an extension of the Self Parameter Tuning (SPT) optimisation algorithm for data streams. We apply the Nelder-Mead algorithm to dynamically-sized samples, converging to optimal settings in a single pass over data while using a relatively small number of hyperparameter configurations. In addition, our proposal automatically readjusts hyperparameters when concept drift occurs. To assess the effectiveness of SSPT, the algorithm is evaluated with three different machine learning problems: recommendation, regression, and classification. Experiments with well-known data sets show that the proposed algorithm can outperform previous hyperparameter tuning efforts by human experts. Results also show that SSPT converges significantly faster and presents at least similar accuracy when compared with the previous double-pass version of the SPT algorithm.

2021

Advances in Intelligent Data Analysis XIX - 19th International Symposium on Intelligent Data Analysis, IDA 2021, Porto, Portugal, April 26-28, 2021, Proceedings

Authors
Abreu, PH; Rodrigues, PP; Fernández, A; Gama, J;

Publication
IDA

Abstract

2020

Using Network Features for Credit Scoring in MicroFinance: Extended Abstract

Authors
Paraiso, P; Ruiz, S; Gomes, P; Rodrigues, L; Gama, J;

Publication
2020 IEEE 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2020)

Abstract
This paper uses non-traditional data, from a MicroFinance Institution (MFI), in a Credit Scoring loan classification problem and addresses a common problem in emerging markets of the lack of a verifiable customers' credit history. We perform a set of experiments to define a baseline model and prove the relevance of node embedding features, in credit scoring models, using a real world dataset. © 2020 IEEE.

  • 39
  • 89