Publicacoes - INESC TEC

Publicações

Publicações por Vítor Manuel Cerqueira

2018

On Evaluating Floating Car Data Quality for Knowledge Discovery

Autores
Cerqueira, V; Moreira Matias, L; Khiari, J; van Lint, H;

Publicação
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Abstract
Floating car data (FCD) denotes the type of data (location, speed, and destination) produced and broadcasted periodically by running vehicles. Increasingly, intelligent transportation systems take advantage of such data for prediction purposes as input to road and transit control and to discover useful mobility patterns with applications to transport service design and planning, to name just a few applications. However, there are considerable quality issues that affect the usefulness and efficacy of FCD in these many applications. In this paper, we propose a methodology to compute such quality indicators automatically for large FCD sets. It leverages on a set of statistical indicators (named Yuki-san) covering multiple dimensions of FCD such as spatio-temporal coverage, accuracy, and reliability. As such, the Yuki-san indicators provide a quick and intuitive means to assess the potential "value" and "veracity" characteristics of the data. Experimental results with two mobility-related data mining and supervised learning tasks on the basis of two real-world FCD sources show that the Yuki-san indicators are indeed consistent with how well the applications perform using the data. With a wider variety of FCD (e.g., from navigation systems and CAN buses) becoming available, further research and validation into the dimensions covered and the efficacy of the Yuki-San indicators is needed.

FecharLer Abstract

2019

Arbitrage of forecasting experts

Autores
Cerqueira, V; Torgo, L; Pinto, F; Soares, C;

Publicação
MACHINE LEARNING

Abstract
Forecasting is an important task across several domains. Its generalised interest is related to the uncertainty and complex evolving structure of time series. Forecasting methods are typically designed to cope with temporal dependencies among observations, but it is widely accepted that none is universally applicable. Therefore, a common solution to these tasks is to combine the opinion of a diverse set of forecasts. In this paper we present an approach based on arbitrating, in which several forecasting models are dynamically combined to obtain predictions. Arbitrating is a metalearning approach that combines the output of experts according to predictions of the loss that they will incur. We present an approach for retrieving out-of-bag predictions that significantly improves its data efficiency. Finally, since diversity is a fundamental component in ensemble methods, we propose a method for explicitly handling the inter-dependence between experts when aggregating their predictions. Results from extensive empirical experiments provide evidence of the method's competitiveness relative to state of the art approaches. The proposed method is publicly available in a software package.

FecharLer Abstract

2019

Layered Learning for Early Anomaly Detection: Predicting Critical Health Episodes

Autores
Cerqueira, V; Torgo, L; Soares, C;

Publicação
Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28-30, 2019, Proceedings

Abstract
Critical health events represent a relevant cause of mortality in intensive care units of hospitals, and their timely prediction has been gaining increasing attention. This problem is an instance of the more general predictive task of early anomaly detection in time series data. One of the most common approaches to solve this problem is to use standard classification methods. In this paper we propose a novel method that uses a layered learning architecture to solve early anomaly detection problems. One key contribution of our work is the idea of pre-conditional events, which denote arbitrary but computable relaxed versions of the event of interest. We leverage this idea to break the original problem into two layers, which we hypothesize are easier to solve. Focusing on critical health episodes, the results suggest that the proposed approach is advantageous relative to state of the art approaches for early anomaly detection. Although we focus on a particular case study, the proposed method is generalizable to other domains. © Springer Nature Switzerland AG 2019.

FecharLer Abstract

2021

Empirical Study on the Impact of Different Sets of Parameters of Gradient Boosting Algorithms for Time-Series Forecasting with LightGBM

Autores
Barros, F; Cerqueira, V; Soares, C;

Publicação
PRICAI 2021: Trends in Artificial Intelligence - 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, November 8-12, 2021, Proceedings, Part I

Abstract
LightGBM has proven to be an effective forecasting algorithm by winning the M5 forecasting competition. However, given the sensitivity of LightGBM to hyperparameters, it is likely that their default values are not optimal. This work aims to answer whether it is essential to tune the hyperparameters of LightGBM to obtain better accuracy in time series forecasting and whether it can be done efficiently. Our experiments consisted of the collection and processing of data as well as hyperparameters generation and finally testing. We observed that on the 58 time series tested, the mean squared error is reduced by a maximum of 17.45% when using randomly generated configurations in contrast to using the default one. Additionally, the study of the individual hyperparameters’ performance was done. Based on the results obtained, we propose an alternative set of default LightGBM hyperparameter values to be used whilst using time series data for forecasting. © 2021, Springer Nature Switzerland AG.

FecharLer Abstract

2020

Evaluating time series forecasting models: an empirical study on performance estimation methods

Autores
Cerqueira, V; Torgo, L; Mozetic, I;

Publicação
MACHINE LEARNING

Abstract
Performance estimation aims at estimating the loss that a predictive model will incur on unseen data. This process is a fundamental stage in any machine learning project. In this paper we study the application of these methods to time series forecasting tasks. For independent and identically distributed data the most common approach is cross-validation. However, the dependency among observations in time series raises some caveats about the most appropriate way to estimate performance in this type of data. Currently, there is no consensual approach. We contribute to the literature by presenting an extensive empirical study which compares different performance estimation methods for time series forecasting tasks. These methods include variants of cross-validation, out-of-sample (holdout), and prequential approaches. Two case studies are analysed: One with 174 real-world time series and another with three synthetic time series. Results show noticeable differences in the performance estimation methods in the two scenarios. In particular, empirical experiments suggest that blocked cross-validation can be applied to stationary time series. However, when the time series are non-stationary, the most accurate estimates are produced by out-of-sample methods, particularly the holdout approach repeated in multiple testing periods.

FecharLer Abstract

2022

A case study comparing machine learning with statistical methods for time series forecasting: size matters

Autores
Cerqueira, V; Torgo, L; Soares, C;

Publicação
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS

Abstract
Time series forecasting is one of the most active research topics. Machine learning methods have been increasingly adopted to solve these predictive tasks. However, in a recent work, evidence was shown that these approaches systematically present a lower predictive performance relative to simple statistical methods. In this work, we counter these results. We show that these are only valid under an extremely low sample size. Using a learning curve method, our results suggest that machine learning methods improve their relative predictive performance as the sample size grows. The R code to reproduce all of our experiments is available at https://github.com/vcerqueira/MLforForecasting.

FecharLer Abstract