Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Vítor Manuel Cerqueira

2015

A framework for analysing dynamic communities in large-scale social networks

Authors
Cerqueira, V; Oliveira, M; Gama, J;

Publication
ICEIS 2015 - 17th International Conference on Enterprise Information Systems, Proceedings

Abstract
Telecommunications companies must process large-scale social networks that reveal the communication patterns among their customers. These networks are dynamic in nature as new customers appear, old customers leave, and the interaction among customers changes over time. One way to uncover the evolution patterns of such entities is by monitoring the evolution of the communities they belong to. Large-scale networks typically comprise thousands, or hundreds of thousands, of communities and not all of them are worth monitoring, or interesting from the business perspective. Several methods have been proposed for tracking the evolution of groups of entities in dynamic networks but these methods lack strategies to effectively extract knowledge and insight from the analysis. In this paper we tackle this problem by proposing an integrated business-oriented framework to track and interpret the evolution of communities in very large networks. The framework encompasses several steps such as network sampling, community detection, community selection, monitoring of dynamic communities and rule-based interpretation of community evolutionary profiles. The usefulness of the proposed framework is illustrated using a real-world large-scale social network from a major telecommunications company.

2016

Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance

Authors
Cerqueira, V; Pinto, F; Sa, C; Soares, C;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
We describe a data mining workflow for predictive maintenance of the Air Pressure System in heavy trucks. Our approach is composed by four steps: (i) a filter that excludes a subset of features and examples based on the number of missing values (ii) a metafeatures engineering procedure used to create a meta-level features set with the goal of increasing the information on the original data; (iii) a biased sampling method to deal with the class imbalance problem; and (iv) boosted trees to learn the target concept. Results show that the metafeatures engineering and the biased sampling method are critical for improving the performance of the classifier.

2017

Arbitrated Ensemble for Solar Radiation Forecasting

Authors
Cerqueira, V; Torgo, L; Soares, C;

Publication
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2017, PT I

Abstract
Utility companies rely on solar radiation forecasting models to control the supply and demand of energy as well as the operability of the grid. They use these predictive models to schedule power plan operations, negotiate prices in the electricity market and improve the performance of solar technologies in general. This paper proposes a novel method for global horizontal irradiance forecasting. The method is based on an ensemble approach, in which individual competing models are arbitrated by a metalearning layer. The goal of arbitrating individual forecasters is to dynamically combine them according to their aptitude in the input data. We validate our proposed model for solar radiation forecasting using data collected by a real-world provider. The results from empirical experiments show that the proposed method is competitive with other methods, including current state-of-the-art methods used for time series forecasting tasks.

2017

autoBagging: Learning to Rank Bagging Workflows with Metalearning

Authors
Pinto, F; Cerqueira, V; Soares, C; Moreira, JM;

Publication
Proceedings of the International Workshop on Automatic Selection, Configuration and Composition of Machine Learning Algorithms co-located with the European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases, AutoML@PKDD/ECML 2017, Skopje, Macedonia, September 22, 2017.

Abstract
Machine Learning (ML) has been successfully applied to a wide range of domains and applications. One of the techniques behind most of these successful applications is Ensemble Learning (EL), the field of ML that gave birth to methods such as Random Forests or Boosting. The complexity of applying these techniques together with the market scarcity on ML experts, has created the need for systems that enable a fast and easy drop-in replacement for ML libraries. Automated machine learning (autoML) is the field of ML that attempts to answers these needs. We propose autoBagging, an autoML system that automatically ranks 63 bagging workflows by exploiting past performance and metalearning. Results on 140 classification datasets from the OpenML platform show that autoBagging can yield better performance than the Average Rank method and achieve results that are not statistically different from an ideal model that systematically selects the best workflow for each dataset. For the purpose of reproducibility and generalizability, autoBagging is publicly available as an R package on CRAN.

2017

Arbitrated Ensemble for Time Series Forecasting

Authors
Cerqueira, V; Torgo, L; Pinto, F; Soares, C;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT II

Abstract
This paper proposes an ensemble method for time series forecasting tasks. Combining different forecasting models is a common approach to tackle these problems. State-of-the-art methods track the loss of the available models and adapt their weights accordingly. Metalearning strategies such as stacking are also used in these tasks. We propose a metalearning approach for adaptively combining forecasting models that specializes them across the time series. Our assumption is that different forecasting models have different areas of expertise and a varying relative performance. Moreover, many time series show recurring structures due to factors such as seasonality. Therefore, the ability of a method to deal with changes in relative performance of models as well as recurrent changes in the data distribution can be very useful in dynamic environments. Our approach is based on an ensemble of heterogeneous forecasters, arbitrated by a metalearning model. This strategy is designed to cope with the different dynamics of time series and quickly adapt the ensemble to regime changes. We validate our proposal using time series from several real world domains. Empirical results show the competitiveness of the method in comparison to state-of-the-art approaches for combining forecasters.

2017

Dynamic and Heterogeneous Ensembles for Time Series Forecasting

Authors
Cerqueira, V; Torgo, L; Oliveira, M; Pfahringer, B;

Publication
2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)

Abstract
This paper addresses the issue of learning time series forecasting models in changing environments by leveraging the predictive power of ensemble methods. Concept drift adaptation is performed in an active manner, by dynamically combining base learners according to their recent performance using a non-linear function. Diversity in the ensembles is encouraged with several strategies that include heterogeneity among learners, sampling techniques and computation of summary statistics as extra predictors. Heterogeneity is used with the goal of better coping with different dynamic regimes of the time series. The driving hypotheses of this work are that (i) heterogeneous ensembles should better fit different dynamic regimes and (ii) dynamic aggregation should allow for fast detection and adaptation to regime changes. We extend some strategies typically used in classification tasks to time series forecasting. The proposed methods are validated using Monte Carlo simulations on 16 real-world univariate time series with numerical outcome as well as an artificial series with clear regime shifts. The results provide strong empirical evidence for our hypotheses. To encourage reproducibility the proposed method is publicly available as a software package.

  • 1
  • 5