Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2016

Hierarchical time series forecast in electrical grids

Autores
Almeida, V; Ribeiro, R; Gama, J;

Publicação
Lecture Notes in Electrical Engineering

Abstract
Hierarchical time series is a first order of importance topic. Effectively, there are several applications where time series can be naturally disaggregated in a hierarchical structure using attributes such as geographical location, product type, etc. Power networks face interesting problems related to its transition to computer-aided grids. Data can be naturally disaggregated in a hierarchical structure, and there is the possibility to look for both single and aggregated points along the grid. Along this work, we applied different hierarchical forecasting methods to them. Three different approaches are compared, two common approaches, bottom-up approach, top-down approach and another one based on the hierarchical structure of data, the optimal regression combination. The evaluation considers short-term forecasting (24-h ahead). Additionally,we discussed the importance associated to the correlation degree among series to improve forecasting accuracy. Our results demonstrated that the hierarchical approach outperforms bottom-up approach at intermediate/high levels. At lower levels, it presents a superior performance in less homogeneous substations, i. e. for the substations linked to different type of customers. Additionally, its performance is comparable to the top-down approach at top levels. This approach revealed to be an interesting tool for hierarchical data analysis. It allows to achieve a good performance at top levels as the top-down approach and at same time it allows to capture series dynamics at bottom levels as the bottom-up. © Springer Science+Business Media Singapore 2016.

2016

Sequential anomalies: a study in the Railway Industry

Autores
Ribeiro, RP; Pereira, P; Gama, J;

Publicação
MACHINE LEARNING

Abstract
Concerned with predicting equipment failures, predictive maintenance has a high impact both at a technical and at a financial level. Most modern equipments have logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for anomaly and novelty detection enables us to explore those datasets, building predictive systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case, we use a failure detection system to predict train door breakdowns before they happen using data from their logging system. We use sensor data from pneumatic valves that control the open and close cycles of a door. Still, the failure of a cycle does not necessarily indicates a breakdown. A cycle might fail due to user interaction. The goal of this study is to detect structural failures in the automatic train door system, not when there is a cycle failure, but when there are sequences of cycle failures. We study three methods for such structural failure detection: outlier detection, anomaly detection and novelty detection, using different windowing strategies. We propose a two-stage approach, where the output of a point-anomaly algorithm is post-processed by a low-pass filter to obtain a subsequence-anomaly detection. The main result of the two-level architecture is a strong impact in the false alarm rate.

2016

A Survey of Predictive Modeling on Im balanced Domains

Autores
Branco, P; Torgo, L; Ribeiro, RP;

Publicação
ACM COMPUTING SURVEYS

Abstract
Many real-world data-mining applications involve obtaining predictive models using datasets with strongly imbalanced distributions of the target variable. Frequently, the least-common values of this target variable are associated with events that are highly relevant for end users (e.g., fraud detection, unusual returns on stock markets, anticipation of catastrophes, etc.). Moreover, the events may have different costs and benefits, which, when associated with the rarity of some of them on the available training data, creates serious problems to predictive modeling techniques. This article presents a survey of existing techniques for handling these important applications of predictive analytics. Although most of the existing work addresses classification tasks (nominal target variables), we also describe methods designed to handle similar problems within regression tasks (numeric target variables). In this survey, we discuss the main challenges raised by imbalanced domains, propose a definition of the problem, describe the main approaches to these tasks, propose a taxonomy of the methods, summarize the conclusions of existing comparative studies as well as some theoretical analyses of some methods, and refer to some related problems within predictive modeling.

2016

UBL: an R package for Utility-based Learning

Autores
Branco, P; Ribeiro, RP; Torgo, L;

Publicação
CoRR

Abstract

2016

Collaborative Data Analysis in Hyperconnected Transportation Systems

Autores
Zarmehri, MN; Soares, C;

Publicação
COLLABORATION IN A HYPERCONNECTED WORLD

Abstract
Taxi trip duration affects the efficiency of operation, the satisfaction of drivers, and, mainly, the satisfaction of the customers, therefore, it is an important metric for the taxi companies. Especially, knowing the predicted trip duration beforehand is very useful to allocate taxis to the taxi stands and also finding the best route for different trips. The existence of hyperconnected network can help to collect data from connected taxis in the city environment and use it collaboratively between taxis for a better prediction. As a matter of fact, the existence of high volume of data, for each individual taxi, several models can be generated. Moreover, taking into account the difference between the data collected by taxis, this data can be organized into different levels of hierarchy. However, finding the best level of granularity which leads to the best model for an individual taxi could be computationally expensive. In this paper, the use of metalearning for addressing the problem of selection of the right level of the hierarchy and the right algorithm that generates the model with the best performance for each taxi is proposed. The proposed approach is evaluated by the data collected in the Drive-In project. The results show that metalearning helps the selection of the algorithm with the best performance.

2016

Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance

Autores
Cerqueira, V; Pinto, F; Sa, C; Soares, C;

Publicação
ADVANCES IN INTELLIGENT DATA ANALYSIS XV

Abstract
We describe a data mining workflow for predictive maintenance of the Air Pressure System in heavy trucks. Our approach is composed by four steps: (i) a filter that excludes a subset of features and examples based on the number of missing values (ii) a metafeatures engineering procedure used to create a meta-level features set with the goal of increasing the information on the original data; (iii) a biased sampling method to deal with the class imbalance problem; and (iv) boosted trees to learn the target concept. Results show that the metafeatures engineering and the biased sampling method are critical for improving the performance of the classifier.

  • 210
  • 430