Publicacoes - INESC TEC

Publicações

Publicações por Rita Paula Ribeiro

2015

An Experimental Study on Predictive Models Using Hierarchical Time Series

Autores
Silva, AM; Ribeiro, RP; Gama, J;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Planning strategies play an important role in companies' management. In the decision-making process, one of the main important goals is sales forecasting. They are important for stocks planing, shop space maintenance, promotions, etc. Sales forecasting use historical data to make reliable projections for the future. In the retail sector, data has a hierarchical structure. Products are organized in hierarchical groups that reflect the business structure. In this work we present a case study, using real data, from a Portuguese leader retail company. We experimentally evaluate standard approaches for sales forecasting and compare against models that explore the hierarchical structure of the products. Moreover, we evaluate different methods to combine predictions for the different hierarchical levels. The results show that exploiting the hierarchical structure present in the data systematically reduces the error of the forecasts.

FecharLer Abstract

2016

Detection of Fraud Symptoms in the Retail Industry

Autores
Ribeiro, RP; Oliveira, R; Gama, J;

Publicação
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2016

Abstract
Data mining is one of the most effective methods for fraud detection. This is highlighted by 25% of organizations that have suffered from economic crimes [1]. This paper presents a case study using real-world data from a large retail company. We identify symptoms of fraud by looking for outliers. To identify the outliers and the context where outliers appear, we learn a regression tree. For a given node, we identify the outliers using the set of examples covered at that node, and the context as the conjunction of the conditions in the path from the root to the node. Surprisingly, at different nodes of the tree, we observe that some outliers disappear and new ones appear. From the business point of view, the outliers that are detected near the leaves of the tree are the most suspicious ones. These are cases of difficult detection, being observed only in a given context, defined by a set of rules associated with the node.

FecharLer Abstract

2016

Hierarchical time series forecast in electrical grids

Autores
Almeida, V; Ribeiro, R; Gama, J;

Publicação
Lecture Notes in Electrical Engineering

Abstract
Hierarchical time series is a first order of importance topic. Effectively, there are several applications where time series can be naturally disaggregated in a hierarchical structure using attributes such as geographical location, product type, etc. Power networks face interesting problems related to its transition to computer-aided grids. Data can be naturally disaggregated in a hierarchical structure, and there is the possibility to look for both single and aggregated points along the grid. Along this work, we applied different hierarchical forecasting methods to them. Three different approaches are compared, two common approaches, bottom-up approach, top-down approach and another one based on the hierarchical structure of data, the optimal regression combination. The evaluation considers short-term forecasting (24-h ahead). Additionally,we discussed the importance associated to the correlation degree among series to improve forecasting accuracy. Our results demonstrated that the hierarchical approach outperforms bottom-up approach at intermediate/high levels. At lower levels, it presents a superior performance in less homogeneous substations, i. e. for the substations linked to different type of customers. Additionally, its performance is comparable to the top-down approach at top levels. This approach revealed to be an interesting tool for hierarchical data analysis. It allows to achieve a good performance at top levels as the top-down approach and at same time it allows to capture series dynamics at bottom levels as the bottom-up. © Springer Science+Business Media Singapore 2016.

FecharLer Abstract

2014

Failure Prediction - An Application in the Railway Industry

Autores
Pereira, P; Ribeiro, RP; Gama, J;

Publicação
DISCOVERY SCIENCE, DS 2014

Abstract
Machine or system failures have high impact both at technical and economic levels. Most modern equipment has logging systems that allow us to collect a diversity of data regarding their operation and health. Using data mining models for novelty detection enables us to explore those datasets, building classification systems that can detect and issue an alert when a failure starts evolving, avoiding the unknown development up to breakdown. In the present case we use a failure detection system to predict train doors breakdowns before they happen using data from their logging system. We study three methods for failure detection: outlier detection, novelty detection and a supervised SVM. Given the problem's features, namely the possibility of a passenger interrupting the movement of a door, the three predictors are prone to false alarms. The main contribution of this work is the use of a low-pass filter to process the output of the predictors leading to a strong reduction in the false alarm rate.

FecharLer Abstract

2017

Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains

Autores
Branco, P; Torgo, L; Ribeiro, RP;

Publicação
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT I

Abstract
The class imbalance problem is a key issue that has received much attention. This attention has been mostly focused on two-classes problems. Fewer solutions exist for the multi-classes imbalance problem. From an evaluation point of view, the class imbalance problem is challenging because a non-uniform importance is assigned to the classes. In this paper, we propose a relevance-based evaluation framework that incorporates user preferences by allowing the assignment of differentiated importance values to each class. The presented solution is able to overcome difficulties detected in existing measures and increases discrimination capability. The proposed framework requires the assignment of a relevance score to the problem classes. To deal with cases where the user is not able to specify each class relevance, we describe three mechanisms to incorporate the existing domain knowledge into the relevance framework. These mechanisms differ in the amount of information available and assumptions made regarding the domain. They also allow the use of our framework in common settings of multi-class imbalanced problems with different levels of information available. © 2017, Springer International Publishing AG.

FecharLer Abstract

2017

Exploring Resampling with Neighborhood Bias on Imbalanced Regression Problems

Autores
Branco, P; Torgo, L; Ribeiro, RP;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2017)

Abstract
Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance of the most relevant cases for the user. This problem has been intensively studied for classification problems. Recently it was recognized that imbalanced domains occur in several other contexts and for a diversity of types of tasks. This paper focus on imbalanced regression tasks. Resampling strategies are among the most successful approaches to imbalanced domains. In this work we propose variants of existing resampling strategies that are able to take into account the information regarding the neighborhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies for reinforcing some regions of the data sets. In an extensive set of experiments we provide evidence of the advantage of introducing a neighborhood bias in the resampling strategies.

FecharLer Abstract