Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2019

Anomaly Detection in Sequential Data: Principles and Case Studies

Authors
Andrade, T; Gama, J; Ribeiro, RP; Sousa, W; Carvalho, A;

Publication
Wiley Encyclopedia of Electrical and Electronics Engineering

Abstract

2019

Clustering of interval time series

Authors
Maharaj, EA; Teles, P; Brito, P;

Publication
STATISTICS AND COMPUTING

Abstract
Interval time series occur when real intervals of some variable of interest are registered as an ordered sequence along time. We address the problem of clustering interval time series (ITS), for which different approaches are proposed. First, clustering is performed based on point-to-point comparisons. Time-domain and wavelet features also serve as clustering variables in alternative approaches. Furthermore, autocorrelation matrix functions, gathering the autocorrelation and cross-correlation functions of the ITS upper and lower bounds, may be compared using adequate distances (e.g. the Frobenius distance) and used for clustering ITS. An improved procedure to determine the autocorrelation function of ITS is proposed, which also serves as a basis for clustering. The different alternative approaches are explored and their performances compared for ITS simulated under different setups. An application to sea level daily ranges, observed at different locations in Australia, illustrates the proposed methods.

2019

Constructive Aggregation and Its Application to Forecasting with Dynamic Ensembles

Authors
Cerqueira, V; Pinto, F; Torgo, L; Soares, C; Moniz, N;

Publication
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I

Abstract
While the predictive advantage of ensemble methods is nowadays widely accepted, the most appropriate way of estimating the weights of each individual model remains an open research question. Meanwhile, several studies report that combining different ensemble approaches leads to improvements in performance, due to a better trade-off between the diversity and the error of the individual models in the ensemble. We contribute to this research line by proposing an aggregation framework for a set of independently created forecasting models, i.e. heterogeneous ensembles. The general idea is to, instead of directly aggregating these models, first rearrange them into different subsets, creating a new set of combined models which is then aggregated into a final decision. We present this idea as constructive aggregation, and apply it to time series forecasting problems. Results from empirical experiments show that applying constructive aggregation to state of the art dynamic aggregation methods provides a consistent advantage. Constructive aggregation is publicly available in a software package. Data related to this paper are available at: https://github.com/vcerqueira/timeseriesdata. Code related to this paper is available at: https://github. com/vcerqueira/tsensembler.

2019

Data mining based framework to assess solution quality for the rectangular 2D strip-packing problem

Authors
Neuenfeldt Junior, A; Silva, E; Gomes, M; Soares, C; Oliveira, JF;

Publication
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
In this paper, we explore the use of reference values (predictors) for the optimal objective function value of hard combinatorial optimization problems, instead of bounds, obtained by data mining techniques, and that may be used to assess the quality of heuristic solutions for the problem. With this purpose, we resort to the rectangular two-dimensional strip-packing problem (2D-SPP), which can be found in many industrial contexts. Mostly this problem is solved by heuristic methods, which provide good solutions. However, heuristic approaches do not guarantee optimality, and lower bounds are generally used to give information on the solution quality, in particular, the area lower bound. But this bound has a severe accuracy problem. Therefore, we propose a data mining-based framework capable of assessing the quality of heuristic solutions for the 2D-SPP. A regression model was fitted by comparing the strip height solutions obtained with the bottom-left-fill heuristic and 19 predictors provided by problem characteristics. Random forest was selected as the data mining technique with the best level of generalisation for the problem, and 30,000 problem instances were generated to represent different 2D-SPP variations found in real-world applications. Height predictions for new problem instances can be found in the regression model fitted. In the computational experimentation, we demonstrate that the data mining-based framework proposed is consistent, opening the doors for its application to finding predictions for other combinatorial optimisation problems, in particular, other cutting and packing problems. However, how to use a reference value instead of a bound, has still a large room for discussion and innovative ideas. Some directions for the use of reference values as a stopping criterion in search algorithms are also provided.

2019

Arbitrage of forecasting experts

Authors
Cerqueira, V; Torgo, L; Pinto, F; Soares, C;

Publication
MACHINE LEARNING

Abstract
Forecasting is an important task across several domains. Its generalised interest is related to the uncertainty and complex evolving structure of time series. Forecasting methods are typically designed to cope with temporal dependencies among observations, but it is widely accepted that none is universally applicable. Therefore, a common solution to these tasks is to combine the opinion of a diverse set of forecasts. In this paper we present an approach based on arbitrating, in which several forecasting models are dynamically combined to obtain predictions. Arbitrating is a metalearning approach that combines the output of experts according to predictions of the loss that they will incur. We present an approach for retrieving out-of-bag predictions that significantly improves its data efficiency. Finally, since diversity is a fundamental component in ensemble methods, we propose a method for explicitly handling the inter-dependence between experts when aggregating their predictions. Results from extensive empirical experiments provide evidence of the method's competitiveness relative to state of the art approaches. The proposed method is publicly available in a software package.

2019

KnowBots: Discovering Relevant Patterns in Chatbot Dialogues

Authors
Rivolli, A; Amaral, C; Guardão, L; de Sá, CR; Soares, C;

Publication
Discovery Science - 22nd International Conference, DS 2019, Split, Croatia, October 28-30, 2019, Proceedings

Abstract
Chatbots have been used in business contexts as a new way of communicating with customers. They use natural language to interact with the customers, whether while offering products and services, or in the support of a specific task. In this context, an important and challenging task is to assess the effectiveness of the machine-to-human interaction, according to business’ goals. Although several analytic tools have been proposed to analyze the user interactions with chatbot systems, to the best of our knowledge they do not consider user-defined criteria, focusing on metrics of engagement and retention of the system as a whole. For this reason, we propose the KnowBots tool, which can be used to discover relevant patterns in the dialogues of chatbots, by considering specific business goals. Given the non-trivial structure of dialogues and the possibly large number of conversational records, we combined sequential pattern mining and subgroup discovery techniques to identify patterns of usage. Moreover, a friendly user-interface was developed to present the results and to allow their detailed analysis. Thus, it may serve as an alternative decision support tool for business or any entity that makes use of this type of interactions with their clients. © Springer Nature Switzerland AG 2019.

  • 131
  • 429