Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2024

From fault detection to anomaly explanation: A case study on predictive maintenance

Authors
Gama, J; Ribeiro, P; Mastelini, S; Davari, N; Veloso, B;

Publication
Journal of Web Semantics

Abstract
Predictive Maintenance applications are increasingly complex, with interactions between many components. Black-box models are popular approaches based on deep-learning techniques due to their predictive accuracy. This paper proposes a neural-symbolic architecture that uses an online rule-learning algorithm to explain when the black-box model predicts failures. The proposed system solves two problems in parallel: (i) anomaly detection and (ii) explanation of the anomaly. For the first problem, we use an unsupervised state-of-the-art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder's reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non-linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder's reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real-world case study of Metro do Porto and provide explanations that illustrate its benefits. © 2024

2024

Multilayer quantile graph for multivariate time series analysis and dimensionality reduction

Authors
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
In recent years, there has been a surge in the prevalence of high- and multidimensional temporal data across various scientific disciplines. These datasets are characterized by their vast size and challenging potential for analysis. Such data typically exhibit serial and cross-dependency and possess high dimensionality, thereby introducing additional complexities to conventional time series analysis methods. To address these challenges, a recent and complementary approach has emerged, known as network-based analysis methods for multivariate time series. In univariate settings, quantile graphs have been employed to capture temporal transition properties and reduce data dimensionality by mapping observations to a smaller set of sample quantiles. To confront the increasingly prominent issue of high dimensionality, we propose an extension of quantile graphs into a multivariate variant, which we term Multilayer Quantile Graphs. In this innovative mapping, each time series is transformed into a quantile graph, and inter-layer connections are established to link contemporaneous quantiles of pairwise series. This enables the analysis of dynamic transitions across multiple dimensions. In this study, we demonstrate the effectiveness of this new mapping using synthetic and benchmark multivariate time series datasets. We delve into the resulting network's topological structures, extract network features, and employ these features for original dataset analysis. Furthermore, we compare our results with a recent method from the literature. The resulting multilayer network offers a significant reduction in the dimensionality of the original data while capturing serial and cross-dimensional transitions. This approach facilitates the characterization and analysis of large multivariate time series datasets through network analysis techniques.

2024

Predicting macroeconomic indicators from online activity data: A review

Authors
Costa, EA; Silva, ME;

Publication
Statistical Journal of the IAOS

Abstract
Predictors of macroeconomic indicators rely primarily on traditional data sourced from National Statistical Offices. However, new data sources made available from recent technological advancements, namely data from online activities, have the potential to bring about fresh perspectives on monitoring economic activities and enhance the accuracy of forecasting. This paper reviews the literature on predicting macroeconomic indicators, such as the gross domestic product, unemployment rate, consumer price index or private consumption, based on online activity data sourced from Google Trends, Twitter (rebranded to X) and mobile devices. Based on a systematic search of publications indexed on the Web of Science and Scopus databases, the analysis of a final set of 56 publications covers the publication history of the data sources, the methods used to model the data and the predictive accuracy of information from such data sources. The paper also discusses the limitations and challenges of using online activity data for macroeconomic predictions. The review concludes that online activity data can be a valuable source of information for predicting macroeconomic indicators. However, one must consider certain limitations and challenges to improve the models' accuracy and reliability. © 2024 - IOS Press. All rights reserved.

2024

Real-time nowcasting the monthly unemployment rates with daily Google Trends data

Authors
Costa, EA; Silva, ME; Gbylik-Sikorska, M;

Publication
SOCIO-ECONOMIC PLANNING SCIENCES

Abstract
Policymakers often have to make decisions based on incomplete economic data because of the usual delay in publishing official statistics. To circumvent this issue, researchers use data from Google Trends (GT) as an early indicator of economic performance. Such data have emerged in the literature as alternative and complementary predictors of macroeconomic outcomes, such as the unemployment rate, featuring readiness, public availability and no costs. This study deals with extensive daily GT data to develop a framework to nowcast monthly unemployment rates tailored to work with real-time data availability, resorting to Mixed Data Sampling (MIDAS) regressions. Portugal is chosen as a use case for the methodology since extracting GT data requires the selection of culturally dependent keywords. The nowcasting period spans 2019 to 2021, encompassing the time frame in which the coronavirus pandemic initiated. The findings indicate that using daily GT data with MIDAS provides timely and accurate insights into the unemployment rate, especially during the COVID-19 pandemic, showing accuracy gains even when compared to nowcasts obtained from typical monthly GT data via traditional ARMAX models.

2024

LEARNING PHONOLOGY WITH DATA IN THE CLASSROOM: ENGAGING STUDENTS IN THE CREOLISTIC RESEARCH PROCESS

Authors
Trigo, L; Silva, C; de Almeida, VM;

Publication
INTERNATIONAL JOURNAL OF HUMANITIES AND ARTS COMPUTING-A JOURNAL OF DIGITAL HUMANITIES

Abstract
Phonology is a linguistic discipline that is naturally computational. However, as many researchers are not familiar with the use of digital methods, most of the computation required is still performed by humans. This article presents a training experiment of master's students of the phonology seminar at the University of Porto, bringing the research process directly to the classroom. The experiment was designed to raise students' awareness of the potentialities of combining human and machine computation in phonology. The Centre for Digital Culture and Innovation (CODA) readily embraced this project to showcase the application of digital humanities as humanities in both research and training activities. During this experiment, students were trained to collect and process phonological data using various open-source and free web-based resources. By combining a strict protocol with some individual research freedom, the students were able to make valuable contributions towards Creolistic Studies, while enriching their individual skills. Finally, the interdisciplinary nature of the approach has demonstrated its potential within and beyond the humanities and social sciences fields (e.g., linguistics, archaeology, history, geography, ethnology, sociology, and genetics), by also introducing the students to basic concepts and practices of Open Science and FAIR principles, including Linked Open Data.

2023

Estimating the Likelihood of Financial Behaviours Using Nearest Neighbors A case study on market sensitivities

Authors
Mendes-Neves, T; Seca, D; Sousa, R; Ribeiro, C; Mendes-Moreira, J;

Publication
COMPUTATIONAL ECONOMICS

Abstract
As many automated algorithms find their way into the IT systems of the banking sector, having a way to validate and interpret the results from these algorithms can lead to a substantial reduction in the risks associated with automation. Usually, validating these pricing mechanisms requires human resources to manually analyze and validate large quantities of data. There is a lack of effective methods that analyze the time series and understand if what is currently happening is plausible based on previous data, without information about the variables used to calculate the price of the asset. This paper describes an implementation of a process that allows us to validate many data points automatically. We explore the K-Nearest Neighbors algorithm to find coincident patterns in financial time series, allowing us to detect anomalies, outliers, and data points that do not follow normal behavior. This system allows quicker detection of defective calculations that would otherwise result in the incorrect pricing of financial assets. Furthermore, our method does not require knowledge about the variables used to calculate the time series being analyzed. Our proposal uses pattern matching and can validate more than 58% of instances, substantially improving human risk analysts' efficiency. The proposal is completely transparent, allowing analysts to understand how the algorithm made its decision, increasing the trustworthiness of the method.

  • 8
  • 428