2025
Autores
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;
Publicação
Abstract
2025
Autores
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;
Publicação
DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Multivariate time series analysis is a vital but challenging task, with multidisciplinary applicability, tackling the characterization of multiple interconnected variables over time and their dependencies. Traditional methodologies often adapt univariate approaches or rely on assumptions specific to certain domains or problems, presenting limitations. A recent promising alternative is to map multivariate time series into high-level network structures such as multiplex networks, with past work relying on connecting successive time series components with interconnections between contemporary timestamps. In this work, we first define a novel cross-horizontal visibility mapping between lagged timestamps of different time series and then introduce the concept of multilayer horizontal visibility graphs. This allows describing cross-dimension dependencies via inter-layer edges, leveraging the entire structure of multilayer networks. To this end, a novel parameter-free topological measure is proposed and common measures are extended for the multilayer setting. Our approach is general and applicable to any kind of multivariate time series data. We provide an extensive experimental evaluation with both synthetic and real-world datasets. We first explore the proposed methodology and the data properties highlighted by each measure, showing that inter-layer edges based on cross-horizontal visibility preserve more information than previous mappings, while also complementing the information captured by commonly used intra-layer edges. We then illustrate the applicability and validity of our approach in multivariate time series mining tasks, showcasing its potential for enhanced data analysis and insights.
2025
Autores
Silva, I; Silva, ME; Pereira, I;
Publicação
Springer Proceedings in Mathematics and Statistics
Abstract
The presence of missing data poses a common challenge for time series analysis in general since the most usual requirement is that the data is equally spaced in time and therefore imputation methods are required. For time series of counts, the usual imputation methods which usually produce real valued observations, are not adequate. This work employs Bayesian principles for handling missing data within time series of counts, based on first-order integer-valued autoregressive (INAR) models, namely Approximate Bayesian Computation (ABC) and Gibbs sampler with Data Augmentation (GDA) algorithms. The methodologies are illustrated with synthetic and real data and the results indicate that the estimates are consistent and present less bias when the percentage of missing observations decreases, as expected. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2024
Autores
Mendes Neves, T; Seca, D; Sousa, R; Ribeiro, C; Mendes Moreira, J;
Publicação
COMPUTATIONAL ECONOMICS
Abstract
As many automated algorithms find their way into the IT systems of the banking sector, having a way to validate and interpret the results from these algorithms can lead to a substantial reduction in the risks associated with automation. Usually, validating these pricing mechanisms requires human resources to manually analyze and validate large quantities of data. There is a lack of effective methods that analyze the time series and understand if what is currently happening is plausible based on previous data, without information about the variables used to calculate the price of the asset. This paper describes an implementation of a process that allows us to validate many data points automatically. We explore the K-Nearest Neighbors algorithm to find coincident patterns in financial time series, allowing us to detect anomalies, outliers, and data points that do not follow normal behavior. This system allows quicker detection of defective calculations that would otherwise result in the incorrect pricing of financial assets. Furthermore, our method does not require knowledge about the variables used to calculate the time series being analyzed. Our proposal uses pattern matching and can validate more than 58% of instances, substantially improving human risk analysts' efficiency. The proposal is completely transparent, allowing analysts to understand how the algorithm made its decision, increasing the trustworthiness of the method.
2024
Autores
Pinto, J; Esteves, V; Tavares, S; Sousa, R;
Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE
Abstract
The power transformer is one of the key components of any electrical grid, and, as such, modern day industrialization activities require constant usage of the asset. This increases the possibility of failures and can potentially diminish the lifespan of a power transformer. Dissolved gas analysis (DGA) is a technique developed to quantify the existence of hydrocarbon gases in the content of the power transformer oil, which in turn can indicate the presence of faults. Since this process requires different chemical analysis for each type of gas, the overall cost of the operation increases with number of gases. Thus said, a machine learning methodology was defined to meet two simultaneous objectives, identify gas subsets, and predict the remaining gases, thus restoring them. Two subsets of equal or smaller size to those used by traditional methods (Duval's triangle, Roger's ratio, IEC table) were identified, while showing potentially superior performance. The models restored the discarded gases, and the restored set was compared with the original set in a variety of validation tasks.
2024
Autores
Guimaraes, N; Campos, R; Jorge, A;
Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summarization. Recently, significant attention has been drawn to OpenAI's GPT models' capabilities and extremely accessible interface. LLMs are nowadays routinely used and studied for downstream tasks and specific applications with great success, pushing forward the state of the art in almost all of them. However, they also exhibit impressive inference capabilities when used off the shelf without further training. In this paper, we aim to study the behavior of pre-trained language models (PLMs) in some inference tasks they were not initially trained for. Therefore, we focus our attention on very recent research works related to the inference capabilities of PLMs in some selected tasks such as factual probing and common-sense reasoning. We highlight relevant achievements made by these models, as well as some of their current limitations that open opportunities for further research.This article is categorized under:Fundamental Concepts of Data and Knowledge > Key Design Issues in DataMiningTechnologies > Artificial Intelligence
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.