Publicacoes - INESC TEC

Publicações

Publicações por LIAAD

2025

Histopathological Imaging Dataset for Oral Cancer Analysis: A Study with a Data Leakage Warning

Autores
Nogueira, M; Gomes, E;

Publicação
Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies

Abstract

2025

A Review of Voicing Decision in Whispered Speech: From Rules to Machine Learning

Autores
da Silva, JMPP; Duarte Nunes, G; Ferreira, A;

Publicação

Abstract

2025

Multilayer horizontal visibility graphs for multivariate time series analysis

Autores
Silva, VF; Silva, ME; Ribeiro, P; Silva, F;

Publicação
DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Multivariate time series analysis is a vital but challenging task, with multidisciplinary applicability, tackling the characterization of multiple interconnected variables over time and their dependencies. Traditional methodologies often adapt univariate approaches or rely on assumptions specific to certain domains or problems, presenting limitations. A recent promising alternative is to map multivariate time series into high-level network structures such as multiplex networks, with past work relying on connecting successive time series components with interconnections between contemporary timestamps. In this work, we first define a novel cross-horizontal visibility mapping between lagged timestamps of different time series and then introduce the concept of multilayer horizontal visibility graphs. This allows describing cross-dimension dependencies via inter-layer edges, leveraging the entire structure of multilayer networks. To this end, a novel parameter-free topological measure is proposed and common measures are extended for the multilayer setting. Our approach is general and applicable to any kind of multivariate time series data. We provide an extensive experimental evaluation with both synthetic and real-world datasets. We first explore the proposed methodology and the data properties highlighted by each measure, showing that inter-layer edges based on cross-horizontal visibility preserve more information than previous mappings, while also complementing the information captured by commonly used intra-layer edges. We then illustrate the applicability and validity of our approach in multivariate time series mining tasks, showcasing its potential for enhanced data analysis and insights.

FecharLer Abstract

2025

Bayesian Modelling of Time Series of Counts with Missing Data

Autores
Silva, I; Silva, ME; Pereira, I;

Publicação
Springer Proceedings in Mathematics and Statistics

Abstract
The presence of missing data poses a common challenge for time series analysis in general since the most usual requirement is that the data is equally spaced in time and therefore imputation methods are required. For time series of counts, the usual imputation methods which usually produce real valued observations, are not adequate. This work employs Bayesian principles for handling missing data within time series of counts, based on first-order integer-valued autoregressive (INAR) models, namely Approximate Bayesian Computation (ABC) and Gibbs sampler with Data Augmentation (GDA) algorithms. The methodologies are illustrated with synthetic and real data and the results indicate that the estimates are consistent and present less bias when the percentage of missing observations decreases, as expected. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

FecharLer Abstract

2024

Estimating the Likelihood of Financial Behaviours Using Nearest Neighbors A case study on market sensitivities

Autores
Mendes Neves, T; Seca, D; Sousa, R; Ribeiro, C; Mendes Moreira, J;

Publicação
COMPUTATIONAL ECONOMICS

Abstract
As many automated algorithms find their way into the IT systems of the banking sector, having a way to validate and interpret the results from these algorithms can lead to a substantial reduction in the risks associated with automation. Usually, validating these pricing mechanisms requires human resources to manually analyze and validate large quantities of data. There is a lack of effective methods that analyze the time series and understand if what is currently happening is plausible based on previous data, without information about the variables used to calculate the price of the asset. This paper describes an implementation of a process that allows us to validate many data points automatically. We explore the K-Nearest Neighbors algorithm to find coincident patterns in financial time series, allowing us to detect anomalies, outliers, and data points that do not follow normal behavior. This system allows quicker detection of defective calculations that would otherwise result in the incorrect pricing of financial assets. Furthermore, our method does not require knowledge about the variables used to calculate the time series being analyzed. Our proposal uses pattern matching and can validate more than 58% of instances, substantially improving human risk analysts' efficiency. The proposal is completely transparent, allowing analysts to understand how the algorithm made its decision, increasing the trustworthiness of the method.

FecharLer Abstract

2024

Optimal gas subset selection for dissolved gas analysis in power transformers

Autores
Pinto, J; Esteves, V; Tavares, S; Sousa, R;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
The power transformer is one of the key components of any electrical grid, and, as such, modern day industrialization activities require constant usage of the asset. This increases the possibility of failures and can potentially diminish the lifespan of a power transformer. Dissolved gas analysis (DGA) is a technique developed to quantify the existence of hydrocarbon gases in the content of the power transformer oil, which in turn can indicate the presence of faults. Since this process requires different chemical analysis for each type of gas, the overall cost of the operation increases with number of gases. Thus said, a machine learning methodology was defined to meet two simultaneous objectives, identify gas subsets, and predict the remaining gases, thus restoring them. Two subsets of equal or smaller size to those used by traditional methods (Duval's triangle, Roger's ratio, IEC table) were identified, while showing potentially superior performance. The models restored the discarded gases, and the restored set was compared with the original set in a variety of validation tasks.

FecharLer Abstract