Publications

Publications by Vítor Santos Costa

2021

Predictive Maintenance for Sensor Enhancement in Industry 4.0

Authors
Silva, C; da Silva, MF; Rodrigues, A; Silva, J; Costa, VS; Jorge, A; Dutra, I;

Publication
Recent Challenges in Intelligent Information and Database Systems - 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, April 7-10, 2021, Proceedings

Abstract
This paper presents an effort to timely handle 400+ GBytes of sensor data in order to produce Predictive Maintenance (PdM) models. We follow a data-driven methodology, using state-of-the-art python libraries, such as Dask and Modin, which can handle big data. We use Dynamic Time Warping for sensors behavior description, an anomaly detection method (Matrix Profile) and forecasting methods (AutoRegressive Integrated Moving Average - ARIMA, Holt-Winters and Long Short-Term Memory - LSTM). The data was collected by various sensors in an industrial context and is composed by attributes that define their activity characterizing the environment where they are inserted, e.g. optical, temperature, pollution and working hours. We successfully managed to highlight aspects of all sensors behaviors, and produce forecast models for distinct series of sensors, despite the data dimension. © 2021, Springer Nature Singapore Pte Ltd.

CloseRead Abstract

2021

Biased resampling strategies for imbalanced spatio-temporal forecasting

Authors
Oliveira, M; Moniz, N; Torgo, L; Costa, VS;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Extreme and rare events, such as spikes in air pollution or abnormal weather conditions, can have serious repercussions. Many of these sorts of events develop through spatio-temporal processes. Timely and accurate predictions are a most valuable tool in addressing their impact. We propose a new set of resampling strategies for imbalanced spatio-temporal forecasting tasks, which introduce bias into formerly random processes. This bias is a combination of a spatial and a temporal weight, which can be either static or relevance-aware, and includes a hyper-parameter that regulates the relative importance of the temporal and spatial dimensions in the selection of observations during under- or over-sampling. We test and compare our proposals against standard versions of the strategies on 10 different geo-referenced numeric time series, using 3 distinct off-the-shelf learning algorithms. Experimental results show that our proposals provide an advantage over random resampling strategies in imbalanced numerical spatio-temporal forecasting tasks.

CloseRead Abstract

2021

SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations

Authors
Rosario Ferreira, N; Guimaraes, V; Costa, VS; Moreira, IS;

Publication
BMC BIOINFORMATICS

Abstract
Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.

CloseRead Abstract

2021

The Treasury Chest of Text Mining: Piling Available Resources for Powerful Biomedical Text Mining

Authors
Rosário-Ferreira, N; Marques-Pereira, C; Pires, M; Ramalhão, D; Pereira, N; Guimarães, V; Santos Costa, V; Moreira, IS;

Publication
BioChem

Abstract
Text mining (TM) is a semi-automatized, multi-step process, able to turn unstructured into structured data. TM relevance has increased upon machine learning (ML) and deep learning (DL) algorithms’ application in its various steps. When applied to biomedical literature, text mining is named biomedical text mining and its specificity lies in both the type of analyzed documents and the language and concepts retrieved. The array of documents that can be used ranges from scientific literature to patents or clinical data, and the biomedical concepts often include, despite not being limited to genes, proteins, drugs, and diseases. This review aims to gather the leading tools for biomedical TM, summarily describing and systematizing them. We also surveyed several resources to compile the most valuable ones for each category.

CloseRead Abstract

2021

Online Learning of Logic Based Neural Network Structures

Authors
Guimarães, V; Costa, VS;

Publication
Inductive Logic Programming - 30th International Conference, ILP 2021, Virtual Event, October 25-27, 2021, Proceedings

Abstract

2022

Online Learning of Logic Based Neural Network Structures

Authors
Guimaraes, V; Costa, VS;

Publication
INDUCTIVE LOGIC PROGRAMMING (ILP 2021)

Abstract
In this paper, we present two online structure learning algorithms for NeuralLog, NeuralLog+OSLR and NeuralLog+OMIL. NeuralLog is a system that compiles first-order logic programs into neural networks. Both learning algorithms are based on Online Structure Learner by Revision (OSLR). NeuralLog+OSLR is a port of OSLR to use NeuralLog as inference engine; while NeuralLog+OMIL uses the underlying mechanism from OSLR, but with a revision operator based on Meta-Interpretive Learning. We compared both systems with OSLR and RDN-Boost on link prediction in three different datasets: Cora, UMLS and UWCSE. Our experiments showed that NeuralLog+OMIL outperforms both the compared systems on three of the four target relations from the Cora dataset and in the UMLS dataset, while both NeuralLog+OSLR and NeuralLog+OMIL outperform OSLR and RDNBoost on the UWCSE, assuming a good initial theory is provided.

CloseRead Abstract