Publications

Publications by CRACS

2021

Pruning strategies for the efficient traversal of the search space in PILP environments

Authors
Corte Real, J; Dutra, I; Rocha, R;

Publication
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract
Probabilistic inductive logic programming (PILP) is a statistical relational learning technique which extends inductive logic programming by considering probabilistic data. The ability to use probabilities to represent uncertainty comes at the cost of an exponential evaluation time when composing theories to model the given problem. For this reason, PILP systems rely on various pruning strategies in order to reduce the search space. However, to the best of the authors' knowledge, there has been no systematic analysis of the different pruning strategies, how they impact the search space and how they interact with one another. This work presents a unified representation for PILP pruning strategies which enables end-users to understand how these strategies work both individually and combined and to make an informed decision on which pruning strategies to select so as to best achieve their goals. The performance of pruning strategies is evaluated both time and quality-wise in two state-of-the-art PILP systems with datasets from three different domains. Besides analysing the performance of the pruning strategies, we also illustrate the utility of PILP in one of the application domains, which is a real-world application.

CloseRead Abstract

2021

Evaluation Procedures for Forecasting with Spatiotemporal Data

Authors
Oliveira, M; Torgo, L; Costa, VS;

Publication
MATHEMATICS

Abstract
The increasing use of sensor networks has led to an ever larger number of available spatiotemporal datasets. Forecasting applications using this type of data are frequently motivated by important domains such as environmental monitoring. Being able to properly assess the performance of different forecasting approaches is fundamental to achieve progress. However, traditional performance estimation procedures, such as cross-validation, face challenges due to the implicit dependence between observations in spatiotemporal datasets. In this paper, we empirically compare several variants of cross-validation (CV) and out-of-sample (OOS) performance estimation procedures, using both artificially generated and real-world spatiotemporal datasets. Our results show both CV and OOS reporting useful estimates, but they suggest that blocking data in space and/or in time may be useful in mitigating CV's bias to underestimate error. Overall, our study shows the importance of considering data dependencies when estimating the performance of spatiotemporal forecasting models.

CloseRead Abstract

2021

Predictive Maintenance for Sensor Enhancement in Industry 4.0

Authors
Silva, C; da Silva, MF; Rodrigues, A; Silva, J; Costa, VS; Jorge, A; Dutra, I;

Publication
Recent Challenges in Intelligent Information and Database Systems - 13th Asian Conference, ACIIDS 2021, Phuket, Thailand, April 7-10, 2021, Proceedings

Abstract
This paper presents an effort to timely handle 400+ GBytes of sensor data in order to produce Predictive Maintenance (PdM) models. We follow a data-driven methodology, using state-of-the-art python libraries, such as Dask and Modin, which can handle big data. We use Dynamic Time Warping for sensors behavior description, an anomaly detection method (Matrix Profile) and forecasting methods (AutoRegressive Integrated Moving Average - ARIMA, Holt-Winters and Long Short-Term Memory - LSTM). The data was collected by various sensors in an industrial context and is composed by attributes that define their activity characterizing the environment where they are inserted, e.g. optical, temperature, pollution and working hours. We successfully managed to highlight aspects of all sensors behaviors, and produce forecast models for distinct series of sensors, despite the data dimension. © 2021, Springer Nature Singapore Pte Ltd.

CloseRead Abstract

2021

Biased resampling strategies for imbalanced spatio-temporal forecasting

Authors
Oliveira, M; Moniz, N; Torgo, L; Costa, VS;

Publication
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS

Abstract
Extreme and rare events, such as spikes in air pollution or abnormal weather conditions, can have serious repercussions. Many of these sorts of events develop through spatio-temporal processes. Timely and accurate predictions are a most valuable tool in addressing their impact. We propose a new set of resampling strategies for imbalanced spatio-temporal forecasting tasks, which introduce bias into formerly random processes. This bias is a combination of a spatial and a temporal weight, which can be either static or relevance-aware, and includes a hyper-parameter that regulates the relative importance of the temporal and spatial dimensions in the selection of observations during under- or over-sampling. We test and compare our proposals against standard versions of the strategies on 10 different geo-referenced numeric time series, using 3 distinct off-the-shelf learning algorithms. Experimental results show that our proposals provide an advantage over random resampling strategies in imbalanced numerical spatio-temporal forecasting tasks.

CloseRead Abstract

2021

SicknessMiner: a deep-learning-driven text-mining tool to abridge disease-disease associations

Authors
Rosario Ferreira, N; Guimaraes, V; Costa, VS; Moreira, IS;

Publication
BMC BIOINFORMATICS

Abstract
Background Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison. Results We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline. Conclusions SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.

CloseRead Abstract

2021

The Treasury Chest of Text Mining: Piling Available Resources for Powerful Biomedical Text Mining

Authors
Rosário-Ferreira, N; Marques-Pereira, C; Pires, M; Ramalhão, D; Pereira, N; Guimarães, V; Santos Costa, V; Moreira, IS;

Publication
BioChem

Abstract
Text mining (TM) is a semi-automatized, multi-step process, able to turn unstructured into structured data. TM relevance has increased upon machine learning (ML) and deep learning (DL) algorithms’ application in its various steps. When applied to biomedical literature, text mining is named biomedical text mining and its specificity lies in both the type of analyzed documents and the language and concepts retrieved. The array of documents that can be used ranges from scientific literature to patents or clinical data, and the biomedical concepts often include, despite not being limited to genes, proteins, drugs, and diseases. This review aims to gather the leading tools for biomedical TM, summarily describing and systematizing them. We also surveyed several resources to compile the most valuable ones for each category.

CloseRead Abstract