Publications

Publications by LIAAD

2024

text2story: A Python Toolkit to Extract and Visualize Story Components of Narrative Text

Authors
Amorim, E; Campos, R; Jorge, AM; Mota, P; Almeida, R;

Publication
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Story components, namely, events, time, participants, and their relations are present in narrative texts from different domains such as journalism, medicine, finance, and law. The automatic extraction of narrative elements encompasses several NLP tasks such as Named Entity Recognition, Semantic Role Labeling, Event Extraction, and Temporal Inference. The text2story Python, an easy-to-use modular library, supports the narrative extraction and visualization pipeline. The package contains an array of narrative extraction tools that can be used separately or in sequence. With this toolkit, end users can process free text in English or Portuguese and obtain formal representations, like standard annotation files or a formal logical representation. The toolkit also enables narrative visualization as Message Sequence Charts (MSC), Knowledge Graphs, and Bubble Diagrams, making it useful to visualize and transform human-annotated narratives. The package combines the use of off-the-shelf and custom tools and is easily patched (replacing existing components) and extended (e.g. with new visualizations). It includes an experimental module for narrative element effectiveness assessment and being is therefore also a valuable asset for researchers developing solutions for narrative extraction. To evaluate the baseline components, we present some results of the main annotators embedded in our package for datasets in English and Portuguese. We also compare the results with the extraction of narrative elements by GPT-3, a robust LLM model.

CloseRead Abstract

2024

Proceedings of Text2Story - Seventh Workshop on Narrative Extraction From Texts held in conjunction with the 46th European Conference on Information Retrieval (ECIR 2024), Glasgow, Scotland, UK, March 24, 2024

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;

Publication
Text2Story@ECIR

Abstract

2024

Detecting and Explaining Anomalies in the Air Production Unit of a Train

Authors
Davari, N; Veloso, B; Ribeiro, RP; Gama, J;

Publication
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024

Abstract
Predictive maintenance methods play a crucial role in the early detection of failures and errors in machinery, preventing them from reaching critical stages. This paper presents a comprehensive study on a real-world dataset called MetroPT3, with data from a Metro do Porto train's air production unit (APU) system. The dataset comprises data collected from various analogue and digital sensors installed on the APU system, enabling the analysis of behavioural changes and deviations from normal patterns. We propose a data-driven predictive maintenance framework based on a Long Short-Term Memory Autoencoder (LSTM-AE) network. The LSTM-AE efficiently identifies abnormal data instances, leading to a reduction in false alarm rates. We also implement a Sparse Autoencoder (SAE) approach for comparative analysis. The experimental results demonstrate that the LSTM-AE outperforms the SAE regarding F1 Score, Recall, and Precision. Furthermore, to gain insights into the reasons for anomaly detection, we apply the Shap method to determine the importance of features in the predictive maintenance model. This approach enhances the interpretability of the model to support the decision-making process better.

CloseRead Abstract

2024

Super-Resolution Analysis for Landfill Waste Classification

Authors
Molina, M; Ribeiro, RP; Veloso, B; Carna, J;

Publication
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024

Abstract
Illegal landfills are a critical issue due to their environmental, economic, and public health impacts. This study leverages aerial imagery for environmental crime monitoring. While advances in artificial intelligence and computer vision hold promise, the challenge lies in training models with high-resolution literature datasets and adapting them to open-access low-resolution images. Considering the substantial quality differences and limited annotation, this research explores the adaptability of models across these domains. Motivated by the necessity for a comprehensive evaluation of waste detection algorithms, it advocates cross-domain classification and super-resolution enhancement to analyze the impact of different image resolutions on waste classification as an evaluation to combat the proliferation of illegal landfills. We observed performance improvements by enhancing image quality but noted an influence on model sensitivity, necessitating careful threshold fine-tuning.

CloseRead Abstract

2024

From fault detection to anomaly explanation: A case study on predictive maintenance

Authors
Gama, J; Ribeiro, RP; Mastelini, S; Davari, N; Veloso, B;

Publication
JOURNAL OF WEB SEMANTICS

Abstract
Predictive Maintenance applications are increasingly complex, with interactions between many components. Black -box models are popular approaches based on deep -learning techniques due to their predictive accuracy. This paper proposes a neural -symbolic architecture that uses an online rule -learning algorithm to explain when the black -box model predicts failures. The proposed system solves two problems in parallel: (i) anomaly detection and (ii) explanation of the anomaly. For the first problem, we use an unsupervised state-of-the-art autoencoder. For the second problem, we train a rule learning system that learns a mapping from the input features to the autoencoder's reconstruction error. Both systems run online and in parallel. The autoencoder signals an alarm for the examples with a reconstruction error that exceeds a threshold. The causes of the signal alarm are hard for humans to understand because they result from a non-linear combination of sensor data. The rule that triggers that example describes the relationship between the input features and the autoencoder's reconstruction error. The rule explains the failure signal by indicating which sensors contribute to the alarm and allowing the identification of the component involved in the failure. The system can present global explanations for the black box model and local explanations for why the black box model predicts a failure. We evaluate the proposed system in a real -world case study of Metro do Porto and provide explanations that illustrate its benefits.

CloseRead Abstract

2024

Community-Based Topic Modeling with Contextual Outlier Handling

Authors
Andrade, C; Ribeiro, RP; Gama, J;

Publication
ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2024

Abstract
E-commerce has become an essential aspect of modern life, providing consumers globally with convenience and accessibility. However, the high volume of short and noisy product descriptions in text streams of massive e-commerce platforms translates into an increased number of clusters, presenting challenges for standard model-based stream clustering algorithms. Standard LDA-based methods often lead to clusters dominated by single elements, effectively failing to manage datasets with varied cluster sizes. Our proposed Community-Based Topic Modeling with Contextual Outlier Handling (CB-TMCOH) algorithm introduces an approach to outlier detection in text data using transformer models for similarity calculations and graph-based clustering. This method efficiently separates outliers and improves clustering in large text datasets, demonstrating its utility not only in e-commerce applications but also proving effective for news and tweets datasets.

CloseRead Abstract