Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2025

CapyMOA: Efficient Machine Learning for Data Streams in Python

Authors
Gomes, HM; Lee, A; Gunasekara, N; Sun, Y; Cassales, GW; Liu, J; Heyden, M; Cerqueira, V; Bahri, M; Koh, YS; Pfahringer, B; Bifet, A;

Publication
CoRR

Abstract

2025

A Multidimensional Approach to Ethical AI Auditing

Authors
Teixeira, S; Cortés, A; Thilakarathne, D; Gori, G; Minici, M; Bhuyan, M; Khairova, N; Adewumi, T; Bhuyan, D; O'Keefe, J; Comito, C; Gama, J; Dignum, V;

Publication
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

Abstract
The increasing integration of Artificial Intelligence (AI) across various sectors of society raises complex ethical challenges requiring systematic and scalable oversight mechanisms. While tools such as AIF360 and Aequitas address specific dimensions, namely fairness, there remains a lack of comprehensive frameworks capable of auditing multiple ethical principles simultaneously. This paper introduces a multidimensional AI auditing tool designed to evaluate systems across key dimensions: fairness, explainability, robustness, transparency, bias, sustainability, and legal compliance. Unlike existing tools, our framework enables simultaneous assessment of these dimensions, supporting more holistic and accountable AI deployment. We demonstrate the tool’s applicability through use cases and discuss its implications for building trust and aligning AI development with fundamental ethical standards.

2025

Strategic Alliances in NetLogo: A Flocking Algorithm with Reinforcement Learning

Authors
Teixeira, S; Campos, P;

Publication
Machine Learning Perspectives of Agent-Based Models

Abstract

2025

Automating Data Extraction from PDF Sleep Reports Using Data Mining Techniques

Authors
Teixeira, F; Costa, J; Amorim, P; Guimarães, N; Ferreira Santos, D;

Publication
Studies in health technology and informatics

Abstract
This work introduces a web application for extracting, processing, and visualizing data from sleep studies reports. Using Optical Character Recognition (OCR) and Natural Language Processing (NLP), the pipeline extracts over 75 key data points from four types of sleep reports. The web application offers an intuitive interface to view individual reports' details and aggregate data from multiple reports. The pipeline demonstrated 100% accuracy in extracting targeted information from a test set of 40 reports, even in cases with missing data or formatting inconsistencies. The developed tool streamlines the analysis of OSA reports, reducing the need for technical expertise and enabling healthcare providers and researchers to utilize sleep study data efficiently. Future work aims to expand the dataset for more complex analyses and imputation techniques.

2025

PolyNarrative: A Multilingual, Multilabel, Multi-domain Dataset for Narrative Extraction from News Articles

Authors
Nikolaidis, N; Stefanovitch, N; Silvano, P; Dimitrov, DI; Yangarber, R; Guimarães, N; Sartori, E; Androutsopoulos, I; Nakov, P; San Martino, GD; Piskorski, J;

Publication
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2025, Vienna, Austria, July 27 - August 1, 2025

Abstract

2025

Using LLMs to Generate Patient Journeys in Portuguese: an Experiment

Authors
Munna, TA; Fernandes, AL; Silvano, P; Guimarães, N; Jorge, A;

Publication
Proceedings of Text2Story - Eighth Workshop on Narrative Extraction From Texts held in conjunction with the 47th European Conference on Information Retrieval (ECIR 2025), Lucca, Italy, April 10, 2025.

Abstract
The relationship of a patient with a hospital from admission to discharge is often kept in a series of textual documents that describe the patient’s journey. These documents are important to analyze the different steps of the clinical process and to make aggregated studies of the paths of patients in the hospital. In this paper, we explore the potential of Large Language Models (LLMs) to generate realistic and comprehensive patient journeys in European Portuguese, addressing the scarcity of medical data in this specific context. We employed Google’s Gemini 1.5 Flash model and utilized a dataset of 285 European Portuguese published case reports from the SPMI website, published by the Portuguese Society of Internal Medicine, as references for generating synthetic medical reports. Our methodology involves a sequential approach to generating a synthetic patient journey. Initially, we generate an admission report, followed by a discharge report. Subsequently, we generate a comprehensive patient journey that integrates the admission, multiple daily progress reports, and the discharge into a cohesive narrative. This end-to-end process ensures a realistic and detailed representation of the patient’s clinical pathway as a patient’s journey. The generated reports were rigorously evaluated by medical and linguistic professionals, as well as automatic metrics to measure the inclusion of key medical entities, similarity to the case report, and correct Portuguese variant. Both qualitative and quantitative evaluations confirmed that the generated synthetic reports are predominantly written in European Portuguese without the loss of important medical information from the case reports. This work contributes to developing high-quality synthetic medical data for training LLMs and advancing AI-driven healthcare applications in under-resourced language settings. © 2025 Copyright for this paper by its authors.

  • 20
  • 510