Publicacoes - INESC TEC

Publicações

Publicações por Nuno Ricardo Guimarães

2022

A WebApp for Reliability Detection in Social Media

Autores
David, F; Guimarães, N; Figueira, A;

Publicação
CENTERIS 2022 - International Conference on ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems and Technologies 2022, Hybrid Event / Lisbon, Portugal, November 9-11, 2022.

Abstract

2023

Exploring Climate Change Data with R

Autores
Guimarães, N; Vehkalahti, K; Campos, P; Engel, J;

Publicação
Statistics for Empowerment and Social Engagement: Teaching Civic Statistics to Develop Informed Citizens

Abstract
Climate change is an existential threat facing humanity and the future of our planet. The signs of global warming are everywhere, and they are more complex than just the climbing temperatures. Climate data on a massive scale has been collected by various scientific groups around the globe. Exploring and extracting useful knowledge from large quantities of data requires powerful software. In this chapter we present some possibilities for exploring and visualising climate change data in connection with statistics education using the freely accessible statistical programming language R together with the computing environment RStudio. In addition to the visualisations, we provide annotated references to climate data repositories and extracts of our openly published R scripts for encouraging teachers and students to reproduce and enhance the visualisations. © Springer Nature Switzerl and AG 2022.

FecharLer Abstract

2021

An organized review of key factors for fake news detection

Autores
Guimarães, N; Figueira, A; Torgo, L;

Publicação
CoRR

Abstract

2024

Pre-trained language models: What do they know?

Autores
Guimaraes, N; Campos, R; Jorge, A;

Publicação
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY

Abstract
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summarization. Recently, significant attention has been drawn to OpenAI's GPT models' capabilities and extremely accessible interface. LLMs are nowadays routinely used and studied for downstream tasks and specific applications with great success, pushing forward the state of the art in almost all of them. However, they also exhibit impressive inference capabilities when used off the shelf without further training. In this paper, we aim to study the behavior of pre-trained language models (PLMs) in some inference tasks they were not initially trained for. Therefore, we focus our attention on very recent research works related to the inference capabilities of PLMs in some selected tasks such as factual probing and common-sense reasoning. We highlight relevant achievements made by these models, as well as some of their current limitations that open opportunities for further research.This article is categorized under:Fundamental Concepts of Data and Knowledge > Key Design Issues in DataMiningTechnologies > Artificial Intelligence

FecharLer Abstract

2023

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

Autores
Sousa, H; Guimaraes, N; Jorge, A; Campos, R;

Publicação
2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT

Abstract
The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models - GPT-3 and GPT-3.5, commonly known as ChatGPT - in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.

FecharLer Abstract

2024

<i>Physio</i>: An LLM-Based Physiotherapy Advisor

Autores
Almeida, R; Sousa, H; Cunha, LF; Guimaraes, N; Campos, R; Jorge, A;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V

Abstract
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt.

FecharLer Abstract