Publications

Publications by Ricardo Campos

2023

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

Authors
Sousa, H; Guimaraes, N; Jorge, A; Campos, R;

Publication
2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT

Abstract
The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models - GPT-3 and GPT-3.5, commonly known as ChatGPT - in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.

CloseRead Abstract

2024

Indexing Portuguese NLP Resources with PT-Pump-Up

Authors
Almeida, R; Campos, R; Jorge, A; Nunes, S;

Publication
CoRR

Abstract

2024

<i>Physio</i>: An LLM-Based Physiotherapy Advisor

Authors
Almeida, R; Sousa, H; Cunha, LF; Guimaraes, N; Campos, R; Jorge, A;

Publication
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V

Abstract
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt.

CloseRead Abstract

2022

Diachronic Analysis of Time References in News Articles

Authors
Jatowt, A; Doucet, A; Campos, R;

Publication
Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022

Abstract
Time expressions embedded in text are important for many downstream tasks in NLP and IR. They have been, for example, utilized for timeline summarization, named entity recognition, temporal information retrieval, question answering and others. In this paper, we introduce a novel analytical approach to analyzing characteristics of time expressions in diachronic text collections. Based on a collection of news articles published over a 33-years' long time span, we investigate several aspects of time expressions with a focus on their interplay with publication dates of containing documents. We utilize a graph-based representation of temporal expressions to represent them through their co-occurring named entities. The proposed approach results in several observations that could be utilized in automatic systems that rely on processing temporal signals embedded in text. It could be also of importance for professionals (e.g., historians) who wish to understand fluctuations in collective memories and collective expectations based on large-scale, diachronic document collections. © 2022 ACM.

CloseRead Abstract

2016

Report on the 1st International Workshop on Recent Trends in News Information Retrieval (NewsIR16)

Authors
Alvarez, MM; Kruschwitz, U; Kazai, G; Hopfgartner, F; Corney, DPA; Campos, R; Albakour, D;

Publication
SIGIR Forum

Abstract

2023

Report on the 1st Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT 2023) at SIGIR 2023

Authors
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publication
SIGIR Forum

Abstract