Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Alípio Jorge

2023

Report on the 6th International Workshop on Narrative Extraction from Texts (Text2Story 2023) at ECIR 2023

Autores
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, H; Mansouri, B;

Publicação
SIGIR Forum

Abstract

2023

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

Autores
Sousa, H; Guimaraes, N; Jorge, A; Campos, R;

Publicação
2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT

Abstract
The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models - GPT-3 and GPT-3.5, commonly known as ChatGPT - in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.

2023

Proceedings of the 6th Workshop on Online Recommender Systems and User Modeling co-located with the 17th ACM Conference on Recommender Systems (RecSys 2023), Singapore, September 19th, 2023

Autores
Vinagre, J; Ghossein, MA; Peska, L; Jorge, AM; Bifet, A;

Publicação
ORSUM@RecSys

Abstract

2023

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Autores
Muhammad, SH; Abdulmumin, I; Ayele, AA; Ousidhoum, N; Adelani, DI; Yimam, SM; Ahmad, IS; Beloucif, M; Mohammad, SM; Ruder, S; Hourrane, O; Jorge, A; Brazdil, P; António Ali, FDM; David, D; Osei, S; Bello, BS; Lawan, FI; Gwadabe, T; Rutunda, S; Belay, TD; Messelle, WB; Balcha, HB; Chala, SA; Gebremichael, HT; Opoku, B; Arthur, S;

Publicação
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023

Abstract

2024

<i>Physio</i>: An LLM-Based Physiotherapy Advisor

Autores
Almeida, R; Sousa, H; Cunha, LF; Guimaraes, N; Campos, R; Jorge, A;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V

Abstract
The capabilities of the most recent language models have increased the interest in integrating them into real-world applications. However, the fact that these models generate plausible, yet incorrect text poses a constraint when considering their use in several domains. Healthcare is a prime example of a domain where text-generative trustworthiness is a hard requirement to safeguard patient well-being. In this paper, we present Physio, a chat-based application for physical rehabilitation. Physio is capable of making an initial diagnosis while citing reliable health sources to support the information provided. Furthermore, drawing upon external knowledge databases, Physio can recommend rehabilitation exercises and over-the-counter medication for symptom relief. By combining these features, Physio can leverage the power of generative models for language processing while also conditioning its response on dependable and verifiable sources. A live demo of Physio is available at https://physio.inesctec.pt.

2024

The 7th International Workshop on Narrative Extraction from Texts: Text2Story 2024

Autores
Campos, R; Jorge, A; Jatowt, A; Bhatia, S; Litvak, M;

Publicação
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT V

Abstract
The Text2Story Workshop series, dedicated to Narrative Extraction from Texts, has been running successfully since 2018. Over the past six years, significant progress, largely propelled by Transformers and Large Language Models, has advanced our understanding of natural language text. Nevertheless, the representation, analysis, generation, and comprehensive identification of the different elements that compose a narrative structure remains a challenging objective. In its seventh edition, the workshop strives to consolidate a common platform and a multidisciplinary community for discussing and addressing various issues related to narrative extraction tasks. In particular, we aim to bring to the forefront the challenges involved in understanding narrative structures and integrating their representation into established frameworks, as well as in modern architectures (e.g., transformers) and AI-powered language models (e.g., chatGPT) which are now common and form the backbone of almost every IR and NLP application. Text2Story encompasses sessions covering full research papers, work-in-progress, demos, resources, position and dissemination papers, along with keynote talks. Moreover, there is dedicated space for informal discussions on methods, challenges, and the future of research in this dynamic field.

  • 39
  • 40