Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2023

The 1st International Workshop on Implicit Author Characterization from Texts for Search and Retrieval (IACT'23)

Autores
Litvak, M; Rabaev, I; Campos, R; Jorge, AM; Jatowt, A;

Publicação
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023

Abstract
The first edition of the Implicit Author Characterization from Texts for Search and Retrieval (IACT'23) aims at bringing to the forefront the challenges involved in identifying and extracting from texts implicit information about authors (e.g., human or AI) and using it in IR tasks. The IACT workshop provides a common forum to consolidate multi-disciplinary efforts and foster discussions to identify the wide-ranging issues related to the task of extracting implicit author-related information from the textual content, including novel tasks and datasets. We will also discuss the ethical implications of implicit information extraction. In addition, we announce a shared task focused on automatically determining the literary epochs of written books.

2023

Clinical model for Hereditary Transthyretin Amyloidosis age of onset prediction

Autores
Pedroto, M; Coelho, T; Jorge, A; Mendes Moreira, J;

Publicação
FRONTIERS IN NEUROLOGY

Abstract
IntroductionHereditary transthyretin amyloidosis (ATTRv amyloidosis) is a rare neurological hereditary disease clinically characterized as severe, progressive, and life-threatening while the age of onset represents the moment in time when the first symptoms are felt. In this study, we present and discuss our results on the study, development, and evaluation of an approach that allows for time-to-event prediction of the age of onset, while focusing on genealogical feature construction. Materials and methodsThis research was triggered by the need to answer the medical problem of when will an asymptomatic ATTRv patient show symptoms of the disease. To do so, we defined and studied the impact of 77 features (ranging from demographic and genealogical to familial disease history) we studied and compared a pool of prediction algorithms, namely, linear regression (LR), elastic net (EN), lasso (LA), ridge (RI), support vector machines (SV), decision tree (DT), random forest (RF), and XGboost (XG), both in a classification as well as a regression setting; we assembled a baseline (BL) which corresponds to the current medical knowledge of the disease; we studied the problem of predicting the age of onset of ATTRv patients; we assessed the viability of predicting age of onset on short term horizons, with a classification framing, on localized sets of patients (currently symptomatic and asymptomatic carriers, with and without genealogical information); and we compared the results with an out-of-bag evaluation set and assembled in a different time-frame than the original data in order to account for data leakage. ResultsCurrently, we observe that our approach outperforms the BL model, which follows a set of clinical heuristics and represents current medical practice. Overall, our results show the supremacy of SV and XG for both the prediction tasks although impacted by data characteristics, namely, the existence of missing values, complex data, and small-sized available inputs. DiscussionWith this study, we defined a predictive model approach capable to be well-understood by medical professionals, compared with the current practice, namely, the baseline approach (BL), and successfully showed the improvement achieved to the current medical knowledge.

2023

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Autores
Sousa, H; Pasquali, A; Jorge, A; Santos, CS; Lopes, MA;

Publicação
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023

Abstract
Textual health records of cancer patients are usually protracted and highly unstructured, making it very time-consuming for health professionals to get a complete overview of the patient's therapeutic course. As such limitations can lead to suboptimal and/or inefficient treatment procedures, healthcare providers would greatly benefit from a system that effectively summarizes the information of those records. With the advent of deep neural models, this objective has been partially attained for English clinical texts, however, the research community still lacks an effective solution for languages with limited resources. In this paper, we present the approach we developed to extract procedures, drugs, and diseases from oncology health records written in European Portuguese. This project was conducted in collaboration with the Portuguese Institute for Oncology which, besides holding over 10 years of duly protected medical records, also provided oncologist expertise throughout the development of the project. Since there is no annotated corpus for biomedical entity extraction in Portuguese, we also present the strategy we followed in annotating the corpus for the development of the models. The final models, which combined a neural architecture with entity linking, achieved..1 scores of 88.6, 95.0, and 55.8 per cent in the mention extraction of procedures, drugs, and diseases, respectively.

2023

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Autores
Sousa, H; Jorge, A; Campos, R;

Publicação
PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023

Abstract
Temporal information extraction (TIE) has attracted a great deal of interest over the last two decades. Such endeavors have led to the development of a significant number of datasets. Despite its benefits, having access to a large volume of corpora makes it difficult to benchmark TIE systems. On the one hand, different datasets have different annotation schemes, which hinders the comparison between competitors across different corpora. On the other hand, the fact that each corpus is disseminated in a different format requires a considerable engineering effort for a researcher/practitioner to develop parsers for all of them. These constraints force researchers to select a limited amount of datasets to evaluate their systems which consequently limits the comparability of the systems. Yet another obstacle to the comparability of TIE systems is the evaluation metric employed. While most research works adopt traditional metrics such as precision, recall, and..1, a few others prefer temporal awareness - a metric tailored to be more comprehensive on the evaluation of temporal systems. Although the reason for the absence of temporal awareness in the evaluation of most systems is not clear, one of the factors that certainly weighs on this decision is the need to implement the temporal closure algorithm, which is neither straightforward to implement nor easily available. All in all, these problems have limited the fair comparison between approaches and consequently, the development of TIE systems. To mitigate these problems, we have developed tieval, a Python library that provides a concise interface for importing different corpora and is equipped with domain-specific operations that facilitate system evaluation. In this paper, we present the first public release of tieval and highlight its most relevant features. The library is available as open source, under MIT License, at PyPI1 and GitHub(2).

2023

A survey on narrative extraction from textual data

Autores
Santana, B; Campos, R; Amorim, E; Jorge, A; Silvano, P; Nunes, S;

Publicação
ARTIFICIAL INTELLIGENCE REVIEW

Abstract
Narratives are present in many forms of human expression and can be understood as a fundamental way of communication between people. Computational understanding of the underlying story of a narrative, however, may be a rather complex task for both linguists and computational linguistics. Such task can be approached using natural language processing techniques to automatically extract narratives from texts. In this paper, we present an in depth survey of narrative extraction from text, providing a establishing a basis/framework for the study roadmap to the study of this area as a whole as a means to consolidate a view on this line of research. We aim to fulfill the current gap by identifying important research efforts at the crossroad between linguists and computer scientists. In particular, we highlight the importance and complexity of the annotation process, as a crucial step for the training stage. Next, we detail methods and approaches regarding the identification and extraction of narrative components, their linkage and understanding of likely inherent relationships, before detailing formal narrative representation structures as an intermediate step for visualization and data exploration purposes. We then move into the narrative evaluation task aspects, and conclude this survey by highlighting important open issues under the domain of narratives extraction from texts that are yet to be explored.

2023

Combining Symbolic and Deep Learning Approaches for Sentiment Analysis

Autores
Muhammad, SH; Brazdil, P; Jorge, A;

Publicação
Compendium of Neurosymbolic Artificial Intelligence

Abstract
Deep learning approaches have become popular in sentiment analysis because of their competitive performance. The downside of this approach is that they do not provide understandable explanations on how the sentiment values are calculated. Previous approaches that used sentiment lexicons for sentiment analysis can do that, but their performance is lower than deep learning approaches. Therefore, it is natural to wonder if the two approaches can be combined to exploit their advantages. In this chapter, we present a neuro-symbolic approach that combines both symbolic and deep learning approaches for sentiment analysis tasks. The symbolic approach exploits sentiment lexicon and shifter patterns-which cover the operations of inversion/reversal, intensification, and attenuation/downtoning. The deep learning approach used a pre-trained language model (PLM) to construct sentiment lexicon. Our experimental result shows that the proposed approach leads to promising results, substantially better than the results of a pure lexicon-based approach. Although the results did not reach the level of the deep learning approach, a great advantage is that sentiment prediction can be accompanied by understandable explanations. For some users, it is very important to see how sentiment is derived, even if performance is a little lower. © 2023 The authors and IOS Press. All rights reserved.

  • 22
  • 440