Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2022

Predicting Argument Density from Multiple Annotations

Authors
Rocha, G; Leite, B; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publication
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)

Abstract
Annotating a corpus with argument structures is a complex task, and it is even more challenging when addressing text genres where argumentative discourse markers do not abound. We explore a corpus of opinion articles annotated by multiple annotators, providing diverse perspectives of the argumentative content therein. New annotation aggregation methods are explored, diverging from the traditional ones that try to minimize presumed errors from annotator disagreement. The impact of our methods is assessed for the task of argument density prediction, seen as an initial step in the argument mining pipeline. We evaluate and compare models trained for this regression task in different generated datasets, considering their prediction error and also from a ranking perspective. Results confirm the expectation that addressing argument density from a ranking perspective is more promising than looking at the problem as a mere regression task. We also show that probabilistic aggregation, which weighs tokens by considering all annotators, is a more interesting approach, achieving encouraging results as it accommodates different annotator perspectives. The code and models are publicly available at https://github.com/DARGMINTS/argument density.

2022

Annotating Arguments in a Corpus of Opinion Articles

Authors
Rocha, G; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publication
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Interest in argument mining has resulted in an increasing number of argument annotated corpora. However, most focus on English texts with explicit argumentative discourse markers, such as persuasive essays or legal documents. Conversely, we report on the first extensive and consolidated Portuguese argument annotation project focused on opinion articles. We briefly describe the annotation guidelines based on a multi-layered process and analyze the manual annotations produced, highlighting the main challenges of this textual genre. We then conduct a comprehensive inter-annotator agreement analysis, including argumentative discourse units, their classes and relations, and resulting graphs. This analysis reveals that each of these aspects tackles very different kinds of challenges. We observe differences in annotator profiles, motivating our aim of producing a non-aggregated corpus containing the insights of every annotator. We note that the interpretation and identification of token-level arguments is challenging; nevertheless, tasks that focus on higher-level components of the argument structure can obtain considerable agreement. We lay down perspectives on corpus usage, exploiting its multi-faceted nature.

2022

Visualização da relevância relativa de investigadores a partir da sua produção textual

Authors
Trigo, L; Brazdil, P;

Publication
Linguística: Revista de Estudos Linguísticos da Universidade do Porto

Abstract
Building a researchers affinity network through the automatic processing of their publications allows us to gain a perspective that goes beyond the networks established through co-authorship. The definition of the importance of each researcher is defined upon their bibliographic production volume, i.e., number of publications, and also upon their centrality in the general network of researchers. In fact, the centrality of a researcher in a network reveals its importance in communication flows with other researchers, thus assuming that communication between researchers is itself a relevant factor for organizational life and in its production. Both network and centrality concepts are better interpreted in a graphical way. In this study, we explore the workflow that will provide these visualizations and focus in the empirical selection of the most appropriate centrality measure. We also propose a centrality visualization method that facilitates the interpretation of the selected measures

2022

The impact of heterogeneous distance functions on missing data imputation and classification performance

Authors
Santos, MS; Abreu, PH; Fernandez, A; Luengo, J; Santos, J;

Publication
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

Abstract
This work performs an in-depth study of the impact of distance functions on K-Nearest Neighbours imputation of heterogeneous datasets. Missing data is generated at several percentages, on a large benchmark of 150 datasets (50 continuous, 50 categorical and 50 heterogeneous datasets) and data imputation is performed using different distance functions (HEOM, HEOM-R, HVDM, HVDM-R, HVDM-S, MDE and SIMDIST) and k values (1, 3, 5 and 7). The impact of distance functions on kNN imputation is then evaluated in terms of classification performance, through the analysis of a classifier learned from the imputed data, and in terms of imputation quality, where the quality of the reconstruction of the original values is assessed. By analysing the properties of heterogeneous distance functions over continuous and categorical datasets individually, we then study their behaviour over heterogeneous data. We discuss whether datasets with different natures may benefit from different distance functions and to what extent the component of a distance function that deals with missing values influences such choice. Our experiments show that missing data has a significant impact on distance computation and the obtained results provide guidelines on how to choose appropriate distance functions depending on data characteristics (continuous, categorical or heterogeneous datasets) and the objective of the study (classification or imputation tasks).

2022

The identification of cancer lesions in mammography images with missing pixels: analysis of morphology

Authors
Santos, JC; Abreu, PH; Santos, MS;

Publication
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA)

Abstract
The quality of mammography images is essential for the diagnosis of breast cancer and image imputation has become a popular technique to overcome noise, artifacts, and missing data to aid in the diagnosis of diseases. In this paper, we assess the performance of six imputation methodologies for the reconstruction of missing pixels in different morphologies in mammography images. The images included in this study are collected from four public datasets (CBIS-DDSM, Mini-MIAS, INbreast, and CSAW) and the imputation results are evaluated through the mean absolute error (MAE) and structural similarity index measure (SSIM). This study goes beyond the traditional evaluation of imputation algorithms, analyzing imputation quality, morphology preservation and classification performance. The effects of imputation on the morphology of cancer lesions are of utmost importance since it lays the foundation for physicians to interpret and analyze the imputation results. The results show that DIP is the most promising methodology for higher missing pixel rates, morphology preservation, and classifying malignant and benign images.

2022

Brown-Sequard syndrome in a patient with spondyloarthritis after COVID-19 vaccine: a challenging differential diagnosis

Authors
Costa, R; Soares, C; Vaz, C; Bernardes, M; Tavares, M; Abreu, P;

Publication
ARP RHEUMATOLOGY

Abstract

  • 78
  • 440