2022
Authors
Rocha, G; Leite, B; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;
Publication
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)
Abstract
Annotating a corpus with argument structures is a complex task, and it is even more challenging when addressing text genres where argumentative discourse markers do not abound. We explore a corpus of opinion articles annotated by multiple annotators, providing diverse perspectives of the argumentative content therein. New annotation aggregation methods are explored, diverging from the traditional ones that try to minimize presumed errors from annotator disagreement. The impact of our methods is assessed for the task of argument density prediction, seen as an initial step in the argument mining pipeline. We evaluate and compare models trained for this regression task in different generated datasets, considering their prediction error and also from a ranking perspective. Results confirm the expectation that addressing argument density from a ranking perspective is more promising than looking at the problem as a mere regression task. We also show that probabilistic aggregation, which weighs tokens by considering all annotators, is a more interesting approach, achieving encouraging results as it accommodates different annotator perspectives. The code and models are publicly available at https://github.com/DARGMINTS/argument density.
2022
Authors
Rocha, G; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;
Publication
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Abstract
Interest in argument mining has resulted in an increasing number of argument annotated corpora. However, most focus on English texts with explicit argumentative discourse markers, such as persuasive essays or legal documents. Conversely, we report on the first extensive and consolidated Portuguese argument annotation project focused on opinion articles. We briefly describe the annotation guidelines based on a multi-layered process and analyze the manual annotations produced, highlighting the main challenges of this textual genre. We then conduct a comprehensive inter-annotator agreement analysis, including argumentative discourse units, their classes and relations, and resulting graphs. This analysis reveals that each of these aspects tackles very different kinds of challenges. We observe differences in annotator profiles, motivating our aim of producing a non-aggregated corpus containing the insights of every annotator. We note that the interpretation and identification of token-level arguments is challenging; nevertheless, tasks that focus on higher-level components of the argument structure can obtain considerable agreement. We lay down perspectives on corpus usage, exploiting its multi-faceted nature.
2023
Authors
Correia, A; Guimaraes, D; Paredes, H; Fonseca, B; Paulino, D; Trigo, L; Brazdil, P; Schneider, D; Grover, A; Jameel, S;
Publication
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS
Abstract
Visualizing and examining the intellectual landscape and evolution of scientific communities to support collaboration is crucial for multiple research purposes. In some cases, measuring similarities and matching patterns between research publication document sets can help to identify people with similar interests for building research collaboration networks and university-industry linkages. The premise of this work is assessing feasibility for resolving ambiguous cases in similarity detection to determine authorship with natural language processing (NLP) techniques so that crowdsourcing is applied only in instances that require human judgment. Using an NLP-crowdsourcing convergence strategy, we can reduce the costs of microtask crowdsourcing while saving time and maintaining disambiguation accuracy over large datasets. This article contributes a next-gen crowd-artificial intelligence framework that used an ensemble of term frequency-inverse document frequency and bidirectional encoder representation from transformers to obtain similarity rankings for pairs of scientific documents. A sequence of content-based similarity tasks was created using a crowd-powered interface for solving disambiguation problems. Our experimental results suggest that an adaptive NLP-crowdsourcing hybrid framework has advantages for inter-researcher similarity detection tasks where fully automatic algorithms provide unsatisfactory results, with the goal of helping researchers discover potential collaborators using data-driven approaches.
2023
Authors
Silva, CRSe; Pimentel Trigo, LM;
Publication
Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2022, Graz, Austria, July 10-14, 2023, Conference Abstracts
Abstract
2015
Authors
Brazdil, P; Trigo, L; Cordeiro, J; Sarmento, R; Valizadeh, M;
Publication
Oslo Studies in Language
Abstract
2018
Authors
Sarmento, R; Trigo, L; Fonseca, L;
Publication
Intelligent Systems
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.