Publications

Publications by Luís Pimentel Trigo

2022

Predicting Argument Density from Multiple Annotations

Authors
Rocha, G; Leite, B; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publication
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)

Abstract
Annotating a corpus with argument structures is a complex task, and it is even more challenging when addressing text genres where argumentative discourse markers do not abound. We explore a corpus of opinion articles annotated by multiple annotators, providing diverse perspectives of the argumentative content therein. New annotation aggregation methods are explored, diverging from the traditional ones that try to minimize presumed errors from annotator disagreement. The impact of our methods is assessed for the task of argument density prediction, seen as an initial step in the argument mining pipeline. We evaluate and compare models trained for this regression task in different generated datasets, considering their prediction error and also from a ranking perspective. Results confirm the expectation that addressing argument density from a ranking perspective is more promising than looking at the problem as a mere regression task. We also show that probabilistic aggregation, which weighs tokens by considering all annotators, is a more interesting approach, achieving encouraging results as it accommodates different annotator perspectives. The code and models are publicly available at https://github.com/DARGMINTS/argument density.

CloseRead Abstract

2022

Annotating Arguments in a Corpus of Opinion Articles

Authors
Rocha, G; Trigo, L; Cardoso, HL; Sousa-Silva, R; Carvalho, P; Martins, B; Won, M;

Publication
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION

Abstract
Interest in argument mining has resulted in an increasing number of argument annotated corpora. However, most focus on English texts with explicit argumentative discourse markers, such as persuasive essays or legal documents. Conversely, we report on the first extensive and consolidated Portuguese argument annotation project focused on opinion articles. We briefly describe the annotation guidelines based on a multi-layered process and analyze the manual annotations produced, highlighting the main challenges of this textual genre. We then conduct a comprehensive inter-annotator agreement analysis, including argumentative discourse units, their classes and relations, and resulting graphs. This analysis reveals that each of these aspects tackles very different kinds of challenges. We observe differences in annotator profiles, motivating our aim of producing a non-aggregated corpus containing the insights of every annotator. We note that the interpretation and identification of token-level arguments is challenging; nevertheless, tasks that focus on higher-level components of the argument structure can obtain considerable agreement. We lay down perspectives on corpus usage, exploiting its multi-faceted nature.

CloseRead Abstract

2023

NLP-Crowdsourcing Hybrid Framework for Inter-Researcher Similarity Detection

Authors
Correia, A; Guimaraes, D; Paredes, H; Fonseca, B; Paulino, D; Trigo, L; Brazdil, P; Schneider, D; Grover, A; Jameel, S;

Publication
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS

Abstract
Visualizing and examining the intellectual landscape and evolution of scientific communities to support collaboration is crucial for multiple research purposes. In some cases, measuring similarities and matching patterns between research publication document sets can help to identify people with similar interests for building research collaboration networks and university-industry linkages. The premise of this work is assessing feasibility for resolving ambiguous cases in similarity detection to determine authorship with natural language processing (NLP) techniques so that crowdsourcing is applied only in instances that require human judgment. Using an NLP-crowdsourcing convergence strategy, we can reduce the costs of microtask crowdsourcing while saving time and maintaining disambiguation accuracy over large datasets. This article contributes a next-gen crowd-artificial intelligence framework that used an ensemble of term frequency-inverse document frequency and bidirectional encoder representation from transformers to obtain similarity rankings for pairs of scientific documents. A sequence of content-based similarity tasks was created using a crowd-powered interface for solving disambiguation problems. Our experimental results suggest that an adaptive NLP-crowdsourcing hybrid framework has advantages for inter-researcher similarity detection tasks where fully automatic algorithms provide unsatisfactory results, with the goal of helping researchers discover potential collaborators using data-driven approaches.

CloseRead Abstract

2023

CreoPhonPt: a collaborative database saving Portuguese creoles from digital obliteration

Authors
Silva, CRSe; Pimentel Trigo, LM;

Publication
Annual International Conference of the Alliance of Digital Humanities Organizations, DH 2022, Graz, Austria, July 10-14, 2023, Conference Abstracts

Abstract

2015

Affinity Mining of Documents Sets via Network Analysis, Keywords and Summaries

Authors
Brazdil, P; Trigo, L; Cordeiro, J; Sarmento, R; Valizadeh, M;

Publication
Oslo Studies in Language

Abstract
Encontrar pessoas com interesses semelhantes dentro de um domínio pode fornecer um importante auxílio na gestão de centros de investigação. Como a produção académica é facilmente obtida em bases de dados bibliográficas e académicas, estas podem ser usadas para descobrir as afinidades entre os investigadores que não estejam já evidenciadas pela co-autoria. Este processo de descoberta dá-se com a ajuda de técnicas de análise de texto, na base dos termos utilizados nos respectivos documentos. A afinidade pode ser representada em forma de rede, em que os nós representam os artigos de cada investigador e as ligações representam similaridade entre os diferentes investigadores. Cada nó pode ser caracterizado através de diversas medidas de centralidade na rede e algoritmos de detecção de comunidades permitem identificar grupos com interesses semelhantes. Cada nó é ainda caracterizado por um conjunto de palavras-chave e resumos descobertos automaticamente com a ajuda de técnicas avançadas. Este artigo fornece mais detalhes sobre os métodos adoptados e/ou desenvolvidos, alguns dos quais foram implementados no nosso protótipo. Os métodos descritos são gerais e aplicáveis a muitos domínios diferentes, incluindo documentos que descrevem projetos de I&D, documentos associados a legislação, processos judiciais ou procedimentos médicos. Acreditamos deste modo que este trabalho pode ser útil para um público relativamente amplo.

CloseRead Abstract

2018

A Comprehensive Workflow for Enhancing Business Bankruptcy Prediction

Authors
Sarmento, R; Trigo, L; Fonseca, L;

Publication
Intelligent Systems

Abstract
Forecasting enterprise bankruptcy is a critical area for Business Intelligence. It is a major concern for investors and credit institutions on risk analysis. It may also enable the sustainability assessment of critical suppliers and clients, as well as competitors and the business environment. Data Mining may deliver a faster and more precise insight about this issue. Widespread software tools offer a broad spectrum of Artificial Intelligence algorithms and the most difficult task may be the decision of selecting that algorithm. Trying to find an answer for this decision in the relatively large amount of available literature in this area with so many options, advantages, and pitfalls may be as informative as distracting. In this chapter, the authors present an empirical study with a comprehensive Knowledge Discovery and Data Mining (KDD) workflow. The proposed classifier selection automation selects an algorithm that has better prediction performance than the most widely documented in the literature.

CloseRead Abstract