Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Ricardo Campos

2024

ACE-2005-PT: Corpus for Event Extraction in Portuguese

Authors
Cunha, LF; Silvano, P; Campos, R; Jorge, A;

Publication
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024

Abstract
Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55% and 87.55% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.

2024

Contrastive text summarization: a survey

Authors
Ströhle, T; Campos, R; Jatowt, A;

Publication
Int. J. Data Sci. Anal.

Abstract

2018

Overview of IUI2018 workshop: User interfaces for spatial and temporal data analysis (UISTDA2018)

Authors
Wakamiya, S; Jatowt, A; Kawai, Y; Akiyama, T; Campos, R; Yonezawa, T;

Publication
CEUR Workshop Proceedings

Abstract
Nowadays, humanity generates and contributes to form large and complex datasets, going from documents published on media outlets, posts on social media or location-based information. The generated information tends to be complex, heterogeneous (texts, images, videos, etc.) and is growing at an incredible pace, with much of this data having a strong spatial and temporal focus. This steady increase in the availability of such a volume of information, forces the development of more effective user interfaces that would assist users in efficient visualization, analysis and exploration of the data. This half-day workshop on User Interfaces for Spatial and Temporal Data Analysis (UISTDA) held in conjunction with the IUI2018 conference on March 11th, aimed at sharing the latest progress and developments, current challenges and potential applications for exploiting large amounts of spatial and temporal data. In this paper we provide an overview of the workshop goals together with its main contributions. © 2018 Copyright for the individual papers remains with the authors.

2022

The place of ISO-Space in Text2Story multilayer annotation scheme

Authors
Leal, A; Silvano, P; Amorim, E; Cantante, I; Jorge, FSA; Campos, R;

Publication
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, ISA 2022 at LREC 2022 Workshop - Language Resources and Evaluation Conference

Abstract
Reasoning about spatial information is fundamental in natural language to fully understand relationships between entities and/or between events. However, the complexity underlying such reasoning makes it hard to represent formally spatial information. Despite the growing interest on this topic, and the development of some frameworks, many problems persist regarding, for instance, the coverage of a wide variety of linguistic constructions and of languages. In this paper, we present a proposal of integrating ISO-Space into a ISO-based multilayer annotation scheme, designed to annotate news in European Portuguese. This scheme already enables annotation at three levels, temporal, referential and thematic, by combining postulates from ISO 24617-1, 4 and 9. Since the corpus comprises news articles, and spatial information is relevant within this kind of texts, a more detailed account of space was required. The main objective of this paper is to discuss the process of integrating ISO-Space with the existing layers of our annotation scheme, assessing the compatibility of the aforementioned parts of ISO 24617, and the problems posed by the harmonization of the four layers and by some specifications of ISO-Space. © European Language Resources Association (ELRA).

2016

Preface

Authors
Martinez, M; Kruschwitz, U; Kazai, G; Hopfgartner, F; Corney, D; Campos, R; Albakour, D;

Publication
CEUR Workshop Proceedings

Abstract

2024

Report on the 7th International Workshop on Narrative Extraction from Texts (Text2Story 2024) at ECIR 2024

Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Mansouri, B;

Publication
SIGIR Forum

Abstract

  • 19
  • 20