2024
Authors
Cunha, LF; Silvano, P; Campos, R; Jorge, A;
Publication
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024
Abstract
Event extraction is an NLP task that commonly involves identifying the central word (trigger) for an event and its associated arguments in text. ACE-2005 is widely recognised as the standard corpus in this field. While other corpora, like PropBank, primarily focus on annotating predicate-argument structure, ACE-2005 provides comprehensive information about the overall event structure and semantics. However, its limited language coverage restricts its usability. This paper introduces ACE-2005-PT, a corpus created by translating ACE-2005 into Portuguese, with European and Brazilian variants. To speed up the process of obtaining ACE-2005-PT, we rely on automatic translators. This, however, poses some challenges related to automatically identifying the correct alignments between multi-word annotations in the original text and in the corresponding translated sentence. To achieve this, we developed an alignment pipeline that incorporates several alignment techniques: lemmatization, fuzzy matching, synonym matching, multiple translations and a BERT-based word aligner. To measure the alignment effectiveness, a subset of annotations from the ACE-2005-PT corpus was manually aligned by a linguist expert. This subset was then compared against our pipeline results which achieved exact and relaxed match scores of 70.55% and 87.55% respectively. As a result, we successfully generated a Portuguese version of the ACE-2005 corpus, which has been accepted for publication by LDC.
2024
Authors
Ströhle, T; Campos, R; Jatowt, A;
Publication
Int. J. Data Sci. Anal.
Abstract
2018
Authors
Wakamiya, S; Jatowt, A; Kawai, Y; Akiyama, T; Campos, R; Yonezawa, T;
Publication
CEUR Workshop Proceedings
Abstract
Nowadays, humanity generates and contributes to form large and complex datasets, going from documents published on media outlets, posts on social media or location-based information. The generated information tends to be complex, heterogeneous (texts, images, videos, etc.) and is growing at an incredible pace, with much of this data having a strong spatial and temporal focus. This steady increase in the availability of such a volume of information, forces the development of more effective user interfaces that would assist users in efficient visualization, analysis and exploration of the data. This half-day workshop on User Interfaces for Spatial and Temporal Data Analysis (UISTDA) held in conjunction with the IUI2018 conference on March 11th, aimed at sharing the latest progress and developments, current challenges and potential applications for exploiting large amounts of spatial and temporal data. In this paper we provide an overview of the workshop goals together with its main contributions. © 2018 Copyright for the individual papers remains with the authors.
2022
Authors
Leal, A; Silvano, P; Amorim, E; Cantante, I; Jorge, FSA; Campos, R;
Publication
Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, ISA 2022 at LREC 2022 Workshop - Language Resources and Evaluation Conference
Abstract
Reasoning about spatial information is fundamental in natural language to fully understand relationships between entities and/or between events. However, the complexity underlying such reasoning makes it hard to represent formally spatial information. Despite the growing interest on this topic, and the development of some frameworks, many problems persist regarding, for instance, the coverage of a wide variety of linguistic constructions and of languages. In this paper, we present a proposal of integrating ISO-Space into a ISO-based multilayer annotation scheme, designed to annotate news in European Portuguese. This scheme already enables annotation at three levels, temporal, referential and thematic, by combining postulates from ISO 24617-1, 4 and 9. Since the corpus comprises news articles, and spatial information is relevant within this kind of texts, a more detailed account of space was required. The main objective of this paper is to discuss the process of integrating ISO-Space with the existing layers of our annotation scheme, assessing the compatibility of the aforementioned parts of ISO 24617, and the problems posed by the harmonization of the four layers and by some specifications of ISO-Space. © European Language Resources Association (ELRA).
2016
Authors
Martinez, M; Kruschwitz, U; Kazai, G; Hopfgartner, F; Corney, D; Campos, R; Albakour, D;
Publication
CEUR Workshop Proceedings
Abstract
2024
Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M; Cordeiro, JP; Rocha, C; Sousa, HO; Mansouri, B;
Publication
SIGIR Forum
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.