Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Sérgio Nunes

2019

Characterizing the Hypergraph-of-Entity Representation Model

Authors
Devezas, JL; Nunes, S;

Publication
Complex Networks and Their Applications VIII - Volume 2 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019, Lisbon, Portugal, December 10-12, 2019.

Abstract
The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. © 2020, Springer Nature Switzerland AG.

2019

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Authors
Fortuna, P; Rocha da Silva, JR; Soler Company, J; Wanner, L; Nunes, S;

Publication
THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE

Abstract
Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. First, non-experts annotated the tweets with binary labels ('hate' vs. 'no-hate'). Then, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. The hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.

2020

Characterizing the hypergraph-of-entity and the structural impact of its extensions

Authors
Devezas, J; Nunes, S;

Publication
APPLIED NETWORK SCIENCE

Abstract
The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions-materialized through synonym, context, and tf_bin hyperedges-in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specific aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval effectiveness and the structural features of the representation model, proposing ranking and anomaly indicators, as useful guides for modifying or extending the hypergraph-of-entity.

2021

Managing research the wiki way

Authors
Devezas, JL; Nunes, S;

Publication
XRDS

Abstract

2021

Brat2Viz: a Tool and Pipeline for Visualizing Narratives from Annotated Texts

Authors
Amorim, E; Ribeiro, A; Santana, BS; Cantante, I; Jorge, A; Nunes, S; Silvano, P; Leal, A; Campos, R;

Publication
Proceedings of Text2Story - Fourth Workshop on Narrative Extraction From Texts held in conjunction with the 43rd European Conference on Information Retrieval (ECIR 2021), Lucca, Italy, April 1, 2021 (online event due to Covid-19 outbreak).

Abstract
Narrative Extraction from text is a complex task that starts by identifying a set of narrative elements (actors, events, times), and the semantic links between them (temporal, referential, semantic roles). The outcome is a structure or set of structures which can then be represented graphically, thus opening room for further and alternative exploration of the plot. Such visualization can also be useful during the on-going annotation process. Manual annotation of narratives can be a complex effort and the possibility offered by the Brat annotation tool of annotating directly on the text does not seem sufficiently helpful. In this paper, we propose Brat2Viz, a tool and a pipeline that displays visualization of narrative information annotated in Brat. Brat2Viz reads the annotation file of Brat, produces an intermediate representation in the declarative language DRS (Discourse Representation Structure), and from this obtains the visualization. Currently, we make available two visualization schemes: MSC (Message Sequence Chart) and Knowledge Graphs. The modularity of the pipeline enables the future extension to new annotation sources, different annotation schemes, and alternative visualizations or representations. We illustrate the pipeline using examples from an European Portuguese news corpus. Copyright © by the paper's authors.

2020

ECIR 2020 workshops: assessing the impact of going online

Authors
Nunes, S; Little, S; Bhatia, S; Boratto, L; Cabanac, G; Campos, R; Couto, FM; Faralli, S; Frommholz, I; Jatowt, A; Jorge, A; Marras, M; Mayr, P; Stilo, G;

Publication
SIGIR Forum

Abstract

  • 6
  • 12