Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Sérgio Nunes

2019

Characterizing the Hypergraph-of-Entity Representation Model

Autores
Devezas, JL; Nunes, S;

Publicação
Complex Networks and Their Applications VIII - Volume 2 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019, Lisbon, Portugal, December 10-12, 2019.

Abstract
The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. © 2020, Springer Nature Switzerland AG.

2019

A Hierarchically-Labeled Portuguese Hate Speech Dataset

Autores
Fortuna, P; Rocha da Silva, JR; Soler Company, J; Wanner, L; Nunes, S;

Publicação
THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE

Abstract
Over the past years, the amount of online offensive speech has been growing steadily. To successfully cope with it, machine learning is applied. However, ML-based techniques require sufficiently large annotated datasets. In the last years, different datasets were published, mainly for English. In this paper, we present a new dataset for Portuguese, which has not been in focus so far. The dataset is composed of 5,668 tweets. For its annotation, we defined two different schemes used by annotators with different levels of expertise. First, non-experts annotated the tweets with binary labels ('hate' vs. 'no-hate'). Then, expert annotators classified the tweets following a fine-grained hierarchical multiple label scheme with 81 hate speech categories in total. The inter-annotator agreement varied from category to category, which reflects the insight that some types of hate speech are more subtle than others and that their detection depends on personal perception. The hierarchical annotation scheme is the main contribution of the presented work, as it facilitates the identification of different types of hate speech and their intersections. To demonstrate the usefulness of our dataset, we carried a baseline classification experiment with pre-trained word embeddings and LSTM on the binary classified data, with a state-of-the-art outcome.

2020

Characterizing the hypergraph-of-entity and the structural impact of its extensions

Autores
Devezas, J; Nunes, S;

Publicação
APPLIED NETWORK SCIENCE

Abstract
The hypergraph-of-entity is a joint representation model for terms, entities and their relations, used as an indexing approach in entity-oriented search. In this work, we characterize the structure of the hypergraph, from a microscopic and macroscopic scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefficients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, finding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinalities are log-normally distributed. We also find that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions-materialized through synonym, context, and tf_bin hyperedges-in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specific aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval effectiveness and the structural features of the representation model, proposing ranking and anomaly indicators, as useful guides for modifying or extending the hypergraph-of-entity.

2021

Managing research the wiki way

Autores
Devezas, JL; Nunes, S;

Publicação
XRDS

Abstract

2021

Brat2Viz: a Tool and Pipeline for Visualizing Narratives from Annotated Texts

Autores
Amorim, E; Ribeiro, A; Santana, BS; Cantante, I; Jorge, A; Nunes, S; Silvano, P; Leal, A; Campos, R;

Publicação
Proceedings of Text2Story - Fourth Workshop on Narrative Extraction From Texts held in conjunction with the 43rd European Conference on Information Retrieval (ECIR 2021), Lucca, Italy, April 1, 2021 (online event due to Covid-19 outbreak).

Abstract
Narrative Extraction from text is a complex task that starts by identifying a set of narrative elements (actors, events, times), and the semantic links between them (temporal, referential, semantic roles). The outcome is a structure or set of structures which can then be represented graphically, thus opening room for further and alternative exploration of the plot. Such visualization can also be useful during the on-going annotation process. Manual annotation of narratives can be a complex effort and the possibility offered by the Brat annotation tool of annotating directly on the text does not seem sufficiently helpful. In this paper, we propose Brat2Viz, a tool and a pipeline that displays visualization of narrative information annotated in Brat. Brat2Viz reads the annotation file of Brat, produces an intermediate representation in the declarative language DRS (Discourse Representation Structure), and from this obtains the visualization. Currently, we make available two visualization schemes: MSC (Message Sequence Chart) and Knowledge Graphs. The modularity of the pipeline enables the future extension to new annotation sources, different annotation schemes, and alternative visualizations or representations. We illustrate the pipeline using examples from an European Portuguese news corpus. Copyright © by the paper's authors.

2020

ECIR 2020 workshops: assessing the impact of going online

Autores
Nunes, S; Little, S; Bhatia, S; Boratto, L; Cabanac, G; Campos, R; Couto, FM; Faralli, S; Frommholz, I; Jatowt, A; Jorge, A; Marras, M; Mayr, P; Stilo, G;

Publicação
SIGIR Forum

Abstract

  • 6
  • 12