Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Sérgio Nunes is an Associate Professor at the Department of Informatics Engineering at FEUP, University of Porto, and a Senior Researcher at INESC TEC. He holds a PhD in Information Retrieval (2010) focused on using temporal features for relevance estimation, and a MSc in Information Management (2004).


His main research interests are in information retrieval and web information systems. He teaches databases, web technologies and information retrieval in different programs, namely the Informatics Engineering Doctoral Program, the Informatics Engineering Bachelor and Masters, and the Multimedia Masters.


Was the Director of the U.Porto Media Innovation Labs (MIL), an Excellence Center of the University of Porto, with the mission of developing the university's capacity in the field of Media in teaching, research and innovation activities by promoting collaborations between existing university structures and articulation with external partners.

Interest
Topics
Details

Details

  • Name

    Sérgio Nunes
  • Role

    Area Manager
  • Since

    20th December 2010
005
Publications

2024

Indexing Portuguese NLP Resources with PT-Pump-Up

Authors
Almeida, R; Campos, R; Jorge, A; Nunes, S;

Publication
CoRR

Abstract

2024

A Community-Driven Data-to-Text Platform for Football Match Summaries

Authors
Fernandes, P; Nunes, S; Santos, L;

Publication
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Data-to-text systems offer a transformative approach to generating textual content in data-rich environments. This paper describes the architecture and deployment of Prosebot, a community-driven data-to-text platform tailored for generating textual summaries of football matches derived from match statistics. The system enhances the visibility of lower-tier matches, traditionally accessible only through data tables. Prosebot uses a template-based Natural Language Generation (NLG) module to generate initial drafts, which are subsequently refined by the reading community. Comprehensive evaluations, encompassing both human-mediated and automated assessments, were conducted to assess the system's efficacy. Analysis of the community-edited texts reveals that significant segments of the initial automated drafts are retained, suggesting their high quality and acceptance by the collaborators. Preliminary surveys conducted among platform users highlight a predominantly positive reception within the community.

2024

Text2Story Lusa: A Dataset for Narrative Analysis in European Portuguese News Articles

Authors
Nunes, S; Jorge, AM; Amorim, E; Sousa, HO; Leal, A; Silvano, PM; Cantante, I; Campos, R;

Publication
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Narratives have been the subject of extensive research across various scientific fields such as linguistics and computer science. However, the scarcity of freely available datasets, essential for studying this genre, remains a significant obstacle. Furthermore, datasets annotated with narratives components and their morphosyntactic and semantic information are even scarcer. To address this gap, we developed the Text2Story Lusa datasets, which consist of a collection of news articles in European Portuguese. The first datasets consists of 357 news articles and the second dataset comprises a subset of 117 manually densely annotated articles, totaling over 50 thousand individual annotations. By focusing on texts with substantial narrative elements, we aim to provide a valuable resource for studying narrative structures in European Portuguese news articles. On the one hand, the first dataset provides researchers with data to study narratives from various perspectives. On the other hand, the annotated dataset facilitates research in information extraction and related tasks, particularly in the context of narrative extraction pipelines. Both datasets are made available adhering to FAIR principles, thereby enhancing their utility within the research community.

2024

Data Collection Pipeline for Low-Resource Languages: A Case Study on Constructing a Tetun Text Corpus

Authors
de Jesus G.; Nunes S.;

Publication
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

Abstract
This paper proposes Labadain Crawler, a data collection pipeline tailored to automate and optimize the process of constructing textual corpora from the web, with a specific target to low-resource languages. The system is built on top of Nutch, an open-source web crawler and data extraction framework, and incorporates language processing components such as a tokenizer and a language identification model. The pipeline efficacy is demonstrated through successful testing with Tetun, one of Timor-Leste's official languages, resulting in the construction of a high-quality Tetun text corpus comprising 321.7k sentences extracted from over 22k web pages. The contributions of this paper include the development of a Tetun tokenizer, a Tetun language identification model, and a Tetun text corpus, marking an important milestone in Tetun text information retrieval.

2023

Annotation and Visualisation of Reporting Events in Textual Narratives

Authors
Silvano, P; Amorim, E; Leal, A; Cantante, I; Silva, F; Jorge, A; Campos, R; Nunes, S;

Publication
Proceedings of Text2Story - Sixth Workshop on Narrative Extraction From Texts held in conjunction with the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2, 2023.

Abstract
News articles typically include reporting events to inform on what happened. These reporting events are not part of the story being told but are nonetheless a relevant part of the news and can pose a challenge to the computational processing of news narratives. They compose a reporting narrative, which is the present study's focus. This paper aims to demonstrate through selected use cases how a comprehensive annotation scheme with suitable tags and links can properly represent the reporting events and the way they relate to the events that make the story. In addition, we put forward a proposal for their visual representation that enables a systematic and detailed analysis of the importance of reporting events in the news structure. Finally, we describe some lexico-grammatical features of reporting events, which can contribute to their automatic detection. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Supervised
thesis

2023

Access Control in Linked Data Archives

Author
Tiago Gonçalves da Silva

Institution
UP-FEUP

2023

Visualizing News Stories from Annotated Text

Author
Catarina Justo dos Santos Fernandes

Institution
UP-FEUP

2023

Federation Solutions for Linked Data Applications

Author
Tiago Gonçalves Gomes

Institution
UP-FEUP

2023

Information Retrieval over Linked Data Archives

Author
Cláudia Inês da Costa Martins

Institution
UP-FEUP

2023

Connect-the-Dots: Artificial Intelligence and Automation in Investigative Journalism

Author
Joana Rodrigues da Silva

Institution
UP-FEUP