Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Sobre

Sobre

Sérgio Nunes é Professor Associado do Departamento de Engenharia Informática da FEUP, Universidade do Porto e Investigador Sénior do INESC TEC. É Doutorado em Engenharia Informática (2010), na área da Recuperação de Informação, com trabalho focado no uso de caraterísticas temporais para estimar a relevância de informação. É Mestre em Gestão da Informação (2004) com trabalho desenvolvido na área da interoperabilidade entre sistemas de informação académicos.


Tem como principais interesses de investigação a área da recuperação de informação, a interação e visualização de informação, e os sistemas de informação em contexto web. No ensino, o foco são as áreas das bases de dados, das tecnologias da web, e da recuperação de informação, com a coordenação de diversas unidades curriculares em diferentes programas, nomeadamente o Programa Doutoral em Engenharia Informática, a Licenciatura e o Mestrado em Engenharia Informática, e o Mestrado em Multimédia.


Foi Diretor do U.Porto Media Innovation Labs (MIL), o Centro de Competências da Universidade do Porto com o objetivo de desenvolver a capacidade da universidade na área dos Media nas vertentes do ensino, investigação e inovação, promovendo colaborações entre as estruturas existentes e a articulação com parceiros externos.

Tópicos
de interesse
Detalhes

Detalhes

  • Nome

    Sérgio Nunes
  • Cargo

    Responsável de Área
  • Desde

    20 dezembro 2010
005
Publicações

2024

Indexing Portuguese NLP Resources with PT-Pump-Up

Autores
Almeida, R; Campos, R; Jorge, A; Nunes, S;

Publicação
CoRR

Abstract

2024

A Community-Driven Data-to-Text Platform for Football Match Summaries

Autores
Fernandes, P; Nunes, S; Santos, L;

Publicação
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Data-to-text systems offer a transformative approach to generating textual content in data-rich environments. This paper describes the architecture and deployment of Prosebot, a community-driven data-to-text platform tailored for generating textual summaries of football matches derived from match statistics. The system enhances the visibility of lower-tier matches, traditionally accessible only through data tables. Prosebot uses a template-based Natural Language Generation (NLG) module to generate initial drafts, which are subsequently refined by the reading community. Comprehensive evaluations, encompassing both human-mediated and automated assessments, were conducted to assess the system's efficacy. Analysis of the community-edited texts reveals that significant segments of the initial automated drafts are retained, suggesting their high quality and acceptance by the collaborators. Preliminary surveys conducted among platform users highlight a predominantly positive reception within the community.

2024

Text2Story Lusa: A Dataset for Narrative Analysis in European Portuguese News Articles

Autores
Nunes, S; Jorge, AM; Amorim, E; Sousa, HO; Leal, A; Silvano, PM; Cantante, I; Campos, R;

Publicação
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy.

Abstract
Narratives have been the subject of extensive research across various scientific fields such as linguistics and computer science. However, the scarcity of freely available datasets, essential for studying this genre, remains a significant obstacle. Furthermore, datasets annotated with narratives components and their morphosyntactic and semantic information are even scarcer. To address this gap, we developed the Text2Story Lusa datasets, which consist of a collection of news articles in European Portuguese. The first datasets consists of 357 news articles and the second dataset comprises a subset of 117 manually densely annotated articles, totaling over 50 thousand individual annotations. By focusing on texts with substantial narrative elements, we aim to provide a valuable resource for studying narrative structures in European Portuguese news articles. On the one hand, the first dataset provides researchers with data to study narratives from various perspectives. On the other hand, the annotated dataset facilitates research in information extraction and related tasks, particularly in the context of narrative extraction pipelines. Both datasets are made available adhering to FAIR principles, thereby enhancing their utility within the research community.

2024

Data Collection Pipeline for Low-Resource Languages: A Case Study on Constructing a Tetun Text Corpus

Autores
de Jesus G.; Nunes S.;

Publicação
2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings

Abstract
This paper proposes Labadain Crawler, a data collection pipeline tailored to automate and optimize the process of constructing textual corpora from the web, with a specific target to low-resource languages. The system is built on top of Nutch, an open-source web crawler and data extraction framework, and incorporates language processing components such as a tokenizer and a language identification model. The pipeline efficacy is demonstrated through successful testing with Tetun, one of Timor-Leste's official languages, resulting in the construction of a high-quality Tetun text corpus comprising 321.7k sentences extracted from over 22k web pages. The contributions of this paper include the development of a Tetun tokenizer, a Tetun language identification model, and a Tetun text corpus, marking an important milestone in Tetun text information retrieval.

2023

Annotation and Visualisation of Reporting Events in Textual Narratives

Autores
Silvano, P; Amorim, E; Leal, A; Cantante, I; Silva, F; Jorge, A; Campos, R; Nunes, S;

Publicação
Proceedings of Text2Story - Sixth Workshop on Narrative Extraction From Texts held in conjunction with the 45th European Conference on Information Retrieval (ECIR 2023), Dublin, Ireland, April 2, 2023.

Abstract
News articles typically include reporting events to inform on what happened. These reporting events are not part of the story being told but are nonetheless a relevant part of the news and can pose a challenge to the computational processing of news narratives. They compose a reporting narrative, which is the present study's focus. This paper aims to demonstrate through selected use cases how a comprehensive annotation scheme with suitable tags and links can properly represent the reporting events and the way they relate to the events that make the story. In addition, we put forward a proposal for their visual representation that enables a systematic and detailed analysis of the importance of reporting events in the news structure. Finally, we describe some lexico-grammatical features of reporting events, which can contribute to their automatic detection. © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

Teses
supervisionadas

2023

Connect-the-Dots: Artificial Intelligence and Automation in Investigative Journalism

Autor
Joana Rodrigues da Silva

Instituição
UP-FEUP

2023

Evaluation of Text Diversity over time for Automatically Generated Texts in Sports Journalism

Autor
José David Souto Rocha

Instituição
UP-FEUP

2023

Text Information Retrieval in Tetun

Autor
Gabriel de Jesus

Instituição
UP-FEUP

2023

Visual narratives supported by dynamic infographics: a case study in the sports domain

Autor
Pedro Manuel Santos Queirós

Instituição
UP-FEUP

2023

Guidelines to introduce Internet voting in Portuguese elections based on the Estonian case stuty

Autor
Marlon Vinícius Andrade de Luna Freire

Instituição
UP-FEUP