2025
Authors
Campos, R; Jorge, AM; Jatowt, A; Bhatia, S; Litvak, M;
Publication
Text2Story@ECIR
Abstract
2025
Authors
Ermakova, L; Bosser, AG; Miller, T; Campos, R;
Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
Over the last three years, the JOKER Lab series at CLEF has gathered an active community of researchers in natural language processing and information retrieval to collaborate on non-literal use of language in text. Such language can be a challenge for AI systems, but also sometimes for humans, as it requires understanding implicit cultural references and unorthodox interactions between form and meaning. In this paper, we discuss the lessons learned from the previous iterations of the Lab and describe how its upcoming edition will build upon those to address new challenges. In 2025, JOKER will provide novel tasks and update some previous ones with new data and new languages. This year we provide sandbox environments for experimenting with humour-aware information retrieval (Task 1), a previously featured task now enhanced with an all-new Portuguese corpus; wordplay translation in text (Task 2), another historical task for which we provide new corpora; onomastic wordplay (Task 3), a new task focussed on humorous proper names in fiction; and controlled creativity (Task 4), another novel task that aims at identifying and avoiding hallucinations. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Authors
Silva, R; Campos, R;
Publication
Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Lucca, Italy, April 6-10, 2025, Proceedings, Part V
Abstract
Around 80% of websites change significantly or disappear altogether after the first year, resulting in the loss of invaluable information. In this volatile scenario, preserving online content is increasingly essential. This is especially critical for local news outlets, which produce a wealth of information within the unique context of their communities but often lack sufficient archiving resources. In this paper, we take a significant step forward by leveraging the information preserved by the Portuguese Web Archive, Arquivo.pt, to recreate the website of a local news outlet. This online demo grants users direct access to previously lost news articles, images, and front covers, thus contributing to preserving local digital heritage. An IR system was also implemented to ensure easy access, along with a recommendation system based on BERT embeddings to suggest related news articles and enhance user engagement. As a final contribution, we also provide a Python package, enabling others to replicate the process of collecting, processing, retrieving, and recreating websites for local news outlets in Portugal. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
2025
Authors
Nogueira, DM; Gomes, EF;
Publication
Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2025 - Volume 1, Porto, Portugal, February 20-22, 2025.
Abstract
2025
Authors
Alvarez, ML; Bahillo, A; Arjona, L; Nogueira, DM; Gomes, EF; Jorge, AM;
Publication
IEEE ACCESS
Abstract
Sound-based uroflowmetry (SU) is a non-invasive technique emerging as an alternative to traditional uroflowmetry (UF) to calculate the voiding flow rate based on the sound generated by the urine impacting the water in a toilet, enabling remote monitoring and reducing the patient burden and clinical costs. This study trains four different machine learning (ML) models (random forest, gradient boosting, support vector machine and convolutional neural network) using both regression and classification approaches to predict and categorize the voiding flow rate from sound events. The models were trained with a dataset that contains sounds from synthetic void events generated with a high precision peristaltic pump and a traditional toilet. Sound was simultaneously recorded with three devices: Ultramic384k, Mi A1 smartphone and Oppo Smartwatch. To extract the audio features, our analysis showed that segmenting the audio signals into 1000 ms segments with frequencies up to 16 kHz provided the best results. Results show that random forest achieved the best performance in both regression and classification tasks, with a mean absolute error (MAE) of 0.9, 0.7 and 0.9 ml/s and quadratic weighted kappa (QWK) of 0.99, 1.0 and 1.0 for the three devices. To evaluate the models in a real environment and assess the effectiveness of training with synthetic data, the best-performing models were retrained and validated using a real voiding sounds dataset. The results reported an MAE below 2.5 ml/s and a QWK above 0.86 for regression and classification tasks, respectively.
2025
Authors
Nogueira, DM; Simões, M; Ferreira, C; Ribeiro, RP; Martínez-Rego, D; Cai, A; Gama, J;
Publication
Abstract
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.