2025
Authors
de Arriba-Pérez, F; García-Méndez, S; Leal, F; Malheiro, B; Burguillo, JC;
Publication
INTEGRATED COMPUTER-AIDED ENGINEERING
Abstract
Social media platforms, increasingly used as news sources for varied data analytics, have transformed how information is generated and disseminated. However, the unverified nature of this content raises concerns about trustworthiness and accuracy, potentially negatively impacting readers' critical judgment due to disinformation. This work aims to contribute to the automatic data quality validation field, addressing the rapid growth of online content on wiki pages. Our scalable solution includes stream-based data processing with feature engineering, feature analysis and selection, stream-based classification, and real-time explanation of prediction outcomes. The explainability dashboard is designed for the general public, who may need more specialized knowledge to interpret the model's prediction. Experimental results on two datasets attain approximately 90% values across all evaluation metrics, demonstrating robust and competitive performance compared to works in the literature. In summary, the system assists editors by reducing their effort and time in detecting disinformation.
2025
Authors
Pataca, B; Barroso, J; Santos, V;
Publication
Communications in Computer and Information Science - Technology and Innovation in Learning, Teaching and Education
Abstract
2025
Authors
Nascimento, R; Gonzalez, DG; Solteiro Pires, EJ; Filipe, V; Silva, MF; Rocha, L;
Publication
IEEE Access
Abstract
2025
Authors
Avraam, D; Wilson, RC; Chan, NA; Banerjee, S; Bishop, TRP; Butters, O; Cadman, T; Cederkvist, L; Duijts, L; Montagut, XE; Garner, H; Gonçalves, G; González, JR; Haakma, S; Hartlev, M; Hasenauer, J; Huth, M; Hyde, E; Jaddoe, VWV; Marcon, Y; Mayrhofer, MT; Molnar-Gabor, F; Morgan, AS; Murtagh, M; Nestor, M; Andersen, AMN; Parker, S; de Moira, AP; Schwarz, F; Strandberg-Larsen, K; Swertz, MA; Welten, M; Wheater, S; Burton, P;
Publication
BIOINFORMATICS ADVANCES
Abstract
Motivation The validity of epidemiologic findings can be increased using triangulation, i.e. comparison of findings across contexts, and by having sufficiently large amounts of relevant data to analyse. However, access to data is often constrained by practical considerations and by ethico-legal and data governance restrictions. Gaining access to such data can be time-consuming due to the governance requirements associated with data access requests to institutions in different jurisdictions.Results DataSHIELD is a software solution that enables remote analysis without the need for data transfer (federated analysis). DataSHIELD is a scientifically mature, open-source data access and analysis platform aligned with the 'Five Safes' framework, the international framework governing safe research access to data. It allows real-time analysis while mitigating disclosure risk through an active multi-layer system of disclosure-preventing mechanisms. This combination of real-time remote statistical analysis, disclosure prevention mechanisms, and federation capabilities makes DataSHIELD a solution for addressing many of the technical and regulatory challenges in performing the large-scale statistical analysis of health and biomedical data. This paper describes the key components that comprise the disclosure protection system of DataSHIELD. These broadly fall into three classes: (i) system protection elements, (ii) analysis protection elements, and (iii) governance protection elements.Availability and implementation Information about the DataSHIELD software is available in https://datashield.org/ and https://github.com/datashield.
2025
Authors
Ferreira, J; Darabi, R; Sousa, A; Brueckner, F; Reis, LP; Reis, A; Tavares, RS; Sousa, J;
Publication
Journal of Intelligent Manufacturing
Abstract
This work introduces Gen-JEMA, a generative approach based on joint embedding with multimodal alignment (JEMA), to enhance feature extraction in the embedding space and improve the explainability of its predictions. Gen-JEMA addresses these challenges by leveraging multimodal data, including multi-view images and metadata such as process parameters, to learn transferable semantic representations. Gen-JEMA enables more explainable and enriched predictions by learning a decoder from the embedding. This novel co-learning framework, tailored for directed energy deposition (DED), integrates multiple data sources to learn a unified data representation and predict melt pool images from the primary sensor. The proposed approach enables real-time process monitoring using only the primary modality, simplifying hardware requirements and reducing computational overhead. The effectiveness of Gen-JEMA for DED process monitoring was evaluated, focusing on its generalization to downstream tasks such as melt pool geometry prediction and the generation of external melt pool representations using off-axis sensor data. To generate these external representations, autoencoder (AE) and variational autoencoder (VAE) architectures were optimized using Bayesian optimization. The AE outperformed other approaches achieving a 38% improvement in melt pool geometry prediction compared to the baseline and 88% in data generation compared with the VAE. The proposed framework establishes the foundation for integrating multisensor data with metadata through a generative approach, enabling various downstream tasks within the DED domain and achieving a small embedding, allowing efficient process control based on model predictions and embeddings. © The Author(s) 2025.
2025
Authors
Rodrigues, JF; Cardoso, HL; Lopes, CT;
Publication
COMPANION PROCEEDINGS OF THE ACM WEB CONFERENCE 2025, WWW COMPANION 2025
Abstract
Text simplification converts complex text into simpler language, improving readability and comprehension. This study evaluates the effectiveness of open-source large language models for text simplification across various categories. We created a dataset of 66,620 lead section pairs from English and Simple English Wikipedia, spanning nine categories, and tested Llama 3 for text simplification. We assessed its output for readability, simplicity, and meaning preservation. Results show improved readability, with simplification varying by category. Texts on Time were the most shortened, while Leisurerelated texts had the greatest reduction of words/characters and syllables per sentence. Meaning preservation was most effective for the Objects and Education categories.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.