2026
Autores
Guo, JL; Ng, BK; Lam, CT; Abreu, PH;
Publicação
INFORMATION FUSION
Abstract
Solar photovoltaic (PV) power generation has become one of the most widely adopted forms of clean energy worldwide. In large-scale PV farm operation and maintenance, unmanned aerial vehicles equipped with thermal infrared (TIR) cameras are increasingly used to enable automated fault detection and classification. However, the long imaging distance and the inherently low resolution of TIR images often lead to fault patterns appearing with low contrast, making subtle discriminative features difficult to extract and posing significant challenges to achieving highly accurate fault identification and classification. To address these challenges, we propose GEPFNet, a network that exploits Group Equivariant Convolutions to explicitly model the geometric structures of faults, incorporates multi-scale processing with unified local-global contextual representations, and adopts a parallel feature fusion strategy to integrate multi-level features and enhance contextual utilization effectively. The design of feature extraction and fusion mechanisms ensures the proposed GEPFNet achieves strong robustness and generalization under complex operational conditions. The effectiveness of GEPFNet was validated on two public datasets with distinct resolutions, class distributions, and feature characteristics: PVF-10 and the Infrared Solar Module (ISM) dataset. Extensive experiments and statistical analyses demonstrate that the proposed GEPFNet achieves state-of-the-art performance on the PVF-10 dataset, obtaining an accuracy of 96.05 %+/- 0.42 for the 2-Class task and 94.64 %+/- 0.35 for the 10-Class task. On the ISM dataset, GEPFNet achieves an improvement of approximately 5 % over the baseline models. Moreover, under highly imbalanced data distributions, the proposed GEPFNet achieves average accuracy improvements of 5.83% and 3.82% on PVF-10 and ISM, respectively, further demonstrating its capability to enhance class-wise performance. With only 9.51 GFLOPs, GEPFNet also exhibits notable computational efficiency, making it well suited for PV fault classification in TIR imagery.
2026
Autores
Santini, L; Coelho, LCC; Floridia, C;
Publicação
IEEE Sensors Journal
Abstract
2026
Autores
Moás, PM; Lopes, CT;
Publicação
LINKING THEORY AND PRACTICE OF DIGITAL LIBRARIES, TPDL 2025
Abstract
Wikipedia is the largest and most globally well-known online encyclopedia, but its collaborative nature leads to a significant disparity in article quality. In this work, we explore real-time and automatic quality assessment within Wikipedia through machine-learning. We first constructed a dataset of 36,000 English articles and 145 features, then compared the performance of multiple classification and regression algorithms and studied how the number of classes and features affects the model's performance. The six-class experiments achieved a classifier accuracy of 64% and a mean absolute error of 0.09 in regression methods, which matches or beats most state-of-the-art approaches. Our model produces similar results on some non-English Wikipedias, but the error is slightly higher on other versions. We have also determined that the features measuring the article's content and revision history bring the largest performance boost.
2026
Autores
Duarte, P; Coelho, A; Ribeiro, FM; Teixeira, FB; Pessoa, LM; Ricardo, M;
Publicação
CoRR
Abstract
2026
Autores
Andrade, JG; Sampaio, AdO; Garcia, JE; Fonseca, MJ;
Publicação
Dispositiva
Abstract
2026
Autores
Victoriano, M; Pavlovic, M; Sandve, GK; Oliveira, HP; Rocha, A; Greiff, V;
Publicação
NATURE MACHINE INTELLIGENCE
Abstract
Synthetic datasets are essential for the development and benchmarking of machine learning methods in biomedicine, as they help overcome the pervasive data scarcity in biomedical research. In fields such as immunomics, genomics and proteomics, they enable the development of prediction algorithms, including methods for immune receptor-antigen binding prediction. When generated with transparent and fully specified parameters, synthetic datasets serve as rule-based systems for reproducible and interpretable model testing, an essential step towards digital twins that emulate biological systems for diagnosis and therapy design. A key obstacle, however, is the 'simulation to reality' (sim2real) gap, which describes the uncertainty about whether performance on synthetic data is predictive of performance on experimental data. Divergent statistical and biological properties may erode generalizability and clinical relevance. The lack of standardized sim2real benchmarks impedes validation and widespread adoption. We argue that multilayered validation frameworks, incorporating techniques such as domain adaptation and hybrid validation, and grounded in biological realism, are essential to ensuring that synthetic datasets faithfully capture biological complexity. Closing the sim2real gap will unlock the full translational potential of synthetic data, accelerating diagnostic and therapeutic discovery, guiding clinical decision-making, and advancing the development of predictive digital twins.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.