Publicacoes - INESC TEC

Publicações

Publicações por CRACS

2025

Distance-based feature selection using Benford's law for malware detection

Autores
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publicação
COMPUTERS & SECURITY

Abstract
Detecting malware in computer networks and data streams from Android devices remains a critical challenge for cybersecurity researchers. While machine learning and deep learning techniques have shown promising results, these approaches often require large volumes of labelled data, offer limited interpretability, and struggle to adapt to sophisticated threats such as zero-day attacks. Moreover, their high computational requirements restrict their applicability in resource-constrained environments. This research proposes an innovative approach that advances the state of the art by offering practical solutions for dynamic and data-limited security scenarios. By integrating natural statistical laws, particularly Benford's law, with dissimilarity functions, a lightweight, fast, and scalable model is developed that eliminates the need for extensive training and large labelled datasets while improving resilience to data imbalance and scalability for large-scale cybersecurity applications. Although Benford's law has demonstrated potential in anomaly detection, its effectiveness is limited by the difficulty of selecting relevant features. To overcome this, the study combines Benford's law with several distance functions, including Median Absolute Deviation, Kullback-Leibler divergence, Euclidean distance, and Pearson correlation, enabling statistically grounded feature selection. Additional metrics, such as the Kolmogorov test, Jensen-Shannon divergence, and Z statistics, were used for model validation. This approach quantifies discrepancies between expected and observed distributions, addressing classic feature selection challenges like redundancy and imbalance. Validated on both balanced and unbalanced datasets, the model achieved strong results: 88.30% accuracy and 85.08% F1-score in the balanced set, 92.75% accuracy and 95.29% F1-score in the unbalanced set. The integration of Benford's law with distance functions significantly reduced false positives and negatives. Compared to traditional Machine Learning methods, which typically require extensive training and large datasets to achieve F1 scores between 92% and 99%, the proposed approach delivers competitive performance while enhancing computational efficiency, robustness, and interpretability. This balance makes it a practical and scalable alternative for real-time or resource-constrained cybersecurity environments.

FecharLer Abstract

2025

Enhancing IoMT Security by Using Benford's Law and Distance Functions

Autores
Fernandes, P; Ciardhuáin, SO; Antunes, M;

Publicação
Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part I

Abstract
The increasing connectivity of Internet of Medical Things (IoMT) devices has accentuated their susceptibility to cyberattacks. The sensitive data they handle makes them prime targets for information theft and extortion, while outdated and insecure communication protocols further elevate security risks. This paper presents a lightweight and innovative approach that combines Benford’s law with statistical distance functions to detect attacks in IoMT devices. The methodology uses Benford’s law to analyze digit frequency and classify IoMT devices traffic as benign or malicious, regardless of attack type. It employs distance-based statistical functions like Jensen-Shannon divergence, Kullback-Leibler divergence, Pearson correlation, and the Kolmogorov test to detect anomalies. Experimental validation was conducted on the CIC-IoMT-2024 benchmark dataset, comprising 45 features and multiple attack types. The best performance was achieved with the Kolmogorov test (a=0.01), particularly in DoS ICMP attacks, yielding a precision of 99.24%, a recall of 98.73%, an F1 score of 98.97%, and an accuracy of 97.81%. Jensen-Shannon divergence also performed robustly in detecting SYN-based attacks, demonstrating strong detection with minimal computational cost. These findings confirm that Benford’s law, when combined with well-chosen statistical distances, offers a viable and efficient alternative to machine learning models for anomaly detection in constrained environments like IoMT. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract

2025

An Optimized Multi-class Classification for Industrial Control Systems

Autores
Palma, A; Antunes, M; Alves, A;

Publicação
Pattern Recognition and Image Analysis - 12th Iberian Conference, IbPRIA 2025, Coimbra, Portugal, June 30 - July 3, 2025, Proceedings, Part I

Abstract
Ensuring the security of Industrial Control Systems (ICS) is increasingly critical due to increasing connectivity and cyber threats. Traditional security measures often fail to detect evolving attacks, necessitating more effective solutions. This paper evaluates machine learning (ML) methods for ICS cybersecurity, using the ICS-Flow dataset and Optuna for hyperparameter tuning. The selected models, namely Random Forest (RF), AdaBoost, XGBoost, Deep Neural Networks, Artificial Neural Networks, ExtraTrees (ET), and Logistic Regression, are assessed using macro-averaged F1-score to handle class imbalance. Experimental results demonstrate that ensemble-based methods (RF, XGBoost, and ET) offer the highest overall detection performance, particularly in identifying commonly occurring attack types. However, minority classes, such as IP-Scan, remain difficult to detect accurately, indicating that hyperparameter tuning alone is insufficient to fully deal with imbalanced ICS data. These findings highlight the importance of complementary measures, such as focused feature selection, to enhance classification capabilities and protect industrial networks against a wider array of threats. © 2025 Elsevier B.V., All rights reserved.

FecharLer Abstract

2025

Anomaly Detection and Root Cause Analysis in Cloud-Native Environments Using Large Language Models and Bayesian Networks

Autores
Pedroso, DF; Almeida, L; Pulcinelli, LEG; Aisawa, WAA; Dutra, I; Bruschi, SM;

Publicação
IEEE ACCESS

Abstract
Cloud computing technologies offer significant advantages in scalability and performance, enabling rapid deployment of applications. The adoption of microservices-oriented architectures has introduced an ecosystem characterized by an increased number of applications, frameworks, abstraction layers, orchestrators, and hypervisors, all operating within distributed systems. This complexity results in the generation of vast quantities of logs from diverse sources, making the analysis of these events an inherently challenging task, particularly in the absence of automation. To address this issue, Machine Learning techniques leveraging Large Language Models (LLMs) offer a promising approach for dynamically identifying patterns within these events. In this study, we propose a novel anomaly detection framework utilizing a microservices architecture deployed on Kubernetes and Istio, enhanced by an LLM model. The model was trained on various error scenarios, with Chaos Mesh employed as an error injection tool to simulate faults of different natures, and Locust used as a load generator to create workload stress conditions. After an anomaly is detected by the LLM model, we employ a dynamic Bayesian network to provide probabilistic inferences about the incident, proving the relationships between components and assessing the degree of impact among them. Additionally, a ChatBot powered by the same LLM model allows users to interact with the AI, ask questions about the detected incident, and gain deeper insights. The experimental results demonstrated the model's effectiveness, reliably identifying all error events across various test scenarios. While it successfully avoided missing any anomalies, it did produce some false positives, which remain within acceptable limits.

FecharLer Abstract

2025

EVSOAR: Security Orchestration, Automation and Response via EV Charging Stations

Autores
Freitas, T; Silva, E; Yasmin, R; Shoker, A; Correia, ME; Martins, R; Esteves Veríssimo, PJ;

Publicação
101st IEEE Vehicular Technology Conference, VTC Spring 2025, Oslo, Norway, June 17-20, 2025

Abstract

2025

A Risk Manager for Intrusion Tolerant Systems: Enhancing HAL 9000 With New Scoring and Data Sources

Autores
Freitas, T; Novo, C; Dutra, I; Soares, J; Correia, ME; Shariati, B; Martins, R;

Publicação
SOFTWARE-PRACTICE & EXPERIENCE

Abstract
Background Intrusion Tolerant Systems (ITS) aim to maintain system security despite adversarial presence by limiting the impact of successful attacks. Current ITS risk managers rely heavily on public databases like NVD and Exploit DB, which suffer from long delays in vulnerability evaluation, reducing system responsiveness.Objective This work extends the HAL 9000 Risk Manager to integrate additional real-time threat intelligence sources and employ machine learning techniques to automatically predict and reassess vulnerability risk scores, addressing limitations of existing solutions.Methods A custom-built scraper collects diverse cybersecurity data from multiple Open Source Intelligence (OSINT) platforms, such as NVD, CVE, AlienVault OTX, and OSV. HAL 9000 uses machine learning models for CVE score prediction, vulnerability clustering through scalable algorithms, and reassessment incorporating exploit likelihood and patch availability to dynamically evaluate system configurations.Results Integration of newly scraped data significantly enhances the risk management capabilities, enabling faster detection and mitigation of emerging vulnerabilities with improved resilience and security. Experiments show HAL 9000 provides lower risk and more resilient configurations compared to prior methods while maintaining scalability and automation.Conclusions The proposed enhancements position HAL 9000 as a next-generation autonomous Risk Manager capable of effectively incorporating diverse intelligence sources and machine learning to improve ITS security posture in dynamic threat environments. Future work includes expanding data sources, addressing misinformation risks, and real-world deployments.

FecharLer Abstract