Publicacoes - INESC TEC

Publicações

Publicações por Pedro Henriques Abreu

2019

Denial of Service Attacks: Detecting the Frailties of Machine Learning Algorithms in the Classification Process

Autores
Frazao, I; Abreu, PH; Cruz, T; Araújo, H; Simoes, P;

Publicação
CRITICAL INFORMATION INFRASTRUCTURES SECURITY (CRITIS 2018)

Abstract
Denial of Service attacks, which have become commonplace on the Information and Communications Technologies domain, constitute a class of threats whose main objective is to degrade or disable a service or functionality on a target. The increasing reliance of Cyber-Physical Systems upon these technologies, together with their progressive interconnection with other infrastructure and/or organizational domains, has contributed to increase their exposure to these attacks, with potentially catastrophic consequences. Despite the potential impact of such attacks, the lack of generality regarding the related works in the attack prevention and detection fields has prevented its application in real-world scenarios. This paper aims at reducing that effect by analyzing the behavior of classification algorithms with different dataset characteristics. © 2019, Springer Nature Switzerland AG.

FecharLer Abstract

2014

An interface for fitness function design

Autores
Machado, P; Martins, T; Amaro, H; Abreu, PH;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Fitness assignment is one of the biggest challenges in evolutionary art. Interactive evolutionary computation approaches put a significant burden on the user, leading to human fatigue. On the other hand, autonomous evolutionary art systems usually fail to give the users the opportunity to express and convey their artistic goals and preferences. Our approach empowers the users by allowing them to express their intentions through the design of fitness functions. We present a novel responsive interface for designing fitness function in the scope of evolutionary ant paintings. Once the evolutionary runs are concluded, further control is given to the users by allowing them to specify the rendering details of selected pieces. The analysis of the experimental results highlights how fitness function design influences the outcomes of the evolutionary runs, conveying the intentions of the user and enabling the evolution of a wide variety of images. © 2014 Springer-Verlag.

FecharLer Abstract

2014

Overall survival prediction for women breast cancer using ensemble methods and incomplete clinical data

Autores
Abreu, PH; Amaro, H; Silva, DC; Machado, P; Abreu, MH; Afonso, N; Dourado, A;

Publicação
IFMBE Proceedings

Abstract
Breast Cancer is the most common type of cancer in women worldwide. In spite of this fact, there are insufficient studies that, using data mining techniques, are capable of helping medical doctors in their daily practice. This paper presents a comparative study of three ensemble methods (TreeBagger, LPBoost and Subspace) using a clinical dataset with 25% missing values to predict the overall survival of women with breast cancer. To complete the absent values, the k-nearest neighbor (k-NN) algorithm was used with four distinct neighbor values, trying to determine the best one for this particular scenario. Tests were performed for each of the three ensemble methods and each k-NN configuration, and their performance compared using a Friedman test. Despite the complexity of this challenge, the produced results are promising and the best algorithmconfiguration (TreeBagger using 3 neighbors) presents a prediction accuracy of 73%. © Springer International Publishing Switzerland 2014.

FecharLer Abstract

2015

A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients

Autores
Santos, MS; Abreu, PH; Garcia Laencina, PJ; Simao, A; Carvalho, A;

Publicação
JOURNAL OF BIOMEDICAL INFORMATICS

Abstract
Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the Friedman rank test. Our proposed methodology coupled with neural networks outperformed all others, suggesting an improvement over the classical approaches currently used in Hepatocellular Carcinoma prediction models.

FecharLer Abstract

2024

Siamese Autoencoder Architecture for the Imputation of Data Missing Not at Random

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP;

Publicação
JOURNAL OF COMPUTATIONAL SCIENCE

Abstract
Missing data is an issue that can negatively impact any task performed with the available data and it is often found in real -world domains such as healthcare. One of the most common strategies to address this issue is to perform imputation, where the missing values are replaced by estimates. Several approaches based on statistics and machine learning techniques have been proposed for this purpose, including deep learning architectures such as generative adversarial networks and autoencoders. In this work, we propose a novel siamese neural network suitable for missing data imputation, which we call Siamese Autoencoder-based Approach for Imputation (SAEI). Besides having a deep autoencoder architecture, SAEI also has a custom loss function and triplet mining strategy that are tailored for the missing data issue. The proposed SAEI approach is compared to seven state-of-the-art imputation methods in an experimental setup that comprises 14 heterogeneous datasets of the healthcare domain injected with Missing Not At Random values at a rate between 10% and 60%. The results show that SAEI significantly outperforms all the remaining imputation methods for all experimented settings, achieving an average improvement of 35%. This work is an extension of the article Siamese Autoencoder-Based Approach for Missing Data Imputation [1] presented at the International Conference on Computational Science 2023. It includes new experiments focused on runtime, generalization capabilities, and the impact of the imputation in classification tasks, where the results show that SAEI is the imputation method that induces the best classification results, improving the F1 scores for 50% of the used datasets.

FecharLer Abstract

2024

Imputation of data Missing Not at Random: Artificial generation and benchmark analysis

Autores
Pereira, RC; Abreu, PH; Rodrigues, PP; Figueiredo, MAT;

Publicação
EXPERT SYSTEMS WITH APPLICATIONS

Abstract
Experimental assessment of different missing data imputation methods often compute error rates between the original values and the estimated ones. This experimental setup relies on complete datasets that are injected with missing values. The injection process is straightforward for the Missing Completely At Random and Missing At Random mechanisms; however, the Missing Not At Random mechanism poses a major challenge, since the available artificial generation strategies are limited. Furthermore, the studies focused on this latter mechanism tend to disregard a comprehensive baseline of state-of-the-art imputation methods. In this work, both challenges are addressed: four new Missing Not At Random generation strategies are introduced and a benchmark study is conducted to compare six imputation methods in an experimental setup that covers 10 datasets and five missingness levels (10% to 80%). The overall findings are that, for most missing rates and datasets, the best imputation method to deal with Missing Not At Random values is the Multiple Imputation by Chained Equations, whereas for higher missingness rates autoencoders show promising results.

FecharLer Abstract