Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Pedro Henriques Abreu

2015

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

Authors
Garcia Laencina, PJ; Abreu, PH; Abreu, MH; Afonoso, N;

Publication
COMPUTERS IN BIOLOGY AND MEDICINE

Abstract
Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (Ill) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation. Prediction models for breast cancer survivability are constructed using four different methods: K-Nearest Neighbors, Classification Trees, Logistic Regression and Support Vector Machines. Experiments are performed in a nested ten-fold cross-validation procedure and, according to the obtained results, the best results are provided by the K-Nearest Neighbors algorithm: more than 81% of accuracy and more than 0.78 of area under the Receiver Operator Characteristic curve, which constitutes very good results in this complex scenario.

2019

A Data Visualization Approach for Intersection Analysis using AIS Data

Authors
Pereira, R; Abreu, P; Polisciuc, E; Machado, P;

Publication
PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP

Abstract
Automatic Identification System data has been used in several studies with different directions like traffic forecasting, pollution control or anomalous behavior detection in vessels trajectories. Considering this last subject, the intersection between vessels is often related with abnormal behaviors, but this topic has not been exploited yet. In this paper an approach to assist the domain experts in the task of analyzing these intersections is introduced, based on data processing and visualization. The work was experimented with a proprietary dataset that covers the Portuguese maritime zone, containing an average of 6460 intersections by day. The results show that several intersections were only noticeable with the visualization strategies here proposed. Copyright

2016

Types of assessing student-programming knowledge

Authors
Gomes, A; Correia, FB; Abreu, PH;

Publication
2016 IEEE Frontiers in Education Conference, FIE 2015, Eire, PA, USA, October 12-15, 2016

Abstract
High failure and dropout rates are common in higher education institutions with introductory programming courses. Some researchers advocate that sometimes teachers don't use correct methods of assessment and that many students pass in programming without knowing how to program. In this paper authors describe the assessment methodology applied to a first year, first semester, Biomedical Engineering programming course (2015/2016). Students' programming skills were tested by playing a game in the first class, then they were assessed with three tests and a final exam, each with topics the authors considered fundamental for the students to master. A correlation analyses between the different types of tests and exam questions is done, to evaluate the most suitable, for assessing programming knowledge, showing that it is possible to use different question types as a pedagogical strategy, to assess student difficulty levels and programming skills, that help students acquire abstract, reasoning and algorithm thinking in an acceptable level. Also, it is shown that different forms of questions are equivalent to assess equal knowledge and that it is possible to predict the ability of a student to program at an early stage.

2020

Interpretability vs. Complexity: The Friction in Deep Neural Networks

Authors
Amorim, JP; Abreu, PH; Reyes, M; Santos, J;

Publication
Proceedings of the International Joint Conference on Neural Networks

Abstract
Saliency maps have been used as one possibility to interpret deep neural networks. This method estimates the relevance of each pixel in the image classification, with higher values representing pixels which contribute positively to classification.The goal of this study is to understand how the complexity of the network affects the interpretabilty of the saliency maps in classification tasks. To achieve that, we investigate how changes in the regularization affects the saliency maps produced, and their fidelity to the overall classification process of the network.The experimental setup consists in the calculation of the fidelity of five saliency map methods that were compare, applying them to models trained on the CIFAR-10 dataset, using different levels of weight decay on some or all the layers.Achieved results show that models with lower regularization are statistically (significance of 5%) more interpretable than the other models. Also, regularization applied only to the higher convolutional layers or fully-connected layers produce saliency maps with more fidelity. © 2020 IEEE.

2014

Personalizing breast cancer patients with heterogeneous data

Authors
Abreu, PH; Amaro, H; Silva, DC; Machado, P; Abreu, MH;

Publication
IFMBE Proceedings

Abstract
The prediction of overall survival in patients has an important role, especially in diseases with a high mortality rate. Encompassed in this reality, patients with oncological diseases, particularly the more frequent ones like woman breast cancer, can take advantage of a very good customization, which in some cases may even lead to a disease-free life. In order to achieve this customization, in this work a comparison between three algorithms (evolutionary, hierarchical and k-medoids) is proposed. After constructing a database with more than 800 breast cancer patients from a single oncology center with 15 clinical variables (heterogeneous data) and having 25% of the data missing, which illustrates a real clinical scenario, the algorithms were used to group similar patients into clusters. Using Tukey's HSD (Honestly Significant Difference) test, from both comparison between k-medoids and the other two approaches (evolutionary and hierarchical clustering) a statistical difference were detected (p- value < 0.0000001) as well as for the other comparison (evolutionary versus hierarchical clustering) - p-value = 0.0061354 - for a significance level of 95%. The future work will consist primarily in dealing with the missing data, in order to achieve better results in future prediction. © 2014, Springer International Publishing Switzerland.

2018

Improving the Classifier Performance in Motor Imagery Task Classification: What are the steps in the classification process that we should worry about?

Authors
Santos, MS; Abreu, PH; Rodriguez Bermudez, G; Garcia Laencina, PJ;

Publication
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS

Abstract
Brain-Computer Interface systems based on motor imagery are able to identify an individual's intent to initiate control through the classification of encephalography patterns. Correctly classifying such patterns is instrumental and strongly depends in a robust machine learning block that is able to properly process the features extracted from a subject's encephalograms. The main objective of this work is to provide an overall view on machine learning stages, aiming to answer the following question: "What are the steps in the classification process that we should worry about?". The obtained results suggest that future research in the field should focus on two main aspects: exploring techniques for dimensionality reduction, in particular, supervised linear approaches, and evaluating adequate validation schemes to allow a more precise interpretation of results.

  • 10
  • 20