Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pedro Henriques Abreu

2016

Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review

Autores
Abreu, PH; Santos, MS; Abreu, MH; Andrade, B; Silva, DC;

Publicação
ACM COMPUTING SURVEYS

Abstract
Background: Recurrence is an important cornerstone in breast cancer behavior, intrinsically related to mortality. In spite of its relevance, it is rarely recorded in the majority of breast cancer datasets, which makes research in its prediction more difficult. Objectives: To evaluate the performance of machine learning techniques applied to the prediction of breast cancer recurrence. Material and Methods: Revision of published works that used machine learning techniques in local and open source databases between 1997 and 2014. Results: The revision showed that it is difficult to obtain a representative dataset for breast cancer recurrence and there is no consensus on the best set of predictors for this disease. High accuracy results are often achieved, yet compromising sensitivity. The missing data and class imbalance problems are rarely addressed and most often the chosen performance metrics are inappropriate for the context. Discussion and Conclusions: Although different techniques have been used, prediction of breast cancer recurrence is still an open problem. The combination of different machine learning techniques, along with the definition of standard predictors for breast cancer recurrence seem to be the main future directions to obtain better results.

2015

Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values

Autores
Garcia Laencina, PJ; Abreu, PH; Abreu, MH; Afonoso, N;

Publicação
COMPUTERS IN BIOLOGY AND MEDICINE

Abstract
Breast cancer is the most frequently diagnosed cancer in women. Using historical patient information stored in clinical datasets, data mining and machine learning approaches can be applied to predict the survival of breast cancer patients. A common drawback is the absence of information, i.e., missing data, in certain clinical trials. However, most standard prediction methods are not able to handle incomplete samples and, then, missing data imputation is a widely applied approach for solving this inconvenience. Therefore, and taking into account the characteristics of each breast cancer dataset, it is required to perform a detailed analysis to determine the most appropriate imputation and prediction methods in each clinical environment This research work analyzes a real breast cancer dataset from Institute Portuguese of Oncology of Porto with a high percentage of unknown categorical information (most clinical data of the patients are incomplete), which is a challenge in terms of complexity. Four scenarios are evaluated: (I) 5-year survival prediction without imputation and 5-year survival prediction from cleaned dataset with (II) Mode imputation, (Ill) Expectation-Maximization imputation and (IV) K-Nearest Neighbors imputation. Prediction models for breast cancer survivability are constructed using four different methods: K-Nearest Neighbors, Classification Trees, Logistic Regression and Support Vector Machines. Experiments are performed in a nested ten-fold cross-validation procedure and, according to the obtained results, the best results are provided by the K-Nearest Neighbors algorithm: more than 81% of accuracy and more than 0.78 of area under the Receiver Operator Characteristic curve, which constitutes very good results in this complex scenario.

2019

A data visualization approach for intersection analysis using AIS data

Autores
Pereira, RC; Abreu, PH; Polisciuc, E; Machado, P;

Publicação
VISIGRAPP 2019 - Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications

Abstract
Automatic Identification System data has been used in several studies with different directions like traffic forecasting, pollution control or anomalous behavior detection in vessels trajectories. Considering this last subject, the intersection between vessels is often related with abnormal behaviors, but this topic has not been exploited yet. In this paper an approach to assist the domain experts in the task of analyzing these intersections is introduced, based on data processing and visualization. The work was experimented with a proprietary dataset that covers the Portuguese maritime zone, containing an average of 6460 intersections by day. The results show that several intersections were only noticeable with the visualization strategies here proposed. Copyright

2016

Types of assessing student-programming knowledge

Autores
Gomes A.; Correia F.; Abreu P.;

Publicação
Proceedings - Frontiers in Education Conference, FIE

Abstract
High failure and dropout rates are common in higher education institutions with introductory programming courses. Some researchers advocate that sometimes teachers don't use correct methods of assessment and that many students pass in programming without knowing how to program. In this paper authors describe the assessment methodology applied to a first year, first semester, Biomedical Engineering programming course (2015/2016). Students' programming skills were tested by playing a game in the first class, then they were assessed with three tests and a final exam, each with topics the authors considered fundamental for the students to master. A correlation analyses between the different types of tests and exam questions is done, to evaluate the most suitable, for assessing programming knowledge, showing that it is possible to use different question types as a pedagogical strategy, to assess student difficulty levels and programming skills, that help students acquire abstract, reasoning and algorithm thinking in an acceptable level. Also, it is shown that different forms of questions are equivalent to assess equal knowledge and that it is possible to predict the ability of a student to program at an early stage.

2020

Interpretability vs. Complexity: The Friction in Deep Neural Networks

Autores
Amorim, JP; Abreu, PH; Reyes, M; Santos, J;

Publicação
Proceedings of the International Joint Conference on Neural Networks

Abstract
Saliency maps have been used as one possibility to interpret deep neural networks. This method estimates the relevance of each pixel in the image classification, with higher values representing pixels which contribute positively to classification.The goal of this study is to understand how the complexity of the network affects the interpretabilty of the saliency maps in classification tasks. To achieve that, we investigate how changes in the regularization affects the saliency maps produced, and their fidelity to the overall classification process of the network.The experimental setup consists in the calculation of the fidelity of five saliency map methods that were compare, applying them to models trained on the CIFAR-10 dataset, using different levels of weight decay on some or all the layers.Achieved results show that models with lower regularization are statistically (significance of 5%) more interpretable than the other models. Also, regularization applied only to the higher convolutional layers or fully-connected layers produce saliency maps with more fidelity. © 2020 IEEE.

2014

Personalizing breast cancer patients with heterogeneous data

Autores
Abreu, PH; Amaro, H; Silva, DC; Machado, P; Abreu, MH;

Publicação
IFMBE Proceedings

Abstract
The prediction of overall survival in patients has an important role, especially in diseases with a high mortality rate. Encompassed in this reality, patients with oncological diseases, particularly the more frequent ones like woman breast cancer, can take advantage of a very good customization, which in some cases may even lead to a disease-free life. In order to achieve this customization, in this work a comparison between three algorithms (evolutionary, hierarchical and k-medoids) is proposed. After constructing a database with more than 800 breast cancer patients from a single oncology center with 15 clinical variables (heterogeneous data) and having 25% of the data missing, which illustrates a real clinical scenario, the algorithms were used to group similar patients into clusters. Using Tukey's HSD (Honestly Significant Difference) test, from both comparison between k-medoids and the other two approaches (evolutionary and hierarchical clustering) a statistical difference were detected (p- value < 0.0000001) as well as for the other comparison (evolutionary versus hierarchical clustering) - p-value = 0.0061354 - for a significance level of 95%. The future work will consist primarily in dealing with the missing data, in order to achieve better results in future prediction. © 2014, Springer International Publishing Switzerland.

  • 9
  • 14