Publicacoes - INESC TEC

Publicações

Publicações por Tânia Pereira

2022

Differential Gene Expression Analysis of the Most Relevant Genes for Lung Cancer Prediction and Sub-type Classification

Autores
Ramos, B; Pereira, T; Silva, F; Costa, JL; Oliveira, HP;

Publicação
PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2022)

Abstract
An early diagnosis of cancer is essential for a good prognosis, and the identification of differentially expressed genes can enable a better personalization of the treatment plan that can target those genes in therapy. This work proposes a pipeline that predicts the presence of lung cancer and the subtype allowing the identification of differentially expressed genes for lung cancer adenocarcinoma and squamous cell carcinoma subtypes. A gradient boosted tree model is used for the classification tasks based on RNA-seq data. The analysis of gene expressions that better differentiate cancerous from normal tissue, and features that distinguish between lung subtypes is the main focus of the present work. Differential expressed genes are analyzed by performing hierarchical clustering in order to identify gene signatures that are commonly regulated and biological signatures associated with a specific subtype. This analysis highlighted patterns of commonly regulated genes already known in the literature as cancer or subtype-specific genes, and others that are not yet documented in the literature.

FecharLer Abstract

2022

The Influence of a Coherent Annotation and Synthetic Addition of Lung Nodules for Lung Segmentation in CT Scans

Autores
Sousa, J; Pereira, T; Neves, I; Silva, F; Oliveira, HP;

Publicação
SENSORS

Abstract
Lung cancer is a highly prevalent pathology and a leading cause of cancer-related deaths. Most patients are diagnosed when the disease has manifested itself, which usually is a sign of lung cancer in an advanced stage and, as a consequence, the 5-year survival rates are low. To increase the chances of survival, improving the cancer early detection capacity is crucial, for which computed tomography (CT) scans represent a key role. The manual evaluation of the CTs is a time-consuming task and computer-aided diagnosis (CAD) systems can help relieve that burden. The segmentation of the lung is one of the first steps in these systems, yet it is very challenging given the heterogeneity of lung diseases usually present and associated with cancer development. In our previous work, a segmentation model based on a ResNet34 and U-Net combination was developed on a cross-cohort dataset that yielded good segmentation masks for multiple pathological conditions but misclassified some of the lung nodules. The multiple datasets used for the model development were originated from different annotation protocols, which generated inconsistencies for the learning process, and the annotations are usually not adequate for lung cancer studies since they did not comprise lung nodules. In addition, the initial datasets used for training presented a reduced number of nodules, which was showed not to be enough to allow the segmentation model to learn to include them as a lung part. In this work, an objective protocol for the lung mask's segmentation was defined and the previous annotations were carefully reviewed and corrected to create consistent and adequate ground-truth masks for the development of the segmentation model. Data augmentation with domain knowledge was used to create lung nodules in the cases used to train the model. The model developed achieved a Dice similarity coefficient (DSC) above 0.9350 for all test datasets and it showed an ability to cope, not only with a variety of lung patterns, but also with the presence of lung nodules as well. This study shows the importance of using consistent annotations for the supervised learning process, which is a very time-consuming task, but that has great importance to healthcare applications. Due to the lack of massive datasets in the medical field, which consequently brings a lack of wide representativity, data augmentation with domain knowledge could represent a promising help to overcome this limitation for learning models development.

FecharLer Abstract

2022

Learning Models for Traumatic Brain Injury Mortality Prediction on Pediatric Electronic Health Records

Autores
Fonseca, J; Liu, XY; Oliveira, HP; Pereira, T;

Publicação
FRONTIERS IN NEUROLOGY

Abstract
BackgroundTraumatic Brain Injury (TBI) is one of the leading causes of injury related mortality in the world, with severe cases reaching mortality rates of 30-40%. It is highly heterogeneous both in causes and consequences, complicating medical interpretation and prognosis. Gathering clinical, demographic, and laboratory data to perform a prognosis requires time and skill in several clinical specialties. Machine learning (ML) methods can take advantage of the data and guide physicians toward a better prognosis and, consequently, better healthcare. The objective of this study was to develop and test a wide range of machine learning models and evaluate their capability of predicting mortality of TBI, at hospital discharge, while assessing the similarity between the predictive value of the data and clinical significance. MethodsThe used dataset is the Hackathon Pediatric Traumatic Brain Injury (HPTBI) dataset, composed of electronic health records containing clinical annotations and demographic data of 300 patients. Four different classification models were tested, either with or without feature selection. For each combination of the classification model and feature selection method, the area under the receiver operator curve (ROC-AUC), balanced accuracy, precision, and recall were calculated. ResultsMethods based on decision trees perform better when using all features (Random Forest, AUC = 0.86 and XGBoost, AUC = 0.91) but other models require prior feature selection to obtain the best results (k-Nearest Neighbors, AUC = 0.90 and Artificial Neural Networks, AUC = 0.84). Additionally, Random Forest and XGBoost allow assessing the feature's importance, which could give insights for future strategies on the clinical routine. ConclusionPredictive capability depends greatly on the combination of model and feature selection methods used but, overall, ML models showed a very good performance in mortality prediction for TBI. The feature importance results indicate that predictive value is not directly related to clinical significance.

FecharLer Abstract

2022

On the way for the best imaging features from CT images to predict EGFR Mutation Status in Lung Cancer

Autores
Silva, P; Pereira, T; Teixeira, M; Silva, F; Oliveira, HP;

Publicação
44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2022, Glasgow, Scotland, United Kingdom, July 11-15, 2022

Abstract
Artificial Intelligence-based tools have shown promising results to help clinicians in diagnosis tasks. Radio-genomics would aid in the genotype characterization using information from radiologic images. The prediction of the mutations status of main oncogenes associated with lung cancer will help the clinicians to have a more accurate diagnosis and a personalized treatment plan, decreasing the need to use the biopsy. In this work, novel and objective features were extracted from the lung that contained the nodule, and several machine learning methods were combined with feature selection techniques to select the best approach to predict the EGFR mutation status in lung cancer CT images. An AUC of 0.756 ± 0.055 was obtained using a logistic regression and independent component analysis as feature selector, supporting the hypothesis that CT images can capture pathophysiological information with great value for clinical assessment and personalized medicine of lung cancer. Clinical Relevance-Radiogenomic approaches could be an interesting help for lung cancer characterization. This work represents a preliminary study for the development of computer-aided decision systems to provide a more accurate and fast characterization of lung cancer which is fundamental for an adequate treatment plan for lung cancer patients.

FecharLer Abstract

2022

A Random Forest-based Classifier for MYCN Status Prediction in Neuroblastoma using CT Images

Autores
Pereira, T; Silva, F; Claro, P; Carvalho, DC; Dias, SC; Torrão, H; Oliveira, HP;

Publicação
44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, EMBC 2022, Glasgow, Scotland, United Kingdom, July 11-15, 2022

Abstract
Neuroblastoma (NB) is the most common extracranial solid tumor in childhood. Genomic amplification of MYCN is associated with poor outcomes and is detected in 16% of all NB cases. CT scans and MRI are the imaging techniques recommended for diagnosis and disease staging. The assessment of imaging features such as tumor volume, shape, and local extension represent relevant prognostic information. Radiogenomics have shown powerful results in the assessment of the genotype based on imaging findings automatically extracted from medical images. In this work, random forest was used to classify the MYCN amplification using radiomic features extracted from CT slices in a population of 46 NB patients. The learning model showed an area under the curve (AUC) of 0.85 ± 0.13, suggesting that radiomic-based methodologies might be helpful in the extraction of information that is not accessible by human naked eyes but could aid the clinicians on the diagnosis and treatment plan definition. Clinical relevance - This approach represents a random forest-based model to predict the MYCN amplification in NB patients that could give a faster, earlier, and repeatable analysis of the tumor along the time.

FecharLer Abstract

2022

Robustness Analysis of Deep Learning-Based Lung Cancer Classification Using Explainable Methods

Autores
Malafaia, M; Silva, F; Neves, I; Pereira, T; Oliveira, HP;

Publicação
IEEE ACCESS

Abstract
Deep Learning (DL) based classification algorithms have been shown to achieve top results in clinical diagnosis, namely with lung cancer datasets. However, the complexity and opaqueness of the models together with the still scant training datasets call for the development of explainable modeling methods enabling the interpretation of the results. To this end, in this paper we propose a novel interpretability approach and demonstrate how it can be used on a malignancy lung cancer DL classifier to assess its stability and congruence even when fed a low amount of image samples. Additionally, by disclosing the regions of the medical images most relevant to the resulting classification the approach provides important insights to the correspondent clinical meaning apprehended by the algorithm. Explanations of the results provided by ten different models against the same test sample are compared. These attest the stability of the approach and the algorithm focus on the same image regions.

FecharLer Abstract