Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por LIAAD

2015

Predicting malignancy from mammography findings and image-guided core biopsies

Autores
Ferreira, P; Fonseca, NA; Dutra, I; Woods, R; Burnside, E;

Publicação
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS

Abstract
The main goal of this work is to produce machine learning models that predict the outcome of a mammography from a reduced set of annotated mammography findings. In the study we used a dataset consisting of 348 consecutive breast masses that underwent image guided core biopsy performed between October 2005 and December 2007 on 328 female subjects. We applied various algorithms with parameter variation to learn from the data. The tasks were to predict mass density and to predict malignancy. The best classifier that predicts mass density is based on a support vector machine and has accuracy of 81.3%. The expert correctly annotated 70% of the mass densities. The best classifier that predicts malignancy is also based on a support vector machine and has accuracy of 85.6%, with a positive predictive value of 85%. One important contribution of this work is that our model can predict malignancy in the absence of the mass density attribute, since we can fill up this attribute using our mass density predictor.

2015

Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction

Autores
Frankish, A; Uszczynska, B; Ritchie, GRS; Gonzalez, JM; Pervouchine, D; Petryszak, R; Mudge, JM; Fonseca, N; Brazma, A; Guigo, R; Harrow, J;

Publicação
BMC GENOMICS

Abstract
Background: A vast amount of DNA variation is being identified by increasingly large-scale exome and genome sequencing projects. To be useful, variants require accurate functional annotation and a wide range of tools are available to this end. McCarthy et al recently demonstrated the large differences in prediction of loss-of-function (LoF) variation when RefSeq and Ensembl transcripts are used for annotation, highlighting the importance of the reference transcripts on which variant functional annotation is based. Results: We describe a detailed analysis of the similarities and differences between the gene and transcript annotation in the GENCODE and RefSeq genesets. We demonstrate that the GENCODE Comprehensive set is richer in alternative splicing, novel CDSs, novel exons and has higher genomic coverage than RefSeq, while the GENCODE Basic set is very similar to RefSeq. Using RNAseq data we show that exons and introns unique to one geneset are expressed at a similar level to those common to both. We present evidence that the differences in gene annotation lead to large differences in variant annotation where GENCODE and RefSeq are used as reference transcripts, although this is predominantly confined to non-coding transcripts and UTR sequence, with at most similar to 30% of LoF variants annotated discordantly. We also describe an investigation of dominant transcript expression, showing that it both supports the utility of the GENCODE Basic set in providing a smaller set of more highly expressed transcripts and provides a useful, biologically-relevant filter for further reducing the complexity of the transcriptome. Conclusions: The reference transcripts selected for variant functional annotation do have a large effect on the outcome. The GENCODE Comprehensive transcripts contain more exons, have greater genomic coverage and capture many more variants than RefSeq in both genome and exome datasets, while the GENCODE Basic set shows a higher degree of concordance with RefSeq and has fewer unique features. We propose that the GENCODE Comprehensive set has great utility for the discovery of new variants with functional potential, while the GENCODE Basic set is more suitable for applications demanding less complex interpretation of functional variants.

2015

Convergent Evolution at the Gametophytic Self-Incompatibility System in Malus and Prunus

Autores
Aguiar, B; Vieira, J; Cunha, AE; Fonseca, NA; Iezzoni, A; van Nocker, S; Vieira, CP;

Publicação
PLOS ONE

Abstract
S-RNase-based gametophytic self-incompatibility (GSI) has evolved once before the split of the Asteridae and Rosidae. This conclusion is based on the phylogenetic history of the S-RNase that determines pistil specificity. In Rosaceae, molecular characterizations of Prunus species, and species from the tribe Pyreae (i.e., Malus, Pyrus, Sorbus) revealed different numbers of genes determining S-pollen specificity. In Prunus only one pistil and pollen gene determine GSI, while in Pyreae there is one pistil but multiple pollen genes, implying different specificity recognition mechanisms. It is thus conceivable that within Rosaceae the genes involved in GSI in the two lineages are not orthologous but possibly paralogous. To address this hypothesis we characterised the S-RNase lineage and S-pollen lineage genes present in the genomes of five Rosaceae species from three genera: M. x domestica (apple, self-incompatible (SI); tribe Pyreae), P. persica (peach, self-compatible (SC); Amygdaleae), P. mume (mei, SI; Amygdaleae), Fragaria vesca (strawberry, SC; Potentilleae), and F. nipponica (mori-ichigo, SI; Potentilleae). Phylogenetic analyses revealed that the Malus and Prunus S-RNase and S-pollen genes belong to distinct gene lineages, and that only Prunus S-RNase and SFB-lineage genes are present in Fragaria. Thus, S-RNase based GSI system of Malus evolved independently from the ancestral system of Rosaceae. Using expression patterns based on RNA-seq data, the ancestral S-RNase lineage gene is inferred to be expressed in pistils only, while the ancestral S-pollen lineage gene is inferred to be expressed in tissues other than pollen.

2015

Obstructive Sleep Apnea diagnosis: the Bayesian network model revisited

Autores
Rodrigues, PP; Santos, DF; Leite, L;

Publicação
2015 IEEE 28TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Obstructive Sleep Apnea (OSA) is a disease that affects approximately 4% of men and 2% of women worldwide but is still underestimated and underdiagnosed. The standard method for assessing this index, and therefore defining the OSA diagnosis, is polysomnography (PSG). Previous work developed relevant Bayesian network models but those were based only on variables univariatedly associated with the outcome, yielding a bias on the possible knowledge representation of the models. The aim of this work was to develop and validate new Bayesian network decision support models that could be used during sleep consult to assess the need for PSG. Bayesian models were developed using a) expert opinion, b) hill-climbing, c) naive Bayes and d) TAN structures. Resulting models validity was assessed with in-sample AUC and stratified cross-validation, also comparing with previously published model. Overall, models achieved good discriminative power (AUC>70%) and validity (measures consistently above 70%). Main conclusions are a) the need to integrate a wider range of variables in the final models and b) the support of using Bayesian networks in the diagnosis of obstructive sleep apnea.

2015

Predicting Within-24h Visualisation of Hospital Clinical Reports Using Bayesian Networks

Autores
Rodrigues, PP; Lemes, CI; Dias, CC; Cruz Correia, R;

Publicação
PROGRESS IN ARTIFICIAL INTELLIGENCE

Abstract
Clinical record integration and visualisation is one of the most important abilities of modern health information systems (HIS). Its use on clinical encounters plays a relevant role in the efficacy and efficiency of health care. One solution is to consider a virtual patient record (VPR), created by integrating all clinical records, which must collect documents from distributed departmental HIS. However, the amount of data currently being produced, stored and used in these settings is stressing information technology infrastructure: integrated VPR of central hospitals may gather millions of clinical documents, so accessing data becomes an issue. Our vision is that, making clinical reports to be stored either in primary (fast) or secondary (slower) storage devices according to their likelihood of visualisation can help manage the workload of these systems. The aim of this work was to develop a model that predicts the probability of visualisation, within 24h after production, of each clinical report in the VPR, so that reports less likely to be visualised in the following 24 hours can be stored in secondary devices. We studied log data from an existing virtual patient record (n=4975 reports) with information on report creation and report first-time visualisation dates, along with contextual information. Bayesian network classifiers were built and compared with logistic regression, revealing high discriminating power (AUC around 90%) and accuracy in predicting whether a report is going to be accessed in the 24 hours after creation.

2015

Preliminary study for a Bayesian network prognostic model for Crohn's disease

Autores
Dias, CC; Magro, F; Rodrigues, PP;

Publicação
2015 IEEE 28TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS)

Abstract
Crohn's disease is one type of inflammatory bowel disease whose incidence is currently increasing, and may affect any part of both the small and large intestine, possibly irritating deeper layers of the organs. Being a chronic disease, neither treatment nor surgery actually heals the patients. Thus, focus has been given to identifying good prognostic models based on clinical factors since they are more easily included in daily practice. The aim of this work is to provide an initial study on the adequacy of a Bayesian network model to enhance the prognosis prediction for patients with Crohn's disease. Multicentric study data of patients with surgery or immunosuppression in the six month after diagnosis was used to derive a Bayesian network, focusing on the prognosis and the analysis of factors interaction, including clinical features, disease course, treatment, follow-up plan, and adverse events. Two models were evaluated (naive Bayes and Tree-Augmented Naive Bayes) and also compared with logistic regression, using cross-validation and ROC curve analysis. Preliminary results showed competitive accuracy (above 75%) and discriminative power (above 70%). The generated models presented interesting insights on factor interaction and predictive ability for the prognosis, supporting their use in future clinical decision support systems.

  • 242
  • 430