2017
Authors
Cavadas, B; Ferreira, J; Camacho, R; Fonseca, NA; Pereira, L;
Publication
11th International Conference on Practical Applications of Computational Biology & Bioinformatics, PACBB 2017, Porto, Portugal, 21-23 June, 2017
Abstract
The huge amount of genomic and transcriptomic data obtained to characterize human diversity can also be exploited to indirectly gather information on the human microbiome. Here we present the pipeline QmihR designed to identify and quantify the abundance of known microbiome communities and to search for new/rare pathogenic species in RNA-seq datasets. We applied QmihR to 36 RNA-seq tumor tissue samples from Ukrainian gastric carcinoma patients available in TCGA, in order to characterize their microbiome and check for efficiency of the pipeline. The microbes present in the samples were in accordance to published data in other European datasets, and the independent BLAST evaluation of microbiome-aligned reads confirmed that the assigned species presented the highest BLAST match-hits. QmihR is available at GitHub (https://github.com/ Pereira-lab/QmihR). © Springer International Publishing AG 2017.
2014
Authors
Camacho, R; Ramos, R; Fonseca, NA;
Publication
INDUCTIVE LOGIC PROGRAMMING: 23RD INTERNATIONAL CONFERENCE
Abstract
Inductive Logic Programming (ILP) is a well known approach to Multi-Relational Data Mining. ILP systems may take a long time for analyzing the data mainly because the search (hypotheses) spaces are often very large and the evaluation of each hypothesis, which involves theorem proving, may be quite time consuming in some domains. To address these efficiency issues of ILP systems we propose the APIS (And ParallelISm for ILP) system that uses results from Logic Programming AND-parallelism. The approach enables the partition of the search space into sub-spaces of two kinds: sub-spaces where clause evaluation requires theorem proving; and sub-spaces where clause evaluation is performed quite efficiently without resorting to a theorem prover. We have also defined a new type of redundancy (Coverage-equivalent redundancy) that enables the prune of significant parts of the search space. The new type of pruning together with the partition of the hypothesis space considerably improved the performance of the APIS system. An empirical evaluation of the APIS system in standard ILP data sets shows considerable speedups without a lost of accuracy of the models constructed.
2017
Authors
Teixeira, V; Camacho, R; Ferreira, PG;
Publication
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)
Abstract
Cancer genome projects are characterizing the genome, epigenome and transcriptome of a large number of samples using the latest high-throughput sequencing assays. The generated data sets pose several challenges for traditional statistical and machine learning methods. In this work we are interested in the task of deriving the most informative genes from a cancer gene expression data set. For that goal we built denoising autoencoders (DAE) and stacked denoising autoencoders and we studied the influence of the input nodes on the final representation of the DAE. We have also compared these deep learning approaches with other existing approaches. Our study is divided into two main tasks. First, we built and compared the performance of several feature extraction methods as well as data sampling methods using classifiers that were able to distinguish the samples of thyroid cancer patients from samples of healthy persons. In the second task, we have investigated the possibility of building comprehensible descriptions of gene expression data by using Denoising Autoencoders and Stacked Denoising Autoencoders as feature extraction methods. After extracting information related to the description built by the network, namely the connection weights, we devised post-processing techniques to extract comprehensible and biologically meaningful descriptions out of the constructed models. We have been able to build high accuracy models to discriminate thyroid cancer from healthy patients but the extraction of comprehensible models is still very limited.
2017
Authors
Dutra, I; Camacho, R; Barbosa, JG; Marques, O;
Publication
VECPAR
Abstract
2015
Authors
Cavadas, B; Soares, P; Camacho, R; Brandao, A; Costa, MD; Fernandes, V; Pereira, JB; Rito, T; Samuels, DC; Pereira, L;
Publication
HUMAN MUTATION
Abstract
A high-resolution mtDNA phylogenetic tree allowed us to look backward in time to investigate purifying selection. Purifying selection was very strong in the last 2,500 years, continuously eliminating pathogenic mutations back until the end of the Younger Dryas (approximate to 11,000 years ago), when a large population expansion likely relaxed selection pressure. This was preceded by a phase of stable selection until another relaxation occurred in the out-of-Africa migration. Demography and selection are closely related: expansions led to relaxation of selection and higher pathogenicity mutations significantly decreased the growth of descendants. The only detectible positive selection was the recurrence of highly pathogenic nonsynonymous mutations (m.3394T>C-m.3397A>G-m.3398T>C) at interior branches of the tree, preventing the formation of a dinucleotide STR (TATATA) in the MT-ND1 gene. At the most recent time scale in 124 mother-children transmissions, purifying selection was detectable through the loss of mtDNA variants with high predicted pathogenicity. A few haplogroup-defining sites were also heteroplasmic, agreeing with a significant propensity in 349 positions in the phylogenetic tree to revert back to the ancestral variant. This nonrandom mutation property explains the observation of heteroplasmic mutations at some haplogroup-defining sites in sequencing datasets, which may not indicate poor quality as has been claimed.
2014
Authors
Abreu, P; Soares, C; Camacho, R;
Publication
2014 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ITS APPLICATIONS (ICCSA)
Abstract
Optimization studies often require very large computational resources to execute experiments. Furthermore, most of the time, the experiments are repetitions (same problem instances and same algorithm with the same parameters) that were carried out in past studies. In this work, we propose a framework for the execution of optimization experiments in a distributed environment and for the storage of the results as well as of the experimental conditions. The framework can support not only the organized execution of experiments but it also enables the reuse of the results in future studies.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.