Publications

Publications by Nuno Fonseca

2020

Butler enables rapid cloud-based analysis of thousands of human genomes

Authors
Yakneen, S; Waszak, SM; Gertz, M; Korbel, JO; Aminou, B; Bartolome, J; Boroevich, KA; Boyce, R; Brooks, AN; Buchanan, A; Buchhalter, I; Butler, AP; Byrne, NJ; Cafferkey, A; Campbell, PJ; Chen, ZH; Cho, S; Choi, W; Clapham, P; Davis Dusenbery, BN; De La Vega, FM; Demeulemeester, J; Dow, MT; Dursi, LJ; Eils, J; Eils, R; Ellrott, K; Farcas, C; Favero, F; Fayzullaev, N; Ferretti, V; Flicek, P; Fonseca, NA; Gelpi, JL; Getz, G; Gibson, B; Grossman, RL; Harismendy, O; Heath, AP; Heinold, MC; Hess, JM; Hofmann, O; Hong, JH; Hudson, TJ; Hutter, B; Hutter, CM; Hubschmann, D; Imoto, S; Ivkovic, S; Jeon, SH; Jiao, W; Jung, J; Kabbe, R; Kahles, A; Kerssemakers, JNA; Kim, HL; Kim, H; Kim, J; Kim, Y; Kleinheinz, K; Koscher, M; Koures, A; Kovacevic, M; Lawerenz, C; Leshchiner, I; Liu, J; Livitz, D; Mihaiescu, GL; Mijalkovic, S; Lazic, AM; Miyano, S; Miyoshi, N; Nahal Bose, HK; Nakagawa, H; Nastic, M; Newhouse, SJ; Nicholson, J; O'Connor, BD; Ocana, D; Ohi, K; Ohno Machado, L; Omberg, L; Ouellette, BFF; Paramasivam, N; Perry, MD; Pihl, TD; Prinz, M; Puiggros, M; Radovic, P; Raine, KM; Rheinbay, E; Rosenberg, M; Royo, R; Ratsch, G; Saksena, G; Schlesner, M; Shorser, SI; Short, C; Sofia, HJ; Spring, J; Stein, LD; Struck, AJ; Tiao, G; Tijanic, N; Torrents, D; Van Loo, P; Vazquez, M; Vicente, D; Wala, JA; Wang, ZN; Weischenfeldt, J; Werner, J; Williams, A; Woo, Y; Wright, AJ; Xiang, Q; Yang, LM; Yuen, D; Yung, CK; Zhang, JJ;

Publication
NATURE BIOTECHNOLOGY

Abstract
Efficient, large-scale genomic analysis is facilitated on the cloud by a computational tool with error-diagnosing and self-healing capabilities. We present Butler, a computational tool that facilitates large-scale genomic analyses on public and academic clouds. Butler includes innovative anomaly detection and self-healing functions that improve the efficiency of data processing and analysis by 43% compared with current approaches. Butler enabled processing of a 725-terabyte cancer genome dataset from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project in a time-efficient and uniform manner.

CloseRead Abstract

2020

Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis

Authors
Carlevaro Fita, J; Lanzós, A; Feuerbach, L; Hong, C; Mas Ponte, D; Pedersen, JS; Abascal, F; Amin, SB; Bader, GD; Barenboim, J; Beroukhim, R; Bertl, J; Boroevich, KA; Brunak, S; Campbell, PJ; Carlevaro Fita, J; Chakravarty, D; Chan, CWY; Chen, K; Choi, JK; Deu Pons, J; Dhingra, P; Diamanti, K; Feuerbach, L; Fink, JL; Fonseca, NA; Frigola, J; Gambacorti Passerini, C; Garsed, DW; Gerstein, M; Getz, G; Gonzalez Perez, A; Guo, Q; Gut, IG; Haan, D; Hamilton, MP; Haradhvala, NJ; Harmanci, AO; Helmy, M; Herrmann, C; Hess, JM; Hobolth, A; Hodzic, E; Hong, C; Hornshøj, H; Isaev, K; Izarzugaza, JMG; Johnson, R; Johnson, TA; Juul, M; Juul, RI; Kahles, A; Kahraman, A; Kellis, M; Khurana, E; Kim, J; Kim, JK; Kim, Y; Komorowski, J; Korbel, JO; Kumar, S; Lanzós, A; Larsson, E; Lawrence, MS; Lee, D; Lehmann, KV; Li, S; Li, X; Lin, Z; Liu, EM; Lochovsky, L; Lou, S; Madsen, T; Marchal, K; Martincorena, I; Martinez Fundichely, A; Maruvka, YE; McGillivray, PD; Meyerson, W; Muiños, F; Mularoni, L; Nakagawa, H; Nielsen, MM; Paczkowska, M; Park, K; Park, K; Pedersen, JS; Pich, O; Pons, T; Pulido Tamayo, S; Raphael, BJ; Reimand, J; Reyes Salazar, I; Reyna, MA; Rheinbay, E; Rubin, MA; Rubio Perez, C; Sabarinathan, R; Sahinalp, SC; Saksena, G; Salichos, L; Sander, C; Schumacher, SE; Shackleton, M; Shapira, O; Shen, C; Shrestha, R; Shuai, S; Sidiropoulos, N; Sieverling, L; Sinnott Armstrong, N; Stein, LD; Stuart, JM; Tamborero, D; Tiao, G; Tsunoda, T; Umer, HM; Uusküla Reimand, L; Valencia, A; Vazquez, M; Verbeke, LPC; Wadelius, C; Wadi, L; Wang, J; Warrell, J; Waszak, SM; Weischenfeldt, J; Wheeler, DA; Wu, G; Yu, J; Zhang, J; Zhang, X; Zhang, Y; Zhao, Z; Zou, L; von Mering, C; Johnson, R;

Publication
COMMUNICATIONS BIOLOGY

Abstract
Joana Carlevaro-Fita, Andres Lanzos et al. present the Cancer LncRNA Census (CLC), a manually curated dataset of 122 long noncoding RNAs (lncRNAs) with experimentally-validated functions in cancer based on data from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium. CLC lncRNAs have unique gene features, and a number display evidence for cancer-driving functions that are conserved from humans to mice. Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis.

CloseRead Abstract

2020

Combined burden and functional impact tests for cancer driver discovery using DriverPower

Authors
Shuai, S; Abascal, F; Amin, SB; Bader, GD; Bandopadhayay, P; Barenboim, J; Beroukhim, R; Bertl, J; Boroevich, KA; Brunak, S; Campbell, PJ; Carlevaro Fita, J; Chakravarty, D; Chan, CWY; Chen, K; Choi, JK; Deu Pons, J; Dhingra, P; Diamanti, K; Feuerbach, L; Fink, JL; Fonseca, NA; Frigola, J; Gambacorti Passerini, C; Garsed, DW; Gerstein, M; Getz, G; Guo, Q; Gut, IG; Haan, D; Hamilton, MP; Haradhvala, NJ; Harmanci, AO; Helmy, M; Herrmann, C; Hess, JM; Hobolth, A; Hodzic, E; Hong, C; Hornshøj, H; Isaev, K; Izarzugaza, JMG; Johnson, R; Johnson, TA; Juul, M; Juul, RI; Kahles, A; Kahraman, A; Kellis, M; Khurana, E; Kim, J; Kim, JK; Kim, Y; Komorowski, J; Korbel, JO; Kumar, S; Lanzós, A; Larsson, E; Lawrence, MS; Lee, D; Lehmann, KV; Li, S; Li, X; Lin, Z; Liu, EM; Lochovsky, L; Lou, S; Madsen, T; Marchal, K; Martincorena, I; Martinez Fundichely, A; Maruvka, YE; McGillivray, PD; Meyerson, W; Muiños, F; Mularoni, L; Nakagawa, H; Nielsen, MM; Paczkowska, M; Park, K; Park, K; Pedersen, JS; Pons, T; Pulido Tamayo, S; Raphael, BJ; Reimand, J; Reyes Salazar, I; Reyna, MA; Rheinbay, E; Rubin, MA; Rubio Perez, C; Sahinalp, SC; Saksena, G; Salichos, L; Sander, C; Schumacher, SE; Shackleton, M; Shapira, O; Shen, C; Shrestha, R; Shuai, S; Sidiropoulos, N; Sieverling, L; Sinnott Armstrong, N; Stein, LD; Stuart, JM; Tamborero, D; Tiao, G; Tsunoda, T; Umer, HM; Uusküla Reimand, L; Valencia, A; Vazquez, M; Verbeke, LPC; Wadelius, C; Wadi, L; Wang, J; Warrell, J; Waszak, SM; Weischenfeldt, J; Wheeler, DA; Wu, G; Yu, J; Zhang, J; Zhang, X; Zhang, Y; Zhao, Z; Zou, L; von Mering, C; Gallinger, S; Stein, L;

Publication
Nature Communications

Abstract
The discovery of driver mutations is one of the key motivations for cancer genome sequencing. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancers across 38 tumour types, we describe DriverPower, a software package that uses mutational burden and functional impact evidence to identify driver mutations in coding and non-coding sites within cancer whole genomes. Using a total of 1373 genomic features derived from public sources, DriverPower’s background mutation model explains up to 93% of the regional variance in the mutation rate across multiple tumour types. By incorporating functional impact scores, we are able to further increase the accuracy of driver discovery. Testing across a collection of 2583 cancer genomes from the PCAWG project, DriverPower identifies 217 coding and 95 non-coding driver candidates. Comparing to six published methods used by the PCAWG Drivers and Functional Interpretation Working Group, DriverPower has the highest F1 score for both coding and non-coding driver discovery. This demonstrates that DriverPower is an effective framework for computational driver discovery. © 2020, The Author(s).

CloseRead Abstract

2021

Comparative Genomics of Xanthomonas euroxanthea and Xanthomonas arboricola pv. juglandis Strains Isolated from a Single Walnut Host Tree

Authors
Fernandes, C; Martins, L; Teixeira, M; Blom, J; Pothier, JE; Fonseca, NA; Tavares, F;

Publication
MICROORGANISMS

Abstract
The recent report of distinct Xanthomonas lineages of Xanthomonas arboricola pv. juglandis and Xanthomonas euroxanthea within the same walnut tree revealed that this consortium of walnut-associated Xanthomonas includes both pathogenic and nonpathogenic strains. As the implications of this co-colonization are still poorly understood, in order to unveil niche-specific adaptations, the genomes of three X. euroxanthea strains (CPBF 367, CPBF 424(T), and CPBF 426) and of an X. arboricola pv. juglandis strain (CPBF 427) isolated from a single walnut tree in Loures (Portugal) were sequenced with two different technologies, Illumina and Nanopore, to provide consistent single scaffold chromosomal sequences. General genomic features showed that CPBF 427 has a genome similar to other X. arboricola pv. juglandis strains, regarding its size, number, and content of CDSs, while X. euroxanthea strains show a reduction regarding these features comparatively to X. arboricola pv. juglandis strains. Whole genome comparisons revealed remarkable genomic differences between X. arboricola pv. juglandis and X. euroxanthea strains, which translates into different pathogenicity and virulence features, namely regarding type 3 secretion system and its effectors and other secretory systems, chemotaxis-related proteins, and extracellular enzymes. Altogether, the distinct genomic repertoire of X. euroxanthea may be particularly useful to address pathogenicity emergence and evolution in walnut-associated Xanthomonas.

CloseRead Abstract

2021

Complete Genome Sequence Obtained by Nanopore and Illumina Hybrid Assembly of Xanthomonas arboricola pv. juglandis CPBF 427, Isolated from Buds of a Walnut Tree

Authors
Teixeira, M; Fernandes, C; Chaves, C; Pinto, J; Tavares, F; Fonseca, NA;

Publication
MICROBIOLOGY RESOURCE ANNOUNCEMENTS

Abstract
We report the genome sequence of Xanthomonas arboricola pv. juglandis strain CPBF 427, which was isolated from early-season buds of a diseased walnut tree, suggesting overwinter potential. This study provides a consistent genomic reference for this pathovar and may contribute to addressing the overwinter survival of these walnut pathogens.

CloseRead Abstract

2021

Evaluating the impact of sampling strategies and bioinformatics on ethanol-based DNA metabarcoding

Authors
Martins, FM; Fonseca, NA; Egeter, B; Pinto, J; Assunção, T; Chaves, C; Sousa, P; Jesus, J; Beja, P;

Publication
ARPHA Conference Abstracts

Abstract
Recent developments on ethanol-based DNA (etDNA) metabarcoding have shown that it is possible to extract meaningful information about macroinvertebrate community diversity and composition from the ethanol used to preserve bulk samples. The major advantages of this molecular approach are the reduced processing time and costs, and the possibility to keep specimens intact for other experiments. Yet, organisms with highly sclerotised exoskeleton or that are rare in the sample have been found to release a lower amount of DNA into solution and tend to be consistently missed by etDNA metabarcoding, thereby compromising the viability of the method. Few studies have shown that the first steps of the metabarcoding workflow are crucial for the good performance of etDNA-based assays, such as the decision on storage time before sampling and the ethanol phase to be analysed, the inclusion of pre-treatment strategies (i.e., freezing), and the choice of the DNA extraction protocol. In this study, we aimed to evaluate the combined effect of various technical choices on the performance of etDNA metabarcoding, considering factors such as sample volume, ethanol phase of sorted and unsorted samples, pre-capture treatments (evaporation vs filtration) and bioinformatic pipelines. Through the application of decision-tree models, our preliminary data revealed that the increase of volume (by itself) is enough to improve PCR amplification yields and proportion of families matching the morphological identifications, with great impact on the detection of hard-bodied and cased taxa. Also, no major differences among phases with or without a sorting step nor among bioinformatic pipelines were detected, particularly at higher volumes. Our results suggest that the higher performance (with lower observed variation) in taxonomic detection at higher volumes is likely a consequence of a higher availability of longer fragments of DNA in solution. This study highlights the importance of understanding the impact of technical choices to improve the efficiency of a DNA-based method, and reinstates etDNA metabarcoding as a potential method in the context of biomonitoring.

CloseRead Abstract