Publicacoes - INESC TEC

Publicações

Publicações por Pedro Gabriel Ferreira

2020

A vast resource of allelic expression data spanning human tissues

Autores
Castel S.E.; Aguet F.; Aguet F.; Aguet F.; Mohammadi P.; Mohammadi P.; Anand S.; Anand S.; Ardlie K.G.; Ardlie K.G.; Gabriel S.; Getz G.A.; Graubert A.; Graubert A.; Hadley K.; Hadley K.; Handsaker R.E.; Handsaker R.E.; Huang K.H.; Kashin S.; Kashin S.; Li X.; MacArthur D.G.; Meier S.R.; Meier S.R.; Nedzel J.L.; Nedzel J.L.; Nguyen D.T.; Segrè A.V.; Todres E.; Todres E.; Balliu B.; Barbeira A.N.; Battle A.; Bonazzola R.; Brown A.; Brown C.D.; Castel S.E.; Conrad D.F.; Cotter D.J.; Cox N.; Das S.; De Goede O.M.; Dermitzakis E.T.; Einson J.; Engelhardt B.E.; Eskin E.; Eulalio T.Y.; Ferraro N.M.; Flynn E.D.; Fresard L.; Gamazon E.R.; Garrido-Martín D.; Gay N.R.; Gloudemans M.J.; Guigó R.; Hame A.R.; He Y.; Hoffman P.J.; Hormozdiari F.; Hou L.; Huang K.H.; Im H.K.; Jo B.; Kasela S.; Kellis M.; Kim-Hellmuth S.; Kwong A.; Lappalainen T.; Li X.; Li X.; Liang Y.; Mangul S.; Montgomery S.B.; Muñoz-Aguirre M.; Nachun D.C.; Nguyen D.T.; Nobel A.B.; Oliva M.; Park Y.S.; Park Y.; Parsana P.; Rao A.S.; Reverter F.; Rouhana J.M.; Sabatti C.; Saha A.; Segrè A.V.; Skol A.D.; Stephens M.; Stranger B.E.; Strober B.J.; Teran N.A.; Viñuela A.; Wang G.; Wen X.; Wright F.; Wucher V.; Zou Y.; Ferreira P.G.;

Publicação
Genome Biology

Abstract
Allele expression (AE) analysis robustly measures cis-regulatory effects. Here, we present and demonstrate the utility of a vast AE resource generated from the GTEx v8 release, containing 15,253 samples spanning 54 human tissues for a total of 431 million measurements of AE at the SNP level and 153 million measurements at the haplotype level. In addition, we develop an extension of our tool phASER that allows effect sizes of cis-regulatory variants to be estimated using haplotype-level AE data. This AE resource is the largest to date, and we are able to make haplotype-level data publicly available. We anticipate that the availability of this resource will enable future studies of regulatory variation across human tissues.

FecharLer Abstract

2018

Bioinformatics algorithms: Design and implementation in python

Autores
Rocha, M; Ferreira, PG;

Publicação
Bioinformatics Algorithms: Design and Implementation in Python

Abstract
Bioinformatics Algorithms: Design and Implementation in Python provides a comprehensive book on many of the most important bioinformatics problems, putting forward the best algorithms and showing how to implement them. The book focuses on the use of the Python programming language and its algorithms, which is quickly becoming the most popular language in the bioinformatics field. Readers will find the tools they need to improve their knowledge and skills with regard to algorithm development and implementation, and will also uncover prototypes of bioinformatics applications that demonstrate the main principles underlying real world applications.

FecharLer Abstract

2021

Correction: Solving unsolved rare neurological diseases—a Solve-RD viewpoint (European Journal of Human Genetics, (2021), 10.1038/s41431-021-00901-1)

Autores
Schüle, R; Timmann, D; Erasmus, CE; Reichbauer, J; Wayand, M; Baets, J; Balicza, P; Chinnery, P; Dürr, A; Haack, T; Hengel, H; Horvath, R; Houlden, H; Kamsteeg, EJ; Kamsteeg, C; Lohmann, K; Macaya, A; Marcé Grau, A; Maver, A; Molnar, J; Münchau, A; Peterlin, B; Riess, O; Schöls, L; Schüle, R; Stevanin, G; Synofzik, M; Timmerman, V; van de Warrenburg, B; van Os, N; Vandrovcova, J; Wayand, M; Wilke, C; van de Warrenburg, B; Schöls, L; Wilke, C; Bevot, A; Zuchner, S; Beltran, S; Laurie, S; Matalonga, L; Graessner, H; Synofzik, M; Graessner, H; Zurek, B; Ellwanger, K; Ossowski, S; Demidov, G; Sturm, M; Schulze Hentrich, JM; Heutink, P; Brunner, H; Scheffer, H; Hoogerbrugge, N; Hoischen, A; ’t Hoen, PAC; Vissers, LELM; Gilissen, C; Steyaert, W; Sablauskas, K; de Voer, RM; Janssen, E; de Boer, E; Steehouwer, M; Yaldiz, B; Kleefstra, T; Brookes, AJ; Veal, C; Gibson, S; Wadsley, M; Mehtarizadeh, M; Riaz, U; Warren, G; Dizjikan, FY; Shorter, T; Töpf, A; Straub, V; Bettolo, CM; Specht, S; Clayton Smith, J; Banka, S; Alexander, E; Jackson, A; Faivre, L; Thauvin, C; Vitobello, A; Denommé Pichon, AS; Duffourd, Y; Tisserant, E; Bruel, AL; Peyron, C; Pélissier, A; Beltran, S; Gut, IG; Laurie, S; Piscia, D; Matalonga, L; Papakonstantinou, A; Bullich, G; Corvo, A; Garcia, C; Fernandez Callejo, M; Hernández, C; Picó, D; Paramonov, I; Lochmüller, H; Gumus, G; Bros Facer, V; Rath, A; Hanauer, M; Olry, A; Lagorce, D; Havrylenko, S; Izem, K; Rigour, F; Durr, A; Davoine, CS; Guillot Noel, L; Heinzmann, A; Coarelli, G; Bonne, G; Evangelista, T; Allamand, V; Nelson, I; Yaou, RB; Metay, C; Eymard, B; Cohen, E; Atalaia, A; Stojkovic, T; Macek, M; Turnovec, M; Thomasová, D; Kremliková, RP; Franková, V; Havlovicová, M; Kremlik, V; Parkinson, H; Keane, T; Spalding, D; Senf, A; Robinson, P; Danis, D; Robert, G; Costa, A; Patch, C; Hanna, M; Houlden, H; Reilly, M; Vandrovcova, J; Muntoni, F; Zaharieva, I; Sarkozy, A; de Jonghe, P; Nigro, V; Banfi, S; Torella, A; Musacchia, F; Piluso, G; Ferlini, A; Selvatici, R; Rossi, R; Neri, M; Aretz, S; Spier, I; Sommer, AK; Peters, S; Oliveira, C; Pelaez, JG; Matos, AR; José, CS; Ferreira, M; Gullo, I; Fernandes, S; Garrido, L; Ferreira, P; Carneiro, F; Swertz, MA; Johansson, L; van der Velde, JK; van der Vries, G; Neerincx, PB; Roelofs Prins, D; Köhler, S; Metcalfe, A; Verloes, A; Drunat, S; Rooryck, C; Trimouille, A; Castello, R; Morleo, M; Pinelli, M; Varavallo, A; De la Paz, MP; Sánchez, EB; Martín, EL; Delgado, BM; de la Rosa, FJAG; Ciolfi, A; Dallapiccola, B; Pizzi, S; Radio, FC; Tartaglia, M; Renieri, A; Benetti, E; Balicza, P; Molnar, MJ; Maver, A; Peterlin, B; Münchau, A; Lohmann, K; Herzog, R; Pauly, M; Macaya, A; Marcé Grau, A; Osorio, AN; de Benito, DN; Lochmüller, H; Thompson, R; Polavarapu, K; Beeson, D; Cossins, J; Cruz, PMR; Hackman, P; Johari, M; Savarese, M; Udd, B; Horvath, R; Capella, G; Valle, L; Holinski Feder, E; Laner, A; Steinke Lange, V; Schröck, E; Rump, A;

Publicação
European Journal of Human Genetics

Abstract
In the original publication of the article, consortium author lists were missing in the article. © 2021, The Author(s).

FecharLer Abstract

2018

Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics

Autores
Barbeira, AN; Dickinson, SP; Bonazzola, R; Zheng, J; Wheeler, HE; Torres, JM; Torstenson, ES; Shah, KP; Garcia, T; Edwards, TL; Stahl, EA; Huckins, LM; Aguet, F; Ardlie, KG; Cummings, BB; Gelfand, ET; Getz, G; Hadley, K; Handsaker, RE; Huang, KH; Kashin, S; Karczewski, KJ; Lek, M; Li, X; MacArthur, DG; Nedzel, JL; Nguyen, DT; Noble, MS; Segrè, AV; Trowbridge, CA; Tukiainen, T; Abell, NS; Balliu, B; Barshir, R; Basha, O; Battle, A; Bogu, GK; Brown, A; Brown, CD; Castel, SE; Chen, LS; Chiang, C; Conrad, DF; Damani, FN; Davis, JR; Delaneau, O; Dermitzakis, ET; Engelhardt, BE; Eskin, E; Ferreira, PG; Frésard, L; Gamazon, ER; Garrido Martín, D; Gewirtz, ADH; Gliner, G; Gloudemans, MJ; Guigo, R; Hall, IM; Han, B; He, Y; Hormozdiari, F; Howald, C; Jo, B; Kang, EY; Kim, Y; Kim Hellmuth, S; Lappalainen, T; Li, G; Li, X; Liu, B; Mangul, S; McCarthy, MI; McDowell, IC; Mohammadi, P; Monlong, J; Montgomery, SB; Muñoz Aguirre, M; Ndungu, AW; Nobel, AB; Oliva, M; Ongen, H; Palowitch, JJ; Panousis, N; Papasaikas, P; Park, Y; Parsana, P; Payne, AJ; Peterson, CB; Quan, J; Reverter, F; Sabatti, C; Saha, A; Sammeth, M; Scott, AJ; Shabalin, AA; Sodaei, R; Stephens, M; Stranger, BE; Strober, BJ; Sul, JH; Tsang, EK; Urbut, S; Van De Bunt, M; Wang, G; Wen, X; Wright, FA; Xi, HS; Yeger Lotem, E; Zappala, Z; Zaugg, JB; Zhou, YH; Akey, JM; Bates, D; Chan, J; Claussnitzer, M; Demanelis, K; Diegel, M; Doherty, JA; Feinberg, AP; Fernando, MS; Halow, J; Hansen, KD; Haugen, E; Hickey, PF; Hou, L; Jasmine, F; Jian, R; Jiang, L; Johnson, A; Kaul, R; Kellis, M; Kibriya, MG; Lee, K; Li, JB; Li, Q; Lin, J; Lin, S; Linder, S; Linke, C; Liu, Y; Maurano, MT; Molinie, B; Nelson, J; Neri, FJ; Park, Y; Pierce, BL; Rinaldi, NJ; Rizzardi, LF; Sandstrom, R; Skol, A; Smith, KS; Snyder, MP; Stamatoyannopoulos, J; Tang, H; Wang, L; Wang, M; Van Wittenberghe, N; Wu, F; Zhang, R; Nierras, CR; Branton, PA; Carithers, LJ; Guan, P; Moore, HM; Rao, A; Vaught, JB; Gould, SE; Lockart, NC; Martin, C; Struewing, JP; Volpi, S; Addington, AM; Koester, SE; Little, AR; Brigham, LE; Hasz, R; Hunter, M; Johns, C; Johnson, M; Kopen, G; Leinweber, WF; Lonsdale, JT; McDonald, A; Mestichelli, B; Myer, K; Roe, B; Salvatore, M; Shad, S; Thomas, JA; Walters, G; Washington, M; Wheeler, J; Bridge, J; Foster, BA; Gillard, BM; Karasik, E; Kumar, R; Miklos, M; Moser, MT; Jewell, SD; Montroy, RG; Rohrer, DC; Valley, DR; Davis, DA; Mash, DC; Undale, AH; Smith, AM; Tabor, DE; Roche, NV; McLean, JA; Vatanian, N; Robinson, KL; Sobin, L; Barcus, ME; Valentino, KM; Qi, L; Hunter, S; Hariharan, P; Singh, S; Um, KS; Matose, T; Tomaszewski, MM; Barker, LK; Mosavel, M; Siminoff, LA; Traino, HM; Flicek, P; Juettemann, T; Ruffier, M; Sheppard, D; Taylor, K; Trevanion, SJ; Zerbino, DR; Craft, B; Goldman, M; Haeussler, M; Kent, WJ; Lee, CM; Paten, B; Rosenbloom, KR; Vivian, J; Zhu, J; Nicolae, DL; Cox, NJ; Im, HK;

Publicação
Nature Communications

Abstract
Scalable, integrative methods to understand mechanisms that link genetic variants with phenotypes are needed. Here we derive a mathematical expression to compute PrediXcan (a gene mapping approach) results using summary data (S-PrediXcan) and show its accuracy and general robustness to misspecified reference sets. We apply this framework to 44 GTEx tissues and 100+ phenotypes from GWAS and meta-analysis studies, creating a growing public catalog of associations that seeks to capture the effects of gene expression variation on human phenotypes. Replication in an independent cohort is shown. Most of the associations are tissue specific, suggesting context specificity of the trait etiology. Colocalized significant associations in unexpected tissues underscore the need for an agnostic scanning of multiple contexts to improve our ability to detect causal regulatory mechanisms. Monogenic disease genes are enriched among significant associations for related traits, suggesting that smaller alterations of these genes may cause a spectrum of milder phenotypes. © 2018 The Author(s).

FecharLer Abstract

2022

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Autores
Moreno, M; Vilaca, R; Ferreira, PG;

Publicação
BMC BIOINFORMATICS

Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

FecharLer Abstract

2023

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

Autores
Baptista, D; Ferreira, PG; Rocha, M;

Publicação
PLOS COMPUTATIONAL BIOLOGY

Abstract
Author summaryCancer therapies often fail because tumor cells become resistant to treatment. One way to overcome resistance is by treating patients with a combination of two or more drugs. Some combinations may be more effective than when considering individual drug effects, a phenomenon called drug synergy. Computational drug synergy prediction methods can help to identify new, clinically relevant drug combinations. In this study, we developed several deep learning models for drug synergy prediction. We examined the effect of using different types of deep learning architectures, and different ways of representing drugs and cancer cell lines. We explored the use of biological prior knowledge to select relevant cell line features, and also tested data-driven feature reduction methods. We tested both precomputed drug features and deep learning methods that can directly learn features from raw representations of molecules. We also evaluated whether including genomic features, in addition to gene expression data, improves the predictive performance of the models. Through these experiments, we were able to identify strategies that will help guide the development of new deep learning models for drug synergy prediction in the future. One of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact-limiting gene expression data to cancer or drug response-specific genes improved performance. Drug features appeared to be more predictive of drug response, with a 41% increase in coefficient of determination (R-2) and 26% increase in Spearman correlation relative to a baseline model that used only cell line and drug identifiers. Molecular fingerprint-based drug representations performed slightly better than learned representations-ECFP4 fingerprints increased R-2 by 5.3% and Spearman correlation by 2.8% w.r.t the best learned representations. In general, fully connected feature-encoding subnetworks outperformed other architectures. DL outperformed other ML methods by more than 35% (R-2) and 14% (Spearman). Additionally, an ensemble combining the top DL and ML models improved performance by about 6.5% (R-2) and 4% (Spearman). Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.

FecharLer Abstract