Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por Pedro Gabriel Ferreira

2013

Transcriptome and genome sequencing uncovers functional variation in humans

Autores
Lappalainen, T; Sammeth, M; Friedländer, MR; ‘t Hoen, PAC; Monlong, J; Rivas, MA; Gonzàlez-Porta, M; Kurbatova, N; Griebel, T; Ferreira, PG; Barann, M; Wieland, T; Greger, L; van Iterson, M; Almlöf, J; Ribeca, P; Pulyakhina, I; Esser, D; Giger, T; Tikhonov, A; Sultan, M; Bertier, G; MacArthur, DG; Lek, M; Lizano, E; Buermans, HPJ; Padioleau, I; Schwarzmayr, T; Karlberg, O; Ongen, H; Kilpinen, H; Beltran, S; Gut, M; Kahlem, K; Amstislavskiy, V; Stegle, O; Pirinen, M; Montgomery, SB; Donnelly, P; McCarthy, MI; Flicek, P; Strom, TM; The Geuvadis Consortium,; Lehrach, H; Schreiber, S; Sudbrak, R; Carracedo,; Antonarakis, SE; Häsler, R; Syvänen, A; van Ommen, G; Brazma, A; Meitinger, T; Rosenstiel, P; Guigó, R; Gut, IG; Estivill, X; Dermitzakis, ET;

Publicação
NATURE

Abstract
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of messenger RNA and microRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project-the first uniformly processed high-throughput RNA-sequencing data from multiple human populations with high-quality genome sequences. We discover extremely widespread genetic variation affecting the regulation of most genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on the cellular mechanisms of regulatory and loss-of-function variation, and allows us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.

2013

Immune response is a personal matter

Autores
Ferreira, PG; Dermitzakis, ET;

Publicação
eLife

Abstract

2013

CPEB1 coordinates alternative 3 ' -UTR formation with translational regulation

Autores
Bava, FA; Eliscovich, C; Ferreira, PG; Minana, B; Ben Dov, C; Guigo, R; Valcarcel, J; Mendez, R;

Publicação
NATURE

Abstract
More than half of mammalian genes generate multiple messenger RNA isoforms that differ in their 3' untranslated regions (3' UTRs) and therefore in regulatory sequences(1), often associated with cell proliferation and cancer(2,3); however, the mechanisms coordinating alternative 3'-UTR processing for specific mRNA populations remain poorly defined. Here we report that the cytoplasmic-polyadenylation element binding protein 1 (CPEB1), an RNA-binding protein that regulates mRNA translation(4), also controls alternative 3'-UTR processing. CPEB1 shuttles to the nudeus(5,6), where it co-localizes with splicing factors and mediates shortening of hundreds of mRNA 3' UTRs, thereby modulating their translation efficiency in the cytoplasm. CPEB1-mediated 3'-UTR shortening correlates with cell proliferation and tumorigenesis. CPEB1 binding to pre-mRNAs not only directs the use of alternative polyadenylation sites, but also changes alternative splicing by preventing U2AF65 recruitment. Our results reveal a novel function of CPEB1 in mediating alternative 3'-UTR processing, which is coordinated with regulation of mRNA translation, through its dual nuclear and cytoplasmic functions.

2023

Privacy-Preserving Machine Learning on Apache Spark

Autores
Brito, CV; Ferreira, PG; Portela, BL; Oliveira, RC; Paulo, JT;

Publicação
IEEE ACCESS

Abstract
The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

2022

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

Autores
Baptista, D; Ferreira, PG; Rocha, M;

Publicação

Abstract
AbstractOne of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact on performance. Drug features appeared to be more predictive of drug response. Molecular fingerprint-based drug representations performed slightly better than learned representations, and gene expression data of cancer or drug response-specific genes also improved performance. In general, fully connected feature-encoding subnetworks outperformed other architectures, with DL outperforming other ML methods. Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.Author summaryCancer therapies often fail because tumor cells become resistant to treatment. One way to overcome resistance is by treating patients with a combination of two or more drugs. Some combinations may be more effective than when considering individual drug effects, a phenomenon called drug synergy. Computational drug synergy prediction methods can help to identify new, clinically relevant drug combinations. In this study, we developed several deep learning models for drug synergy prediction. We examined the effect of using different types of deep learning architectures, and different ways of representing drugs and cancer cell lines. We explored the use of biological prior knowledge to select relevant cell line features, and also tested data-driven feature reduction methods. We tested both precomputed drug features and deep learning methods that can directly learn features from raw representations of molecules. We also evaluated whether including genomic features, in addition to gene expression data, improves the predictive performance of the models. Through these experiments, we were able to identify strategies that will help guide the development of new deep learning models for drug synergy prediction in the future.

2023

Predicting Age from Human Lung Tissue Through Multi-modal Data Integration

Autores
Moraes, A; Moreno, M; Ribeiro, R; Ferreira, G;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
The accurate prediction of biological age can bring important benefits in promoting therapeutic and behavioural strategies for healthy aging. We propose the development of age prediction models using multi-modal datasets, including transcriptomics, methylation and histological images from lung tissue samples of 793 human donors. From a technical point of view this is a challenging problem since not all donors are covered by the same data modalities and the datasets have a very high feature dimensionality with a relatively smaller number of samples. To fairly compare performance across different data types, we’ve created a test set including donors represented in each modality. Given the unique characteristics of the data distribution, we developed gradient boosting tree and convolutional neural network models for each dataset. The performance of the models can be affected by several covariates, including smoking history, and, most importantly, by a skewed distribution of age. Data-centric approaches, including feature engineering, feature selection, data stratification and resampling, proved fundamental in building models that were optimally adapted for each data modality, resulting in significant improvements in model performance for imbalanced regression. The models were then applied to the test set independently, and later combined into a multi-modal ensemble through a voting strategy, predicting age with a median absolute error of 4 years. Even if prediction accuracy remains a challenge, in this work we provide insights to address the difficulties of multi-modal data integration and imbalanced data prediction. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

  • 12
  • 13