Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

Pedro G. Ferreira graduated in Systems and Informatics Engineering (2002) and completed a PhD in Artificial Intelligence from University of Minho (2007). He was a Postdoctoral Fellow at Center for Genomic Regulation, Barcelona (2008-2012) and at University of Geneva (2012-2014). He participated in several major international consortia including ICGC-CLL, ENCODE, GEUVADIS and GTEx. Currently, he is an Assistant Professor at the Department of Computer Science, Faculty of Sciences of University of Porto and a researcher at INESCTEC-LIADD and i3s/Ipatimup. His main research focus is in genomic data science. In particular, he is interested in unraveling the role of genomics on the human health and disease. He has been involved in several bioinformatics start-ups.

Interest
Topics
Details

Details

  • Name

    Pedro Gabriel Ferreira
  • Role

    Senior Researcher
  • Since

    20th September 2018
001
Publications

2024

A Distributed Computing Solution for Privacy-Preserving Genome-Wide Association Studies

Authors
Brito, C; Ferreira, P; Paulo, J;

Publication

Abstract
AbstractBreakthroughs in sequencing technologies led to an exponential growth of genomic data, providing unprecedented biological in-sights and new therapeutic applications. However, analyzing such large amounts of sensitive data raises key concerns regarding data privacy, specifically when the information is outsourced to third-party infrastructures for data storage and processing (e.g., cloud computing). Current solutions for data privacy protection resort to centralized designs or cryptographic primitives that impose considerable computational overheads, limiting their applicability to large-scale genomic analysis.We introduce Gyosa, a secure and privacy-preserving distributed genomic analysis solution. Unlike in previous work, Gyosafollows a distributed processing design that enables handling larger amounts of genomic data in a scalable and efficient fashion. Further, by leveraging trusted execution environments (TEEs), namely Intel SGX, Gyosaallows users to confidentially delegate their GWAS analysis to untrusted third-party infrastructures. To overcome the memory limitations of SGX, we implement a computation partitioning scheme within Gyosa. This scheme reduces the number of operations done inside the TEEs while safeguarding the users’ genomic data privacy. By integrating this security scheme inGlow, Gyosaprovides a secure and distributed environment that facilitates diverse GWAS studies. The experimental evaluation validates the applicability and scalability of Gyosa, reinforcing its ability to provide enhanced security guarantees. Further, the results show that, by distributing GWASes computations, one can achieve a practical and usable privacy-preserving solution.

2024

Integration of multi-modal datasets to estimate human aging

Authors
Ribeiro, R; Moraes, A; Moreno, M; Ferreira, PG;

Publication
MACHINE LEARNING

Abstract
Aging involves complex biological processes leading to the decline of living organisms. As population lifespan increases worldwide, the importance of identifying factors underlying healthy aging has become critical. Integration of multi-modal datasets is a powerful approach for the analysis of complex biological systems, with the potential to uncover novel aging biomarkers. In this study, we leveraged publicly available epigenomic, transcriptomic and telomere length data along with histological images from the Genotype-Tissue Expression project to build tissue-specific regression models for age prediction. Using data from two tissues, lung and ovary, we aimed to compare model performance across data modalities, as well as to assess the improvement resulting from integrating multiple data types. Our results demostrate that methylation outperformed the other data modalities, with a mean absolute error of 3.36 and 4.36 in the test sets for lung and ovary, respectively. These models achieved lower error rates when compared with established state-of-the-art tissue-agnostic methylation models, emphasizing the importance of a tissue-specific approach. Additionally, this work has shown how the application of Hierarchical Image Pyramid Transformers for feature extraction significantly enhances age modeling using histological images. Finally, we evaluated the benefits of integrating multiple data modalities into a single model. Combining methylation data with other data modalities only marginally improved performance likely due to the limited number of available samples. Combining gene expression with histological features yielded more accurate age predictions compared with the individual performance of these data types. Given these results, this study shows how machine learning applications can be extended to/in multi-modal aging research. Code used is available at https://github.com/zroger49/multi_modal_age_prediction.

2024

The molecular impact of cigarette smoking resembles aging across tissues

Authors
Ramirez, JM; Ribeiro, R; Soldatkina, O; Moraes, A; García-Pérez, R; Ferreira, PG; Melé, M;

Publication

Abstract
AbstractTobacco smoke is the main cause of preventable mortality worldwide. Smoking increases the risk of developing many diseases and has been proposed as an aging accelerator. Yet, the molecular mechanisms driving smoking-related health decline and aging acceleration in most tissues remain unexplored. Here, we characterize gene expression, alternative splicing, DNA methylation and histological alterations induced by cigarette smoking across human tissues. We show that smoking impacts tissue architecture and triggers systemic inflammation. We find that in many tissues, the effects of smoking significantly overlap those of aging in the same direction. Specifically, both age and smoking upregulate inflammatory genes and drive hypomethylation at enhancers. In addition, we observe widespread smoking-driven hypermethylation at target regions of the Polycomb repressive complex, which is a well-known aging effect. Smoking-induced epigenetic changes overlap causal aging CpGs, suggesting that these methylation changes may directly mediate aging acceleration observed in smokers. Finally, we find that smoking effects that are shared with aging are more persistent over time. Overall, our multi-tissue and multi-omic analysis of the effects of cigarette smoking provides an extensive characterization of the impact of tobacco smoke across tissues and unravels the molecular mechanisms driving smoking-induced tissue homeostasis decline and aging acceleration.

2023

A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

Authors
Baptista, D; Ferreira, PG; Rocha, M;

Publication
PLOS COMPUTATIONAL BIOLOGY

Abstract
Author summaryCancer therapies often fail because tumor cells become resistant to treatment. One way to overcome resistance is by treating patients with a combination of two or more drugs. Some combinations may be more effective than when considering individual drug effects, a phenomenon called drug synergy. Computational drug synergy prediction methods can help to identify new, clinically relevant drug combinations. In this study, we developed several deep learning models for drug synergy prediction. We examined the effect of using different types of deep learning architectures, and different ways of representing drugs and cancer cell lines. We explored the use of biological prior knowledge to select relevant cell line features, and also tested data-driven feature reduction methods. We tested both precomputed drug features and deep learning methods that can directly learn features from raw representations of molecules. We also evaluated whether including genomic features, in addition to gene expression data, improves the predictive performance of the models. Through these experiments, we were able to identify strategies that will help guide the development of new deep learning models for drug synergy prediction in the future. One of the main obstacles to the successful treatment of cancer is the phenomenon of drug resistance. A common strategy to overcome resistance is the use of combination therapies. However, the space of possibilities is huge and efficient search strategies are required. Machine Learning (ML) can be a useful tool for the discovery of novel, clinically relevant anti-cancer drug combinations. In particular, deep learning (DL) has become a popular choice for modeling drug combination effects. Here, we set out to examine the impact of different methodological choices on the performance of multimodal DL-based drug synergy prediction methods, including the use of different input data types, preprocessing steps and model architectures. Focusing on the NCI ALMANAC dataset, we found that feature selection based on prior biological knowledge has a positive impact-limiting gene expression data to cancer or drug response-specific genes improved performance. Drug features appeared to be more predictive of drug response, with a 41% increase in coefficient of determination (R-2) and 26% increase in Spearman correlation relative to a baseline model that used only cell line and drug identifiers. Molecular fingerprint-based drug representations performed slightly better than learned representations-ECFP4 fingerprints increased R-2 by 5.3% and Spearman correlation by 2.8% w.r.t the best learned representations. In general, fully connected feature-encoding subnetworks outperformed other architectures. DL outperformed other ML methods by more than 35% (R-2) and 14% (Spearman). Additionally, an ensemble combining the top DL and ML models improved performance by about 6.5% (R-2) and 4% (Spearman). Using a state-of-the-art interpretability method, we showed that DL models can learn to associate drug and cell line features with drug response in a biologically meaningful way. The strategies explored in this study will help to improve the development of computational methods for the rational design of effective drug combinations for cancer therapy.

2023

Soteria: Preserving Privacy in Distributed Machine Learning

Authors
Brito, C; Ferreira, P; Portela, B; Oliveira, R; Paulo, J;

Publication
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023

Abstract
We propose Soteria, a system for distributed privacy-preserving Machine Learning (ML) that leverages Trusted Execution Environments (e.g. Intel SGX) to run code in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The conducted experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41%, when compared to previous related work. Our protocol is accompanied by a security proof, as well as a discussion regarding resilience against a wide spectrum of ML attacks.

Supervised
thesis

2023

Unravelling the Complexity of Human Disease: Transcriptomic Networks of Phenotype - Gene Expression Data

Author
Darmit Manish Kumar

Institution
UP-FCUP

2023

A Multi-Caller Pipeline to maximize the output of Somatic Exome Sequencing Analysis

Author
Inês Sofia Pinheiro Marques

Institution
UP-FCUP

2023

Identification of mRNA signatures by bioinformatic analysis in cancers related to tobacco smoking

Author
Maria de Sá Bessa

Institution
UP-FCUP

2023

New antidotes for Bothrops asper venom: a study of PLA2 protein

Author
Roberto Miguel Pais Pinto

Institution
UP-FCUP

2023

Omics-based prediction of human phenotypes using scalable machine learning approaches

Author
Marta Carolina Cabral Moreno

Institution
UP-FCUP