Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by LIAAD

2024

Process mining embeddings: Learning vector representations for Petri nets

Authors
Colonna, JG; Fares, AA; Duarte, M; Sousa, R;

Publication
INTELLIGENT SYSTEMS WITH APPLICATIONS

Abstract
Process Mining offers a powerful framework for uncovering, analyzing, and optimizing real-world business processes. Petri nets provide a versatile means of modeling process behavior. However, traditional methods often struggle to effectively compare complex Petri nets, hindering their potential for process enhancement. To address this challenge, we introduce PetriNet2Vec, an unsupervised methodology inspired by Doc2Vec. This approach converts Petri nets into embedding vectors, facilitating the comparison, clustering, and classification of process models. We validated our approach using the PDC Dataset, comprising 96 diverse Petri net models. The results demonstrate that PetriNet2Vec effectively captures the structural properties of process models, enabling accurate process classification and efficient process retrieval. Specifically, our findings highlight the utility of the learned embeddings in two key downstream tasks: process classification and process retrieval. In process classification, the embeddings allowed for accurate categorization of process models based on their structural properties. In process retrieval, the embeddings enabled efficient retrieval of similar process models using cosine distance. These results demonstrate the potential of PetriNet2Vec to significantly enhance process mining capabilities.

2024

Integrating machine learning techniques for predicting ground vibration in pile driving activities

Authors
Abouelmaty, AM; Colaço, A; Fares, AA; Ramos, A; Costa, PA;

Publication
COMPUTERS AND GEOTECHNICS

Abstract
This study focuses on the assessment of ground vibrations due to pile driving activities. Given the likelihood of excessive vibration due to the driving process, it is imperative to predict vibration levels during the design phase. The primary goal of this work is to integrate machine learning techniques, specifically Extreme Gradient Boosting (XGBoost) and Artificial Neural Networks (ANNs) for real-time vibration prediction. The training dataset was generated using a validated numerical model and the trained models were validated based on experimental results. This validation process highlights the efficiency and accuracy of Extreme Gradient Boosting in predicting the-free-field response of the ground.

2024

A Distributed Computing Solution for Privacy-Preserving Genome-Wide Association Studies

Authors
Brito, C; Ferreira, P; Paulo, J;

Publication

Abstract
AbstractBreakthroughs in sequencing technologies led to an exponential growth of genomic data, providing unprecedented biological in-sights and new therapeutic applications. However, analyzing such large amounts of sensitive data raises key concerns regarding data privacy, specifically when the information is outsourced to third-party infrastructures for data storage and processing (e.g., cloud computing). Current solutions for data privacy protection resort to centralized designs or cryptographic primitives that impose considerable computational overheads, limiting their applicability to large-scale genomic analysis.We introduce Gyosa, a secure and privacy-preserving distributed genomic analysis solution. Unlike in previous work, Gyosafollows a distributed processing design that enables handling larger amounts of genomic data in a scalable and efficient fashion. Further, by leveraging trusted execution environments (TEEs), namely Intel SGX, Gyosaallows users to confidentially delegate their GWAS analysis to untrusted third-party infrastructures. To overcome the memory limitations of SGX, we implement a computation partitioning scheme within Gyosa. This scheme reduces the number of operations done inside the TEEs while safeguarding the users’ genomic data privacy. By integrating this security scheme inGlow, Gyosaprovides a secure and distributed environment that facilitates diverse GWAS studies. The experimental evaluation validates the applicability and scalability of Gyosa, reinforcing its ability to provide enhanced security guarantees. Further, the results show that, by distributing GWASes computations, one can achieve a practical and usable privacy-preserving solution.

2024

Integration of multi-modal datasets to estimate human aging

Authors
Ribeiro, R; Moraes, A; Moreno, M; Ferreira, PG;

Publication
MACHINE LEARNING

Abstract
Aging involves complex biological processes leading to the decline of living organisms. As population lifespan increases worldwide, the importance of identifying factors underlying healthy aging has become critical. Integration of multi-modal datasets is a powerful approach for the analysis of complex biological systems, with the potential to uncover novel aging biomarkers. In this study, we leveraged publicly available epigenomic, transcriptomic and telomere length data along with histological images from the Genotype-Tissue Expression project to build tissue-specific regression models for age prediction. Using data from two tissues, lung and ovary, we aimed to compare model performance across data modalities, as well as to assess the improvement resulting from integrating multiple data types. Our results demostrate that methylation outperformed the other data modalities, with a mean absolute error of 3.36 and 4.36 in the test sets for lung and ovary, respectively. These models achieved lower error rates when compared with established state-of-the-art tissue-agnostic methylation models, emphasizing the importance of a tissue-specific approach. Additionally, this work has shown how the application of Hierarchical Image Pyramid Transformers for feature extraction significantly enhances age modeling using histological images. Finally, we evaluated the benefits of integrating multiple data modalities into a single model. Combining methylation data with other data modalities only marginally improved performance likely due to the limited number of available samples. Combining gene expression with histological features yielded more accurate age predictions compared with the individual performance of these data types. Given these results, this study shows how machine learning applications can be extended to/in multi-modal aging research. Code used is available at https://github.com/zroger49/multi_modal_age_prediction.

2024

The molecular impact of cigarette smoking resembles aging across tissues

Authors
Ramirez, JM; Ribeiro, R; Soldatkina, O; Moraes, A; García-Pérez, R; Ferreira, PG; Melé, M;

Publication

Abstract
AbstractTobacco smoke is the main cause of preventable mortality worldwide. Smoking increases the risk of developing many diseases and has been proposed as an aging accelerator. Yet, the molecular mechanisms driving smoking-related health decline and aging acceleration in most tissues remain unexplored. Here, we characterize gene expression, alternative splicing, DNA methylation and histological alterations induced by cigarette smoking across human tissues. We show that smoking impacts tissue architecture and triggers systemic inflammation. We find that in many tissues, the effects of smoking significantly overlap those of aging in the same direction. Specifically, both age and smoking upregulate inflammatory genes and drive hypomethylation at enhancers. In addition, we observe widespread smoking-driven hypermethylation at target regions of the Polycomb repressive complex, which is a well-known aging effect. Smoking-induced epigenetic changes overlap causal aging CpGs, suggesting that these methylation changes may directly mediate aging acceleration observed in smokers. Finally, we find that smoking effects that are shared with aging are more persistent over time. Overall, our multi-tissue and multi-omic analysis of the effects of cigarette smoking provides an extensive characterization of the impact of tobacco smoke across tissues and unravels the molecular mechanisms driving smoking-induced tissue homeostasis decline and aging acceleration.

2024

Identification of Participants of Narratives Using Knowledge Bases

Authors
Machado, J; Amorim, E;

Publication
Anais do XXXIX Simpósio Brasileiro de Banco de Dados (SBBD 2024)

Abstract

  • 18
  • 440