Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por HASLab

2022

Relating Kleene Algebras with Pseudo Uninorms

Autores
Bedregal, BRC; Santiago, RHN; Madeira, A; Martins, MA;

Publicação
Dynamic Logic. New Trends and Applications - 4th International Workshop, DaLí 2022, Haifa, Israel, July 31 - August 1, 2022, Revised Selected Papers

Abstract
This paper explores a strict relation between two core notions of the semantics of programs and of fuzzy logics: Kleene Algebras and (pseudo) uninorms. It shows that every Kleene algebra induces a pseudo uninorm, and that some pseudo uninorms induce Kleene algebras. This connection establishes a new perspective on the theory of Kleene algebras and provides a way to build (new) Kleene algebras. The latter aspect is potentially useful as a source of formalism to capture and model programs acting with fuzzy behaviours and domains. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

2022

Accelerating Deep Learning Training Through Transparent Storage Tiering

Autores
Dantas, M; Leitao, D; Cui, P; Macedo, R; Liu, XL; Xu, WJ; Paulo, J;

Publicação
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022)

Abstract
We present MONARCH, a framework-agnostic storage middleware that transparently employs storage tiering to accelerate Deep Learning (DL) training. It leverages existing storage tiers of modern supercomputers (i.e., compute node's local storage and shared parallel file system (PFS)), while considering the I/O patterns of DL frameworks to improve data placement across tiers. MONARCH aims at accelerating DL training and decreasing the I/O pressure imposed over the PFS. We apply MONARCH to TensorFlow and PyTorch, while validating its performance and applicability under different models and dataset sizes. Results show that, even when the training dataset can only be partially stored at local storage, MONARCH reduces TensorFlow's and PyTorch's training time by up to 28% and 37% for I/O-intensive models, respectively. Furthermore, MONARCH decreases the number of I/O operations submitted to the PFS by up to 56%.

2022

Protecting Metadata Servers From Harm Through Application-level I/O Control

Autores
Macedo, R; Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Evans, RT; Paulo, J;

Publicação
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022)

Abstract
Modern large-scale I/O applications that run on HPC infrastructures are increasingly becoming metadata-intensive. Unfortunately, having multiple concurrent applications submitting massive amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to unresponsiveness of the storage backend and overall performance degradation. To address these challenges, we present PADLL, a storage middleware that enables system administrators to proactively control and ensure QoS over metadata workflows in HPC storage systems. We demonstrate its performance and feasibility by controlling the rate of both synthetic and realistic I/O workloads. Results show that PADLL can dynamically control metadata-aggressive workloads, prevent I/O burstiness, and ensure I/O fairness and prioritization.

2022

Protecting Metadata Servers From Harm Through Application-level I/O Control

Autores
MacEdo, R; Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Evans, RT; Paulo, J;

Publicação
Proceedings - IEEE International Conference on Cluster Computing, ICCC

Abstract
Modern large-scale I/O applications that run on HPC infrastructures are increasingly becoming metadata-intensive. Unfortunately, having multiple concurrent applications submitting massive amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to unresponsiveness of the storage backend and overall performance degradation. To address these challenges, we present Padll, a storage middleware that enables system administrators to proactively control and ensure QoS over metadata workflows in HPC storage systems. We demonstrate its performance and feasibility by controlling the rate of both synthetic and realistic I/O workloads. Results show that Padll can dynamically control metadata-aggressive workloads, prevent I/O burstiness, and ensure I/O fairness and prioritization. © 2022 IEEE.

2022

Which Technologies are Most Frequently Used by Data Scientists?

Autores
Pereira, P; Fernandes, JP; Cunha, J;

Publicação
2022 IEEE Symposium on Visual Languages and Human-Centric Computing, VL/HCC 2022, Rome, Italy, September 12-16, 2022

Abstract
Data collection is pervasively bound to our digital lifestyle. A recent study reports that the growth of the data created and replicated in 2020 was even higher than in the previous years to an astonishing global amount of 64.2 zettabytes of data. There are numerous companies whose services/products rely heavily on data analysis, and mining the produced data has already revealed great value for businesses in different sectors. In order to be able to support the professionals that do this job, typically known as data scientists, we first need to characterize them. To contribute towards this characterization, we conducted a public survey and in this work we present the results about a particular aspects of their life: the tools they use and need. © 2022 IEEE Computer Society. All rights reserved.

2022

Scalable transcriptomics analysis with Dask: applications in data science and machine learning

Autores
Moreno, M; Vilaca, R; Ferreira, PG;

Publicação
BMC BIOINFORMATICS

Abstract
Background: Gene expression studies are an important tool in biological and biomedical research. The signal carried in expression profiles helps derive signatures for the prediction, diagnosis and prognosis of different diseases. Data science and specifically machine learning have many applications in gene expression analysis. However, as the dimensionality of genomics datasets grows, scalable solutions become necessary. Methods: In this paper we review the main steps and bottlenecks in machine learning pipelines, as well as the main concepts behind scalable data science including those of concurrent and parallel programming. We discuss the benefits of the Dask framework and how it can be integrated with the Python scientific environment to perform data analysis in computational biology and bioinformatics. Results: This review illustrates the role of Dask for boosting data science applications in different case studies. Detailed documentation and code on these procedures is made available at https:// github. com/martaccmoreno/gexp-ml-dask. Conclusion: By showing when and how Dask can be used in transcriptomics analysis, this review will serve as an entry point to help genomic data scientists develop more scalable data analysis procedures.

  • 27
  • 247