Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Tiago Paulo

2021

Pods-as-Volumes: Effortlessly Integrating Storage Systems and Middleware into Kubernetes

Authors
Faria, A; Macedo, R; Paulo, J;

Publication
WOC '21: Proceedings of the Seventh International Workshop on Container Technologies and Container Clouds, Virtual Event, Canada, 6 December 2021

Abstract

2022

Accelerating Deep Learning Training Through Transparent Storage Tiering

Authors
Dantas, M; Leitao, D; Cui, P; Macedo, R; Liu, XL; Xu, WJ; Paulo, J;

Publication
2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022)

Abstract
We present MONARCH, a framework-agnostic storage middleware that transparently employs storage tiering to accelerate Deep Learning (DL) training. It leverages existing storage tiers of modern supercomputers (i.e., compute node's local storage and shared parallel file system (PFS)), while considering the I/O patterns of DL frameworks to improve data placement across tiers. MONARCH aims at accelerating DL training and decreasing the I/O pressure imposed over the PFS. We apply MONARCH to TensorFlow and PyTorch, while validating its performance and applicability under different models and dataset sizes. Results show that, even when the training dataset can only be partially stored at local storage, MONARCH reduces TensorFlow's and PyTorch's training time by up to 28% and 37% for I/O-intensive models, respectively. Furthermore, MONARCH decreases the number of I/O operations submitted to the PFS by up to 56%.

2022

PAIO: General, Portable I/O Optimizations With Minor Application Modifications

Authors
Macedo, R; Tanimura, Y; Haga, J; Chidarnbaram, V; Pereira, J; Paulo, J;

Publication
PROCEEDINGS OF THE 20TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 2022

Abstract
We present PAID, a framework that allows developers to implement portable I/O policies and optimizations for different applications with minor modifications to their original code base. The chief insight behind PALO is that if we are able to intercept and differentiate requests as they flow through different layers of the I/O stack, we can enforce complex storage policies without significantly changing the layers themselves. PAIO adopts ideas from the Software-Defined Storage community, building data plane stages that mediate and optimize I/O requests across layers and a control plane that coordinates and fine-tunes stages according to different storage policies. We demonstrate the performance and applicability of PALO with two use cases. The first improves 99th percentile latency by 4 x in industry-standard LSM-based key-value stores. The second ensures dynamic per-application bandwidth guarantees under shared storage environments.

2022

Protecting Metadata Servers From Harm Through Application-level I/O Control

Authors
Macedo, R; Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Evans, RT; Paulo, J;

Publication
2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022)

Abstract
Modern large-scale I/O applications that run on HPC infrastructures are increasingly becoming metadata-intensive. Unfortunately, having multiple concurrent applications submitting massive amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to unresponsiveness of the storage backend and overall performance degradation. To address these challenges, we present PADLL, a storage middleware that enables system administrators to proactively control and ensure QoS over metadata workflows in HPC storage systems. We demonstrate its performance and feasibility by controlling the rate of both synthetic and realistic I/O workloads. Results show that PADLL can dynamically control metadata-aggressive workloads, prevent I/O burstiness, and ensure I/O fairness and prioritization.

2022

Protecting Metadata Servers From Harm Through Application-level I/O Control

Authors
MacEdo, R; Miranda, M; Tanimura, Y; Haga, J; Ruhela, A; Harrell, SL; Evans, RT; Paulo, J;

Publication
Proceedings - IEEE International Conference on Cluster Computing, ICCC

Abstract
Modern large-scale I/O applications that run on HPC infrastructures are increasingly becoming metadata-intensive. Unfortunately, having multiple concurrent applications submitting massive amounts of metadata operations can easily saturate the shared parallel file system's metadata resources, leading to unresponsiveness of the storage backend and overall performance degradation. To address these challenges, we present Padll, a storage middleware that enables system administrators to proactively control and ensure QoS over metadata workflows in HPC storage systems. We demonstrate its performance and feasibility by controlling the rate of both synthetic and realistic I/O workloads. Results show that Padll can dynamically control metadata-aggressive workloads, prevent I/O burstiness, and ensure I/O fairness and prioritization. © 2022 IEEE.

2023

Distributed Applications and Interoperable Systems - 23rd IFIP WG 6.1 International Conference, DAIS 2023, Held as Part of the 18th International Federated Conference on Distributed Computing Techniques, DisCoTec 2023, Lisbon, Portugal, June 19-23, 2023, Proceedings

Authors
Martínez, MP; Paulo, J;

Publication
DAIS

Abstract

  • 5
  • 8