Publicacoes - INESC TEC

Publicações

Publicações por José Orlando Pereira

2019

Minha: Large-Scale Distributed Systems Testing Made Practical

Autores
Machado, N; Maia, F; Neves, F; Coelho, F; Pereira, J;

Publicação
23rd International Conference on Principles of Distributed Systems, OPODIS 2019, December 17-19, 2019, Neuchâtel, Switzerland.

Abstract
Testing large-scale distributed system software is still far from practical as the sheer scale needed and the inherent non-determinism make it very expensive to deploy and use realistically large environments, even with cloud computing and state-of-the-art automation. Moreover, observing global states without disturbing the system under test is itself difficult. This is particularly troubling as the gap between distributed algorithms and their implementations can easily introduce subtle bugs that are disclosed only with suitably large scale tests. We address this challenge with Minha, a framework that virtualizes multiple JVM instances in a single JVM, thus simulating a distributed environment where each host runs on a separate machine, accessing dedicated network and CPU resources. The key contributions are the ability to run off-the-shelf concurrent and distributed JVM bytecode programs while at the same time scaling up to thousands of virtual nodes; and enabling global observation within standard software testing frameworks. Our experiments with two distributed systems show the usefulness of Minha in disclosing errors, evaluating global properties, and in scaling tests orders of magnitude with the same hardware resources. © Nuno Machado, Francisco Maia, Francisco Neves, Fábio Coelho, and José Pereira; licensed under Creative Commons License CC-BY 23rd International Conference on Principles of Distributed Systems (OPODIS 2019).

FecharLer Abstract

2020

Black-box inter-application traffic monitoring for adaptive container placement

Autores
Neves, F; Vilaca, R; Pereira, J;

Publicação
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20)

Abstract
A key issue in the performance of modern containerized distributed systems, such as big data storage and processing stacks or micro service based applications, is the placement of each container, or container pod, in virtual and physical servers. Although it has been shown that inter-application traffic is an important factor in placement decisions, as it directly indicates how components interact, it has not been possible to accurately monitor it in an application independent way, thus putting it out of reach of cloud platforms. In this paper we present an efficient black-box monitoring approach for detecting and building a weighted communication graph of collaborating processes in a distributed system that can be queried for various purposes, including adaptive placement. The key to achieving high detail and low overhead without custom application instrumentation is to use a kernel-aided event driven strategy. We evaluate a prototype implementation with micro-benchmarks and demonstrate its usefulness for container placement in a distributed data storage and processing stack (i.e., Cassandra and Spark).

FecharLer Abstract

2020

Experiences on Teaching Alloy with an Automated Assessment Platform

Autores
Macedo, N; Cunha, A; Pereira, J; Carvalho, R; Silva, R; Paiva, ACR; Ramalho, MS; Silva, DC;

Publicação
Rigorous State-Based Methods - 7th International Conference, ABZ 2020, Ulm, Germany, May 27-29, 2020, Proceedings

Abstract
This paper presents Alloy4Fun, a web application that enables online editing and sharing of Alloy models and instances (including dynamic ones developed with the Electrum extension), to be used mainly in an educational context. By introducing secret paragraphs and commands in the models, Alloy4Fun allows the distribution and automated assessment of simple specification challenges, a mechanism that enables students to learn the language at their own pace. Alloy4Fun stores all versions of shared and analyzed models, as well as derivation trees that depict how they evolved over time: this wealth of information can be mined by researchers or tutors to identify, for example, learning breakdowns in the class or typical mistakes made by Alloy users. Alloy4Fun has been used in formal methods graduate courses for two years and for the latest edition we present results regarding its adoption by the students, as well as preliminary insights regarding the most common bottlenecks when learning Alloy (and Electrum). © Springer Nature Switzerland AG 2020.

FecharLer Abstract

2020

A Survey and Classification of Software-Defined Storage Systems

Autores
Macedo, R; Paulo, J; Pereira, J; Bessani, A;

Publicação
ACM COMPUTING SURVEYS

Abstract
The exponential growth of digital information is imposing increasing scale and efficiency demands on modern storage infrastructures. As infrastructure complexity increases, so does the difficulty in ensuring quality of service, maintainability, and resource fairness, raising unprecedented performance, scalability, and programmability challenges. Software-Defined Storage (SDS) addresses these challenges by cleanly disentangling control and data flows, easing management, and improving control functionality of conventional storage systems. Despite its momentum in the research community, many aspects of the paradigm are still unclear, undefined, and unexplored, leading to misunderstandings that hamper the research and development of novel SDS technologies. In this article, we present an in-depth study of SDS systems, providing a thorough description and categorization of each plane of functionality. Further, we propose a taxonomy and classification of existing SDS solutions according to different criteria. Finally, we provide key insights about the paradigm and discuss potential future research directions for the field.

FecharLer Abstract

2020

Self-tunable DBMS Replication with Reinforcement Learning

Autores
Ferreira, L; Coelho, F; Pereira, J;

Publicação
Distributed Applications and Interoperable Systems - 20th IFIP WG 6.1 International Conference, DAIS 2020, Held as Part of the 15th International Federated Conference on Distributed Computing Techniques, DisCoTec 2020, Valletta, Malta, June 15-19, 2020, Proceedings

Abstract
Fault-tolerance is a core feature in distributed database systems, particularly the ones deployed in cloud environments. The dependability of these systems often relies in middleware components that abstract the DBMS logic from the replication itself. The highly configurable nature of these systems makes their throughput very dependent on the correct tuning for a given workload. Given the high complexity involved, machine learning techniques are often considered to guide the tuning process and decompose the relations established between tuning variables. This paper presents a machine learning mechanism based on reinforcement learning that attaches to a hybrid replication middleware connected to a DBMS to dynamically live-tune the configuration of the middleware according to the workload being processed. Along with the vision for the system, we present a study conducted over a prototype of the self-tuned replication middleware, showcasing the achieved performance improvements and showing that we were able to achieve an improvement of 370.99% on some of the considered metrics. © IFIP International Federation for Information Processing 2020.

FecharLer Abstract

2020

A Comparison of Message Exchange Patterns in BFT Protocols - (Experience Report)

Autores
Silva, F; Alonso, AN; Pereira, J; Oliveira, R;

Abstract
The performance and scalability of byzantine fault-tolerant (BFT) protocols for state machine replication (SMR) have recently come under scrutiny due to their application in the consensus mechanism of blockchain implementations. This led to a proliferation of proposals that provide different trade-offs that are not easily compared as, even if these are all based on message passing, multiple design and implementation factors besides the message exchange pattern differ between each of them. In this paper we focus on the impact of different combinations of cryptographic primitives and the message exchange pattern used to collect and disseminate votes, a key aspect for performance and scalability. By measuring this aspect in isolation and in a common framework, we characterise the design space and point out research directions for adaptive protocols that provide the best trade-off for each environment and workload combination. © IFIP International Federation for Information Processing 2020.

FecharLer Abstract