Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Nuno Almeida Machado

2018

Falcon: A Practical Log-based Analysis Tool for Distributed Systems

Authors
Neves, F; Machado, N; Pereira, J;

Publication
2018 48TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN)

Abstract
Programmers and support engineers typically rely on log data to narrow down the root cause of unexpected behaviors in dependable distributed systems. Unfortunately, the inherently distributed nature and complexity of such distributed executions often leads to multiple independent logs, scattered across different physical machines, with thousands or millions entries poorly correlated in terms of event causality. This renders log-based debugging a tedious, time-consuming, and potentially inconclusive task. We present Falcon, a tool aimed at making log-based analysis of distributed systems practical and effective. Falcon's modular architecture, designed as an extensible pipeline, allows it to seamlessly combine several distinct logging sources and generate a coherent space-time diagram of distributed executions. To preserve event causality, even in the presence of logs collected from independent unsynchronized machines, Falcon introduces a novel happens-before symbolic formulation and relies on an off-the-shelf constraint solver to obtain a coherent event schedule. Our case study with the popular distributed coordination service Apache Zookeeper shows that Falcon eases the log-based analysis of complex distributed protocols and is helpful in bridging the gap between protocol design and implementation.

2018

Totally Ordered Replication for Massive Scale Key-Value Stores

Authors
Ribeiro, J; Machado, N; Maia, F; Matos, M;

Publication
Distributed Applications and Interoperable Systems - 18th IFIP WG 6.1 International Conference, DAIS 2018, Held as Part of the 13th International Federated Conference on Distributed Computing Techniques, DisCoTec 2018, Madrid, Spain, June 18-21, 2018, Proceedings

Abstract
Scalability is one of the most relevant features of today’s data management systems. In order to achieve high scalability and availability, recent distributed key-value stores refrain from costly replica coordination when processing requests. However, these systems typically do not perform well under churn. In this paper, we propose DataFlagons, a large-scale key-value store that integrates epidemic dissemination with a probabilistic total order broadcast algorithm. By ensuring that all replicas process requests in the same order, DataFlagons provides probabilistic strong data consistency while achieving high scalability and robustness under churn. © 2018, IFIP International Federation for Information Processing.

2019

Minha: Large-Scale Distributed Systems Testing Made Practical

Authors
Machado, N; Maia, F; Neves, F; Coelho, F; Pereira, J;

Publication
23rd International Conference on Principles of Distributed Systems, OPODIS 2019, December 17-19, 2019, Neuchâtel, Switzerland.

Abstract
Testing large-scale distributed system software is still far from practical as the sheer scale needed and the inherent non-determinism make it very expensive to deploy and use realistically large environments, even with cloud computing and state-of-the-art automation. Moreover, observing global states without disturbing the system under test is itself difficult. This is particularly troubling as the gap between distributed algorithms and their implementations can easily introduce subtle bugs that are disclosed only with suitably large scale tests. We address this challenge with Minha, a framework that virtualizes multiple JVM instances in a single JVM, thus simulating a distributed environment where each host runs on a separate machine, accessing dedicated network and CPU resources. The key contributions are the ability to run off-the-shelf concurrent and distributed JVM bytecode programs while at the same time scaling up to thousands of virtual nodes; and enabling global observation within standard software testing frameworks. Our experiments with two distributed systems show the usefulness of Minha in disclosing errors, evaluating global properties, and in scaling tests orders of magnitude with the same hardware resources. © Nuno Machado, Francisco Maia, Francisco Neves, Fábio Coelho, and José Pereira; licensed under Creative Commons License CC-BY 23rd International Conference on Principles of Distributed Systems (OPODIS 2019).

2018

MODELLING BASED TEACHING WITH SPREADSHEET. A STUDY IN A HEALTH CARE COURSE

Authors
Machado, N; Baptista, M;

Publication
12TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE (INTED)

Abstract
Computer-based modelling tools allow students to express their theories in models that can be simulated. In this way, students can use their theories operationally, confronting themselves with the consequences of their ideas. The ability of students to form and express a mental model will be expanded if they are given an opportunity to become aware of their own mental model by expressing this same model and comparing it to other models, like a consensus model. The building of numerical models of biophysical phenomena, such as the mechanics of breathing, or blood circulation, has the potential for student motivation as well as long-term learning. Our theory is that by re-building well known numerical models of physiological phenomena students will have the opportunity to change their perceptions about the relevance of the contents addressed, simultaneously improving their learning in the topics covered and increasing their motivation in the basic science disciplines in their curricula. For the implementation of computer numerical models historically it was necessary to use some programming language, such as MATLAB, BASIC, C++, JAVA. With the development of computer science, it is now possible to these students "construct" models of physical phenomena expressed through dedicated computer tools without necessarily having to do so in a programming language. As for example we have STELLA, MODELLUS, or STARLOGO. There is also the possibility of using a spreadsheet such as Microsoft EXCEL, Open Office CALC, Google SHEETS, or others, as tools that allow students to express physical models. The current spreadsheets, even those available for free, are very powerful, having many integrated tools, in terms of calculation, and we can count on several other features, such as graphs of various types, buttons and other tools that allow interaction with the model, and databases that can be integrated into the spreadsheet. There are several advantages of using spreadsheets in science education due to its general access, through smartphones, tablet's and computers, ease of implementation for the basic operations, and ease of the "debug" process, relative to other types of software. Also, it does not require prior knowledge of programming languages, or about complex mathematical software, which would an obstacle to the learning in itself. There is also the positive side effect of learning how to use a spreadsheet that is a plus in itself for the future professional's. This paper will have a review of the state of the art of using spreadsheets in Modelling Based Learning. Also, it will be presented a study with first year undergraduate students of a health care course, using Biophysical models historically very important in the physiology and medicine development.

2019

Concurrency Debugging with MaxSMT

Authors
Terra Neves, M; Machado, N; Lynce, I; Manquinho, V;

Publication
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE

Abstract
Current Maximum Satisfiability (MaxSAT) algorithms based on successive calls to a powerful Satisfiability (SAT) solver are now able to solve real-world instances in many application domains. Moreover, replacing the SAT solver with a Satisfiability Modulo Theories (SMT) solver enables effective MaxSMT algorithms. However, MaxSMT has seldom been used in debugging multi-threaded software. Multi-threaded programs are usually non-deterministic due to the huge number of possible thread operation schedules, which makes them much harder to debug than sequential programs. A recent approach to isolate the root cause of concurrency bugs in multi-threaded software is to produce a report that shows the differences between a failing and a non-failing execution. However, since they rely solely on heuristics, these reports can be unnecessarily large. Hence, reports may contain operations that are not relevant to the bug's occurrence. This paper proposes the use of MaxSMT for the generation of minimal reports for multi-threaded software with concurrency bugs. The proposed techniques report situations that the existing techniques are not able to identify. Experimental results show that using MaxSMT can significantly improve the accuracy of the generated reports and, consequently, their usefulness in debugging the root cause of concurrency bugs.

2021

Horus: Non-Intrusive Causal Analysis of Distributed Systems Logs

Authors
Neves, F; Machado, N; Vilaca, R; Pereira, J;

Publication
51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021)

Abstract
Logs are still the primary resource for debugging distributed systems executions. Complexity and heterogeneity of modern distributed systems, however, make log analysis extremely challenging. First, due to the sheer amount of messages, in which the execution paths of distinct system components appear interleaved. Second, due to unsynchronized physical clocks, simply ordering the log messages by timestamp does not suffice to obtain a causal trace of the execution. To address these issues, we present Horus, a system that enables the refinement of distributed system logs in a causally-consistent and scalable fashion. Horus leverages kernel-level probing to capture events for tracking causality between application-level logs from multiple sources. The events are then encoded as a directed acyclic graph and stored in a graph database, thus allowing the use of rich query languages to reason about runtime behavior. Our case study with TrainTicket, a ticket booking application with 40+ microservices, shows that Horus surpasses current widely-adopted log analysis systems in pinpointing the root cause of anomalies in distributed executions. Also, we show that Horus builds a causally-consistent log of a distributed execution with much higher performance (up to 3 orders of magnitude) and scalability than prior state-of-the-art solutions. Finally, we show that Horus' approach to query causality is up to 30 times faster than graph database built-in traversal algorithms.

  • 2
  • 3