Publicacoes - INESC TEC

Publicações

Publicações por Carlos Baquero

2007

Implementing range queries with a decentralized balanced tree over distributed hash tables

Autores
Lopes, N; Baquero, C;

Publicação
NETWORK-BASED INFORMATION SYSTEMS, PROCEEDINGS

Abstract
Range queries, retrieving all keys within a given range, is an important add-on for Distributed Hash Tables (DHTs), as they rely only on exact key matching lookup. In this paper we support range queries through a balanced tree algorithm, Decentralized Balanced Tree, that runs over any DHT system. Our algorithm is based on the B(+)-tree design that efficiently stores clustered data while maintaining a balanced load on hosts. The internal structure of the balanced tree is suited for range queries operations over many data distributions since it easily handles clustered data without losing performance. We analyzed, and evaluated our algorithm under a simulated environment, to show it's operation scalability for both insertions and queries. We will show that the system design. imposes a fixed penalty over the DHT access cost, and thus inherits the scalability properties of the chosen underlying DHT.

FecharLer Abstract

2007

Scalable Bloom Filters

Autores
Almeida, PS; Baquero, C; Preguica, N; Hutchison, D;

Publicação
INFORMATION PROCESSING LETTERS

Abstract
Bloom filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.

FecharLer Abstract

2008

Interval Tree Clocks A Logical Clock for Dynamic Systems

Autores
Almeida, PS; Baquero, C; Fonte, V;

Publicação
PRINCIPLES OF DISTRIBUTED SYSTEMS, 12TH INTERNATIONAL CONFERENCE, OPODIS 2008

Abstract
Causality tracking mechanisms, such as vector clocks and version vectors, rely on mappings from globally unique identifiers to integer counters. In a system with a well known set of entities these ids can be preconfigured and given distinct positions in a vector or distinct names in a mapping. Id management is more problematic in dynamic systems, with large and highly variable number of entities, being worsened when network partitions occur. Present solutions for causality tracking are not appropriate to these increasingly common scenarios. In this paper we introduce Interval Tree Clocks, a novel causality tracking mechanism that can be used in scenarios with a dynamic number of entities, allowing a completely decentralized creation of processes/replicas without need for global identifiers or global coordination. The mechanism has a variable size representation that adapts automatically to the number of existing entities, growing or shrinking appropriately. The representation is so compact that the mechanism can even be considered for scenarios with a fixed number of entities, which makes it a general substitute for vector clocks and version vectors.

FecharLer Abstract

2009

Forby: Providing Groupware Features Relying on Distributed File System Event Dissemination

Autores
Sousa, P; Preguica, N; Baquero, C;

Publicação
GROUPWARE-DESIGN: IMPLEMENTATION, AND USE, PROCEEDINGS

Abstract
Intensive research and development has been conducted in the design and creation of groupware systems for distributed users. While for some activities, these groupware tools are widely used, for other activities the impact in the groupware community has been smaller and can be improved. One reason for this fact is that the mostly common used applications do not support collaborative features and users are reluctant to change to a different application. In this paper we discuss how available file system mechanisms can help to address this problem. In this context, we present Forby, a system that allows to provide groupware features to distributed users by combining filesystem monitoring and distributed event dissemination. To demonstrate our solution, we present three systems that rely on Forby for providing groupware features to users running unmodified applications.

FecharLer Abstract

2009

Fault-Tolerant Aggregation by Flow Updating

Autores
Jesus, P; Baquero, C; Almeida, PS;

Publicação
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, PROCESSINGS

Abstract
Data aggregation plays an important role in the design of scalable systems, allowing the determination of meaningful system-wide properties to direct the execution of distributed applications. In the particular case of wireless sensor networks, data collection is often only practicable if aggregation is performed. Several aggregation algorithms have been proposed in the last few years, exhibiting different properties in terms of accuracy, speed and communication tradeoffs. Nonetheless, existing approaches are found lacking in terms of fault tolerance. In this paper, we introduce a novel fault-tolerant averaging based data aggregation algorithm. It tolerates substantial message loss (link failures), while competing algorithms in the same class can be affected by a Single lost message. The algorithm is based on manipulating flows (in the graph theoretical sense), that are updated using idempotent messages, providing it with unique robustness capabilities. Furthermore, evaluation results obtained by comparing it with other averaging approaches have revealed that it outperforms them in terms of time and message complexity.

FecharLer Abstract

2011

Fault-Tolerant Aggregation: Flow-Updating Meets Mass-Distribution

Autores
Almeida, PS; Baquero, C; Farach Colton, M; Jesus, P; Mosteiro, MA;

Publicação
PRINCIPLES OF DISTRIBUTED SYSTEMS

Abstract
Flow-Updating (FU) is a fault-tolerant technique that has proved to be efficient in practice for the distributed computation of aggregate functions in communication networks where individual processors do not have access to global information. Previous distributed aggregation protocols, based on repeated sharing of input values (or mass) among processors, sometimes called Mass-Distribution (MD) protocols, are not resilient to communication failures (or message loss) because such failures yield a loss of mass. In this paper, we present a protocol which we call Mass-Distribution with Flow-Updating (MDFU). We obtain MDFU by applying FU techniques to classic MD. We analyze the convergence time of MDFU showing that stochastic message loss produces low overhead. This is the first convergence proof of an FU-based algorithm. We evaluate MDFU experimentally, comparing it with previous MD and FU protocols, and verifying the behavior predicted by the analysis. Finally, given that MDFU incurs a fixed deviation proportional to the message-loss rate, we adjust the accuracy of MDFU heuristically in a new protocol called MDFU with Linear Prediction (MDFU-LP). The evaluation shows that both MDFU and MDFU-LP behave very well in practice, even under high rates of message loss and even changing the input values dynamically.

FecharLer Abstract