Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Carlos Baquero

2009

Search Optimizations in Structured Peer-to-peer Systems

Authors
Lopes, N; Baquero, C;

Publication
2009 18TH IEEE INTERNATIONAL WORKSHOP ON ENABLING TECHNOLOGIES: INFRASTRUCTURES FOR COLLABORATIVE ENTERPRISES

Abstract
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow non-uniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this issue. We propose a distributed data structure based on a decentralized balanced tree to balance storage data and network load more uniformly across hosts. The results show that the data structure is capable of balancing resources, in particular when performing multiple keyword searches.

2009

Probabilistic Estimation of Network Size and Diameter

Authors
Cardoso, JCS; Baquero, C; Almeida, PS;

Publication
LADC: 2009 4TH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING

Abstract
Determining the size of a network and its diameter are important functions in distributed systems, as there are a number of algorithms which rely on such parameters, or at least on estimates of those values. The Extrema Propagation technique allows the estimation of the size of a network in a fast, distributed and fault tolerant manner. The technique was previously studied in a simulation setting where rounds advance synchronously and where there is no message loss. This work presents two main contributions. The first, is the study of the Extrema Propagation technique under asynchronous rounds and integrated in the Network Friendly Epidemic Multicast (NeEM) framework. The second, is the evaluation of a diameter estimation technique associated with the Extrema Propagation. This study also presents a small enhancement to the Extrema Propagation in terms of communication cost and points out some other possible enhancements. Results show that there is a clear trade-off between time and communication that must be considered when configuring the protocol-a faster convergence time implies a higher communication cost Results also show that its possible to reduce the total communication cost by more than 18% using a simple approach. The diameter estimation technique is shown to have a relative error of less than 10% even when using a small sample of nodes.

2009

Fast Estimation of Aggregates in Unstructured Networks

Authors
Baquero, C; Almeida, PS; Menezes, R;

Publication
ICAS: 2009 FIFTH INTERNATIONAL CONFERENCE ON AUTONOMIC AND AUTONOMOUS SYSTEMS

Abstract
Aggregation of data values plays an important role on distributed computations, in particular over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network Size to dimension distributed hash tables or estimates of the average system load to direct load-balancing. Distributed aggregation using non-idempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multi-path routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps equals the diameter; and it is fully, distributed, with no single point of failure and the result produced at every node.

2010

Fault-Tolerant Aggregation for Dynamic Networks

Authors
Jesus, P; Baquero, C; Almeida, PS;

Publication
2010 29TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS SRDS 2010

Abstract
Data aggregation is a fundamental building block of modern distributed systems. Averaging based approaches, commonly designated gossip-based, are an important class of aggregation algorithms as they allow all nodes to produce a result, converge to any required accuracy, and work independently from the network topology. However, existing approaches exhibit many dependability issues when used in faulty and dynamic environments. This paper extends our own technique, Flow Updating, which is immune to message loss, to operate in dynamic networks, improving its fault tolerance characteristics. Experimental results show that the novel version of Flow Updating vastly outperforms previous averaging algorithms; it self adapts to churn without requiring any periodic restart, supporting node crashes and high levels of message loss.

2011

A Markov random walk under constraint for discovering overlapping communities in complex networks

Authors
Jin, D; Yang, B; Baquero, C; Liu, DY; He, DX; Liu, J;

Publication
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT

Abstract
The detection of overlapping communities in complex networks has motivated recent research in relevant fields. Aiming to address this problem, we propose a Markov-dynamics-based algorithm, called UEOC, which means 'unfold and extract overlapping communities'. In UEOC, when identifying each natural community that overlaps, a Markov random walk method combined with a constraint strategy, which is based on the corresponding annealed network (degree conserving random network), is performed to unfold the community. Then, a cutoff criterion with the aid of a local community function, called conductance, which can be thought of as the ratio between the number of edges inside the community and those leaving it, is presented to extract this emerged community from the entire network. The UEOC algorithm depends on only one parameter whose value can be easily set, and it requires no prior knowledge of the hidden community structures. The proposed UEOC has been evaluated both on synthetic benchmarks and on some real-world networks, and has been compared with a set of competing algorithms. The experimental result has shown that UEOC is highly effective and efficient for discovering overlapping communities.

2012

Extrema Propagation: Fast Distributed Estimation of Sums and Network Sizes

Authors
Baquero, C; Almeida, PS; Menezes, R; Jesus, P;

Publication
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

Abstract
Aggregation of data values plays an important role on distributed computations, in particular, over peer-to-peer and sensor networks, as it can provide a summary of some global system property and direct the actions of self-adaptive distributed algorithms. Examples include using estimates of the network size to dimension distributed hash tables or estimates of the average system load to direct load balancing. Distributed aggregation using nonidempotent functions, like sums, is not trivial as it is not easy to prevent a given value from being accounted for multiple times; this is especially the case if no centralized algorithms or global identifiers can be used. This paper introduces Extrema Propagation, a probabilistic technique for distributed estimation of the sum of positive real numbers. The technique relies on the exchange of duplicate insensitive messages and can be applied in flood and/or epidemic settings, where multipath routing occurs; it is tolerant of message loss; it is fast, as the number of message exchange steps can be made just slightly above the theoretical minimum; and it is fully distributed, with no single point of failure and the result produced at every node.

  • 11
  • 19