Publicacoes - INESC TEC

Publicações

Publicações por José Orlando Pereira

2011

Experimental evaluation of distributed middleware with a virtualized Java environment

Autores
Carvalho, NA; Bordalo, J; Campos, F; Pereira, J;

Publicação
Proceedings of the 6th Workshop on Middleware for Service Oriented Computing, MW4SOC 2011 - Co-located with the ACM/IFIP/USENIX 12th International Middleware Conference, Middleware 2011

Abstract
The correctness and performance of large scale service oriented systems depend on distributed middleware components performing various communication and coordination functions. It is, however, very difficult to experimentally assess such middleware components, as interesting behavior often arises exclusively in large scale settings, but such deployments are costly and time consuming. We address this challenge with Minha, a system that virtualizes multiple JVM instances within a single JVM while simulating key environment components, thus reproducing the concurrency, distribution, and performance characteristics of the actual system. The usefulness of Minha is demonstrated by applying it to the WS4D Java stack, a popular implementation of the Devices Profile for Web Services (DPWS) specification. © 2011 ACM.

FecharLer Abstract

2007

HyParView: a membership protocol for reliable gossip-based broadcast

Autores
Leitao, J; Pereira, J; Rodrigues, L;

Publicação
37TH ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS

Abstract
Gossip, or epidemic, protocols have emerged as a powerful strategy to implement highly scalable and resilient reliable broadcast primitives. Due to scalability reasons, each participant in a gossip protocol maintains a partial view of the system. The reliability of the gossip protocol depends upon some critical properties of these views, such as degree distribution and clustering coefficient. Several algorithms have been proposed to maintain partial views for gossip protocols. In this paper we show that tinder a high number of faults, these algorithms take a long time to restore the desirable view properties. To address this problem, we present HyParView, a new membership protocol to support gossip-based broadcast that ensures high levels of reliability even in the presence of high rates of node failure. The HyFarView protocol is based on a novel approach that relies in the use of two distinct partial views, which are maintained with different goals by different strategies.

FecharLer Abstract

2003

Adaptive gossip-based broadcast

Autores
Rodrigues, L; Pereira, J; Handurukande, S; Guerraoui, R; Kermarrec, AM;

Publicação
2003 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS

Abstract
This paper presents a novel adaptation mechanism that allows every node of a gossip-based broadcast algorithm to adjust the rate of message emission 1) to the amount of resources available to the nodes within the same broadcast group and 2) to the global level of congestion in the system. The adaptation mechanism can be applied to all gossip-based broadcast algorithms we know of and makes their use more realistic in practical situations where nodes have limited resources whose quantity changes dynamically with time without decreasing the reliability.

FecharLer Abstract

2011

Improving the Scalability of Cloud-Based Resilient Database Servers

Autores
Soares, L; Pereira, J;

Publicação
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS

Abstract
Many rely now on public cloud infrastructure-as-a-service for database servers, mainly, by pushing the limits of existing pooling and replication software to operate large shared-nothing virtual server clusters. Yet, it is unclear whether this is still the best architectural choice, namely, when cloud infrastructure provides seamless virtual shared storage and bills clients on actual disk usage. This paper addresses this challenge with Resilient Asynchronous Commit (RAsC), an improvement to a well-known shared-nothing design based on the assumption that a much larger number of servers is required for scale than for resilience. Then we compare this proposal to other database server architectures using an analytical model focused on peak throughput and conclude that it provides the best performance/cost trade-off while at the same time addressing a wide range of fault scenarios.

FecharLer Abstract

2010

Measuring Software Systems Scalability for Proactive Data Center Management

Autores
Carvalho, NA; Pereira, J;

Publicação
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2010, PT II

Abstract
The current trend of increasingly larger Web-based applications makes scalability the key challenge when developing, deploying, and maintaining data centers. At the same time, the migration to the cloud computing paradigm means that each data center hosts an increasingly complex mix of applications, from multiple owners and in constant evolution. Unfortunately, managing such data centers in a cost-effective manner requires that the scalability properties of the hosted workloads to be accurately known, namely, to proactively provision adequate resources and to plan the most economical placement of applications. Obviously, stopping each of them and running a custom benchmark to asses its scalability properties is not an option. In this paper we address this challenge with a tool to measure the software scalability regarding CPU availability, to predict system behavior in face of varying resources and an increasing workload. This tool does not depend on a particular application and relies only on Linux's SystemTap probing infrastructure. We validate the approach first using simulation and then in an actual system. The resulting better prediction of scalability properties should allow improved (self-)management practices.

FecharLer Abstract

2009

Evaluating Throughput Stability of Protocols for Distributed Middleware

Autores
Carvalho, NA; Oliveira, JP; Pereira, J;

Publicação
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2009, PT 1

Abstract
Communication of large data volumes is a core functionality of distributed systems middleware, namely, for interconnecting components, for distributed computation and for fault tolerance. This common functionality is however achieved in different middleware platforms with various combinations of operating system and application level protocols, both standardized and ad hoc, and including implementations on managed runtime environments such as Java. In this paper, in contrast with most previous work that focus on performance, we point out that architectural and implementation decisions have an impact in throughput stability when the system is heavily loaded, precisely when such stability is most important. In detail, we present an experimental evaluation of several communication protocol components under stress conditions and conclude on the relative merits of several architectural options.

FecharLer Abstract