Publicacoes - INESC TEC

Publicações

Publicações por Ricardo Pereira Vilaça

2010

On the Expressiveness and Trade-Offs of Large Scale Tuple Stores

Autores
Vilaca, R; Cruz, F; Oliveira, R;

Publicação
ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS: OTM 2010, PT II

Abstract
Massive-scale distributed computing is a challenge at our doorstep. The current exponential growth of data calls for massive-scale capabilities of storage and processing. This is being acknowledged by several major Internet players embracing the cloud computing model and offering first generation distributed tuple stores. Having all started from similar requirements, these systems ended up providing a similar service: A simple tuple store interface, that allows applications to insert, query, and remove individual elements. Furthermore, while availability is commonly assumed to be sustained by the massive scale itself, data consistency and freshness is usually severely hindered. By doing so, these services focus on a specific narrow trade-off between consistency, availability, performance, scale, and migration cost, that is much less attractive to common business needs. In this paper we introduce Data Droplets, a novel tuple store that shifts the current trade-off towards the needs of common business users, providing additional consistency guarantees and higher level data processing primitives smoothing the migration path for existing applications. We present a detailed comparison between Data Droplets and existing systems regarding their data model, architecture and trade-offs. Preliminary results of the system's performance under a realistic workload are also presented.

FecharLer Abstract

2011

A Correlation-Aware Data Placement Strategy for Key-Value Stores

Autores
Vilaca, R; Oliveira, R; Pereira, J;

Publicação
DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS

Abstract
Key-value stores hold the unprecedented bulk of the data produced by applications such as social networks. Their scalability and availability requirements often outweigh sacrificing richer data and processing models, and even elementary data consistency. Moreover, existing key-value stores have only random or order based placement strategies. In this paper we exploit arbitrary data relations easily expressed by the application to foster data locality and improve the performance of complex queries common in social network read-intensive workloads. We present a novel data placement strategy, supporting dynamic tags, based on multidimensional locality-preserving mappings. We compare our data placement strategy with the ones used in existing key-value stores under the workload of a typical social network application and show that the proposed correlation-aware data placement strategy offers a major improvement on the system's overall response time and network requirements.

FecharLer Abstract

2007

GORDA: An open architecture for database replication

Autores
Correia, A; Pereira, J; Rodrigues, L; Carvalho, N; Vilaca, R; Oliveira, R; Guedes, S;

Publicação
Sixth IEEE International Symposium on Network Computing and Applications, Proceedings

Abstract

2012

Automatic elasticity in OpenStack

Autores
Beernaert, L; Matos, M; Vilaca, R; Oliveira, R;

Publicação
Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management, SDMCMM 2012

Abstract
Cloud computing infrastructures are the most recent approach to the development and conception of computational systems. Cloud infrastructures are complex environments with various subsystems, each one with their own challenges. Cloud systems should be able to provide the following fundamental property: elasticity. Elasticity is the ability to automatically add and remove instances according to the needs of the system. This is a requirement for pay-per-use billing models. Various open source software solutions allow companies and institutions to build their own Cloud infrastructure. However, in most of these, the elasticity feature is quite immature. Monitoring and timely adapting the active resources of a Cloud computing infrastructure is key to provide the elasticity required by diverse, multi-tenant and pay-per-use business models. In this paper, we propose Elastack, an automated monitoring and adaptive system, generic enough to be applied to existing IaaS frameworks, and intended to enable the elasticity they currently lack. Our approach offers any Cloud infrastructure the mechanisms to implement automated monitoring and adaptation as well as the flexibility to go beyond these. We evaluate Elastack by integrating it with the OpenStack showing how easy it is to add these important features with a minimum, almost imperceptible, amount of modifications to the default installation. © 2012 ACM.

FecharLer Abstract

2009

On the Cost of Database Clusters Reconfiguration

Autores
Vilaca, R; Pereira, J; Oliveira, R; Armendariz Inigo, JE; Gonzalez de Mendivi, JRG;

Publicação
2009 28TH IEEE INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS

Abstract
Data base clusters based on share-nothing replication techniques are currently widely accepted as a practical solution to scalability and availability of the data tier. A key issue when planning such systems is the ability to meet service level agreements when load spikes occur or cluster nodes fail. This translates into the ability to provision and deploy additional nodes. Many current research efforts focus on designing autonomic controllers to perform such reconfiguration, tuned to quickly react to system changes and spawn new replicas based on resource usage and performance measurements. In contrast, we are concerned about the inherent impact of deploying an additional node to an online cluster, considering both the time required to finish such an action as well as the impact on resource usage and performance of the cluster as a whole. If noticeable, such impact hinders the practicability of self-management techniques, since it adds an additional dimension that has to he accounted for. Our approach is to systematically benchmark a number of different reconfiguration scenarios to assess the cost of bringing a new replica online. We consider factors such as: workload characteristics, incremental and parallel recovery, flow control and outdatedness of the recovering replica. As a result, we show that research should be refocused from optimizing the capture and transmition of changes to applying them, which in a realistic setting dominates the cost of the recovery operation.

FecharLer Abstract

2011

An epidemic approach to dependable key-value substrates

Autores
Matos, M; Vilaca, R; Pereira, J; Oliveira, R;

Publicação
Proceedings of the International Conference on Dependable Systems and Networks

Abstract
The sheer volumes of data handled by today's Internet services demand uncompromising scalability from the persistence substrates. Such demands have been successfully addressed by highly decentralized key-value stores invariably governed by a distributed hash table. The availability of these structured overlays rests on the assumption of a moderately stable environment. However, as scale grows with unprecedented numbers of nodes the occurrence of faults and churn becomes the norm rather than the exception, precluding the adoption of rigid control over the network's organization. In this position paper we outline the major ideas of a novel architecture designed to handle today's very large scale demand and its inherent dynamism. The approach rests on the well-known reliability and scalability properties of epidemic protocols to minimize the impact of churn. We identify several challenges that such an approach implies and speculate on possible solutions to ensure data availability and adequate access performance. © 2011 IEEE.

FecharLer Abstract