Publicacoes - INESC TEC

Publicações

Publicações por Ricardo Pereira Vilaça

2014

DATAFLASKS: epidemic store for massive scale systems

Autores
Maia, F; Matos, M; Vilaca, R; Pereira, J; Oliveira, R; Riviere, E;

Publicação
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
Very large scale distributed systems provide some of the most interesting research challenges while at the same time being increasingly required by nowadays applications. The escalation in the amount of connected devices and data being produced and exchanged, demands new data management systems. Although new data stores are continuously being proposed, they are not suitable for very large scale environments. The high levels of churn and constant dynamics found in very large scale systems demand robust, proactive and unstructured approaches to data management. In this paper we propose a novel data store solely based on epidemic (or gossip-based) protocols. It leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems. We provide an open source prototype of the data store and correspondent evaluation.

FecharLer Abstract

2013

Evaluating Cassandra as a manager of large file sets

Autores
Beernaert, L; Gomes, P; Matos, M; Vilaça, R; Oliveira, R;

Publicação
Proceedings of the 3rd International Workshop on Cloud Data and Platforms, CloudDP@EuroSys 2013, Prague, Czech Republic, April 14-17, 2013

Abstract
All companies developing their business on the Web, not only giants like Google or Facebook but also small companies focused on niche markets, face scalability issues in data management. The case study of this paper is the content management systems for classified or commercial advertisements on the Web. The data involved has a very significant growth rate and a read-intensive access pattern with a reduced update rate. Typically, data is stored in traditional file systems hosted on dedicated servers or Storage Area Network devices due to the generalization and ease of use of file systems. However, this ease in implementation and usage has a disadvantage: the centralized nature of these systems leads to availability, elasticity and scalability problems. The scenario under study, undemanding in terms of the system's consistency and with a simple interaction model, is suitable to a distributed database, such as Cassandra, conceived precisely to dynamically handle large volumes of data. In this paper, we analyze the suitability of Cassandra as a substitute for file systems in content management systems. The evaluation, conducted using real data from a production system, shows that when using Cassandra, one can easily get horizontal scalability of storage, redundancy across multiple independent nodes and load distribution imposed by the periodic activities of safeguarding data, while ensuring a comparable performance to that of a file system. Copyright © 2013 ACM.

FecharLer Abstract

2013

MeT: workload aware elasticity for NoSQL

Autores
Cruz, F; Maia, F; Matos, M; Oliveira, R; Paulo, J; Pereira, J; Vilaça, R;

Publicação
Eighth Eurosys Conference 2013, EuroSys '13, Prague, Czech Republic, April 14-17, 2013

Abstract
NoSQL databases manage the bulk of data produced by modern Web applications such as social networks. This stems from their ability to partition and spread data to all available nodes, allowing NoSQL systems to scale. Unfortunately, current solutions' scale out is oblivious to the underlying data access patterns, resulting in both highly skewed load across nodes and suboptimal node configurations. In this paper, we first show that judicious placement of HBase partitions taking into account data access patterns can improve overall throughput by 35%. Next, we go beyond current state of the art elastic systems limited to uninformed replica addition and removal by: i) reconfiguring existing replicas according to access patterns and ii) adding replicas specifically configured to the expected access pattern. MeT is a prototype for a Cloud-enabled framework that can be used alone or in conjunction with OpenStack for the automatic and heterogeneous reconfiguration of a HBase deployment. Our evaluation, conducted using the YCSB workload generator and a TPC-C workload, shows that MeT is able to i) autonomously achieve the performance of a manual configured cluster and ii) quickly reconfigure the cluster according to unpredicted workload changes. © 2013 ACM.

FecharLer Abstract

2014

On the Support of Versioning in Distributed Key-Value Stores

Autores
Felber, P; Pasin, M; Riviere, E; Schiavoni, V; Sutra, P; Coelho, F; Oliveira, R; Matos, M; Vilaca, R;

Publicação
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
The ability to access and query data stored in multiple versions is an important asset for many applications, such as Web graph analysis, collaborative editing platforms, data forensics, or correlation mining. The storage and retrieval of versioned data requires a specific API and support from the storage layer. The choice of the data structures used to maintain versioned data has a fundamental impact on the performance of insertions and queries. The appropriate data structure also depends on the nature of the versioned data and the nature of the access patterns. In this paper we study the design and implementation space for providing versioning support on top of a distributed key-value store (KVS). We define an API for versioned data access supporting multiple writers and show that a plain KVS does not offer the necessary synchronization power for implementing this API. We leverage the support for listeners at the KVS level and propose a general construction for implementing arbitrary types of data structures for storing and querying versioned data. We explore the design space of versioned data storage ranging from a flat data structure to a distributed sharded index. The resulting system, ALEPH, is implemented on top of an industrial-grade open-source KVS, Infinispan. Our evaluation, based on real-world Wikipedia access logs, studies the performance of each versioning mechanisms in terms of load balancing, latency and storage overhead in the context of different access scenarios.

FecharLer Abstract

2013

An effective scalable SQL engine for NoSQL databases

Autores
Vilaca, R; Cruz, F; Pereira, J; Oliveira, R;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
NoSQL databases were initially devised to support a few concrete extreme scale applications. Since the specificity and scale of the target systems justified the investment of manually crafting application code their limited query and indexing capabilities were not a major impediment. However, with a considerable number of mature alternatives now available there is an increasing willingness to use NoSQL databases in a wider and more diverse spectrum of applications and, to most of them, hand-crafted query code is not an enticing trade-off. In this paper we address this shortcoming of current NoSQL databases with an effective approach for executing SQL queries while preserving their scalability and schema flexibility. We show how a full-fledged SQL engine can be integrated atop of HBase leading to an ANSI SQL compliant database. Under a standard TPC-C workload our prototype scales linearly with the number of nodes in the system and outperforms a NoSQL TPC-C implementation optimized for HBase. © 2013 IFIP International Federation for Information Processing.

FecharLer Abstract

2017

Performance trade-offs on a secure multi-party relational database

Autores
Pontes, R; Pinto, M; Barbosa, M; Vilaça, R; Matos, M; Oliveira, R;

Publicação
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
The privacy of information is an increasing concern of software applications users. This concern was caused by attacks to cloud services over the last few years, that have leaked confidential information such as passwords, emails and even private pictures. Once the information is leaked, the users and software applications are powerless to contain the spread of information and its misuse. With databases as a central component of applications that store almost all of their data, they are one of the most common targets of attacks. However, typical deployments of databases do not leverage security mechanisms to stop attacks and do not apply cryptographic schemes to protect data. This issue has been tackled by multiple secure databases that provide trade-offs between security, query capabilities and performance. Despite providing stronger security guarantees, the proposed solutions still entrust their data to a single entity that can be corrupted or hacked. Secret sharing can solve this problem by dividing data in multiple secrets and storing each secret at a different location. The division is done in such a way that if one location is hacked, no information can be leaked. Depending on the protocols used to divide data, functions can be computed over this data through secure protocols that do not disclose information or actually know which values are being calculated. We propose a SQL database prototype capable of offering a trade-off between security and query latency by using a different secure protocol. An evaluation of the protocols is also performed, showing that our most relaxed protocol has an improvement of 5% on the query latency time over the original protocol. © 2017 ACM.

FecharLer Abstract