Publications

Publications by Rui Carlos Oliveira

2014

pH1: A Transactional Middleware for NoSQL

Authors
Coelho, F; Cruz, F; Vilaca, R; Pereira, J; Oliveira, R;

Publication
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
NoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. In this work we propose pH1, a generic middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. This is achieved in a non-intrusive manner, requiring no modifications to servers and no native support for multiple versions. Instead, the transactional context is achieved by means of a multiversion distributed cache and an external transaction certifier, exposed by extending the client's interface with transaction bracketing primitives. We validate and evaluate pH1 with Apache Cassandra and Hyperdex. First, using the YCSB benchmark, we show that the cost of providing ACID guarantees to these NoSQL databases amounts to 11% decrease in throughput. Moreover, using the transaction intensive TPC-C workload, pH1 presented an impact of 22% decrease in throughput. This contrasts with OMID, a previous proposal that takes advantage of HBase's support for multiple versions, with a throughput penalty of 76% in the same conditions

CloseRead Abstract

2017

A Practical Framework for Privacy-Preserving NoSQL Databases

Authors
Macedo, R; Paulo, J; Pontes, R; Portela, B; Oliveira, T; Matos, M; Oliveira, R;

Publication
2017 IEEE 36TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
Cloud infrastructures provide database services as cost-efficient and scalable solutions for storing and processing large amounts of data. To maximize performance, these services require users to trust sensitive information to the cloud provider, which raises privacy and legal concerns. This represents a major obstacle to the adoption of the cloud computing paradigm. Recent work addressed this issue by extending databases to compute over encrypted data. However, these approaches usually support a single and strict combination of cryptographic techniques invariably making them application specific. To assess and broaden the applicability of cryptographic techniques in secure cloud storage and processing, these techniques need to be thoroughly evaluated in a modular and configurable database environment. This is even more noticeable for NoSQL data stores where data privacy is still mostly overlooked. In this paper, we present a generic NoSQL framework and a set of libraries supporting data processing cryptographic techniques that can be used with existing NoSQL engines and composed to meet the privacy and performance requirements of different applications. This is achieved through a modular and extensible design that enables data processing over multiple cryptographic techniques applied on the same database. For each technique, we provide an overview of its security model, along with an extensive set of experiments. The framework is evaluated with the YCSB benchmark, where we assess the practicality and performance tradeoffs for different combinations of cryptographic techniques. The results for a set of macro experiments show that the average overhead in NoSQL operations performance is below 15%, when comparing our system with a baseline database without privacy guarantees.

CloseRead Abstract

2017

Prepared scan: efficient retrieval of structured data from HBase

Authors
Neves, F; Vilaça, R; Pereira, JO; Oliveira, R;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema exibility. However, accessing structured data is costly due to such exibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29% and decreases network bandwidth consumption up to 20%. © 2017 ACM.

CloseRead Abstract

2017

SAFETHINGS: Data Security by Design in the IoT

Authors
Barbosa, M; Ben Mokhtar, S; Felber, P; Maia, F; Matos, M; Oliveira, R; Riviere, E; Schiavoni, V; Voulgaris, S;

Publication
2017 13TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2017)

Abstract
Despite years of research and the long-lasting promise of pervasiveness of an "Internet of Things", it is only recently that a truly convincing number of connected things have been deployed in the wild. New services are now being built on top of these things and allow to realize the IoT vision. However, integration of things in complex and interconnected systems is still only in the hands of their manufacturers and of Cloud providers supporting IoT integration platforms. Several issues associated with data privacy arise from this situation. Not only do users need to trust manufacturers and IoT platforms for handling their data, but integration between heterogeneous platforms is still only incipient. In this position paper, we chart a new IoT architecture, SAFETHINGS, that aims at enabling data privacy by design, and that we believe can serve as the foundation for a more comprehensive IoT integration. The SAFETHINGS architecture is based on two simple but powerful conceptual component families, the cleansers and blenders, that allow data owners to get back the control of IoT data and its processing.

CloseRead Abstract

2013

Slicing as a Distributed Systems Primitive

Authors
Maia, F; Matos, M; Oliveira, R; Riviere, E;

Publication
2013 SIXTH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE COMPUTING (LADC)

Abstract
Large-scale distributed systems appear as the major infrastructures for supporting planet-scale services. These systems call for appropriate management mechanisms and protocols. Slicing is an example of an autonomous, fully decentralized protocol suitable for large-scale environments. It aims at organizing the system into groups of nodes, called slices, according to an application-specific criteria where the size of each slice is relative to the size of the full system. This allows assigning a certain fraction of nodes to different task, according to their capabilities. Although useful, current slicing techniques lack some features of considerable practical importance. This paper proposes a slicing protocol, that builds on existing solutions, and addresses some of their frailties. We present novel solutions to deal with non-uniform slices and to perform online and dynamic slices schema reconfiguration. Moreover, we describe how to provision a slice-local Peer Sampling Service for upper protocol layers and how to enhance slicing protocols with the capability of slicing over more than one attribute. Slicing is presented as a complete, dependable and integrated distributed systems primitive for large-scale systems.

CloseRead Abstract

2014

Workload-aware table splitting for NoSQL

Authors
Cruz, F; Maia, F; Oliveira, R; Vilaça, R;

Publication
Symposium on Applied Computing, SAC 2014, Gyeongju, Republic of Korea - March 24 - 28, 2014

Abstract
Massive scale data stores, which exhibit highly desirable scalability and availability properties are becoming pivotal systems in nowadays infrastructures. Scalability achieved by these data stores is anchored on data independence; there is no clear relationship between data, and atomic inter-node operations are not a concern. Such assumption over data allows aggressive data partitioning. In particular, data tables are horizontally partitioned and spread across nodes for load balancing. However, in current versions of these data stores, partitioning is either a manual process or automated but simply based on table size. We argue that size based partitioning does not lead to acceptable load balancing as it ignores data access patterns, namely data hotspots. Moreover, manual data partitioning is cumbersome and typically infeasible in large scale scenarios. In this paper we propose an automated table splitting mechanism that takes into account the system workload. We evaluate such mechanism showing that it simple, non-intrusive and effective. Copyright 2014 ACM.

CloseRead Abstract