Publicacoes - INESC TEC

Publicações

Publicações por Ricardo Pereira Vilaça

2014

pH1: A Transactional Middleware for NoSQL

Autores
Coelho, F; Cruz, F; Vilaca, R; Pereira, J; Oliveira, R;

Publicação
2014 IEEE 33RD INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS)

Abstract
NoSQL databases opt not to offer important abstractions traditionally found in relational databases in order to achieve high levels of scalability and availability: transactional guarantees and strong data consistency. In this work we propose pH1, a generic middleware layer over NoSQL databases that offers transactional guarantees with Snapshot Isolation. This is achieved in a non-intrusive manner, requiring no modifications to servers and no native support for multiple versions. Instead, the transactional context is achieved by means of a multiversion distributed cache and an external transaction certifier, exposed by extending the client's interface with transaction bracketing primitives. We validate and evaluate pH1 with Apache Cassandra and Hyperdex. First, using the YCSB benchmark, we show that the cost of providing ACID guarantees to these NoSQL databases amounts to 11% decrease in throughput. Moreover, using the transaction intensive TPC-C workload, pH1 presented an impact of 22% decrease in throughput. This contrasts with OMID, a previous proposal that takes advantage of HBase's support for multiple versions, with a throughput penalty of 76% in the same conditions

FecharLer Abstract

2017

Prepared scan: efficient retrieval of structured data from HBase

Autores
Neves, F; Vilaça, R; Pereira, JO; Oliveira, R;

Publicação
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
The ability of NoSQL systems to scale better than traditional relational databases motivates a large set of applications to migrate their data to NoSQL systems, even without aiming to exploit the provided schema exibility. However, accessing structured data is costly due to such exibility, incurring in a lot of bandwidth and processing unit usage. In this paper, we analyse this cost in Apache HBase and propose a new scan operation, named Prepared Scan, that optimizes the access to data structured in a regular manner by taking advantage of a well-known schema by application. Using an industry standard benchmark, we show that Prepared Scan improves throughput up to 29% and decreases network bandwidth consumption up to 20%. © 2017 ACM.

FecharLer Abstract

2014

Workload-aware table splitting for NoSQL

Autores
Cruz, F; Maia, F; Oliveira, R; Vilaça, R;

Publicação
Symposium on Applied Computing, SAC 2014, Gyeongju, Republic of Korea - March 24 - 28, 2014

Abstract
Massive scale data stores, which exhibit highly desirable scalability and availability properties are becoming pivotal systems in nowadays infrastructures. Scalability achieved by these data stores is anchored on data independence; there is no clear relationship between data, and atomic inter-node operations are not a concern. Such assumption over data allows aggressive data partitioning. In particular, data tables are horizontally partitioned and spread across nodes for load balancing. However, in current versions of these data stores, partitioning is either a manual process or automated but simply based on table size. We argue that size based partitioning does not lead to acceptable load balancing as it ignores data access patterns, namely data hotspots. Moreover, manual data partitioning is cumbersome and typically infeasible in large scale scenarios. In this paper we propose an automated table splitting mechanism that takes into account the system workload. We evaluate such mechanism showing that it simple, non-intrusive and effective. Copyright 2014 ACM.

FecharLer Abstract

2014

The 2nd Workshop on Planetary-Scale Distributed Systems (W-PSDS 2014)

Autores
Antunes Leitao, JC; Pereira Vilaça, RM;

Publicação
33rd IEEE International Symposium on Reliable Distributed Systems Workshops, SRDS Workshops 2014, Nara, Japan, October 6-9, 2014

Abstract

2017

HTAPBench: Hybrid Transactional and Analytical Processing Benchmark

Autores
Coelho, F; Paulo, J; Vilaça, R; Pereira, JO; Oliveira, R;

Publicação
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ICPE 2017, L'Aquila, Italy, April 22-26, 2017

Abstract
The increasing demand for real-time analytics requires the fusion of Transactional (OLTP) and Analytical (OLAP) systems, eschewing ETL processes and introducing a plethora of proposals for the so-called Hybrid Analytical and Trans-actional Processing (HTAP) systems. Unfortunately, current benchmarking approaches are not able to comprehensively produce a unified metric from the assessment of an HTAP system. The evaluation of both engine types is done separately, leading to the use of disjoint sets of benchmarks such as TPC-C or TPC-H. In this paper we propose a new benchmark, HTAPBench, providing a unified metric for HTAP systems geared toward the execution of constantly increasing OLAP requests limited by an admissible impact on OLTP performance. To achieve this, a load balancer within HTAPBench regulates the coexistence of OLTP and OLAP workloads, proposing a method for the generation of both new data and requests, so that OLAP requests over freshly modified data are comparable across runs. We demonstrate the merit of our approach by validating it with different types of systems: OLTP, OLAP and HTAP; showing that the benchmark is able to highlight the differences between them, while producing queries with comparable complexity across experiments with negligible variability. © 2017 ACM.

FecharLer Abstract

2019

d'Artagnan: A Trusted NoSQL Database on Untrusted Clouds

Autores
Pontes, R; Maia, F; Vilaça, R; Machado, N;

Publicação
38th Symposium on Reliable Distributed Systems, SRDS 2019, Lyon, France, October 1-4, 2019

Abstract
Privacy sensitive applications that store confidential information such as personal identifiable data or medical records have strict security concerns. These concerns hinder the adoption of the cloud. With cloud providers under the constant threat of malicious attacks, a single successful breach is sufficient to exploit any valuable information and disclose sensitive data. Existing privacy-aware databases mitigate some of these concerns, but sill leak critical information that can potently compromise the entire system's security. This paper proposes d'Artagnan, the first privacy-aware multi-cloud NoSQL database framework that renders database leaks worthless. The framework stores data as encrypted secrets in multiple clouds such that i) a single data breach cannot break the database's confidentiality and ii) queries are processed on the server-side without leaking any sensitive information. d'Artagnan is evaluated with industry-standard benchmark on market-leading cloud providers. © 2019 IEEE.

FecharLer Abstract