Publications

Publications by José Orlando Pereira

2016

Benchmarking Polystores: the CloudMdsQL Experience

Authors
Kolev, B; Pau, R; Levchenko, O; Valduriez, P; Jimenez Peri, R; Pereira, J;

Publication
2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)

Abstract
The CloudMdsQL polystore provides integrated access to multiple heterogeneous data stores, such as RDBMS, NoSQL or even HDFS through a big data analytics framework such as MapReduce or Spark. The CloudMdsQL language is a functional SQL-like query language with a flexible nested data model. A major capability is to exploit the full power of each of the underlying data stores by allowing native queries to be expressed as functions and involved in SQL statements. The CloudMdsQL polystore has been validated with a good number of different data stores: HDFS, key-value, document, graph, RDBMS and OLAP engine. In this paper, we introduce the benchmarking of the CloudMdsQL polystore and evaluate the performance benefits of important features enabled by the query language and engine.

CloseRead Abstract

2015

Implementing a Linear Algebra Approach to Data Processing

Authors
Pontes, R; Matos, M; Oliveira, JN; Pereira, JO;

Publication
Grand Timely Topics in Software Engineering - International Summer School GTTSE 2015, Braga, Portugal, August 23-29, 2015, Tutorial Lectures

Abstract
Data analysis is among the main strategies of our time for enterprises to take advantage of the vast amounts of data their systems generate and store everyday. Thus the standard relational database model is challenged everyday to cope with quantitative operations over a traditionally qualitative, relational model. A novel approach to the semantics of data is based on (typed) linear algebra (LA), rather than relational algebra, bridging the gap between data dimensions and data measures in a unified way. Also, this bears the promise of increased parallelism, as most operations in LA admit a ‘divide & conquer’ implementation. This paper presents a first experiment in implementing such a typed linear algebra approach and testing its performance on a data distributed system. It presents solutions to some theoretical limitations and evaluates the overall performance. © Springer International Publishing AG 2017.

CloseRead Abstract

2013

Proceedings of the 8th Workshop on Middleware for Next Generation Internet Computing, MW4NextGen 2013, Beijing, China, December 9-13, 2013

Authors
Göschka, KM; Pereira, JO; Hung, PCK;

Publication
MW4NextGen@Middleware

Abstract
[No abstract available]

CloseRead Abstract

2016

CloudMdsQL: querying heterogeneous cloud data stores with a common language

Authors
Kolev, B; Valduriez, P; Bondiombouy, C; Jimenez Peris, R; Pau, R; Pereira, J;

Publication
DISTRIBUTED AND PARALLEL DATABASES

Abstract
The blooming of different cloud data management infrastructures, specialized for different kinds of data and tasks, has led to a wide diversification of DBMS interfaces and the loss of a common programming paradigm. In this paper, we present the design of a cloud multidatastore query language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The query engine has a fully distributed architecture, which provides important opportunities for optimization. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized, e.g. by pushing down select predicates, using bind join, performing join ordering, or planning intermediate data shipping. Our experimental validation, with three data stores (graph, document and relational) and representative queries, shows that CloudMdsQL satisfies the five important requirements for a cloud multidatastore query language.

CloseRead Abstract

2013

Experience with a middleware infrastructure for service oriented financial applications

Authors
Oliveira, JP; Pereira, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
Financial institutions, acting as financial intermediaries, need to handle numerous information sources and feed them to multiple processing, storage, and display services. This requires filtering and routing, but these feeds are usually provided in custom formats and protocols that are not the best fit for further processing. Moreover, the sheer volume of information and stringent timeliness and reliability requirements make this a substantial task. In this paper, i) we characterize one of these information feeds (the Exchange Data Publisher feed from the NYSE Euronext European Cash Markets) and ii) we present and evaluate a dissemination system for this particular feeder based on commodity hardware and open-source message-oriented middleware (Apache Qpid). This allows us to assess the feasibility of this approach and to point out the main challenges to be overcome. Copyright 2013 ACM.

CloseRead Abstract

2017

HTAPBench: Hybrid Transactional and Analytical Processing Benchmark

Authors
Coelho, F; Paulo, J; Vilaça, R; Pereira, JO; Oliveira, R;

Publication
Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, ICPE 2017, L'Aquila, Italy, April 22-26, 2017

Abstract
The increasing demand for real-time analytics requires the fusion of Transactional (OLTP) and Analytical (OLAP) systems, eschewing ETL processes and introducing a plethora of proposals for the so-called Hybrid Analytical and Trans-actional Processing (HTAP) systems. Unfortunately, current benchmarking approaches are not able to comprehensively produce a unified metric from the assessment of an HTAP system. The evaluation of both engine types is done separately, leading to the use of disjoint sets of benchmarks such as TPC-C or TPC-H. In this paper we propose a new benchmark, HTAPBench, providing a unified metric for HTAP systems geared toward the execution of constantly increasing OLAP requests limited by an admissible impact on OLTP performance. To achieve this, a load balancer within HTAPBench regulates the coexistence of OLTP and OLAP workloads, proposing a method for the generation of both new data and requests, so that OLAP requests over freshly modified data are comparable across runs. We demonstrate the merit of our approach by validating it with different types of systems: OLTP, OLAP and HTAP; showing that the benchmark is able to highlight the differences between them, while producing queries with comparable complexity across experiments with negligible variability. © 2017 ACM.

CloseRead Abstract