Publicacoes - INESC TEC

Publicações

Publicações por José Orlando Pereira

2014

A Survey and Classification of Storage Deduplication Systems

Autores
Paulo, J; Pereira, J;

Publicação
ACM COMPUTING SURVEYS

Abstract
The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.

FecharLer Abstract

2015

X-Ray: Monitoring and analysis of distributed database queries

Autores
Guimaraes, P; Pereira, J;

Publicação
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
The integration of multiple database technologies, including both SQL and NoSQL, allows using the best tool for each aspect of a complex problem and is increasingly sought in practice. Unfortunately, this makes it difficult for database developers and administrators to obtain a clear view of the resulting composite data processing paths, as they combine operations chosen by different query optimisers, implemented by different software packages, and partitioned across distributed systems. This work addresses this challenge with the X-Ray framework, that allows monitoring code to be added to a Java-based distributed system by manipulating its bytecode at runtime. The resulting information is collected in a NoSQL database and then processed to visualise data processing paths as required for optimising integrated database systems. This proposal is demonstrated with a distributed query over a federation of Apache Derby database servers and its performance evaluated with the standard TPC-C benchmark workload. © IFIP International Federation for Information Processing 2015.

FecharLer Abstract

2016

The CloudMdsQL Multistore System

Autores
Kolev, B; Bondiombouy, C; Valduriez, P; Peris, RJ; Pau, R; Pereira, J;

Publicação
Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016

Abstract
The blooming of different cloud data management infrastructures has turned multistore systems to a major topic in the nowadays cloud landscape. In this demonstration, we present a Cloud Multidatastore Query Language (CloudMdsQL), and its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational and NoSQL) within a single query that may contain embedded invocations to each data store's native query interface. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized. Within our demonstration, we focus on two use cases each involving four diverse data stores (graph, document, relational, and key-value) with its corresponding CloudMdsQL queries. The query execution flows are visualized by an embedded real-time monitoring subsystem. The users can also try out different ad-hoc queries, not necessarily in the context of the use cases. Copyright is held by the owner/author(s).

FecharLer Abstract

2016

Design and Implementation of the CloudMdsQL Multistore System

Autores
Kolev, B; Bondiombouy, C; Levchenko, O; Valduriez, P; Jimenez, R; Pau, R; Pereira, J;

Publicação
PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, VOL 1 (CLOSER)

Abstract
The blooming of different cloud data management infrastructures has turned multistore systems to a major topic in the nowadays cloud landscape. In this paper, we give an overview of the design of a Cloud Multidatastore Query Language (CloudMdsQL), and the implementation of its query engine. CloudMdsQL is a functional SQL-like language, capable of querying multiple heterogeneous data stores (relational, NoSQL, HDFS) within a single query that can contain embedded invocations to each data store's native query interface. The major innovation is that a CloudMdsQL query can exploit the full power of local data stores, by simply allowing some local data store native queries (e.g. a breadth-first search query against a graph database) to be called as functions, and at the same time be optimized.

FecharLer Abstract

2014

Improving the Scalability of DPWS-Based Networked Infrastructures

Autores
Campos, Filipe; Pereira, Jose;

Publicação
CoRR

Abstract

2015

An Experimental Evaluation of Machine-to-Machine Coordination Middleware

Autores
Campos, F; Pereira, J;

Publicação
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II

Abstract
The vision of the Internet-of-Things (IoT) embodies the seam- less discovery, configuration, and interoperability of networked devices in various settings, ranging from home automation and multimedia to autonomous vehicles and manufacturing equipment. As these ap- plications become increasingly critical, the middleware coping with Machine-to-Machine (M2M) communication and coordination has to deal with fault tolerance and increasing complexity, while still abiding to resource constraints of target devices. In this paper, we focus on configuration management and coordi- nation of services in a M2M scenario. On one hand, we consider ZooKeeper, originally developed for cloud data centers, offering a simple file-system abstraction, and embodying replication for fault-tolerance and scalability based on a consensus protocol. On the other hand, we consider the Devices Profile for Web Services (DPWS) stack with replicated services based on our implementation of the Raft consensus protocol. We show that the latter offers adequate performance for the targeted applications while providing increasing flexibility.

FecharLer Abstract