Publications

Publications by CRACS

2021

Refactoring the Whitby Intelligent Tutoring System for Clean Architecture

Authors
Brown, PS; Dimitrova, V; Hart, G; Cohn, AG; Moura, P;

Publication
Theory Pract. Log. Program.

Abstract

2021

On the correctness and efficiency of a novel lock-free hash trie map design

Authors
Areias, M; Rocha, R;

Publication
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

Abstract
Hash tries are a trie-based data structure with nearly ideal characteristics for the implementation of hash maps. In this paper, we present a novel, simple and scalable hash trie map design that fully supports the concurrent search, insert and remove operations on hash maps. To the best of our knowledge, our proposal is the first that puts together the following characteristics: (i) be lock free; (ii) use fixed size data structures; and (iii) maintain the access to all internal data structures as persistent memory references. Our design is modular enough to allow different types of configurations aimed for different performances in memory usage and execution time and can be easily implemented in any type of language, library or within other complex data structures. We discuss in detail the key algorithms required to easily reproduce our implementation by others and we present a proof of correctness showing that our proposal is linearizable and lock-free for the search, insert and remove operations. Experimental results show that our proposal is quite competitive when compared against other state-of-the-art proposals implemented in Java.

CloseRead Abstract

2021

On the implementation of memory reclamation methods in a lock-free hash trie design

Authors
Moreno, P; Areias, M; Rocha, R;

Publication
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING

Abstract
Hash tries are a trie-based data structure with nearly ideal characteristics for the implementation of hash maps. Starting from a particular lock-free hash map data structure, named Lock-Free Hash Tries, we focus on solving the problem of memory reclamation without losing the lock-freedom property. To the best of our knowledge, outside garbage collected environments, there is no current implementation of hash maps that is able to reclaim memory in a lock-free manner. To achieve this goal, we propose an approach for memory reclamation specific to Lock-Free Hash Tries that explores the characteristics of its structure in order to achieve efficient memory reclamation with low and well-defined memory bounds. We present and discuss in detail the key algorithms required to easily reproduce our implementation by others. Experimental results show that our approach obtains better results when compared with other state-of-the-art memory reclamation methods and provides a competitive and scalable hash map implementation, if compared to lock-based implementations.

CloseRead Abstract

2021

Towards an Elastic Lock-Free Hash Trie Design

Authors
Areias, M; Rocha, R;

Publication
2021 20TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC)

Abstract
A key aspect of any hash map design is the problem of dynamically resizing it in order to deal with hash collisions. In this context, elasticity refers to the ability to automatically resize the internal data structures that support the hash map operations in order to meet varying workloads, thus optimizing the overall memory consumption of the hash map. This work extends a previous lock-free hash trie design to support elastic hashing, i.e., expand saturated hash levels and compress unused hash levels, such that, at each point in time, the number of levels in a path matches the current demand as closely as possible. Experimental results show that elasticity effectively improves the search operation and, in doing so, our design becomes very competitive when compared to other state-of-the-art designs implemented in Java.

CloseRead Abstract

2021

Pruning strategies for the efficient traversal of the search space in PILP environments

Authors
Corte Real, J; Dutra, I; Rocha, R;

Publication
KNOWLEDGE AND INFORMATION SYSTEMS

Abstract
Probabilistic inductive logic programming (PILP) is a statistical relational learning technique which extends inductive logic programming by considering probabilistic data. The ability to use probabilities to represent uncertainty comes at the cost of an exponential evaluation time when composing theories to model the given problem. For this reason, PILP systems rely on various pruning strategies in order to reduce the search space. However, to the best of the authors' knowledge, there has been no systematic analysis of the different pruning strategies, how they impact the search space and how they interact with one another. This work presents a unified representation for PILP pruning strategies which enables end-users to understand how these strategies work both individually and combined and to make an informed decision on which pruning strategies to select so as to best achieve their goals. The performance of pruning strategies is evaluated both time and quality-wise in two state-of-the-art PILP systems with datasets from three different domains. Besides analysing the performance of the pruning strategies, we also illustrate the utility of PILP in one of the application domains, which is a real-world application.

CloseRead Abstract

2021

Evaluation Procedures for Forecasting with Spatiotemporal Data

Authors
Oliveira, M; Torgo, L; Costa, VS;

Publication
MATHEMATICS

Abstract
The increasing use of sensor networks has led to an ever larger number of available spatiotemporal datasets. Forecasting applications using this type of data are frequently motivated by important domains such as environmental monitoring. Being able to properly assess the performance of different forecasting approaches is fundamental to achieve progress. However, traditional performance estimation procedures, such as cross-validation, face challenges due to the implicit dependence between observations in spatiotemporal datasets. In this paper, we empirically compare several variants of cross-validation (CV) and out-of-sample (OOS) performance estimation procedures, using both artificially generated and real-world spatiotemporal datasets. Our results show both CV and OOS reporting useful estimates, but they suggest that blocking data in space and/or in time may be useful in mitigating CV's bias to underestimate error. Overall, our study shows the importance of considering data dependencies when estimating the performance of spatiotemporal forecasting models.

CloseRead Abstract