Publications

Publications by Pedro Diniz

2013

A note from the program chairs

Authors
Morrow, K; Diniz, PC;

Publication
2013 23rd International Conference on Field Programmable Logic and Applications, FPL 2013 - Proceedings

Abstract

2014

Addressing failures in exascale computing

Authors
Snir, M; Wisniewski, RW; Abraham, JA; Adve, SV; Bagchi, S; Balaji, P; Belak, J; Bose, P; Cappello, F; Carlson, B; Chien, AA; Coteus, P; Debardeleben, NA; Diniz, PC; Engelmann, C; Erez, M; Fazzari, S; Geist, A; Gupta, R; Johnson, F; Krishnamoorthy, S; Leyffer, S; Liberty, D; Mitra, S; Munson, T; Schreiber, R; Stearley, J; Hensbergen, EV;

Publication
International Journal of High Performance Computing Applications

Abstract
We present here a report produced by a workshop on 'Addressing failures in exascale computing' held in Park City, Utah, 4-11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an exascale system, and build on those results, examining potential solutions from both a hardware and software perspective and focusing on a combined approach.The workshop brought together participants with expertise in applications, system software, and hardware; they came from industry, government, and academia, and their interests ranged from theory to implementation. The combination allowed broad and comprehensive discussions and led to this document, which summarizes and builds on those discussions. © The Author(s) 2014.

CloseRead Abstract

2014

An evaluation of lazy fault detection based on Adaptive Redundant Multithreading

Authors
Hukerikar, S; Teranishi, K; Diniz, PC; Lucas, RF;

Publication
2014 IEEE High Performance Extreme Computing Conference, HPEC 2014

Abstract
The challenge of resilience for High Performance Computing applications is significant for future extreme scale systems. These systems will experience unprecedented rates of faults and errors as they will be constructed from massive numbers of components that are inherently less reliable than those available today. While the use of redundant computing can provide detection and possible correction of errors, its system-wide use in future extreme-scale HPC systems will incur considerable overheads to application performance. In this paper, we present a framework that provides application level fault detection based on redundant multithreading. In previous work, we demonstrated an adaptive approach based on a language level directive. The computation contained in the programmer directive is executed by duplicate threads. In concert with a runtime system, the redundant multithreading is enabled opportunistically to provide fault detection at more reasonable overheads to application performance. The lazy fault detection approach presented in this work seeks to further optimize the use of redundancy by prioritizing the application's primary computation over the fault detection. Our approach relaxes the requirement that the redundant threads synchronize and compare results immediately. We show that lazy error detection is feasible and yields lower time to solution over adaptive RMT for a range of scientific computational kernels. We also explore a thread-to-core assignment strategy that seeks to reduce the interference between the redundant threads. © 2014 IEEE.

CloseRead Abstract

2014

A case for adaptive redundancy for HPC resilience

Authors
Hukerikar, S; Diniz, PC; Lucas, RF;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Redundancy both in space and time has been widely used to detect and in some cases correct errors in High Performance Computing (HPC) systems. With the HPC community seeking exascale class supercomputers by the end of the decade, unrealistic expectations for correct system behavior will result in exorbitant costs in terms of performance lost and energy expended. Resilience strategies will need to find balance between fault coverage and the overheads incurred. In this work, we propose an adaptive approach that factors in application level knowledge together with runtime inference about the fault tolerance state of the system to dynamically enable redundant multithreading (RMT). Our approach is based on simple programming language extensions, tightly integrated with a compiler infrastructure and a runtime framework that enables managing the performance overheads of redundant computation. © 2014 Springer-Verlag Berlin Heidelberg.

CloseRead Abstract

2018

Applied Reconfigurable Computing. Architectures, Tools, and Applications - 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Proceedings

Authors
Voros, NS; Hübner, M; Keramidas, G; Goehringer, D; Antonopoulos, CP; Diniz, PC;

Publication
ARC

Abstract

2018

A Faddeev Systolic Array for EKF-SLAM and its Arithmetic Data Representation Impact on FPGA

Authors
de Souza Rosa, L; Dasu, A; C. Diniz, P; Bonato, V;

Publication
Journal of Signal Processing Systems

Abstract
The Extended Kalman Filter (EKF) computation is a core task for the simultaneous localization and mapping (SLAM) problem in autonomous mobile robots. The SLAM problem involves operations over high dimension data sets, requiring high throughput and performance, given the real-time nature of the robotics, control-decision algorithm this task is a part of. The lightweight and power restricted computing environments in mobile robotics requires customized processing systems such as Field-Programmable Gate Arrays (FPGAs). This work presents an arithmetic precision analysis and a Faddeev algorithm to calculate the Schur’s Complement hardware architecture implementation for the EKF-SLAM using a Systolic Array (SA). While it is widely believed that fixed-point implementations of arithmetic operations lead to area and performance benefits on FPGAs, the results in this article reveal that each Processing Element (PE) in the SA consumes 25% more logic and about 30% more register resources for the fixed-point 13.23 representation than if using the IEEE-754 single precision floating-point format. In addition, for FPGA devices with hardware support for key components of floating-point computations, a single PE floating-point implementation can achieve a maximum frequency up to 50% higher than a corresponding fixed-point implementation for the same relative numeric errors. © 2017, Springer Science+Business Media New York.

CloseRead Abstract