Publications

Publications by Pedro Diniz

2017

Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, Code Transformations and Compilation

Authors
Cardoso, JMP; Coutinho, JGF; Diniz, PC;

Publication
Embedded Computing for High Performance: Efficient Mapping of Computations Using Customization, Code Transformations and Compilation

Abstract
Embedded Computing for High Performance: Design Exploration and Customization Using High-level Compilation and Synthesis Tools provides a set of real-life example implementations that migrate traditional desktop systems to embedded systems. Working with popular hardware, including Xilinx and ARM, the book offers a comprehensive description of techniques for mapping computations expressed in programming languages such as C or MATLAB to high-performance embedded architectures consisting of multiple CPUs, GPUs, and reconfigurable hardware (FPGAs). The authors demonstrate a domain-specific language (LARA) that facilitates retargeting to multiple computing systems using the same source code. In this way, users can decouple original application code from transformed code and enhance productivity and program portability. After reading this book, engineers will understand the processes, methodologies, and best practices needed for the development of applications for high-performance embedded computing systems. Focuses on maximizing performance while managing energy consumption in embedded systems Explains how to retarget code for heterogeneous systems with GPUs and FPGAs Demonstrates a domain-specific language that facilitates migrating and retargeting existing applications to modern systems Includes downloadable slides, tools, and tutorials.

CloseRead Abstract

2014

Evaluating high-level program invariants using reconfigurable hardware

Authors
Park, J; Diniz, PC;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
There is an increasing concern about transient errors in deep sub-micron processor architectures. Software-only error detection approaches that exploit program invariants for silent error detection incur large execution overheads and are unreliable as state can be corrupted after invariant check points. In this paper we explore the use of configurable hardware structures for the continuous evaluation of high-level program invariants at the assembly-level. We evaluate the resource requirements and performance of the proposed hardware structures on a contemporary reconfigurable hardware device. The results, for a small set of kernels codes, reveal that these hardware structures require a very small number of resources and are fairly insensitive to the complexity of the invariants thus making the proposed hardware approach an attractive alternative to software-only invariant checking by integrating them in traditional processor architectures. © 2014 Springer International Publishing Switzerland.

CloseRead Abstract

2017

Message from ANDARE'17 general and program chairs

Authors
Bartolini, A; Cardoso, JMP; Silvano, C; Palermo, G; Barbosa, J; Marongiu, A; Mustafa, D; Rohou, E; Mantovani, F; Agosta, G; Martinovic, J; Pingali, K; Slaninová, K; Benini, L; Cytowski, M; Palkovic, M; Gerndt, M; Sanna, N; Diniz, P; Rusitoru, R; Eigenmann, R; Patki, T; Fahringer, T; Rosendard, T;

Publication
ACM International Conference Proceeding Series

Abstract

2019

Preface

Authors
Hochberger, C; Nelson, B; Koch, A; Woods, R; Diniz, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract

2007

Partial data reuse for windowing computations: Performance modeling for FPGA implementations

Authors
Park, JS; Diniz, PC;

Publication
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS

Abstract
The mapping of applications to FPGAs involves the exploration of a potentially large space of possible design choices with long and error-prone design cycles. Automated compiler analysis and transformation techniques aim at improving the design productivity of this mapping process by reducing the design cycles while still leading to good desigs. Scalar replacement, also known as, register promotion, leads to designs that reduce the number of external memory accesses, and thus reduce the execution time, by the use of storage resource. In this paper we present the combination of loop transformation techniques, namely loop unrolling, loop splitting and loop interchange with scalar replacement. to enable partial data reuse on computations expressed by tightly nested loops pervasive in image processing algorithms. We describe an accurate performance modeling in the presence of partial data reuse. Our experimental results reveal that our model accurately captures the non-trivial execution effects of pipelined implementations in the presence of partial data reuse due to the need to fill-up data buffers. The model thus allows a compiler to explore a large design space with high accuracy, ultimately allowing compiler tools to find better design than using brute-force approaches.

CloseRead Abstract

2007

A combined hardware/software optimization framework for signal representation and recognition

Authors
Demertzi, M; Diniz, P; Hall, MW; Gilbert, AC; Wang, Y;

Publication
COMPUTATIONAL SCIENCE - ICCS 2007, PT 1, PROCEEDINGS

Abstract
This paper describes a signal recognition system that is jointly optimized from mathematical representation, algorithm design and final implementation. The goal is to exploit signal properties to jointly optimize a computation, beginning with first principles (mathematical representation) and completed with implementation. We use a BestBasis algorithm to search a large collection of orthogonal transforms derived from the Walsh-Hadamard transform to find a series of transforms which best discriminate among signal classes. The implementation exploits the structure of these matrices to compress the matrix representation, and in the process of multiplying the signal by the transform, reuse the results of prior computation and parallelize the implementation in hardware. Through this joint optimization, this dynamic, data-driven system is able to yield much more highly optimized results than if the optimizations were performed statically and in isolation. We provide results taken from applying this system to real input signals of spoken digits, and perform the initial analyses to demonstrate the properties of the transform matrices lead to optimized solutions.

CloseRead Abstract