Publications

Publications by Pedro Diniz

2010

High performance architectures and compilers

Authors
Diniz, PC; Danelutto, M; Barthou, D; Gonzales, M; Hübner, M;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This topic deals with architecture design and compilation for high performance systems. The areas of interest range from microprocessors to large-scale parallel machines; from general-purpose platforms to specialized hardware (e.g., graphic coprocessors, low-power embedded systems); and from hardware design to compiler technology. On the compilation side, topics of interest include programmer productivity issues, concurrent and/or sequential language aspects, program analysis, transformation, automatic discovery and/or management of parallelism at all levels, and the interaction between the compiler and the rest of the system. On the architecture side, the scope spans system architectures, processor micro-architecture, memory hierarchy, and multi-threading, and the impact of emerging trends. All the papers submitted to this track highlight the growing significance of Chip Multi-Processors (CMP) and Simultaneous Multi- Threaded (SMT) processors in contemporary high-performance architectures. © 2010 Springer-Verlag.

CloseRead Abstract

2011

Data reorganization and prefetching of pointer-based data structures

Authors
Park, J; Diniz, PC;

Publication
IEEE Design and Test of Computers

Abstract
The applicability of FPGAs as memory engines capable of sophisticated and customizable memory traversal, selection, and relocation operations common in scientific computations involving large, pointer-based data structures, is explored. The experimental work illustrates that, despite their clock speed handicap, FPGAs can be successfully integrated with traditional architectures. The nonleaf nodes, or internal nodes, can have up to four children nodes, either internal nodes or simple leaf nodes. A hardware design supporting the execution of computations that traverse data structures must also support relocation. It is found that for string-pattern matching, the design aggressively exploits parallelism by concurrently testing the various patterns. For the sparse-mesh data structure, the implementation is completely memory bound despite a pipelining execution strategy.

CloseRead Abstract

2011

Introduction

Authors
Becker, J; Benoit, P; Cumplido, R; Prasanna, VK; Vaidyanathan, R; Hartenstein, R; Areibi, S; Bampi, S; Bergmann, N; Brebner, G; Buechner, T; Cadenas, O; Campi, F; Carro, L; Chen, N; Cheung, PYK; Dandekar, O; Diessel, O; Jean Philippe,; Diniz, P; Donlin, A; Elgindy, H; Fahmy, S; Glesner, M; Gogniat, G; Gu, Y; Guccione, S; Hariyama, M; Heinkel, U; Herkersdorf, A; Hochberger, C; Hollstein, T; Heubner, M; Jones, A; Katkoori, S; Koch, A; Kress, R; Krupnova, H; Lagadec, L; Lauwereins, R; Leong, P; Lysaght, P; Marnane, L; Mesquita, D; Moraes, F; Moreno, M; Morra, C; Morris, J; Mukherjee, A; Nakano, K; Nunez Yanez, J; Ors, B; Ou, J; Pardo, F; Parthasarathi, R; Patterson, C; Paulsson, K; Pionteck, T; Platzner, M; Pottier, B; Reis, R; Santambrogio, M; Sass, R; Sassatelli, G; Schaumont, P; Schmeck, H; Sezer, S; Smit, G; So, H; Sutter, G; Tanougast, C; Teich, J; Tessier, R; Thomas, D; Torres, L; Trahan, J; Torresen, J; Valderrama, C; Vanderbauwhede, W; Vasilko, M; Veale, B; Vorbach, M; Waldschmidt, K; Wehn, N;

Publication
IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum

Abstract

2011

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Introduction

Authors
Sato, M; Barthou, D; Diniz, PC; Saddayapan, P;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This topic deals with architecture design and and compilation for high performance systems. The areas of interest range from microprocessors to large-scale parallel machines; from general-purpose platforms to specialized hardware; and from hardware design to compiler technology. On the compilation side, topics of interest include programmer productivity issues, concurrent and/or sequential language aspects, program analysis, program transformation, automatic discovery and/or management of parallelism at all levels, and the interaction between the compiler and the rest of the system. On the architecture side, the scope spans system architectures, processor micro-architecture, memory hierarchy, and multi-threading, and the impact of emerging trends. © 2011 Springer-Verlag.

CloseRead Abstract

2005

A register allocation algorithm in the presence of scalar replacement for fine-grain configurable architectures

Authors
Baradaran, N; Diniz, PC;

Publication
Proceedings -Design, Automation and Test in Europe, DATE '05

Abstract
The aggressive application of scalar replacement to array references substantially reduces the number of memory operations at the expense of a possibly very large number of registers. In this paper we describe a register allocation algorithm that assigns registers to scalar replaced array references along the critical paths of a computation, in many cases exploiting the opportunity for concurrent memory accesses. Experimental results, for a set of image/signal processing code kernels, reveal that the proposed algorithm leads to a substantial reduction of the number of execution cycles for the corresponding hardware implementation on a contemporary Field-Programmable-Gate- Array (FPGA) when compared to other greedy allocation algorithms, in some cases, using even fewer number of registers.

CloseRead Abstract

2005

Extending the applicability of scalar replacement to multiple induction variables

Authors
Baradaran, N; Diniz, PC; Park, J;

Publication
Lecture Notes in Computer Science

Abstract
Scalar replacement or register promotion uses scalar variables to save data that can be reused across loop iterations, leading to a reduction of the number of memory operations at the expense of a possibly large number of registers. In this paper we present a compiler data reuse analysis capable of uncovering and exploiting reuse opportunities for array references that exhibit Multiple-Induction-Variable (MIV) subscripts, beyond the reach of current data reuse analysis techniques. We present experimental results of the application of scalar replacement to a sample set of kernel codes targeting a programmable hardware computing device - a Field-Programmable-Gate-Array (FPGA). The results show that, for memory bound designs, scalar replacement alone leads to speedups that range between 2x to 6x at the expense of an increase in the FPGA design area in the range of 6x to 20x. © Springer-Verlag Berlin Heidelberg 2005.

CloseRead Abstract