2010
Authors
Diniz, PC; Danelutto, M; Barthou, D; Gonzales, M; Hübner, M;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
This topic deals with architecture design and compilation for high performance systems. The areas of interest range from microprocessors to large-scale parallel machines; from general-purpose platforms to specialized hardware (e.g., graphic coprocessors, low-power embedded systems); and from hardware design to compiler technology. On the compilation side, topics of interest include programmer productivity issues, concurrent and/or sequential language aspects, program analysis, transformation, automatic discovery and/or management of parallelism at all levels, and the interaction between the compiler and the rest of the system. On the architecture side, the scope spans system architectures, processor micro-architecture, memory hierarchy, and multi-threading, and the impact of emerging trends. All the papers submitted to this track highlight the growing significance of Chip Multi-Processors (CMP) and Simultaneous Multi- Threaded (SMT) processors in contemporary high-performance architectures. © 2010 Springer-Verlag.
2011
Authors
Park, J; Diniz, PC;
Publication
IEEE Design and Test of Computers
Abstract
The applicability of FPGAs as memory engines capable of sophisticated and customizable memory traversal, selection, and relocation operations common in scientific computations involving large, pointer-based data structures, is explored. The experimental work illustrates that, despite their clock speed handicap, FPGAs can be successfully integrated with traditional architectures. The nonleaf nodes, or internal nodes, can have up to four children nodes, either internal nodes or simple leaf nodes. A hardware design supporting the execution of computations that traverse data structures must also support relocation. It is found that for string-pattern matching, the design aggressively exploits parallelism by concurrently testing the various patterns. For the sparse-mesh data structure, the implementation is completely memory bound despite a pipelining execution strategy.
2011
Authors
Becker, J; Benoit, P; Cumplido, R; Prasanna, VK; Vaidyanathan, R; Hartenstein, R; Areibi, S; Bampi, S; Bergmann, N; Brebner, G; Buechner, T; Cadenas, O; Campi, F; Carro, L; Chen, N; Cheung, PYK; Dandekar, O; Diessel, O; Jean Philippe,; Diniz, P; Donlin, A; Elgindy, H; Fahmy, S; Glesner, M; Gogniat, G; Gu, Y; Guccione, S; Hariyama, M; Heinkel, U; Herkersdorf, A; Hochberger, C; Hollstein, T; Heubner, M; Jones, A; Katkoori, S; Koch, A; Kress, R; Krupnova, H; Lagadec, L; Lauwereins, R; Leong, P; Lysaght, P; Marnane, L; Mesquita, D; Moraes, F; Moreno, M; Morra, C; Morris, J; Mukherjee, A; Nakano, K; Nunez Yanez, J; Ors, B; Ou, J; Pardo, F; Parthasarathi, R; Patterson, C; Paulsson, K; Pionteck, T; Platzner, M; Pottier, B; Reis, R; Santambrogio, M; Sass, R; Sassatelli, G; Schaumont, P; Schmeck, H; Sezer, S; Smit, G; So, H; Sutter, G; Tanougast, C; Teich, J; Tessier, R; Thomas, D; Torres, L; Trahan, J; Torresen, J; Valderrama, C; Vanderbauwhede, W; Vasilko, M; Veale, B; Vorbach, M; Waldschmidt, K; Wehn, N;
Publication
IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum
Abstract
2011
Authors
Sato, M; Barthou, D; Diniz, PC; Saddayapan, P;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
This topic deals with architecture design and and compilation for high performance systems. The areas of interest range from microprocessors to large-scale parallel machines; from general-purpose platforms to specialized hardware; and from hardware design to compiler technology. On the compilation side, topics of interest include programmer productivity issues, concurrent and/or sequential language aspects, program analysis, program transformation, automatic discovery and/or management of parallelism at all levels, and the interaction between the compiler and the rest of the system. On the architecture side, the scope spans system architectures, processor micro-architecture, memory hierarchy, and multi-threading, and the impact of emerging trends. © 2011 Springer-Verlag.
2005
Authors
Baradaran, N; Diniz, PC;
Publication
Proceedings -Design, Automation and Test in Europe, DATE '05
Abstract
The aggressive application of scalar replacement to array references substantially reduces the number of memory operations at the expense of a possibly very large number of registers. In this paper we describe a register allocation algorithm that assigns registers to scalar replaced array references along the critical paths of a computation, in many cases exploiting the opportunity for concurrent memory accesses. Experimental results, for a set of image/signal processing code kernels, reveal that the proposed algorithm leads to a substantial reduction of the number of execution cycles for the corresponding hardware implementation on a contemporary Field-Programmable-Gate- Array (FPGA) when compared to other greedy allocation algorithms, in some cases, using even fewer number of registers.
2005
Authors
Baradaran, N; Diniz, PC; Park, J;
Publication
Lecture Notes in Computer Science
Abstract
Scalar replacement or register promotion uses scalar variables to save data that can be reused across loop iterations, leading to a reduction of the number of memory operations at the expense of a possibly large number of registers. In this paper we present a compiler data reuse analysis capable of uncovering and exploiting reuse opportunities for array references that exhibit Multiple-Induction-Variable (MIV) subscripts, beyond the reach of current data reuse analysis techniques. We present experimental results of the application of scalar replacement to a sample set of kernel codes targeting a programmable hardware computing device - a Field-Programmable-Gate-Array (FPGA). The results show that, for memory bound designs, scalar replacement alone leads to speedups that range between 2x to 6x at the expense of an increase in the FPGA design area in the range of 6x to 20x. © Springer-Verlag Berlin Heidelberg 2005.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.