2003
Authors
Diniz, PC; Park, J;
Publication
IEEE Symposium on FPGAs for Custom Computing Machines, Proceedings
Abstract
FPGAs (field programmable gate arrays) have appealing features such as customizable internal and external bandwidth and the ability to exploit vast amounts of fine-grain parallelism. In this paper, we explore the applicability of these features in using FPGAs as smart memory engines for search and reorganization computations over spatial pointer-based data structures. The experimental results in this paper suggests that reconfigurable logic, when combined with the data reorganization, can lead to dramatic performance improvements of up to 20x over traditional computer architectures for pointer-based computations, traditionally not viewed as a good match for reconfigurable technologies. © 2003 IEEE.
2000
Authors
Diniz, P; Park, J;
Publication
IEEE Symposium on FPGAs for Custom Computing Machines, Proceedings
Abstract
Mapping computations written in high-level programming languages to FPGA-based computing engines requires programmers to create the datapath responsible for the core of the computation as well as the control structures to generate the appropriate signals to orchestrate its execution. This paper addresses the issue of automatic generation of data storage and control structures for FPGA-based reconfigurable computing engines using existing compiler data dependence analysis techniques. We describe a set of parameterizable data storage and control structures used as the target of our prototype compiler. We present a compiler analysis algorithm to derive the parameters of the data storage structures to minimize the required memory bandwidth of the implementation. We also describe a complete compilation scheme for mapping loops that manipulate multi-dimensional array variables to hardware. We present preliminary simulation results for complete designs generated manually using the results of the compiler analysis. These preliminary, results show that is possible to successfully integrate compiler data dependence analysis with existing commercial synthesis tools. © 2000 IEEE.
2003
Authors
Diniz, P; Hall, M; Park, J; So, B; Ziegler, H;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
The DEFACTO project - a Design Environment For Adaptive Computing Technology - is a system that maps computations, expressed in high-level languages such as C, directly onto FPGA-based computing platforms. Major challenges are the inherent flexibility of FPGA hardware, capacity and timing constraints of the target FPGA devices, and accompanying speed-area trade-offs. To address these, DE-FACTO combines parallelizing compiler technology with behavioral VHDL synthesis tools, obtaining the complementary advantages of the compiler's high-level analyses and transformations and synthesis' binding, allocation and scheduling of low-level hardware resources. To guide the compiler in the search of a good solution, we introduce the notion of balance between the rates at which data is fetched from memory and accessed by the computation, combined with estimation from behavioral synthesis. Since FPGA-based designs offer the potential for optimizing memory-related operations, we have also incorporated the ability to exploit parallel memory accesses and customize memory access protocols into the compiler analysis. © Springer-Verlag Berlin Heidelberg 2003.
2003
Authors
Shesha Shayee, KR; Park, J; Diniz, PC;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
Selecting which program transformations to apply when mapping computations to FPGA-based architectures leads to prohibitively long design exploration cycles. An alternative is to develop fast, yet accurate, performance and area models to understand the impact and interaction of the transformations. In this paper we present a combined analytical performance and area modeling for complete FPGA designs in the presence of loop transformations. Our approach takes into account the impact of input/output memory bandwidth and memory interface resources, often the limiting factor in the effective implementation of these computations. Our preliminary results reveal that our modeling is very accurate allowing a compiler tool to quickly explore a very large design space resulting in the selection of a feasible high-performance design. © Springer-Verlag Berlin Heidelberg 2003.
2003
Authors
Diniz, PC;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
Performance understanding and prediction are extremely important goals for guiding the application of program optimizations or in helping programmers focus their efforts when tuning their applications. In this paper we survey current approaches in performance understanding and modeling for high-performance scientific applications. We also describe a performance modeling and prediction approach that relies on the synergistic collaboration of compiler analysis, compiler-generated instrumentation (to observe relevant run-time input values) and multi-model performance modeling. A compiler analyzes the source code to derive a discrete set of parameterizable performance models. The models use run-time data to define the values of their parameters. This approach, we believe, will allow for higher performance modeling accuracy and more importantly to more precise identification of what the causes of performance problems are. © Springer-Verlag Berlin Heidelberg 2003.
2003
Authors
Park, J; Diniz, PC;
Publication
IEEE Symposium on FPGAs for Custom Computing Machines, Proceedings
Abstract
As the densities of current FPGA continue to grow it is now possible to generate System-On-a-Chip (SoC) designs where multiple computing cores are connected to various memory modules with customized topology with application specific memory access patterns. For example, Xilinx has recently introduced devices to which a paired down version of a PowerPC core can be mapped and connected to a set of internal memories. In this paper we address the problem of synthesizing and estimating the area and speed of memory interfacing for Static RAM (SRAM) and Synchronous Dynamic RAM (SDRAM) with various latency parameters and access modes. We describe a set of synthesizable and programmable memory interfaces a compiler can use to automatically generate the appropriate designs for mapping computations to FPGA-based architectures. Our preliminary results reveal that it is possible to accurately model the area and timing requirements using a linear estimation function. We have successfully integrated the proposed memory interface designs with simple image processing kernels generated using commercially available behavioral synthesis tools. © 2003 IEEE.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.