2001
Authors
Park, J; Diniz, PC;
Publication
Proceedings of the International Symposium on System Synthesis
Abstract
Commercially available behavioral synthesis tools do not adequately support FPGA vendor-specific external memory interfaces making it extremely difficult to exploit pipelined memory access modes as well as application specific memory operations scheduling critical for high-performance solutions. This lack of support substantially increases the complexity and the burden on designers in the mapping of applications to FPGA- based computing engines. In this paper we address the problem of external memory interfacing and aggressive scheduling of memory operations by proposing a decoupled architecture with two components - one component captures the specific target architecture timing while the other component uses application specific memory access pattern information. Our results support the claim that it is possible to exploit application specific information and integrate that knowledge into custom schedulers that mix pipelined and non-pipelined access modes aimed at reducing the overhead associated with external memory accesses. The results also reveal that the additional design complexity of the scheduler, and its impact in the overall design is minimal.
2002
Authors
Diniz, PC; Park, J;
Publication
ACM/SIGDA International Symposium on Field Programmable Gate Arrays - FPGA
Abstract
Field-Programmable-Core-Arrays (FPCA) will include various computing cores for a wide variety of applications ranging from DSP to general purpose computing. With the increasing gap between core computing speeds and memory access latency, managing and orchestrating the movement of data across multiple cores will become increasingly important. In this paper we propose data reorganization engines that allow a wide variety of data reorganizations intra- as well as inter-memory modules for future FPCAs. We have experimented with a suite of data reorganizations pervasive in DSP applications. Our limited set of experiments reveals that the proposed designs for these engines are flexile and use little design area in current FPGA fabrics, making them amenable to be easily integrated in future FPCAs as either soft- or hard- macros.
2002
Authors
So, B; Hall, MW; Diniz, PC;
Publication
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
Abstract
This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.
2007
Authors
Baradaran, N; Diniz, PC;
Publication
IET COMPUTERS AND DIGITAL TECHNIQUES
Abstract
Configurable architectures offer the unique opportunity of customising the storage allocation to meet specific applications' needs. A compiler approach to map the arrays of a loop-based computation to internal memories of a configurable architecture with the objective of minimising the overall execution time is described. An algorithm that considers the data access patterns of the arrays along the critical path of the computation as well as the available storage and memory bandwidth is presented. Experimental results are presented which demonstrate the application of this approach for a set of kernel codes when targeting a field-programmable gate-array. The results reveal that the proposed algorithm outperforms the naive and custom data layout techniques by an average of 33% and 15% in terms of execution time, while taking into account the available hardware resources.
2008
Authors
Park, J; Diniz, PC;
Publication
INTERNATIONAL JOURNAL OF ELECTRONICS
Abstract
Automated compiler analyses and transformation techniques aim at improving design productivity of the mapping process of applications expressed in high-level programming languages to FPGAs. These transformations allow a compiler tool to reduce the number of design cycles and eliminate tedious and error-prone low-level transformations required in this mapping process, while still leading to good designs. Scalar replacement, also known as register promotion, is a very important data-oriented transformation that leads to designs that reduce the number of external memory accesses, and thus reduce execution time, at the expense of storage resource's. In this article we present a combination of loop transformation techniques, namely loop unrolling, loop splitting, and loop interchange with scalar replacement to enable partial data reuse on computations expressed by tightly nested loops pervasive in image processing algorithms. We describe a performance modelling in the presence of partial data reuse. Our experimental results reveal that our model accurately captures the non-trivial execution effects of pipelined implementations in the presence of partial data reuse due to the need to fill-up data buffers. The model thus allows a compiler to explore a large design space with high accuracy, ultimately allowing quickly it to find better designs than designs with limited manual search or brute-force approaches.
2008
Authors
Diniz, PC; Ferreira, DR;
Publication
BUSINESS PROCESS MANAGEMENT
Abstract
Many end users will expect the output of process mining to be a model they can easily understand. On the other hand, knowing which objects were accessed in each operation can be a valuable input for process discovery. From these two trends it is possible to establish an analogy between process mining and the discovery of program structure. In this paper we present an approach for extracting process control-flow from a trace of read and write operations over a set of objects. The approach is divided in two independent phases. In the first phase, Fourier analysis is used to identify periodic behavior that can be represented with loop constructs. In the second phase, a match-and-merge technique is used to produce a control-flow graph capable of generating the input trace and thus representing the process that generated it. The combination of these techniques provides a structured and compact representation of the unknown process, with very good results in terms of conformance metrics.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.