Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by Pedro Diniz

2008

A Compiler Approach to Managing Storage and Memory Bandwidth in Configurable Architectures

Authors
Baradaran, N; Diniz, PC;

Publication
ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS

Abstract
Configurable architectures offer the unique opportunity of realizing hardware designs tailored to the specific data and computational patterns of an application code. Customizing the storage structures is becoming increasingly important in mitigating the continuing gap between memory latencies and internal computing speeds. In this article we describe and evaluate a compiler algorithm that maps the arrays of a loop-based computation to internal storage structures, either RAM blocks or discrete registers. Our objective is to minimize the overall execution time while considering the capacity and bandwidth constraints of the storage resources. The novelty of our approach lies in creating a single framework that combines high-level compiler techniques with lower-level scheduling information for mapping the data. We illustrate the benefits of our approach for a set of image/signal processing kernels using a Xilinx VirtexTM Field-Programmable Gate Array (FPGA). Our algorithm leads to faster designs compared to the state-of-the-art custom data layout mapping technique, in some instances using less storage. When compared to hand-coded designs, our results are comparable in terms of execution time and resources, but are derived in a minute fraction of the design time.

2009

Guest Editorial

Authors
Diniz, PC;

Publication
Concurrency and Computation: Practice and Experience - Concurrency Computat.: Pract. Exper.

Abstract

2009

Special Issue Compilers for Parallel Computers 2007 Workshop (CPC 2007)

Authors
Diniz, PC;

Publication
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Abstract

2011

Domain-Specific Optimization of Signal Recognition Targeting FPGAs

Authors
Demertzi, M; Diniz, PC; Hall, MW; Gilbert, AC; Wang, Y;

Publication
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS

Abstract
Domain-specific optimizations on matrix computations exploiting specific arithmetic and matrix representation formats have achieved significant performance/area gains in Field-Programmable Gate Array (FPGA) hardware designs. In this article, we explore the application of data-driven optimizations to reduce both storage and computation requirements to the problem of signal recognition from a known dictionary. By starting with a high-level mathematical representation of a signal recognition problem, we perform optimizations across the layers of the system, exploiting mathematical structure to improve implementation efficiency. Specifically, we use Walsh wavelet packets in conjunction with a BestBasis algorithm to distinguish between spoken digits. The resulting transform matrices are quite sparse, and exhibit a rich algebraic structure that contains significant overlap across rows. As a consequence, dot-product computations of the transform matrix and signal vectors exhibit significant computation reuse, or repeated identical computations. We present an algorithm for identifying this computation reuse and scheduling of the row computations. We exploit this reuse to derive FPGA hardware implementations that reduce the amount of computation for an individual matrix by as much as 6.35x and an average of 2x for a single dot-product unit. The implementation that exploits reuse achieves a 2x computation reduction compared to three concurrently-executing simpler accumulator units with the same aggregate design area and outperforms software implementations on high-end desktop personal computers.

2012

Resource-Efficient Designs using an Aspect-Oriented Approach

Authors
Coutinho, JGF; Bhattacharya, S; Luk, W; Constantinides, GA; Cardoso, JMP; Carvalho, T; Diniz, PC; Petrov, Z;

Publication
15TH IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2012) / 10TH IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC 2012)

Abstract
The increasing capability and flexibility of reconfigurable hardware, such as Field-Programmable Gate Arrays (FPGAs), give developers a wide range of architectural choices that can satisfy various non-functional requirements, such as those involving performance, resource and energy efficiency. This paper describes a novel approach, based on an aspect-oriented language called LARA, that enables systematic coding and reuse of optimisation strategies that address such non-functional requirements. Our approach will be presented in three steps. First, this approach is shown to support design space exploration (DSE) which makes use of various compilation and optimisation tools, through the deployment of a master weaver and multiple slave weavers. Second, we present three compilation and synthesis strategies for word-length optimisation based on this approach, which involve three tools: the WLOT word-length optimiser deploying a combination of analytical methods; the AutoESL tool compiling C-based descriptions into hardware; and the ISE tool targeting Xilinx devices. Third, the effectiveness of the approach is evaluated. In addition to promoting design re-use, our approach can be used to automatically produce a range of designs with different trade-offs in resource usage and numerical accuracy according to a given LARA-based strategy. For example, one implementation for a subband filter in an MPEG encoder provides 31% savings in area using non-uniform quantizers when compared to a floating-point description with a similar error specification at the output. Another fixed-point implementation for the gridIterate kernel used by a 3D path planning application consumed 25% less resources when the error specification is increased from 1e-6 to 1e-4.

2012

Controlling Hardware Synthesis with Aspects

Authors
Cardoso, JMP; Carvalho, T; Coutinho, JGF; Diniz, PC; Petrov, Z; Luk, W;

Publication
15th Euromicro Conference on Digital System Design, DSD 2012, Cesme, Izmir, Turkey, September 5-8, 2012

Abstract
The synthesis and mapping of applications to configurable embedded systems is a notoriously hard process. Tools have a wide range of parameters, which interact in very unpredictable ways, thus creating a large and complex design space. When exploring this space, designers must understand the interfaces to the various tools and apply, often manually, a sequence of tool-specific transformations making this an extremely cumbersome and error-prone process. This paper describes the use of aspect-oriented techniques for capturing synthesis strategies for tuning the performance of applications' kernels. We illustrate the use of this approach when designing application-specific architectures generated by a high-level synthesis tool. The results highlight the impact of the various strategies when targeting custom hardware and expose the difficulties in devising these strategies. © 2012 IEEE.

  • 20
  • 24