Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Bispo

2012

Hardware pipelining of runtime-detected loops

Authors
Bispo, J; Cardoso, JMP; Monteiro, J;

Publication
25th Symposium on Integrated Circuits and Systems Design, SBCCI 2012, Brasilia, Brazil, August 30 - September 2, 2012

Abstract
Dynamic partitioning is a promising technique where computations are transparently moved from a General Purpose Processor (GPP) to a coprocessor during application execution. To be effective, the mapping of computations to the coprocessor needs to consider aggressive optimizations. One of the mapping optimizations is loop pipelining, a technique extensively studied and known to allow substantial performance improvements. This paper describes a technique for pipelining Megablocks, a type of runtime loop developed for dynamic partitioning. The technique transforms the body of Megablocks into an acyclic dataflow graph which can be fully pipelined and is based on the atomic execution of loop iterations. For a set of 9 benchmarks without memory operations, we generated pipelined hardware versions of the loops and estimate that the presented loop pipelining technique increases the average speedup of non-pipelined coprocessor accelerated designs from 1.6x to 2.2x. For a larger set of 61 benchmarks which include memory operations, the technique achieves a speedup increase from 2.5x to 5.6x. ©2012 IEEE.

2011

Techniques for Dynamically Mapping Computations to Coprocessors

Authors
Bispo, J; Cardoso, JMP;

Publication
2011 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2011, Cancun, Mexico, November 30 - December 2, 2011

Abstract
In embedded reconfigurable computing systems, general purpose processors (GPPs) are typically extended with coprocessors to meet specific goals, such as higher performance and/or energy savings. Coprocessors can range from specialized modules which execute a specific task to reconfigurable arrays of ALUs. This paper presents our ongoing work on techniques to dynamically offload computations being executed by a GPP to a coprocessor. We present our method for identifying repetitive instruction traces, named as Mega blocks, as well as transformations which can be applied over those Mega blocks. We also present a proof-of-concept implementation of a system which transparently moves computations from a GPP to a Specialized Reconfigurable Array (SRA). Finally, we present our current and planned work. © 2011 IEEE.

2010

On Identifying Segments of Traces for Dynamic Compilation

Authors
Bispo, J; Cardoso, JMP;

Publication
International Conference on Field Programmable Logic and Applications, FPL 2010, August 31 2010 - September 2, 2010, Milano, Italy

Abstract
Typical computing systems based on general purpose processors (GPPs) are extended with coarse-grained reconfigurable arrays (CGRAs) to provide higher performance and/or energy savings. In order for applications to take advantage of these computing systems, efficient dynamic mapping techniques are required. Those dynamic mapping techniques will be responsible for automatically moving computations originally running in the GPP to the CGRA. The concept of dynamic compilation, widespread in the context of JIT compilation to GPPs, is receiving more attention by the reconfigurable computing community. This paper presents our approach to dynamically map computations to CGRAs coupled to a GPP. Specifically, we present the identification of large sequences of instructions, MegaBlocks, being executed in a GPP. These MegaBlocks are then mapped to the target CGRA. We evaluate the potential of the MegaBlocks over Basic Blocks and SuperBlocks to increase the IPC when targeting a CGRA and considering the execution of a number of representative benchmarks. © 2010 IEEE.

2010

On identifying and optimizing instruction sequences for dynamic compilation

Authors
Bispo, J; Cardoso, JMP;

Publication
Proceedings of the International Conference on Field-Programmable Technology, FPT 2010, 8-10 December 2010, Tsinghua University, Beijing, China

Abstract
Typical computing systems based on general purpose processors (GPPs) can be extended with coarse-grained reconfigurable arrays (CGRAs) to provide higher performance and/or energy savings. In order for applications to take advantage of these computing systems, possibly including CGRAs varying in size, efficient dynamic compilation/mapping techniques are required. Dynamic mapping will be responsible for automatically moving computations originally running in the GPP to the CGRA. This paper presents our approach to dynamically map computations to CGRAs coupled to a GPP. Specifically, we evaluate the potential of the MegaBlock to accelerate the execution of a number of representative benchmarks when targeting an architecture based on a GPP and a CGRA. In addition, we show the impact on performance when using constant folding and propagation optimizations. © 2010 IEEE.

2008

Combining Rewriting-Logic, Architecture Generation, and Simulation to Exploit Coarse-Grained Reconfigurable Architectures

Authors
Morra, C; Bispo, J; Cardoso, JMP; Becker, J;

Publication
PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES

Abstract

2008

Retargeting, evaluating, and generating reconfigurable array-based architectures

Authors
Morra, C; Cardoso, JMP; Bispo, J; Becker, J;

Publication
2008 SYMPOSIUM ON APPLICATION SPECIFIC PROCESSORS

Abstract
Coarse-grained reconfigurable architectures have proven their value as programmable accelerators for general purpose processors. For early evaluation of those architectures, we need an approach able to exploit and retarget different Processing Elements (PEs) while maintaining the same compilation flow. Bearing in mind those aspects, this paper describes an approach able to map, evaluate and generate reconfigurable architectures based on an array of PEs. We use Rewriting Logic to map computations described by imperative programming languages to the PEs of the target architecture, a VHDL generation step to prototype the architectures being evaluated, and a clock cycle-based simulator to achieve first assessments about the performance of those architectures. In order to show the potential of our approach, we present results of 1-D coarse-grained reconfigurable arrays as accelerator softcores implemented in an FPGA, and the effects of different PE's structures and complexities.

  • 8
  • 13