Publications

Publications by HumanISE

2013

An FPGA-based multi-core approach for pipelining computing stages

Authors
Azarian, A; Cardoso, JMP; Werner, S; Becker, J;

Publication
Proceedings of the ACM Symposium on Applied Computing

Abstract
In recent years, there has been increasing interest on using task-level pipelining to accelerate the overall execution of applications mainly consisting of Producer-Consumer tasks. This paper proposes an approach to achieve pipelining execution of Producer-Consumer pairs of tasks in FPGA-based multi-core architectures. Our approach is able to speedup the overall execution of successive, data-dependent tasks, by using multiple cores and specific customization features provided by FPGAs. An important component of our approach is the use of customized inter-stage buffer schemes to communicate data and to synchronize the cores associated to the Producer-Consumer tasks. In order to improve performance, we propose a technique to optimize out-of-order Producer-Consumer pairs where the consumer uses more than once each data element produced, a behavior present in many applications (e.g., in image processing). All the schemes and optimizations proposed in this paper were evaluated with FPGA implementations. The experimental results show the feasibility of the approach in both in-order and out-of-order Producer-Consumer tasks. Furthermore, the results using our approach to task-level pipelining and a multi-core architecture reveal noticeable performance improvements for a number of benchmarks over a single core implementation without using task-level pipelining. Copyright 2013 ACM.

CloseRead Abstract

2013

The MATISSE MATLAB Compiler

Authors
Bispo, J; Pinto, P; Nobre, R; Carvalho, T; Cardoso, JMP; Diniz, PC;

Publication
2013 11TH IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN)

Abstract
This paper describes MATISSE, a MATLAB to C compiler targeting embedded systems that is based on Strategic and Aspect-Oriented Programming concepts. MATISSE takes as input: (1) MATLAB code and (2) LARA aspects related to types and shapes, code insertion/ removal, and specialization based directives defining default variable values. In this paper we also illustrate the use of MATISSE in leveraging data types and shapes to generate customized C code suitable for high-level hardware synthesis tools. The preliminary experimental results presented here reveal the described approach to yield performance results for the resulting hardware and software references implementations that are comparable in terms of performance with hand-crafted solutions but derived automatically at a fraction of the cost.

CloseRead Abstract

2013

Controlling a complete hardware synthesis toolchain with LARA aspects

Authors
Cardoso, JMP; Carvalho, T; Coutinho, JGF; Nobre, R; Nane, R; Diniz, PC; Petrov, Z; Luk, W; Bertels, K;

Publication
MICROPROCESSORS AND MICROSYSTEMS

Abstract
The synthesis and mapping of applications to configurable embedded systems is a notoriously complex process. Design-flows typically include tools that have a wide range of parameters which interact in very unpredictable ways, thus creating a large and complex design space. When exploring this space, designers must manage the interfaces between different tools and apply, often manually, a sequence of tool-specific transformations making design exploration extremely cumbersome and error-prone. This paper describes the use of techniques inspired by aspect-oriented technology and scripting languages for defining and exploring hardware compilation strategies. In particular, our approach allows developers to control all stages of a hardware/software compilation and synthesis toolchain: from code transformations and compiler optimizations to placement and routing for tuning the performance of application kernels. Our approach takes advantage of an integrated framework which provides a transparent and unified view over toolchains, their data output and the control of their execution. We illustrate the use of our approach when designing application-specific hardware architectures generated by a toolchain composed of high-level source-code transformation and synthesis tools. The results show the impact of various strategies when targeting custom hardware and expose the complexities in devising these strategies, hence highlighting the productivity benefits of this approach.

CloseRead Abstract

2013

Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units

Authors
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;

Publication
International Journal of Reconfigurable Computing

Abstract
The ability to map instructions running in a microprocessor to a reconfigurable processing unit (RPU), acting as a coprocessor, enables the runtime acceleration of applications and ensures code and possibly performance portability. In this work, we focus on the mapping of loop-based instruction traces (called Megablocks) to RPUs. The proposed approach considers offline partitioning and mapping stages without ignoring their future runtime applicability. We present a toolchain that automatically extracts specific trace-based loops, called Megablocks, from MicroBlaze instruction traces and generates an RPU for executing those loops. Our hardware infrastructure is able to move loop execution from the microprocessor to the RPU transparently, at runtime, and without changing the executable binaries. The toolchain and the system are fully operational. Three FPGA implementations of the system, differing in the hardware interfaces used, were tested and evaluated with a set of 15 application kernels. Speedups ranging from 1.26 × to 3.69 × were achieved for the best alternative using a MicroBlaze processor with local memory. © 2013 João Bispo et al.

CloseRead Abstract

2013

Transparent Trace-Based Binary Acceleration for Reconfigurable HW/SW Systems

Authors
Bispo, J; Paulino, N; Cardoso, JMP; Ferreira, JC;

Publication
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS

Abstract
This paper presents a novel approach to accelerate program execution by mapping repetitive traces of executed instructions, called Megablocks, to a runtime reconfigurable array of functional units. An offline tool suite extracts Megablocks from microprocessor instruction traces and generates a Reconfigurable Processing Unit (RPU) tailored for the execution of those Megablocks. The system is able to transparently movebcomputations from the microprocessor to the RPU at runtime. A prototype implementation of the system using a cacheless MicroBlaze microprocessor running code located in external memory reaches speedups from 2.2x to 18.2x for a set of 14 benchmark kernels. For a system setup which maximizes microprocessor performance by having the application code located in internal block RAMs, speedups from 1.4x to 2.8x were estimated.

CloseRead Abstract

2013

The REFLECT design-flow

Authors
Cardoso, JMP; De F. Coutinho, JG; Nane, R; Sima, VM; Olivier, B; Carvalho, T; Nobre, R; Diniz, PC; Petrov, Z; Bertels, K; Gonçalves, F; Van Someren, H; Hübner, M; Constantinides, G; Luk, W; Becker, J; Krátký, K; Bhattacharya, S; Alves, JC; Ferreira, JC;

Publication
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach

Abstract
This chapter describes the design-flow approach developed in the REFLECT project as presented originally in [1]. Over the course of the project, this design-flow has evolved and has been extended into a fully operational toolchain. We begin by presenting an overview of the underlying aspect-oriented compilation flow followed by an extended description of the design-flow and its toolchain. © Springer Science+Business Media New York 2013. All rights are reserved.

CloseRead Abstract