Publications

Publications by João Paiva Cardoso

2011

Reconfigurable Computing

Authors
Cardoso, JMP; Hübner, M;

Publication

Abstract

2001

Compilation increasing the scheduling scope for multi-memory-FPGA-based custom computing machines

Authors
Cardoso, JMP; Neto, HC;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
This paper presents new achievements on the automatic mapping of abstract algorithms, written in imperative software programming languages, to custom computing machines. The reconfigurable hardware element of the target architecture consists of one field-programmable gate array coupled with one or more memories. The compilation flow exposes operation- and functional-level parallelism, and speculative execution. Such expositions are efficiently represented in a hierarchical model. In order to take full advantage of such representation, the scheduling scope is significantly improved by merging basic blocks at loop boundaries and by considering the parallel execution of exposed concurrent loops. The paper describes the scheduling technique, shows a study on the impact of the merge operation, and reveals the improvements achieved when the exposed parallelism is fully satisfied. © Springer-Verlag Berlin Heidelberg 2001.

CloseRead Abstract

2009

On Simplifying Placement and Routing by Extending Coarse-Grained Reconfigurable Arrays with Omega Networks

Authors
Ferreira, R; Damiany, A; Vendramini, J; Teixeira, T; Cardoso, JMP;

Publication
RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS

Abstract
Most reconfigurable computing architectures suffer from computationally demanding Placement and Routing (P&R) steps which might hamper their use in contexts requiring dynamic compilation (e.g., to guarantee application portability in embedded systems). Bearing in mind the simplification of P&R steps, this paper presents and analyzes a coarse-grained reconfigurable array extended with global Omega Networks. We show that integrating one or two Omega Networks in a coarse-grained array simplifies the P&R stage with both low hardware resource overhead and low performance degradation (18% for an 8 x 8 array). The experimental results included permit to compare the coarse-grained array with one or two Omega Networks with a coarse-grained array based on a grid of processing elements with neighbor connections. When comparing the execution time to perform the P&R stage needed for the two arrays, we show that the array using two Omega Networks needs a far simple P&R which for the benchmarks used completed on average in about 20x less time.

CloseRead Abstract

2011

Fast placement and routing by extending coarse-grained reconfigurable arrays with Omega Networks

Authors
Ferreira, RS; Cardoso, JMP; Damiany, A; Vendramini, J; Teixeira, T;

Publication
JOURNAL OF SYSTEMS ARCHITECTURE

Abstract
Reconfigurable computing architectures are commonly used for accelerating applications and/or for achieving energy savings. However, most reconfigurable computing architectures suffer from computationally demanding placement and routing (P&R) steps. This problem may disable their use in systems requiring dynamic compilation (e.g., to guarantee application portability in embedded systems). Bearing in mind the simplification of P&R steps, this paper presents and analyzes a coarse-grained reconfigurable array (CGRA) extended with global multistage interconnect networks, specifically Omega Networks. We show that integrating one or two Omega Networks in a CGRA permits to simplify the P&R stage resulting in both low hardware resource overhead and low performance degradation (18% for an 8 x 8 array). We compare the proposed CGRA, which integrates one or two Omega Networks, with a CGRA based on a grid of processing elements with reach neighbor interconnections and with a torus topology. The execution time needed to perform the P&R stage for the two array architectures shows that the array using two Omega Networks needs a far simpler and faster P&R. The P&R stage in our approach completed on average in about 16 x less time for the 17 benchmarks used. Similar fast approaches needed CGRAs with more complex interconnect resources in order to allow most of the benchmarks used to be successfully placed and routed.

CloseRead Abstract

2011

Selected papers from the 17th reconfigurable architectures workshop (RAW2010)

Authors
Dasu, A; Cardoso, JMP; Bozorgzadeh, E; Becker, J;

Publication
International Journal of Reconfigurable Computing

Abstract

2012

Experiments with the LARA aspect-oriented approach

Authors
Coutinho, JGF; Carvalho, T; Durand, S; Cardoso, JMP; Nobre, R; Diniz, PC; Luk, W;

Publication
AOSD'12 Companion - Proceedings of the 11th Annual International Conference on Aspect Oriented Software Development

Abstract
This demonstration presents a novel design-flow and aspect-oriented language called LARA [1], which is currently used to guide the mapping of high-level C application codes to heterogeneous high-performance embedded systems. In particular, LARA is capable of capturing complex strategies and schemes involving: hardware/software partitioning, code specialization, source code transformations and code instrumentation. A key element of LARA, and a distinguishing feature from existing approaches, is its ability to support the specification of non-functional requirements and user knowledge in a non-invasive way in the exploration of suitable implementations. The design-flow incorporates several tools, such as a LARA frontend, a hardware/software partitioning tool, an aspect weaver, cost estimator, and a source-level transformation engine. All these components can be coordinated as part of an elaborate application mapping strategy using LARA. In this demonstration, we illustrate how non-functional cross-cutting concerns such as runtime monitorization and performance are codified and described in LARA and how the weaving process affects selected applications. Furthermore, we also explain how third-party tools, such as gprof, can be incorporated into the design-flow and aspect description, for instance, to affect the hardware/software partitioning process. We demonstrate how LARA can be used to extract run-time information, such as range values of variables, and can control code transformations and compiler optimizations addressing customized implementations of the corresponding computations on FPGAs. © 2012 ACM.

CloseRead Abstract