Publications

Publications by João Paiva Cardoso

2006

A benchmark approach for compilers in reconfigurable hardware

Authors
Lopes, JJ; Silva, JLE; Marques, E; Cardoso, JMP;

Publication
6TH INTERNATIONAL WORKSHOP ON SYSTEM-ON-CHIP FOR REAL-TIME APPLICATIONS, PROCEEDINGS

Abstract
High-performance FPGA accelerating software applications are a growing demand in fields as communications, image processing, and scientific computing among others. Moreover, as the cost per gate of FPGAs declines, embedded and high-performance systems designers are being presented with new opportunities for creating accelerated software applications using FPGA-based programmable hardware platforms. Powerful high-level language to RTL generators are now emerging. One of the promises of these tools is to allow software and systems engineers to implement algorithms quickly in a familiar language and target the design to a programmable device. The generators available today support syntaxes with different degrees of fidelity to the original language. This paper focuses on the efficient use of C to RTL generators that have a high degree of fidelity to the original C language. The objective of this project is to study some tools that starting from languages of high level as ANSI-C, and generate FPGA accelerating software applications automatically. In this paper are presented tools and partial results of the hardware generated by them.

CloseRead Abstract

2007

A polynomial placement algorithm for data driven coarse-grained reconfigurable architectures

Authors
Ferreira, R; Garcia, A; Teixeira, T; Cardoso, JMP;

Publication
IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, PROCEEDINGS: EMERGING VLSI TECHNOLOGIES AND ARCHITECTURES

Abstract
Coarse-grained reconfigurable computing architectures vary widely in the number and characteristics of the processing elements (cells) and routing topologies used. In order to exploit several different topologies, a place and route framework, able to deal with such vast design exploration space, is of paramount importance. Bearing this in mind, this paper proposes a placement scheme able to target different topologies when considering data-driven reconfigurable architectures. Our approach uses graph models for the target architecture and for the dataflow representation of the application being mapped. Our placement algorithm is guided by a Depth-First Traversal in both the architecture and the application graphs. Two versions of the placement algorithm with respectively O(e) and O(e + n(3)) computational complexities are presented, where e is the number of edges in the dataflow representation of the application and n is the number of cells in the graph model of the architecture. The achieved experimental results show that our approach can be useful to exploit different interconnect topologies as far as coarse-grained reconfigurable computing architectures are concerned.

CloseRead Abstract

2007

Using Rewriting Logic to Match Patterns of Instructions from a Compiler Intermediate Form to Coarse-Grained Processing Elements

Authors
Morra, C; Cardoso, JMP; Becker, J;

Publication
21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Proceedings, 26-30 March 2007, Long Beach, California, USA

Abstract
This paper presents a new and retargetable method to identify patterns of instructions with direct support in coarsegrained processing elements (PEs). The method uses a three-address code SSA (static single assignment) representation of the kernel being mapped and Rewriting Logic for template matching and algebraic optimizations. This approach is able to identify sets of SSA instructions that can be mapped to different PE complexities available in coarsegrained reconfigurable computing architectures. As a proof of concept, results of the approach with a number of benchmark kernels, as far as coverage of template instructions is concerned, are included. © 2007 IEEE.

CloseRead Abstract

2006

A methodology to design FPGA-based PID controllers

Authors
Lima, J; Menotti, R; Cardoso, JMP; Marques, E;

Publication
2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS

Abstract
This paper presents a methodology to implement PID (Proportional, Integral, Derivative) controllers in FPGAs (Field-Programmable Gate Arrays) using fixed-point numerical representation. The Matlab/Simulink environment is used for modeling, simulation and evaluation the performance provided by different fixed-point representations using a given control process. A static bit-width analyzer is used to give a specialized fixed-point representation for each operand/operator in the controller system. After bit-width analysis, a VHDL representation of the system is generated. Results show that the proposed methodology leads to shorten design cycles achieving important resource savings by employing specialized fixed-point representations.

CloseRead Abstract

2006

Regular expression matching for reconfigurable packet inspection

Authors
Bispo, J; Sourdis, L; Cardoso, JMP; Vassiliadis, S;

Publication
2006 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS

Abstract
Recent intrusion detection systems (IDS) use regular expressions instead of static patterns as a more efficient way to represent hazardous packet payload contents. This paper focuses on regular expressions pattern matching engines implemented in reconfigurable hardware. We present a Nondeterministic Finite Automata (NFA) based implementation, which takes advantage of new basic building blocks to support more complex regular expressions than the previous approaches. Our methodology is supported by a tool that automatically generates the circuitry for the given regular expressions, outputting VHDL representations ready for logic synthesis. Furthermore, we include techniques to reduce the area cost of our designs and maximize performance when targeting FPGAs. Experimental results show that our tool is able to generate a regular expression engine to match more than 500 IDS regular expressions (from the Snort ruleset) using only 25K logic cells and achieving 2 Gbps throughput on a Virtex2 and 2.9 on a Virtex4 device. Concerning the throughput per area required per matching non-Meta character, our design is 3.4 and 10x more efficient than previous ASIC and FPGA approaches, respectively.

CloseRead Abstract

2007

A data-driven approach for pipelining sequences of data-dependent loops

Authors
Rodrigues, R; Cardoso, JMP; Diniz, PC;

Publication
FCCM 2007: 15TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS

Abstract
Many video and image/signal processing applications can be structured as sequences of data-dependent tasks using a consumer/producer communication paradigm and are therefore amenable to pipelined execution. This paper presents an execution technique to speed-up the overall execution of successive, data-dependent tasks on a reconfigurahle architecture. The technique pipelines sequences of data-dependent tasks by overlapping their execution subject to data-dependences. It decouples the concurrent data-path and control units and uses a custom, application data-driven, fine-grained synchronization and buffering scheme. In addition, the execution scheme allows for out of-order, but data-dependent producer-consumer pairs not allowed by previous data-driven pipelining approaches. The approach has been exploited in the context of a high-level compiler targeting FPGAs. The preliminary experimental results reveal noticeable performance improvements and buffer size reductions for a number of benchmarks over traditional approaches.

CloseRead Abstract