Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Paiva Cardoso

2016

A Pipelined Multi-softcore Approach for the HOG Algorithm

Authors
Mascagni de Holanda, JAM; Paiva Cardoso, JMP; Marques, E;

Publication
PROCEEDINGS OF THE 2016 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL & IMAGE PROCESSING

Abstract
This paper describes the mapping and the acceleration of an object detection algorithm on a multiprocessor system based on an FPGA. We use HOG ( Histogram of Oriented Gradients), one of the most popular algorithms for detection of different classes of objects and currently being used in smart embedded systems. The use of HOG on such systems requires efficient implementations in order to provide high performance possibly with low energy/power consumption budgets. Also, as variations and adaptations of this algorithm are needed to deal with different scenarios and classes of objects, programmability is required to allow greater development flexibility. In this paper we show our approach towards implementing the HOG algorithm into a multi-softcore Nios II based-system, bearing in mind high-performance and programmability issues. By applying sourceto-source transformations we obtain speedups of 19x and by using pipelined processing we reduce the algorithms execution time 49x. We also show that improving the hardware with acceleration units can result in speedups of 72.4x compared to the embedded baseline application.

2013

An Aspect-Oriented Approach for Designing Safety-Critical Systems

Authors
Petrov, Z; Zaykov, PG; Cardoso, JMP; Coutinho, JGF; Diniz, PC; Luk, W;

Publication
2013 IEEE AEROSPACE CONFERENCE

Abstract
The development of avionics systems is typically a tedious and cumbersome process. In addition to the required functions, developers must consider various and often conflicting non-functional requirements such as safety, performance, and energy efficiency. Certainly, an integrated approach with a seamless design flow that is capable of requirements modelling and supporting refinement down to an actual implementation in a traceable way, may lead to a significant acceleration of development cycles. This paper presents an aspect-oriented approach supported by a toolchain that deals with functional and non-functional requirements in an integrated manner. It also discusses how the approach can be applied to development of safety-critical systems and provides experimental results.

2015

Guest Editorial FPL 2013

Authors
Cardoso, JMP; Diniz, PC; Morrow, K;

Publication
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS

Abstract

2015

A Special-Purpose Language for Implementing Pipelined FPGA-based Accelerators

Authors
de Oliveira, CB; Menotti, R; Cardoso, JMP; Marques, E;

Publication
2015 18th Forum on Specification and Design Languages (FDL)

Abstract
A common use for Field-Programmable Gate Arrays (FPGAs) is the implementation of hardware accelerators. A way of doing so is to specify the internal logic of such accelerators by using Hardware Description Languages (HDLs). However, HDLs rely on the expertise of developers and their knowledge about hardware development with FPGAs. Regarding this, efforts have been focused on developing High-level Synthesis (HLS) tools in an attempt to increase the overall abstraction level required for using FPGAs. However, the solutions presented by such tools are commonly considered inefficient in comparison to the ones achieved by a specialized hardware designer. An alternative solution to program FPGAs is the use of Domain-Specific Languages (DSLs), as they can provide higher abstraction levels than HDLs still allowing the developers to deal with specific issues leading to more efficient designs and not always covered by HLS tools. In this paper we present our recent work on a DSL named LALP (Language for Aggressive Loop Pipelining), which has been developed focusing on the development of FPGA-based, aggressively pipelined, hardware accelerators. We present the recent LALP extensions and the challenges we are facing regarding to the compilation of LALP to FPGAs.

2013

Conclusions

Authors
Diniz, PC; Cardoso, JMP; De F. Coutinho, JG; Petrov, Z;

Publication
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach

Abstract
The REFLECT project aimed at developing, validating, and evaluating a novel compilation and synthesis approach for heterogeneous multi-core computing systems that relies on aspect-oriented specifications to convey critical domain knowledge to all design/development stages of an integrated toolchain. To reach these goals, we have devised a new compilation and synthesis foundation combining distinct but synergistic areas of research, namely, aspect-oriented programming, hardware compilation, design patterns, and hardware templates. © Springer Science+Business Media New York 2013. All rights are reserved.

2013

An automatic tool flow for the combined implementation of multi-mode circuits

Authors
Al Farisi, B; Bruneel, K; Cardoso, JMP; Stroobandt, D;

Publication
Proceedings -Design, Automation and Test in Europe, DATE

Abstract
A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration for every mode separately. To switch between modes the complete reconfigurable region is rewritten, which often leads to very long reconfiguration times. In this paper we present a novel, fully automated tool flow that exploits similarities between the modes and uses Dynamic Circuit Specialization to drastically reduce reconfiguration time. Experimental results show that the number of bits that is rewritten in the configuration memory reduces with a factor from 4.6× to 5.1× without significant performance penalties. © 2013 EDAA.

  • 9
  • 44