Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by João Paiva Cardoso

2009

A Comparison of Three Representative Hardware Sorting Units

Authors
Marcelino, R; Neto, HC; Cardoso, JMP;

Publication
IECON: 2009 35TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS, VOLS 1-6

Abstract
Sorting is an important operation for many embedded computing systems. Since sorting large datasets may slowdown the overall execution, schemes to speedup sorting operations are needed. Bearing in mind the hardware acceleration of sorting, we show in this paper an analysis and comparison among three hardware sorting units: Sorting Network, Insertion Sorting, and FIFO-based Merge Sorting. We focus on embedded computing systems implemented with FPGAs, which give us the flexibility to accommodate customized hardware sorting units. We also present a hardware/software solution for sorting data sets with size larger than the size of the sorting unit. This hardware/software solution achieves 20x overall speedup over a pure software implementation of the well-known quicksort algorithm.

2012

Programming Strategies for Runtime Adaptability

Authors
Cardoso, JMP;

Publication
2012 7TH INTERNATIONAL WORKSHOP ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC)

Abstract
Future advanced embedded computing systems are expected to dynamically adapt applications' behavior and runtime system according to, e.g., usage contexts, operating environments, resources' availability, and battery energy levels. Besides application's functionalities provided by high-level and/or executable binary codes, code for specifying strategies/policies to extend typical functionalities with adaptability behavior is required. A domain-specific language, able to program this adaptability behavior, will allow developers to specify strategies for adaptation, will improve portability, and will help tools to map those strategies to the target system. This paper presents our recent ideas for programming strategies focused on runtime adaptability. The ideas are exposed using extensions to LARA, an aspect-oriented programming language, agnostic to the target language and system. We show examples of using LARA to specify strategies and we comment on the possible implementations to make viable those strategies.

2012

Hardware/Software Specialization Through Aspects

Authors
Cardoso, JMP; Carvalho, T; Teixeira, J; Diniz, PC; Goncalves, F; Petrov, Z;

Publication
2012 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS (SAMOS): ARCHITECTURES, MODELING AND SIMULATION

Abstract
LARA is a programming language being developed to complement application code in a host programming language with instrumentation code, for monitoring, logging, and debugging, user's knowledge about specific characteristics of the application, non-functional requirements, and compiler, mapping and synthesis strategies to guide/control design-flows, especially the ones used to map computations to FPGA-based systems. This paper shows how the aspect-oriented approach provided by LARA allows developers to specify complementary program information that can be used by LARA aware design-flows to promote customized FPGA-based software/hardware implementations. Program and compiler/mapping specialization take advantage of specific properties of applications to optimize and customize specific application modules and software/hardware implementations, e. g., according to usage contexts. We illustrate the concept using a hotspot function from a real-life, industrial, application. The results show the importance of program specialization in deriving hardware/software implementations with higher-performance.

2003

From C programs to the configure-execute model

Authors
Cardoso, JMP; Weinhardt, M;

Publication
DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS

Abstract
The emergence of run-time reconfigurable architectures makes feasible the configure-execute paradigm. Compilation of behavioral descriptions (in, e.g., C, Java, etc.), apart from mapping the computational structures onto the available resources on the device, must split the program in temporal sections if it needs more resources than physically available. In addition, since the execution of the computational structures in a configuration needs at least two stages (i.e., configuring and computing), it is important to split the program such that the reconfiguration overheads are minimized, taking advantage of the overlapping of the execution stages on different configurations. This paper presents mapping techniques to cope with those features. The techniques are being researched in the context of a C compiler for the eXtreme Processing Platform (XPP). Temporal partitioning is applied to furnish a set of configurations that reduces the reconfiguration overhead and thus may lead to performance gains. We also show that when applications include a sequence of loops, the use of several configurations may be more beneficial than the mapping of the entire application onto a single configuration. Preliminary results for a number of benchmarks strongly confirm the approach.

2003

Compilation for FPGA-based reconfigurable hardware

Authors
Cardoso, JMP; Neto, HC;

Publication
IEEE DESIGN & TEST OF COMPUTERS

Abstract
These techniques for compiling software programs into reconfigurable hardware offer faster and more efficient performance than the complex resource-sharing approaches typical of high-level synthesis systems. The Java-based compiler presented in this article uses intermediate graph representations to embody parallelism at various levels.

2003

On combining temporal partitioning and sharing of functional units in compilation for reconfigurable architectures

Authors
Cardoso, JMP;

Publication
IEEE TRANSACTIONS ON COMPUTERS

Abstract
Resource virtualization on FPGA devices, achievable due to its dynamic reconfiguration capabilities, provides an attractive solution to save silicon area. Architectural synthesis for dynamically reconfigurable FPGA-based digital systems needs to consider the case of reducing the number of temporal partitions (reconfigurations) by enabling sharing of some functional units in the same temporal partition. This paper proposes a novel algorithm for automated datapath design from behavioral input descriptions (represented by an acyclic dataflow graph), which simultaneously performs temporal partitioning and sharing of functional units. The proposed algorithm attempts to minimize both the number of temporal partitions and the execution latency of the generated solution. Temporal partitioning, resource sharing, scheduling, and a simple form of allocation and binding are all integrated in a single task. The algorithm is based on heuristics and on a new concept of construction by gradually enlarging timing slots. Results show the efficiency and effectiveness of the algorithm when compared to existent approaches.

  • 29
  • 44