Publications

Publications by João Paiva Cardoso

2020

Compilation of MATLAB computations to CPU/GPU via C/OpenCL generation

Authors
Reis, L; Bispo, J; Cardoso, JMP;

Publication
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE

Abstract
In order to take advantage of the processing power of current computing platforms, programmers typically need to develop software versions for different target devices. This task is time-consuming and requires significant programming and computer architecture expertise. A possible and more convenient alternative is to start with a single high-level description of a program with minimum implementation details, and generate custom implementations according to the target platform. In this paper, we use MATLAB as a high-level programming language and propose a compiler that targets CPU/GPU computing platforms by generating customized implementations in C and OpenCL. We propose a number of compiler techniques to automatically generate efficient C and OpenCL code from MATLAB programs. One of such compiler techniques relies on heuristics to decide when and how to use Shared Virtual Memory (SVM). The experimental results show that our approach is able to generate code that provides significant speedups (eg, geometric mean speedup of 11x for a set of simple benchmarks) using a discrete GPU over equivalent sequential C code executing on a CPU. With more complex benchmarks, for which only some code regions can be parallelized, and are thus offloaded, the generated code achieved speedups of up to 2.2x. We also show the impact of using SVM, specifically fine-grained buffers, and the results show that the compiler is able to achieve significant speedups, both over the versions without SVM and with naive aggressive SVM use, across three CPU/GPU platforms.

CloseRead Abstract

2021

On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators

Authors
Santos, T; Paulino, N; Bispo, J; Cardoso, JMP; Ferreira, JC;

Publication
2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT)

Abstract
By using Dynamic Binary Translation, instruction traces from pre-compiled applications can be offloaded, at runtime, to FPGA-based accelerators, such as Coarse-Grained Loop Accelerators, in a transparent way. However, scheduling onto coarse-grain accelerators is challenging, with two of current known issues being the density of computations that can be mapped, and the effects of memory accesses on performance. Using an in-house framework for analysis of instruction traces, we explore the effect of different window sizes when applying list scheduling, to map the window operations to a coarse-grain loop accelerator model that has been previously experimentally validated. For all window sizes, we vary the number of ALUs and memory ports available in the model, and comment how these parameters affect the resulting latency. For a set of benchmarks taken from the PolyBench suite, compiled for the 32-bit MicroBlaze softcore, we have achieved an average iteration speedup of 5.10x for a basic block repeated 5 times and scheduled with 8 ALUs and memory ports, and an average speedup of 5.46x when not considering resource constraints. We also identify which benchmarks contribute to the difference between these two speedups, and breakdown their limiting factors. Finally, we reflect on the impact memory dependencies have on scheduling.

CloseRead Abstract

2021

Guest Editorial: IEEE TC Special Section on Compiler Optimizations for FPGA-Based Systems

Authors
Cardoso, JMP; DeHon, A; Pozzi, L;

Publication
IEEE TRANSACTIONS ON COMPUTERS

Abstract
The papers in this special section focus on compiler optimization for FPGA-based systems. Reconfigurable computing (RC) is growing in importance in many computing domains and systems, from embedded, mobile to cloud, and high-performance computing. We have witnessed important advancements regarding the programming of RC-based systems, but further improvements are needed, especially regarding efficient techniques for automatic mapping of computations described in high-level languages to the RC resources. The resources of high-end FPGAs allow these devices to implement complex Systemson-a-Chip (SoCs) and substantial computational components of software applications, e.g., when used as hardware accelerators and/or as more energy-efficient computing platforms. This, however, increases the continuous need for efficient compilers targeting FPGAs, and other RC platforms, from high-level programming languages.

CloseRead Abstract

1999

Architectures and compilers to support reconfigurable computing

Authors
Cardoso, JMP; Vestístias, MP;

Publication
XRDS

Abstract

1998

Towards an automatic path from JavaTM bytecodes to hardware through high-level synthesis

Authors
Cardoso, JMP; Neto, HC;

Publication
5th IEEE International Conference on Electronics, Circuits and Systems, ICECS 1998, Surfing the Waves of Science and Technology, Lisbon, Portugal, September 7-10, 1998

Abstract

2007

Guest editorial: Special issue on reconfigurable hardware systems

Authors
Cardoso, JMP; Bertels, K; Constantinides, GA; Vassiliadis, S;

Publication
INTERNATIONAL JOURNAL OF ELECTRONICS

Abstract