2017
Authors
Plessl, C; Cong, GJ; Cardoso, JMP;
Publication
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE
Abstract
2017
Authors
Hannig, F; Cardoso, JMP; Fey, D;
Publication
JOURNAL OF SYSTEMS ARCHITECTURE
Abstract
2017
Authors
Bartolini, A; Cardoso, JMP; Silvano, C;
Publication
ANDARE@PACT
Abstract
2017
Authors
Cardoso, JMP; Huebner, M; Agosta, G; Silvano, C;
Publication
ACM International Conference Proceeding Series
Abstract
2017
Authors
Nobre, R; Reis, L; Cardoso, JMP;
Publication
Euro-Par 2017: Parallel Processing Workshops - Euro-Par 2017 International Workshops, Santiago de Compostela, Spain, August 28-29, 2017, Revised Selected Papers
Abstract
Research in compiler pass phase ordering (i.e., selection of compiler analysis/transformation passes and their order of execution) has been mostly performed in the context of CPUs and, in a small number of cases, FPGAs. In this paper we present experiments regarding compiler pass phase ordering specialization of OpenCL kernels targeting NVIDIA GPUs using Clang/LLVM 3.9 and the libclc OpenCL library. More specifically, we analyze the impact of using specialized compiler phase orders on the performance of 15 PolyBench/GPU OpenCL benchmarks. In addition, we analyze the final NVIDIA PTX assembly code generated by the different compilation flows in order to identify the main reasons for the cases with significant performance improvements. Using specialized compiler phase orders, we were able to achieve performance improvements over the CUDA version and OpenCL compiled with the NVIDIA driver. Compared to CUDA, we were able to achieve geometric mean improvements of 1.54× (up to 5.48×). Compared to the OpenCL driver version, we were able to achieve geometric mean improvements of 1.65× (up to 5.70×). © Springer International Publishing AG, part of Springer Nature 2018.
2017
Authors
Paulino, N; Reis, L; Cardoso, JMP;
Publication
Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, ParCo 2017, 12-15 September 2017, Bologna, Italy
Abstract
Software developers have always found it difficult to adopt Field-Programmable Gate Arrays (FPGAs) as computing platforms. Recent advances in HLS tools aim to ease the mapping of computations to FPGAs by abstracting the hardware design effort via a standard OpenCL interface and execution model. However, OpenCL is a low-level programming language and requires that developers master the target architecture in order to achieve efficient results. Thus, efforts addressing the generation of OpenCL from high-level languages are of paramount importance to increase design productivity and to help software developers. Existing approaches bridge this by translating MATLAB/Octave code into C, or similar languages, in order to improve performance by efficiently compiling for the target hardware. One example is the MATISSE source-to-source compiler, which translates MATLAB code into standard-compliant C and/or OpenCL code. In this paper, we analyse the viability of combining both flows so that sections of MATLAB code can be translated to specialized hardware with a small amount of effort, and test a few code optimizations and their effect on performance. We present preliminary results relative to execution times, and resource and power consumption, for two OpenCL kernels generated by MATISSE, and manual optimizations of each kernel based on different coding techniques. © 2018 The authors and IOS Press.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.