Publications

Publications by João Bispo

2016

The ANTAREX approach to autotuning and adaptivity for energy efficient HPC systems

Authors
Silvano, C; Agosta, G; Cherubin, S; Gadioli, D; Palermo, G; Bartolini, A; Benini, L; Martinovic, J; Palkovic, M; Slaninová, K; Bispo, J; Cardoso, JMP; Abreu, R; Pinto, P; Cavazzoni, C; Sanna, N; Beccari, AR; Cmar, R; Rohou, E;

Publication
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, Como, Italy, May 16-19, 2016

Abstract
The ANTAREX 1 project aims at expressing the application selfadaptivity through a Domain Specific Language (DSL) and to runtime manage and autotune applications for green and heterogeneous High Performance Computing (HPC) systems up to Exascale. The DSL approach allows the definition of energy-efficiency, performance, and adaptivity strategies as well as their enforcement at runtime through application autotuning and resource and power management. We show through a mini-App extracted from one of the project application use cases some initial exploration of application precision tuning by means enabled by the DSL. © 2016 Copyright held by the owner/author(s).

CloseRead Abstract

2014

Multi-target c code generation from MATLAB

Authors
Bispo, J; Reis, L; Cardoso, JMP;

Publication
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Abstract
This paper describes our recent work on MATISSE, a framework for MATLAB to C compilation. We focus on the new optimizations and transformations, as well as on OpenCL generation. MATISSE is controlled with LARA, an aspect-oriented language, able to specify transformations to the input MATLAB code (e.g., insertion of code for variable initialization and for monitoring) and to express information concerning types and shapes of variables. We evaluate the compiler with a set of benchmarks when targeting both an embedded system and a desktop system. The results show that we were able to achieve a speedup up to 1.8× by employing information provided by LARA aspects. We also compare the execution time of the generated C code with the original code running on MATLAB, and we achieve a geometric mean speedup of 19×. The geometric mean speedup reduces to 12× when optimizing the MATLAB code with LARA aspects. Finally, we present a preliminary version of a fully-functioning pragma-based OpenCL generator, built over the MATISSE framework..

CloseRead Abstract

2017

LARA as a language-independent aspect-oriented programming approach

Authors
Pinto, P; Carvalho, T; Bispo, J; Cardoso, JMP;

Publication
Proceedings of the Symposium on Applied Computing, SAC 2017, Marrakech, Morocco, April 3-7, 2017

Abstract
Usually, Aspect-Oriented Programming (AOP) languages are an extension of a specific target language (e.g., AspectJ for Java and AspectC++ for C++). This coupling can impose drawbacks such as arbitrary limitations to the aspect language. LARA is a DSL for source-to-source transformations inspired by AOP concepts, and has been designed to be independent of the target language. In this paper we propose techniques to overcome some of the challenges presented by a language-independent approach to source code transformations, and present and discuss possible solutions and their impact. Additionally, we present some of the benefits and opportunities of this approach. We present an evaluation of our approach, show that we can significantly reduce the effort to develop weavers for new target languages and that the proposed techniques contribute to more concise LARA aspects and safer semantics. Copyright 2017 ACM.

CloseRead Abstract

2018

Aspect composition for multiple target languages using LARA

Authors
Pinto, P; Carvalho, T; Bispo, J; Ramalho, MA; Cardoso, JMP;

Publication
COMPUTER LANGUAGES SYSTEMS & STRUCTURES

Abstract
Usually, Aspect-Oriented Programming (AOP) languages are an extension of a specific target programming language (e.g., Aspect J for JAVA and Aspect C++ for C++). Although providing AOP support with target language extensions may ease the adoption of an approach, it may impose constraints related with constructs and semantics. Furthermore, by tightly coupling the AOP language to the target language the reuse potential of many aspects, especially the ones regarding non-functional requirements, is lost. LARA is a domain-specific language inspired by AOP concepts, having the specification of source-to-source transformations as one of its main goals. LARA has been designed to be, as much as possible, independent of the target language and to provide constructs and semantics that ease the definition of concerns, especially related to non-functional requirements. In this paper, we propose techniques to overcome some of the challenges presented by a multilanguage approach to AOP of cross-cutting concerns focused on non-functional requirements and applied through the use of a weaving process. The techniques mainly focus on providing well-defined library interfaces that can have concrete implementations for each supported target language. The developer uses an agnostic interface and the weaver provides a specific implementation for the target language. We evaluate our approach using 8 concerns with varying levels of language agnosticism that support 4 target languages (C, C++, JAVA and MATLAB) and show that the proposed techniques contribute to more concise LARA aspects, high reuse of aspects, and to significant effort reductions when developing weavers for new imperative, object-oriented programming languages.

CloseRead Abstract

2018

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Authors
Nobre, R; Reis, L; Bispo, J; Carvalho, T; Cardoso, JMP; Cherubin, S; Agosta, G;

Publication
PARMA-DITAM 2018: 9TH WORKSHOP ON PARALLEL PROGRAMMING AND RUNTIME MANAGEMENT TECHNIQUES FOR MANY-CORE ARCHITECTURES AND 7TH WORKSHOP ON DESIGN TOOLS AND ARCHITECTURES FOR MULTICORE EMBEDDED COMPUTING PLATFORMS

Abstract
Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.

CloseRead Abstract

2018

AutoPar-Clava: An Automatic Parallelization source-to-source tool for C code applications

Authors
Arabnejad, H; Bispo, J; Barbosa, JG; Cardoso, JMP;

Abstract
Automatic parallelization of sequential code has become increasingly relevant in multicore programming. In particular, loop parallelization continues to be a promising optimization technique for scienti.c applications, and can provide considerable speedups for program execution. Furthermore, if we can verify that there are no true data dependencies between loop iterations, they can be easily parallelized. This paper describes Clava AutoPar, a library for the Clava weaver that performs automatic and symbolic parallelization of C code. The library is composed of two main parts, parallel loop detection and source-to-source code parallelization. The system is entirely automatic and attempts to statically detect parallel loops for a given input program, without any user intervention or profiling information. We obtained a geometric mean speedup of 1.5 for a set of programs from the C version of the NAS benchmark, and experimental results suggest that the performance obtained with Clava AutoPar is comparable or better than other similar research and commercial tools.

CloseRead Abstract