Publications

Publications by Tiago Diogo Carvalho

2013

Aspect-based source to source transformations

Authors
De F. Coutinho, JG; Cardoso, JMP; Carvalho, T; Bhattacharya, S; Luk, W; Constantinides, G; Diniz, PC; Petrov, Z;

Publication
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach

Abstract
Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow, and to trigger source-to-source code transformations and compilation/synthesis optimizations on a given application. Hence, user knowledge about an application and/or target architecture can be codified as aspects, allowing the original application code to be automatically extended to satisfy non-functional concerns, such as arithmetic precision and performance. © Springer Science+Business Media New York 2013. All rights are reserved.

CloseRead Abstract

2018

Aspect-Driven Mixed-Precision Tuning Targeting GPUs

Authors
Nobre, R; Reis, L; Bispo, J; Carvalho, T; Cardoso, JMP; Cherubin, S; Agosta, G;

Publication
PARMA-DITAM 2018: 9TH WORKSHOP ON PARALLEL PROGRAMMING AND RUNTIME MANAGEMENT TECHNIQUES FOR MANY-CORE ARCHITECTURES AND 7TH WORKSHOP ON DESIGN TOOLS AND ARCHITECTURES FOR MULTICORE EMBEDDED COMPUTING PLATFORMS

Abstract
Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.

CloseRead Abstract

2018

An Approach Based on a DSL plus API for Programming Runtime Adaptivity and Autotuning Concerns

Authors
Carvalho, T; Cardoso, JMP;

Publication
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING

Abstract
In the context of compiler optimizations, tuning of parameters and selection of algorithms, runtime adaptivity and autotuning are becoming increasingly important, especially due to the complexity of applications, workloads, computing devices and execution environments. For identifying and specifying adaptivity, different phases are required: analysis of program hotspots and adaptivity opportunities, code restructuring, and programming of adaptivity strategies. These phases usually require different tools and modications to the source code that may result in difficult to maintain and error prone code. This paper presents a flexible approach to support the different phases when developing adaptive applications. The approach is based on a single domain-specific language (DSL), able to specify and evaluate multiple strategies and to maintain a separation of concerns. We describe the requirements and the design of the DSL, an accompanying API, and of a Java-to-Java compiler that implements the approach. In addition, we present and evaluate the use of the approach to specify runtime adaptivity strategies in the context of Java programs, especially when considering runtime autotuning of optimization parameters and runtime selection of algorithms. Although simple, the case studies shown truly demonstrate the main advantages of the approach in terms of the programming model and of the performance impact.

CloseRead Abstract