2013
Autores
De F. Coutinho, JG; Cardoso, JMP; Carvalho, T; Bhattacharya, S; Luk, W; Constantinides, G; Diniz, PC; Petrov, Z;
Publicação
Compilation and Synthesis for Embedded Reconfigurable Systems: An Aspect-Oriented Approach
Abstract
Source-to-source weaving is a key mechanism in the REFLECT design-flow since it allows the inclusion of application-specific information in the transformed program. In particular, LARA [1, 2] aspects are used to control the design-flow, and to trigger source-to-source code transformations and compilation/synthesis optimizations on a given application. Hence, user knowledge about an application and/or target architecture can be codified as aspects, allowing the original application code to be automatically extended to satisfy non-functional concerns, such as arithmetic precision and performance. © Springer Science+Business Media New York 2013. All rights are reserved.
2018
Autores
Nobre, R; Reis, L; Bispo, J; Carvalho, T; Cardoso, JMP; Cherubin, S; Agosta, G;
Publicação
PARMA-DITAM 2018: 9TH WORKSHOP ON PARALLEL PROGRAMMING AND RUNTIME MANAGEMENT TECHNIQUES FOR MANY-CORE ARCHITECTURES AND 7TH WORKSHOP ON DESIGN TOOLS AND ARCHITECTURES FOR MULTICORE EMBEDDED COMPUTING PLATFORMS
Abstract
Writing mixed-precision kernels allows to achieve higher throughput together with outputs whose precision remain within given limits. The recent introduction of native half-precision arithmetic capabilities in several GPUs, such as NVIDIA P100 and AMD Vega 10, contributes to make precision-tuning even more relevant as of late. However, it is not trivial to manually find which variables are to be represented as half-precision instead of single- or double-precision. Although the use of half-precision arithmetic can speed up kernel execution considerably, it can also result in providing non-usable kernel outputs, whenever the wrong variables are declared using the half-precision data-type. In this paper we present an automatic approach for precision tuning. Given an OpenCL kernel with a set of inputs declared by a user (i.e., the person responsible for programming and/or tuning the kernel), our approach is capable of deriving the mixed-precision versions of the kernel that are better improve upon the original with respect to a given metric (e.g., time-to-solution, energy-to-solution). We allow the user to declare and/or select a metric to measure and to filter solutions based on the quality of the output. We implement a proof-of-concept of our approach using an aspect-oriented programming language called LARA. It is capable of generating mixed-precision kernels that result in considerably higher performance when compared with the original single-precision floating-point versions, while generating outputs that can be acceptable in some scenarios.
2018
Autores
Carvalho, T; Cardoso, JMP;
Publicação
33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING
Abstract
In the context of compiler optimizations, tuning of parameters and selection of algorithms, runtime adaptivity and autotuning are becoming increasingly important, especially due to the complexity of applications, workloads, computing devices and execution environments. For identifying and specifying adaptivity, different phases are required: analysis of program hotspots and adaptivity opportunities, code restructuring, and programming of adaptivity strategies. These phases usually require different tools and modications to the source code that may result in difficult to maintain and error prone code. This paper presents a flexible approach to support the different phases when developing adaptive applications. The approach is based on a single domain-specific language (DSL), able to specify and evaluate multiple strategies and to maintain a separation of concerns. We describe the requirements and the design of the DSL, an accompanying API, and of a Java-to-Java compiler that implements the approach. In addition, we present and evaluate the use of the approach to specify runtime adaptivity strategies in the context of Java programs, especially when considering runtime autotuning of optimization parameters and runtime selection of algorithms. Although simple, the case studies shown truly demonstrate the main advantages of the approach in terms of the programming model and of the performance impact.
2019
Autores
Nobre, R; Bispo, J; Carvalho, T; Cardoso, JMP;
Publicação
SOFTWAREX
Abstract
This article presents Nonio, a modular, easy-to-use, design space exploration framework focused on exploring custom combinations of compiler flags and compiler sequences. We describe the framework and discuss its use with two of the most popular compiler toolchains, GCC and Clang+LLVM. Particularly, we discuss implementation details in the context of flag selection, when using GCC, and phase selection and ordering, when using Clang+LLVM. The framework software organization allows to easily add new components as plug-ins (e.g., an exploration algorithm, an objective metric, integration with another compiler toolchain). The software architecture provides well-defined interfaces, in order to enable seamless composition and interaction between different components. We present, as an example, a use case where we rely on Nonio to obtain custom compiler flags for reducing the execution time and the energy consumption of a C program, in relation to the best predetermined optimization settings provided by the compiler (e.g., -O3). (C) 2019 The Authors. Published by Elsevier B.V.
2020
Autores
Crarcia, KD; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF;
Publicação
14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019)
Abstract
Human Activity Recognition is a machine learning task for the classification of human physical activities. Applications for that task have been extensively researched in recent literature, specially due to the benefits of improving quality of life. Since wearable technologies and smartphones have become more ubiquitous, a large amount of information about a person's life has become available. However, since each person has a unique way of performing physical activities, a Human Activity Recognition system needs to be adapted to the characteristics of a person in order to maintain or improve accuracy. Additionally, when smartphones devices are used to collect data, it is necessary to manage its limited resources, so the system can efficiently work for long periods of time. In this paper, we present a semi-supervised ensemble algorithm and an extensive study of the influence of hyperparameter configuration in classification accuracy. We also investigate how the classification accuracy is affected by the person and the activities performed. Experimental results show that it is possible to maintain classification accuracy by adjusting hyperparameters, like window size and window overlap, depending on the person and activity performed. These results motivate the development of a system able to automatically adapt hyperparameter settings for the activity performed by each person.
2021
Autores
Garcia, KD; de Sa, CR; Poel, M; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF; Kok, JN;
Publicação
NEUROCOMPUTING
Abstract
Human Activity Recognition is focused on the use of sensing technology to classify human activities and to infer human behavior. While traditional machine learning approaches use hand-crafted features to train their models, recent advancements in neural networks allow for automatic feature extraction. Auto-encoders are a type of neural network that can learn complex representations of the data and are commonly used for anomaly detection. In this work we propose a novel multi-class algorithm which consists of an ensemble of auto-encoders where each auto-encoder is associated with a unique class. We compared the proposed approach with other state-of-the-art approaches in the context of human activity recognition. Experimental results show that ensembles of auto-encoders can be efficient, robust and competitive. Moreover, this modular classifier structure allows for more flexible models. For example, the extension of the number of classes, by the inclusion of new auto-encoders, without the necessity to retrain the whole model. (c) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.