Publicacoes - INESC TEC

Publicações

Publicações por HumanISE

2021

A Binary Translation Framework for Automated Hardware Generation

Autores
Paulino, N; Bispo, J; Ferreira, JC; Cardoso, JMP;

Publicação
IEEE MICRO

Abstract
As applications move to the edge, efficiency in computing power and power/energy consumption is required. Heterogeneous computing promises to meet these requirements through application-specific hardware accelerators. Runtime adaptivity might be of paramount importance to realize the potential of hardware specialization, but further study is required on workload retargeting and offloading to reconfigurable hardware. This article presents our framework for the exploration of both offloading and hardware generation techniques. The framework is currently able to process instruction sequences from MicroBlaze, ARMv8, and riscv32imaf binaries, and to represent them as Control and Dataflow Graphs for transformation to implementations of hardware modules. We illustrate the framework's capabilities for identifying binary sequences for hardware translation with a set of 13 benchmarks.

FecharLer Abstract

2021

On Data Parallelism Code Restructuring for HLS Targeting FPGAs

Autores
Campos, R; Cardoso, JMP;

Publicação
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)

Abstract
FPGAs have emerged as hardware accelerators, and in the last decade, researchers have proposed new languages and frameworks to improve the efficiency when mapping computations to FPGAs. One of the main tasks when considering the mapping of software code to FPGAs is code restructuring. Code restructuring is of paramount importance to achieve efficient FPGA-based accelerators, and its automation continues to be a challenge. This paper describes our recent work on techniques to automatically restructure and annotate C code with directives optimized for HLS targeting FPGAs. The input of our approach consists of an unfolded dataflow graph (DFG), currently obtained by a trace of the program's execution, and restructured C code with HLS directives as output. Specifically, in this paper we propose algorithms to optimize the input DFGs and use isomorphic graph detection for exposing data-level parallelism. The experimental results show that our approach is able to generate efficient FPGA implementations, with significant speedups over the input unmodified source codes, and very competitive to implementations obtained by manual optimizations and by previous approaches. Furthermore, the experiments show that, using our approach, it is possible to extract data-parallelism in linear to quadratic time with respect to the number of nodes of the input DFG.

FecharLer Abstract

2021

On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators

Autores
Santos, T; Paulino, N; Bispo, J; Cardoso, JMP; Ferreira, JC;

Publicação
2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT)

Abstract
By using Dynamic Binary Translation, instruction traces from pre-compiled applications can be offloaded, at runtime, to FPGA-based accelerators, such as Coarse-Grained Loop Accelerators, in a transparent way. However, scheduling onto coarse-grain accelerators is challenging, with two of current known issues being the density of computations that can be mapped, and the effects of memory accesses on performance. Using an in-house framework for analysis of instruction traces, we explore the effect of different window sizes when applying list scheduling, to map the window operations to a coarse-grain loop accelerator model that has been previously experimentally validated. For all window sizes, we vary the number of ALUs and memory ports available in the model, and comment how these parameters affect the resulting latency. For a set of benchmarks taken from the PolyBench suite, compiled for the 32-bit MicroBlaze softcore, we have achieved an average iteration speedup of 5.10x for a basic block repeated 5 times and scheduled with 8 ALUs and memory ports, and an average speedup of 5.46x when not considering resource constraints. We also identify which benchmarks contribute to the difference between these two speedups, and breakdown their limiting factors. Finally, we reflect on the impact memory dependencies have on scheduling.

FecharLer Abstract

2021

Guest Editorial: IEEE TC Special Section on Compiler Optimizations for FPGA-Based Systems

Autores
Cardoso, JMP; DeHon, A; Pozzi, L;

Publicação
IEEE TRANSACTIONS ON COMPUTERS

Abstract
The papers in this special section focus on compiler optimization for FPGA-based systems. Reconfigurable computing (RC) is growing in importance in many computing domains and systems, from embedded, mobile to cloud, and high-performance computing. We have witnessed important advancements regarding the programming of RC-based systems, but further improvements are needed, especially regarding efficient techniques for automatic mapping of computations described in high-level languages to the RC resources. The resources of high-end FPGAs allow these devices to implement complex Systemson-a-Chip (SoCs) and substantial computational components of software applications, e.g., when used as hardware accelerators and/or as more energy-efficient computing platforms. This, however, increases the continuous need for efficient compilers targeting FPGAs, and other RC platforms, from high-level programming languages.

FecharLer Abstract

2021

An Efficient Monte Carlo-Based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation System

Autores
Vitali, E; Gadioli, D; Palermo, G; Golasowski, M; Bispo, J; Pinto, P; Martinovic, J; Slaninova, K; Cardoso, JMP; Silvano, C;

Publicação
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING

Abstract
Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for selecting dynamically the number of samples used for the Monte Carlo simulation to solve the Probabilistic Time-Dependent Routing (PTDR) problem, thus improving the computation efficiency. The proposed method is used to determine in a proactive manner the number of simulations to be done to extract the travel-time estimation for each specific request, while respecting an error threshold as output quality level. The methodology requires a reduced effort on the application development side. We adopted an aspect-oriented programming language (LARA) together with a flexible dynamic autotuning library (mARGOt) respectively to instrument the code and to make decisions on tuning the number of samples to improve the execution efficiency. Experimental results demonstrate that the proposed adaptive approach saves a large fraction of simulations (between 36 and 81 percent) with respect to a static approach, while considering different traffic situations, paths and error requirements. Given the negligible runtime overhead of the proposed approach, the execution-time speedup is between 1.5x and 5.1x. This speedup is reflected at the infrastructure-level in terms of a reduction of 36 percent of the computing resources needed to support the whole navigation pipeline.

FecharLer Abstract

2021

Immersive Multimodal and Procedurally-Assisted Creation of VR Environments

Autores
Ferreira, J; Mendes, D; Nobrega, R; Rodrigues, R;

Publicação
2021 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS (VRW 2021)

Abstract
We present VR Designer, a tool for expediting the creation 3D scenes inside VR. It uses controllers and voice commands to create and manipulate primitives and objects imported from openly available repositories. We use modifiers to accelerate repetitive tasks, resorting to procedural content creation techniques to automate the workflow. The tool allows non-expert users to quickly create scenes for contexts such as training or education. We also conducted a user study to validate VR Designer.

FecharLer Abstract