Publicacoes - INESC TEC

Publicações

Publicações por João Canas Ferreira

2020

Hardware architecture for integrate-and-fire signal reconstruction on FPGA

Autores
Carvalho, G; Ferreira, JC; Tavares, VG;

Publicação
2020 XXXV CONFERENCE ON DESIGN OF CIRCUITS AND INTEGRATED SYSTEMS (DCIS)

Abstract
Typical analogue-to-digital conversion (ADC) architectures, at Nyquist rate, tend to occupy a big portion of the integrated circuit die area and to consume more power than desired. Recently, with the rise of Interet-of-Things (IoT), there is a high demand for architectures that can have both reduced area and power consumption. Time encoding machines (TEM) might be a promising alternative. These types of encoders result in very simple and low-power analogue circuits, shifting most of its complexity to the decoding stage, typically stationed in a place with access to more resources. This paper focuses on a particular TEM, the integrate-and-fire neuron (IFN). The IFN modulation is based on a simplified first-order model of neural operation and it encodes the signal in a very power efficient manner. In the end, a novel hardware architecture for the reconstruction of the IFN encoded signal based on a spiking model will be presented. The method is demonstrated and implemented on FPGA, reaching an ENOB as high as 8.23.

FecharLer Abstract

2021

Transparent Control Flow Transfer between CPU and Accelerators for HPC

Autores
Granhao, D; Ferreira, JC;

Publicação
ELECTRONICS

Abstract
Heterogeneous platforms with FPGAs have started to be employed in the High-Performance Computing (HPC) field to improve performance and overall efficiency. These platforms allow the use of specialized hardware to accelerate software applications, but require the software to be adapted in what can be a prolonged and complex process. The main goal of this work is to describe and evaluate mechanisms that can transparently transfer the control flow between CPU and FPGA within the scope of HPC. Combining such a mechanism with transparent software profiling and accelerator configuration could lead to an automatic way of accelerating regular applications. In this work, a mechanism based on the ptrace system call is proposed, and its performance on the Intel Xeon+FPGA platform is evaluated. The feasibility of the proposed approach is demonstrated by a working prototype that performs the transparent control flow transfer of any function call to a matching hardware accelerator. This approach is more general than shared library interposition at the cost of a small time overhead in each accelerator use (about 1.3 ms in the prototype implementation).

FecharLer Abstract

2018

A Reconfigurable Custom Machine for Accelerating Cellular Genetic Algorithms

Autores
Santos, PV; Alves, JC; Ferreira, JC;

Publicação
U.Porto Journal of Engineering

Abstract
In this work we present a reconfigurable and scalable custom processor array for solving optimization problems using cellular genetic algorithms (cGAs), based on a regular fabric of processing nodes and local memories. Cellular genetic algorithms are a variant of the well-known genetic algorithm that can conveniently exploit the coarse-grain parallelism afforded by this architecture. To ease the design of the proposed computing engine for solving different optimization problems, a high-level synthesis design flow is proposed, where the problem-dependent operations of the algorithm are specified in C++ and synthesized to custom hardware. A spectrum allocation problem was used as a case study and successfully implemented in a Virtex-6 FPGA device, showing relevant figures for the computing acceleration.

FecharLer Abstract

2021

A Binary Translation Framework for Automated Hardware Generation

Autores
Paulino, N; Bispo, J; Ferreira, JC; Cardoso, JMP;

Publicação
IEEE MICRO

Abstract
As applications move to the edge, efficiency in computing power and power/energy consumption is required. Heterogeneous computing promises to meet these requirements through application-specific hardware accelerators. Runtime adaptivity might be of paramount importance to realize the potential of hardware specialization, but further study is required on workload retargeting and offloading to reconfigurable hardware. This article presents our framework for the exploration of both offloading and hardware generation techniques. The framework is currently able to process instruction sequences from MicroBlaze, ARMv8, and riscv32imaf binaries, and to represent them as Control and Dataflow Graphs for transformation to implementations of hardware modules. We illustrate the framework's capabilities for identifying binary sequences for hardware translation with a set of 13 benchmarks.

FecharLer Abstract

2021

Pedagogical Innovation in Pandemic Times: The Experience of a Microprocessor Programming Course

Autores
Lima, B; Granhao, D; Araujo, AJ; Ferreira, JC;

Publicação
2021 4TH INTERNATIONAL CONFERENCE OF THE PORTUGUESE SOCIETY FOR ENGINEERING EDUCATION (CISPEE)

Abstract
The 2019/2020 school year will always be remembered for the impact of the COVID-19 pandemic. For the first time in recent history, countries closed schools and forced instructors and students to quickly adjust to online classes. This sudden and forced shift to a method of teaching that was completely different from what we were used to presented several challenges and opportunities on a pedagogical level. In this paper we describe our experience as instructors in a course on microprocessor programming in the Master's Degree in Computer Science and Computing Engineering at the Faculty of Engineering of the University of Porto. Our approach included changes to the assessment plan, which became more distributed, and improvements in communication between students and instructors through the use of Slack. We found that the changes introduced were not only very well received by students, but also resulted in the best exam attendance and average final grade in the last 10 years of the course's history.

FecharLer Abstract

2021

On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators

Autores
Santos, T; Paulino, N; Bispo, J; Cardoso, JMP; Ferreira, JC;

Publicação
2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT)

Abstract
By using Dynamic Binary Translation, instruction traces from pre-compiled applications can be offloaded, at runtime, to FPGA-based accelerators, such as Coarse-Grained Loop Accelerators, in a transparent way. However, scheduling onto coarse-grain accelerators is challenging, with two of current known issues being the density of computations that can be mapped, and the effects of memory accesses on performance. Using an in-house framework for analysis of instruction traces, we explore the effect of different window sizes when applying list scheduling, to map the window operations to a coarse-grain loop accelerator model that has been previously experimentally validated. For all window sizes, we vary the number of ALUs and memory ports available in the model, and comment how these parameters affect the resulting latency. For a set of benchmarks taken from the PolyBench suite, compiled for the 32-bit MicroBlaze softcore, we have achieved an average iteration speedup of 5.10x for a basic block repeated 5 times and scheduled with 8 ALUs and memory ports, and an average speedup of 5.46x when not considering resource constraints. We also identify which benchmarks contribute to the difference between these two speedups, and breakdown their limiting factors. Finally, we reflect on the impact memory dependencies have on scheduling.

FecharLer Abstract