Cookies
O website necessita de alguns cookies e outros recursos semelhantes para funcionar. Caso o permita, o INESC TEC irá utilizar cookies para recolher dados sobre as suas visitas, contribuindo, assim, para estatísticas agregadas que permitem melhorar o nosso serviço. Ver mais
Aceitar Rejeitar
  • Menu
Publicações

Publicações por CTM

2017

Generation of Customized Accelerators for Loop Pipelining of Binary Instruction Traces

Autores
Paulino, NMC; Ferreira, JC; Cardoso, JMP;

Publicação
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS

Abstract
Many embedded applications process large amounts of data using regular computational kernels, amenable to acceleration by specialized hardware coprocessors. To reduce the significant design effort, the dedicated hardware may be automatically generated, usually starting from the application's source or binary code. This paper presents a moduloscheduled loop accelerator capable of executing multiple loops and a supporting toolchain. A generation/scheduling procedure, which fully relies on MicroBlaze instruction traces, produces accelerator instances, customized in terms of functional units and interconnections. The accelerators support integer and single-precision floating-point arithmetic, and exploit instruction-level parallelism, loop pipelining, and memory access parallelism via two read/write ports. A complete implementation of the proposed architecture is evaluated in a Virtex-7 device. Augmenting a MicroBlaze processor with a tailored accelerator achieves a geometric mean speedup, over software-only execution, of 6.61x for 13 floating-point kernels from the Livermore Loops set, and of 4.08x for 11 integer kernels from Texas Instruments' IMGLIB. The proposed customized accelerators are compared with ALU-based ones. The average specialized accelerator requires only 0.47x the number of field-programmable gate array slices of an accelerator with four ALUs. A geometric mean speedup of 1.78x over a four-issue very long instruction word (without floating-point support) was obtained for the integer kernels.

2017

Evaluation of CGRA architecture for real-time processing of biological signals on wearable devices

Autores
Lopes, J; Sousa, D; Ferreira, JC;

Publicação
2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG)

Abstract
This paper describes the design and implementation of a coarse-grained reconfigurable array (CGRA) for low-power biological signal processing. It uses an use-case-driven approach which explores the application domain and gathers common requirements. The selected CGRA core architecture is implemented using a standard-cell flow (in a generic 90nm CMOS process), so that the CGRA can be totally or partially turned off by power gating. The selected CGRA design is evaluated for two use-cases using layout information and accurate node activity information. The resulting accelerator is capable of performing various signal processing tasks very efficiently, achieving an average power consumption of 19.9 pJ/cycle (or 1.99mW at 100 MHz). Static power consumption for less intensive tasks can be reduced by using only some sections of the CGRA while powering-off others.

2017

MICPRO DSD 2015 special issue

Autores
Ferreira, JC; Kitsos, P;

Publicação
MICROPROCESSORS AND MICROSYSTEMS

Abstract

2017

FPGA-based Implementation of a Frequency Spreading FBMC-OQAM Baseband Modulator

Autores
Carvalho, M; Ferreira, ML; Ferreira, JC;

Publicação
2017 24TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS (ICECS)

Abstract
Filter-bank Multicarrier (FBMC) modulation has been proposed as a 5G waveform candidate due to its better spectral efficiency and lower out-of-band emissions compared to OFDM. This paper presents an FPGA-based implementation of a Frequency Spreading FBMC-OQAM baseband modulator and evaluates it in terms of performance, resource utilization and power consumption. The proposed system is then compared with published Polyphase Network (PPN) FBMC-OQAM designs, focusing on resource utilization. The results suggest that the higher computational complexity of FS-FBMC systems does not directly result in higher resource utilization, which makes FS-FBMC a convenient scheme for implementing FBMC designs on FPGA.

2017

Towards a Type 0 Hypervisor for Dynamic Reconfigurable Systems

Autores
Janssen, B; Korkmaz, F; Derya, H; Huebner, M; Ferreira, ML; Ferreira, JC;

Publicação
2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG)

Abstract
The usage of application-specific hardware based on Field-Programmable Gate Arrays (FPGA) has proven its benefits. Current system-on-chips, which contain FPGA fabric, supporting dynamic partial reconfiguration, enable a dynamic hardware acceleration for hardware/software co-designs. With the trend to consolidate multiple computing systems into a single system, applications with mixed criticalities can come into conflict. With our approach, we are exploring the possibility to utilize dedicated hardware for the system management and benefit from possible parallelization of applications and system management tasks.

2017

Towards a type 0 hypervisor for dynamic reconfigurable systems

Autores
Janßen, B; Korkmaz, F; Derya, H; Hübner, M; Ferreira, ML; Ferreira, JC;

Publicação
International Conference on ReConFigurable Computing and FPGAs, ReConFig 2017, Cancun, Mexico, December 4-6, 2017

Abstract

  • 138
  • 324