Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
About

About

I received my Master's Degree from FEUP (Faculdade de Engenharia da Universidade do Porto), in Electrical and Computer Engineering. My thesis was titled Generation of Reconfigurable Circuits from Machine Code, a work which continued throughout my PhD in Electrical and Computer Engineering, also at FEUP, and in association with INESC-TEC.

Having completed my PhD thesis, Generation of Custom Run-time Reconfigurable Hardware for Transparent Binary Acceleration, I am now a post-doc researcher with INESC-TEC on the topic of special compilers for hardware, and also an Auxiliary Assistant Professor with the Department of Informatics at FEUP.

Interest
Topics
Details

Details

  • Name

    Nuno Miguel Paulino
  • Role

    Assistant Researcher
  • Since

    01st July 2012
006
Publications

2024

A DSL and MLIR Dialect for Streaming and Vectorisation

Authors
da Silva, MC; Sousa, L; Paulino, N; Bispo, J;

Publication
APPLIED RECONFIGURABLE COMPUTING. ARCHITECTURES, TOOLS, AND APPLICATIONS, ARC 2024

Abstract
This work addresses the contemporary challenges in computing, caused by the stagnation of Moore's Law and Dennard scaling. The shift towards heterogeneous architectures necessitates innovative compilation strategies, prompting initiatives like the Multi-Level Intermediate Representation (MLIR) project, where progressive code lowering can be achieved through the use of dialects. Our work focuses on developing an MLIR dialect capable of representing streaming data accesses to memory, and Single Instruction Multiple Data (SIMD) vector operations. We also propose our own Structured Representation Language (SRL), a Design Specific Language (DSL) to serve as a precursor into the MLIR layer and subsequent inter-operation between new and existing dialects. The SRL exposes the streaming and vector computational concepts to a higher-level, and serves as intermediate step to supporting code generation containing our proposed dialect from arbitrary input code, which we leave as future work. This paper presents the syntaxes of the SRL DSL and of the dialect, and illustrates how we aim to employ them to target both General-Purpose Processors (GPPs) with SIMD co-processors and custom hardware options such as Field-Programmable Gate Arrayss (FPGAs) and Coarse-Grained Re-configurable Arrays (CGRAs).

2024

Vision-Radio Experimental Infrastructure Architecture Towards 6G

Authors
Teixeira, FB; Ricardo, M; Coelho, A; Oliveira, HP; Viana, P; Paulino, N; Fontes, H; Marques, P; Campos, R; Pessoa, LM;

Publication
CoRR

Abstract

2024

Using Source-to-Source to Target RISC-V Custom Extensions: UVE Case-Study

Authors
Henriques, M; Bispo, J; Paulino, N;

Publication
PROCEEDINGS OF THE RAPIDO 2024 WORKSHOP, HIPEAC 2024

Abstract
Hardware specialization is seen as a promising venue for improving computing efficiency, with reconfigurable devices as excellent deployment platforms for application-specific architectures. One approach to hardware specialization is via the popular RISC-V, where Instruction Set Architecture (ISA) extensions for domains such as Edge Artifical Intelligence (AI) are already appearing. However, to use the custom instructions while maintaining a high (e.g., C/C++) abstraction level, the assembler and compiler must be modified. Alternatively, inline assembly can be manually introduced by a software developer with expert knowledge of the hardware modifications in the RISC-V core. In this paper, we consider a RISC-V core with a vectorization and streaming engine to support the Unlimited Vector Extension (UVE), and propose an approach to automatically transform annotated C loops into UVE compatible code, via automatic insertion of inline assembly. We rely on a source-to-source transformation tool, Clava, to perform sophisticated code analysis and transformations via scripts. We use pragmas to identify code sections amenable for vectorization and/or streaming, and use Clava to automatically insert inline UVE instructions, avoiding extensive modifications of existing compiler projects. We produce UVE binaries which are functionally correct, when compared to handwritten versions with inline assembly, and achieve equal and sometimes improved number of executed instructions, for a set of six benchmarks from the Polybench suite. These initial results are evidence towards that this kind of translation is feasible, and we consider that it is possible in future work to target more complex transformations or other ISA extensions, accelerating the adoption of hardware/software co-design flows for generic application cases.

2023

Self-Localization via Circular Bluetooth 5.1 Antenna Array Receiver

Authors
Paulino, N; Pessoa, LM;

Publication
IEEE ACCESS

Abstract
Future telecommunications aim to be ubiquitous and efficient, as widely deployed connectivity will allow for a variety of edge/fog based services. Challenges are numerous, e.g., spectrum overuse, energy efficiency, latency and bandwidth, battery life and computing power of edge devices. Addressing these challenges is key to compose the backbone for the future Internet-of-Things (IoT). Among IoT applications are Indoor Positioning System and indoor Real-Time-Location-Systems systems, which are needed where GPS is unviable. The Bluetooth Low Energy (BLE) 5.1 specification introduced Direction Finding to the protocol, allowing for BLE devices with antenna arrays to derive the Angle-of-Arrival (AoA) of transmissions. Well known algorithms for AoA calculation are computationally demanding, so recent works have addressed this, since the low-cost of BLE devices may provide efficient solutions for indoor localization. In this paper, we present a system topology and algorithms for self-localization where a receiver with an antenna array utilizes the AoAs from fixed battery powered beacons to self-localize, without a centralized system or wall-power infrastructure. We conduct two main experiments using a BLE receiver of our own design. Firstly, we validate the expected behaviour in an anechoic chamber, computing the AoA with an RMSE of 10.7 degrees conduct a test in an outdoor area of 12 by 12 meters using four beacons, and present pre-processing steps prior to computing the AoAs, followed by position estimations achieving a mean absolute error of 3.6 m for 21 map positions, with a minimum as low as 1.1 m.

2023

Challenges and Opportunities in C/C++ Source-To-Source Compilation (Invited Paper)

Authors
Bispo, J; Paulino, N; Sousa, LM;

Publication
14th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 12th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM 2023, January 17, 2023, Toulouse, France.

Abstract
The C/C++ compilation stack (Intermediate Representations (IRs), compilation passes and backends) is encumbered by a steep learning curve, which we believe can be lowered by complementing it with approaches such as source-to-source compilation. Source-to-source compilation is a technology that is widely used and quite mature in certain programming environments, such as JavaScript, but that faces a low adoption rate in others. In the particular case of C and C++ some of the identified factors include the high complexity of the languages, increased difficulty in building and maintaining C/C++ parsers, or limitations on using source code as an intermediate representation. Additionally, new technologies such as Multi-Level Intermediate Representation (MLIR) have appeared as potential competitors to source-to-source compilers at this level. In this paper, we present what we have identified as current challenges of source-to-source compilation of C and C++, as well as what we consider to be opportunities and possible directions forward. We also present several examples, implemented on top of the Clava source-to-source compiler, that use some of these ideas and techniques to raise the abstraction level of compiler research on complex compiled languages such as C or C++. The examples include automatic parallelization of for loops, high-level synthesis optimisation, hardware/software partitioning with run-time decisions, and automatic insertion of inline assembly for fast prototyping of custom instructions. © João Bispo, Nuno Paulino, and Luís Miguel Sousa.

Supervised
thesis

2021

Aplicação de tecnologias digitais e ludificação ao acolhimento e integração de novos colaboradores.

Author
Pedro Miguel de Serpa Pinto Pereira Gomes

Institution
UP-FEUP

2021

Supply Chain tracking and management with Distributed Ledger

Author
João Malheiro de Sousa

Institution
UP-FEUP

2020

Controlo avançado de sistemas híbridos de armazenamento de energia

Author
Hélder João Loureiro Pereira

Institution
UP-FEUP

2020

Automotive Interior Sensing - Temporal Consistent Human Body Pose Estimation

Author
José Martinho Oliveira Peres

Institution
UP-FEUP

2018

A holistic order allocation strategy for profitability maximization: a simulation study

Author
João Marinho Alves

Institution
UP-FEUP