Cookies Policy
The website need some cookies and similar means to function. If you permit us, we will use those means to collect data on your visits for aggregated statistics to improve our service. Find out More
Accept Reject
  • Menu
Publications

Publications by HumanISE

2021

An ensemble of autonomous auto-encoders for human activity recognition

Authors
Garcia, KD; de Sa, CR; Poel, M; Carvalho, T; Mendes Moreira, J; Cardoso, JMP; de Carvalho, ACPLF; Kok, JN;

Publication
NEUROCOMPUTING

Abstract
Human Activity Recognition is focused on the use of sensing technology to classify human activities and to infer human behavior. While traditional machine learning approaches use hand-crafted features to train their models, recent advancements in neural networks allow for automatic feature extraction. Auto-encoders are a type of neural network that can learn complex representations of the data and are commonly used for anomaly detection. In this work we propose a novel multi-class algorithm which consists of an ensemble of auto-encoders where each auto-encoder is associated with a unique class. We compared the proposed approach with other state-of-the-art approaches in the context of human activity recognition. Experimental results show that ensembles of auto-encoders can be efficient, robust and competitive. Moreover, this modular classifier structure allows for more flexible models. For example, the extension of the number of classes, by the inclusion of new auto-encoders, without the necessity to retrain the whole model. (c) 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).

2021

Formal verification of Matrix based MATLAB models using interactive theorem proving

Authors
Gauhar, A; Rashid, A; Hasan, O; Bispo, J; Cardoso, JMP;

Publication
PEERJ COMPUTER SCIENCE

Abstract
MATLAB is a software based analysis environment that supports a high-level programing language and is widely used to model and analyze systems in various domains of engineering and sciences. Traditionally, the analysis of MATLAB models is done using simulation and debugging/testing frameworks. These methods provide limited coverage due to their inherent incompleteness. Formal verification can overcome these limitations, but developing the formal models of the underlying MATLAB models is a very challenging and time-consuming task, especially in the case of higher-order-logic models. To facilitate this process, we present a library of higher-order-logic functions corresponding to the commonly used matrix functions of MATLAB as well as a translator that allows automatic conversion of MATLAB models to higher-order logic. The formal models can then be formally verified in an interactive theorem prover. For illustrating the usefulness of the proposed library and approach, we present the formal analysis of a Finite Impulse Response (FIR) filter, which is quite commonly used in digital signal processing applications, within the sound core of the HOL Light theorem prover.

2021

A methodology and framework for software memoization of functions

Authors
Pinto, P; Cardoso, JMP;

Publication
CF '21: Computing Frontiers Conference, Virtual Event, Italy, May 11-13, 2021

Abstract
Enhancing performance is crucial when developing applications for high-performance and embedded computing. It requires sophisticated techniques and in-depth knowledge of the application domain and target architecture. Typically, developers prioritize the application's functional requirements over extra-functional requirements. Thus, a large part of the optimization effort is shifted to performance engineers, who rely on manual effort, alongside many analysis and optimization tools that need integration. This paper focuses on memoization, which caches results of pure computations and retrieves them if a function is called with repeating arguments. We propose a methodology for allowing developers and performance engineers to apply memoization straightforwardly by automating code analysis, code transformations, and memoization-specific profiling. It helps developers with no optimization expertise to quickly set up memoization and, simultaneously, it provides performance engineers with highly customizable analysis and memoization. We provide a concrete implementation supported by a DSL, a source-to-source compiler, and a memoization framework. We evaluate the methodology and framework with publicly available benchmarks. We show how one can analyze applications to select functions with performance improvement potential, which the experiments reveal might be challenging to find, and improve some applications with minimal effort. © 2021 ACM.

2021

A Binary Translation Framework for Automated Hardware Generation

Authors
Paulino, N; Bispo, J; Ferreira, JC; Cardoso, JMP;

Publication
IEEE MICRO

Abstract
As applications move to the edge, efficiency in computing power and power/energy consumption is required. Heterogeneous computing promises to meet these requirements through application-specific hardware accelerators. Runtime adaptivity might be of paramount importance to realize the potential of hardware specialization, but further study is required on workload retargeting and offloading to reconfigurable hardware. This article presents our framework for the exploration of both offloading and hardware generation techniques. The framework is currently able to process instruction sequences from MicroBlaze, ARMv8, and riscv32imaf binaries, and to represent them as Control and Dataflow Graphs for transformation to implementations of hardware modules. We illustrate the framework's capabilities for identifying binary sequences for hardware translation with a set of 13 benchmarks.

2021

On Data Parallelism Code Restructuring for HLS Targeting FPGAs

Authors
Campos, R; Cardoso, JMP;

Publication
2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)

Abstract
FPGAs have emerged as hardware accelerators, and in the last decade, researchers have proposed new languages and frameworks to improve the efficiency when mapping computations to FPGAs. One of the main tasks when considering the mapping of software code to FPGAs is code restructuring. Code restructuring is of paramount importance to achieve efficient FPGA-based accelerators, and its automation continues to be a challenge. This paper describes our recent work on techniques to automatically restructure and annotate C code with directives optimized for HLS targeting FPGAs. The input of our approach consists of an unfolded dataflow graph (DFG), currently obtained by a trace of the program's execution, and restructured C code with HLS directives as output. Specifically, in this paper we propose algorithms to optimize the input DFGs and use isomorphic graph detection for exposing data-level parallelism. The experimental results show that our approach is able to generate efficient FPGA implementations, with significant speedups over the input unmodified source codes, and very competitive to implementations obtained by manual optimizations and by previous approaches. Furthermore, the experiments show that, using our approach, it is possible to extract data-parallelism in linear to quadratic time with respect to the number of nodes of the input DFG.

2021

On the Performance Effect of Loop Trace Window Size on Scheduling for Configurable Coarse Grain Loop Accelerators

Authors
Santos, T; Paulino, N; Bispo, J; Cardoso, JMP; Ferreira, JC;

Publication
2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT)

Abstract
By using Dynamic Binary Translation, instruction traces from pre-compiled applications can be offloaded, at runtime, to FPGA-based accelerators, such as Coarse-Grained Loop Accelerators, in a transparent way. However, scheduling onto coarse-grain accelerators is challenging, with two of current known issues being the density of computations that can be mapped, and the effects of memory accesses on performance. Using an in-house framework for analysis of instruction traces, we explore the effect of different window sizes when applying list scheduling, to map the window operations to a coarse-grain loop accelerator model that has been previously experimentally validated. For all window sizes, we vary the number of ALUs and memory ports available in the model, and comment how these parameters affect the resulting latency. For a set of benchmarks taken from the PolyBench suite, compiled for the 32-bit MicroBlaze softcore, we have achieved an average iteration speedup of 5.10x for a basic block repeated 5 times and scheduled with 8 ALUs and memory ports, and an average speedup of 5.46x when not considering resource constraints. We also identify which benchmarks contribute to the difference between these two speedups, and breakdown their limiting factors. Finally, we reflect on the impact memory dependencies have on scheduling.

  • 160
  • 663