2016
Authors
Silva, B; Delbem, A; Bonato, V; Diniz, PC;
Publication
2015 International Conference on ReConFigurable Computing and FPGAs, ReConFig 2015
Abstract
Heterogeneous Multi-core Processors (HMPs) have the potential to outperform homogeneous multi-core processors in terms of raw performance and/or energy as both their Instruction-Set Architectures (ISA) and cache configurations (size and set associativity) can be tailored to the specific needs of a computation. In this paper we describe algorithm for the generation of custom HMP and a dynamic run-time core-mapping and scheduling policy that exploits the heterogeneity of such systems customized with respect to the number of cores and L1 cache memory sizes. Experimental results targeting custom HMPs implemented on a contemporary Field-Programmable Gate Array (FPGA) with LEON3 cores as their base for a set of 50 application codes reveal that the proposed core-mapping and scheduling algorithms achieves an average execution time improvement of 18.3% and a 12% energy reduction over a static core-mapping and list-scheduling algorithms. © 2015 IEEE.
2014
Authors
Hukerikar, S; Diniz, PC; Lucas, RF; Teranishi, K;
Publication
Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014
Abstract
As the scale and complexity of future High Performance Computing systems continues to grow, the rising frequency of faults and errors and their impact on HPC applications will make it increasingly difficult to accomplish useful computation. Traditional means of fault detection and correction are either hardware based or use software based redundancy. Redundancy based approaches usually entail complete replication of the program state or the computation and therefore incurs substantial overhead to application performance. Therefore, the wide-scale use of full redundancy in future exascale class systems is not a viable solution for error detection and correction. In this paper we present an application level fault detection approach that is based on adaptive redundant multithreading. Through a language level directive, the programmer can define structured code blocks. When these blocks are executed by multiple threads and their outputs compared, we can detect errors in specific parts of the program state that will ultimately determine the correctness of the application outcome. The compiler outlines such code blocks and a runtime system reasons whether their execution by redundant threads should enabled/disabled by continuously observing and learning about the fault tolerance state of the system. By providing flexible building blocks for application specific fault detection, our approach makes possible more reasonable performance overheads than full redundancy. Our results show that the overheads to application performance are in the range of 4% to 70% due to runtime system being continuously aware of the rate and source of system faults, rather than the usual overhead in the excess of 100% that is incurred by complete replication. © 2014 IEEE.
2017
Authors
Diniz, PC; Liao, C; Quinlan, DJ; Lucas, RF;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
The most widely used resiliency approach today, based on Checkpoint and Restart (C/R) recovery, is not expected to remain viable in the presence of the accelerated fault and error rates in future Exascale-class systems. In this paper, we introduce a series of pragma directives and the corresponding source-to-source transformations that are designed to convey to a compiler, and ultimately a fault-aware run-time system, key information about the tolerance to memory errors in selected sections of an application. These directives, implemented in the ROSE compiler infrastructure, convey information about storage mapping and error tolerance but also amelioration and recovery using externally provided functions and multi-threading. We present preliminary results of the use of a subset of these directives for a simple implementation of the conjugate-gradient numerical solver in the presence of uncorrected memory errors, showing that it is possible to implement simple recovery strategies with very low programmer effort and execution time overhead. © Springer International Publishing AG 2017.
2013
Authors
Hukerikar, S; Diniz, PC; Lucas, RF;
Publication
2013 IEEE High Performance Extreme Computing Conference, HPEC 2013
Abstract
Emerging large-scale, data intensive applications that use the graph abstraction to represent problems in a broad spectrum of scientific and analytics applications have radically different features from floating point intensive scientific applications. These complex graph applications, besides having large working datasets, exhibit very low spatial and temporal locality which makes designing algorithmic fault tolerance for these quite challenging. They will run on future exascale-class High Performance Computing (HPC) systems that will contain massive number of components, and will be built from devices far less reliable than those used today. In this paper we propose software based approaches that increase robustness for these data intensive, graph-based applications by managing the resiliency in terms of the data flow progress and validation of pointer computations. Our experimental results show that such a simple approach incurs fairly low execution time overheads while allowing these computations to survive a large number of faults that would otherwise always result in the termination of the computation. © 2013 IEEE.
2018
Authors
Hukerikar, S; Teranishi, K; Diniz, PC; Lucas, RF;
Publication
International Journal of Parallel Programming
Abstract
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. However, the use of complete redundancy incurs significant overhead to the application performance. In this paper we present RedThreads, an interface that provides application-level fault detection and correction based on RMT, but applies the thread-level redundancy adaptively. We describe the RedThreads syntax and semantics, and the supporting compiler infrastructure and runtime system. Our approach enables application programmers to scope the extent of redundant computation. Additionally, the runtime system permits the use of RMT to be dynamically enabled, or disabled, based on the resiliency needs of the application and the state of the system. Our experimental results demonstrate how adaptive RMT exploits programmer insight and runtime inference to dynamically navigate the trade-off space between an application’s resilience coverage and the associated performance overhead of redundant computation. © 2017, Springer Science+Business Media New York.
2013
Authors
Hukerikar, S; Diniz, PC; Lucas, RF;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
The challenge of resilience is becoming increasingly important on the path to exascale capability in High Performance Computing (HPC) systems. With clock frequencies unlikely to increase as aggressively as they have in the past, future large scale HPC systems aspiring exaflop capability will need an exponential increase in the count of the ALUs and memory modules deployed in their design [Kogge 2008]. The Mean Time to Failure (MTTF) of the system however, scales inversely to the number of components in the system. Furthermore, these systems will be constructed using devices that are far less reliable than those used today, as transistor geometries shrink and the failures due to chip manufacturing variability, effects of transistor aging as well as transient soft errors will become more prevalent. Therefore the sheer scale of future exascale supercomputers, together with the shrinking VLSI geometries will conspire to make faults and failures increasingly the norm rather than the exception. © 2013 Springer-Verlag.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.