2007
Authors
Chen, C; Chame, J; Nelson, YL; Diniz, P; Hall, M; Lucas, R;
Publication
Journal of Physics: Conference Series
Abstract
The enormous and growing complexity of today's high-end systems has increased the already significant challenge of maximizing performance on today's equally complex scientific applications. In this paper, we discuss the role of compiler technology in supporting application developers in a systematic approach to performance tuning of key application components. Based on scenarios taken from manual optimization of scientific codes, we describe how compiler support can enable the programmer to achieve the same or better performance in a much more productive way. We also present examples derived automatically from compiler optimization that show results comparable to hand-tuned performance. © 2007 IOP Publishing Ltd.
1996
Authors
Rinard, MC; Diniz, PC;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
This paper presents the semantic foundations of commutativity analysis, an analysis technique for automatically parallelizing programs written in a sequential, imperative programming language. Commutativity analysis views the computation as composed of operations on objects. It then analyzes the program at this granularity to discover when operations commute (i.e. generate the same result regardless of the order in which they execute). If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. This paper shows that the basic analysis technique is sound. We have implemented a parallelizing compiler that uses commutativity analysis as its basic analysis technique; this paper also presents performance results from two automatically parallelized applications. © Springer-Verlag Berlin Heidelberg 1996.
1996
Authors
Ibarra, O; Diniz, P; Rinard, M;
Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Abstract
Two operations commute if they generate the same result regardless of the order in which they execute. Commutativity is an important property — commuting operations enable significant optimizations in the fields of parallel computing, optimizing compilers, parallelizing compilers and database concurrency control. Algorithms that statically decide if operations commute can be an important component of systems in these fields because they enable the automatic application of these optimizations. In this paper we define the commutativity decision problem and establish its complexity for a variety of basic instructions and control constructs. Although deciding commutativity is, in general, undecidable or computationally intractable, we believe that efficient algorithms exist that can solve many of the cases that arise in practice. © Springer-Verlag Berlin Heidelberg 1996.
1996
Authors
Rinard, M; Diniz, P;
Publication
IEEE Symposium on Parallel and Distributed Processing - Proceedings
Abstract
This paper introduces an analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views computations as composed of operations on objects. It then analyzes the program to discover when operations commute, i.e. leave the objects in the same state regardless of the order in which they execute. If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. Commutativity analysis eliminates many of the limitations that have prevented existing compilers, which use data dependence analysis, from successfully parallelizing pointer-based applications. It enables compilers to parallelize computations that manipulate graphs and eliminates the need to analyze the data structure construction code to extract global properties of the data structure topology. This paper shows how to use symbolic execution and expression manipulation to statically determine that operations commute and how to exploit the extracted commutativity information to generate parallel code. It also presents performance results that demonstrate that commutativity analysis can be used to successfully parallelize the Barnes-Hut hierarchical N-body solver, an important scientific application that manipulates a complex pointer-based data structure.
1996
Authors
Rinard, M; Diniz, P;
Publication
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
Abstract
This paper presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. It then analyzes the program at this granularity to discover when operations commute (i.e. generate the same final result regardless of the order in which they execute). If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. We have implemented a prototype compilation system that uses commutativity analysis as its primary analysis framework. We have used this system to automatically parallelize two complete scientific computations: the Barnes-Hut N-body solver and the Water code. This paper presents performance results for the generated parallel code running on the Stanford DASH machine. These results provide encouraging evidence that commutativity analysis can serve as the basis for a successful parallelizing compiler.
2004
Authors
Park, J; Diniz, PC; Shesha Shayee, KR;
Publication
IEEE Transactions on Computers
Abstract
Selecting which program transformations to apply when mapping computations to FPGA-based computing architectures can lead to prohibitively long design space exploration cycles. An alternative is to develop fast, yet accurate performance and area models to quickly understand the impact and interaction of the transformations. In this paper, we present a combined analytical performance and area modeling approach for complete FPGA designs in the presence of loop transformations. Our approach takes into account the impact of input/output memory bandwidth and memory interface resources, often the limiting factor in the effective implementation of computations. Our preliminary results reveal that our modeling is very accurate, being therefore amenable to be used in a compiler tool to quickly explore very large design spaces. © 2004 IEEE.
The access to the final selection minute is only available to applicants.
Please check the confirmation e-mail of your application to obtain the access code.