Publications

Publications by Pedro Diniz

1997

Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs

Authors
Diniz, P; Rinard, M;

Publication
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Abstract
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. In an object-based programming system the natural granularity is to give each object its own lock. Each operation can then make its execution atomic by acquiring and releasing the lock for the object that it accesses. But this fine lock granularity may have high synchronization overhead. To achieve good performance it may be necessary to reduce the overhead by coarsening the granularity at which the computation locks objects. In this paper we describe a static analysis technique — lock coarsening — designed to automatically increase the lock granularity in object-based programs with atomic operations. We have implemented this technique in the context of a parallelizing compiler for irregular, object-based programs. Experiments show these algorithms to be effective in reducing the lock overhead to negligible levels. © Springer-Verlag Berlin Heidelberg 1997.

CloseRead Abstract

2002

Coarse-grain pipelining on multiple FPGA architectures

Authors
Ziegler, H; So, B; Hall, M; Diniz, PC;

Publication
IEEE Symposium on FPGAs for Custom Computing Machines, Proceedings

Abstract
Reconfigurable systems, and in particular, FPGA-based custom computing machines, offer a unique opportunity to define application-specific architectures. These architectures offer performance advantages for application domains such as image processing, where the use of customized pipelines exploits the inherent coarse-grain parallelism. In this paper we describe a set of program analyses and an implementation that map a sequential and un-annotated C program into a pipelined implementation running on a set of FPGAs, each with multiple external memories. Based on well-known parallel computing analysis techniques, our algorithms perform unrolling for operator parallelization, reuse and data layout for memory parallelization and precise communication analysis. We extend these techniques for FPGA-based systems to automatically partition the application data and computation into custom pipeline stages, taking into account the available FPGA and interconnect resources. We illustrate the analysis components by way of an example, a machine vision program. We present the algorithm results, derived with minimal manual intervention, which demonstrate the potential of this approach for automatically deriving pipelined designs from high-level sequential specifications. © 2002 IEEE.

CloseRead Abstract

1996

Commutativity analysis: A new analysis framework for parallelizing compilers

Authors
Rinard, MC; Diniz, PC;

Publication
SIGPLAN Notices (ACM Special Interest Group on Programming Languages)

Abstract
This paper presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. It then analyzes the program at this granularity to discover when operations commute (i.e. generate the same final result regardless of the order in which they execute). If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. We have implemented a prototype compilation system that uses commutativity analysis as its primary analysis framework. We have used this system to automatically parallelize two complete scientific computations: the Barnes-Hut N-body solver and the Water code. This paper presents performance results for the generated parallel code running on the Stanford DASH machine. These results provide encouraging evidence that commutativity analysis can serve as the basis for a successful parallelizing compiler. © 1996 ACM.

CloseRead Abstract

1997

Dynamic Feedback: An Effective Technique for Adaptive Computing *

Authors
Diniz, P; Rinard, M;

Publication
SIGPLAN Notices (ACM Special Interest Group on Programming Languages)

Abstract
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment. We have implemented dynamic feedback in the context of a parallelizing compiler for object-based programs. The generated code uses dynamic feedback to automatically choose the best synchronization optimization policy. Our experimental results show hat the synchronization optimization policy has a significant impact on the overall performance of the computation, that the best policy varies from program to program, that the compiler is unable to statically choose the best policy, and that dynamic feedback enables the generated code to exhibit performance that is comparable to that of code that has been manually tuned to use the best policy. We have also performed a theoretical analysis which proves, under certain assumptions, a guaranteed optimality bound for dynamic feedback relative to a hypothetical (and unrealizable) optimal algorithm that uses the best policy at every point during the execution.

CloseRead Abstract

1997

Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers

Authors
Rinard, MC; Diniz, PC;

Publication
ACM Transactions on Programming Languages and Systems

Abstract
This article presents a new analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutativity analysis views the computation as composed of operations on objects. It then analyzes the program at this granularity to discover when operations commute (i.e., generate the same final result regardless of the order in which they execute). If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. We have implemented a prototype compilation system that uses commutativity analysis as its primary analysis technique We have used this system to automatically parallelize three complete scientific computations: the Barnes-Hut N-body solver, the Water liquid simulation code, and the String seismic simulation code. This article presents performance results for the generated parallel code running on the Stanford DASH machine. These results provide encouraging evidence that commutativity analysis can serve as the basis for a successful parallelizing compiler.

CloseRead Abstract

1997

Synchronization transformations for parallel computing

Authors
Diniz, P; Rinard, M;

Publication
Conference Record of the Annual ACM Symposium on Principles of Programming Languages

Abstract
As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations to deliver efficient parallel software. A new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks are described. Also introduced is a new synchronization algorithm, lock elimination, for reducing synchronization overhead.

CloseRead Abstract