Compiler Optimizations & Architectural Support

Rajiv Gupta
Prof. Vijayanand Nagarajan (2009)
Dr. Sriraman Tallam (2007)
Prof. Xiangyu Zhang (2006)
Dr. Bengu Li (2006)
Dr. Arvind Krishnaswamy (2006)
Prof. Jun Yang (2002)
Prof. Youtao Zhang (2002)
Prof. Ras Bodik (1999)
Prof. Soner Onder (1999)

Funding

National Science Foundation, CISE, ITR Medium Grant Program,
CCR-0324969, 9/2003-8/2007.
National Science Foundation, CISE, Computer Systems Architecture Program,
CCR-0208756, 9/2002-8/2006.
Intel Corp., MRL, Santa Clara,
6/2003-6/2006.
National Science Foundation, CISE, ITR Small Grant Program,
CCR-0220334, 9/2002-8/2005.
National Science Foundation, CISE, Compiler Program,
CCR-0105355, 9/2001-8/2005.
DARPA, Power Aware Computing/Communication Program,
Award no. F29601-00-1-0183, 7/2000-11/2002.
Intel Corporation, MRL, Santa Clara, California,
1/1995-6/2002.
National Science Foundation, CISE, Compiler Program,
CCR-0096122, 9/1998-8/2002.
National Science Foundation, CISE, Experimental Systems Program,,
EIA-9806525, 9/1998-8/2002.

Description

This project considered hardware designs, architectural innovations, and compiler techniques for simultaneously optimizing memory usage, performance, and power consuption of applications. Security issues relevant to distributed and parallel systems were also considered.

Embedded Processors. In the context of embedded processors we developed techniques which allow us to achieve performance while operating on compacted code and data. We have shown how compact code can be executed to deliver performance through proper instruction set and microarchitectural support. Compacted high performance code results in lower power consumption. We have also developed new compiler algorithms and instruction set support to show how compacted narrow width data, prevelant in multimedia codes, can be effectively manipulated. A novel register allocation algorithm that allows colocation of multiple narrow width data items in a single register have been designed.
Low Power Caches and Buses. These techniques were developed techniques for lowering the power consumed by on-chip memory and external data buses associated with the processors. They are useful for both high-performance and embedded processors since in both types of processors on-chip memory and external buses consume significant part of the total power. These techniques are based upon compression/encoding of frequent values. We have also developed compiler support for carrying out data compression for reducing power consumed by the memory subsystem.
Superscalar and VLIW Processors. In context of high performance superscalar processors we have developed low complexity memory disambiguation mechanism, path-sensitive value prediction technique, power efficient dynamic instruction issue mechanism, and load/store reuse techniques. These techniques have also been implemented as part of the gcc compiler and the FAST simulation system. In context of VLIW processors, a novel architecture that incorporates value prediction has been developed. In addition, global instruction scheduling algorithms based upon control dependence regions have been developed.
Path-Sensitive Optimizations represent situations in which it is possible to optimize a statement with respect to some paths along which it lies while the same optimization opportunity does not exist along other paths through the statement. We have developed demand-driven and profile-guided analysis for aggressive application of path-sensitive optimizations. Examples of optimizations studied include conditional branch elimination, partial redundancy elimination, partial dead code elimination, load redundancy removal, and elimination of array bound checks. Code motion and control flow restructuring are two transformations that have been used to enable path sensitive optimizations along frequently executed paths. In this research, we also developed techniques which apply optimizations in a resource sensitive and take advantage of machine characteristics such as support for speculation and predication.

Publications

Secure and Power-Aware Processing

V. Nagarajan, R. Gupta, and A. Krishnaswamy,
``Compiler-Assisted Memory Encryption for Embedded Processors,''
International Conference on High Performance Embedded Architectures and Compilers (HiPEAC),
Springer Verlag, LNCS 4367, pages 7-22, Ghent, Belgium, January 2007.
Extended version in invited special issue
Transactions on High Performance Embedded Architectures and Compilers,
LNCS 5470, Springer, Vol. 2, pages 23-44, 2009.
H. Liu and R. Gupta,
Temporal Analysis of Routing Activity for Anomaly Detection in Ad hoc Networks,
Third IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS),
pages 505-508, Vancouver, October 2006.
B. Li, G. Venkatesh, B. Calder, and R. Gupta,
Exploiting Computation Reuse Cache to Reduce Energy in Network Processors,
International Conference on High Performance Embedded Architectures and Compilers, (HiPEAC)
LNCS 3793, Springer Verlag, pages 251-265, Barcelona, Spain, Nov. 2005.
Y. Zhang, L. Gao, J. Yang, X. Zhang and R. Gupta,
SENSS: Security Enhancement to Symmeteric Shared Memory Multiprocessors,
IEEE 11th International Symposium on High Performance Computer Architecture (HPCA),
pages 352-362, San Francisco, California, February 2005.
H. Liu and R. Gupta,
Selective Backbone Construction for Topology Control,
First IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS),
pages 41-50, Fort Lauderdale, Florida, October 2004.
S. Tallam and R. Gupta,
Profile-Guided Java Program Partitioning for Power Aware Computing,
Sixth International Workshop on Java for Parallel and Distributed Computing,
Sante Fe, NM, April 2004.
X. Zhang and R. Gupta,
Hiding Program Slices for Software Security,
First Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO),
pages 325-336, San Francisco, CA, March 2003.

Embedded Processors: Compacted Code and Performance
A. Krishnaswamy and R. Gupta,
Efficient Use of Invisible Registers in Thumb Code,
IEEE/ACM 38th International Symposium on Microarchitecture (MICRO),
pages 30-40, Barcelona, Spain, Nov. 2005.
A. Krishnaswamy and R. Gupta,
Dynamic Coalescing for 16-bit Instructions,
ACM Transactions on Embedded Computing Systems (TECS),
Vol. 4, No. 1, pages 3-37, special issue of selected LCTES'03 papers, Feb. 2005.
A. Krishnaswamy and R. Gupta,
Enhancing the Performance of 16-bit Code Using Augmenting Instructions,
ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES),
pages 254-264, San Diego, CA, June 2003.
A. Krishnaswamy and R. Gupta,
Mixed Width Instruction Sets,
Communications of the ACM (CACM), invited paper in special section on Program Compaction,
Vol. 46, No. 8, pages 47-52, August 2003.
W-K. Chen, B. Li, and R. Gupta,
Code Compaction of Matching Single-Entry Multiple-Exit Regions,
10th Annual International Static Analysis Symposium (SAS),
pages 401-417, San Diego, CA, June 2003.
A. Krishnaswamy and R. Gupta,
Profile Guided Selection of ARM and Thumb Instructions,
ACM SIGPLAN Joint Conference on Languages Compilers and Tools for Embedded Systems
& Software and Compilers for Embedded Systems (LCTES-SCOPES),
pages 55-63, Berlin, Germany, June 2002.

Embedded Processors: Compacted Data and Performance
B. Li, Y. Zhang, and R. Gupta,
Speculative Subword Register Allocation in Embedded Processors,
The 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC),
LNCS 3602, Springer Verlag, pages 56-71, West Lafayette, Indiana, September 2004.
B. Li and R. Gupta,
Simple Offset Assignment in Presence of Subword Data,
International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES),
pages 12-23, San Jose, CA, October 2003.
S. Tallam and R. Gupta,
Bitwidth Aware Global Register Allocation,
30th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL),
pages 85-96, New Orleans, LA, January 2003.
B. Li and R. Gupta,
Bit Section Instruction Set Extension of ARM for Embedded Applications,
International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES),
pages 69-78, Grenoble, France, October 2002.
R. Gupta, E. Mehofer, and Y. Zhang,
A Representation for Bit Section based Analysis and Optimization,
International Conference on Compiler Construction (CC),
LNCS 2304, Springer Verlag, pages 62-77, Grenoble, France, April 2002.

On-chip Memory and Buses
Y. Zhang and R. Gupta,
Compressing Heap Data for Improved Memory Performance,
Software - Practice & Experience (SP&E),
Volume 36, Issue 10, pages 1081-1111, August 2006.
J. Yang, R. Gupta, and C. Zhang,
Frequent Value Encoding for Low Power Data Buses,
ACM Transactions on Design Automation of Electronic Systems (TODAES),
Vol. 9, No. 3, pages 354-384, July 2004.
Recipient of ICPP 2003 Most Original Paper Award.
Y. Zhang and R. Gupta,
Enabling Partial Cache Line Prefetching Through Data Compression,
International Conference on Parallel Processing (ICPP),
pages 277-285, Kaohsiung, Taiwan, October 2003.
J. Yang and R. Gupta,
Frequent Value Locality and its Applications,
ACM Transactions on Embedded Computing Systems (ACM TECS),
special inaugural issue on Memory Systems, Vol. 1, No. 1, pages 79-105, Nov. 2002.
J. Yang and R. Gupta,
Energy Efficient Frequent Value Data Cache Design,
IEEE/ACM 35th International Symposium on Microarchitecture (MICRO),
pages 197-207, Istanbul, Turkey, November 2002.
Y. Zhang and R. Gupta,
Data Compression Transformations for Dynamically Allocated Data Structures,
International Conference on Compiler Construction (CC),
LNCS 2304, Springer Verlag, pages 14-28, Grenoble, France, April 2002.
J. Yang and R. Gupta,
FV Encoding for Low-Power Data I/O,
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED),
pages 84-87, Huntington, CA, August 2001.
J. Yang, Y. Zhang and R. Gupta,
Frequent Value Compression in Data Caches,
IEEE/ACM 33rd International Symposium on Microarchitecture (MICRO-33),
pages 258-265, Monterey, CA, December 2000.
Y. Zhang, J. Yang, and R. Gupta,
Frequent Value Locality and Value-Centric Data Cache Design,
ACM 9th International Conference on Architectural Support for Programming
Lanuguages and Operating Systems (ASPLOS-IX),
pages 150-159, Cambridge, MA, November 2000.

Superscalar and VLIW Processors
S. Onder and R. Gupta,
Instruction Wake-up in Wide Issue Superscalars,
European Conference on Parallel Computing (Euro-Par),
LNCS 2150, Springer Verlag, pages 418-427, Manchester, UK, August 2001.
S. Rele, S. Pande, S. Onder, and R. Gupta,
Optimizing Static Power Dissipation by Functional Units in Superscalar Processors,
International Conference on Compiler Construction (CC),
LNCS 2304, Springer Verlag, pages 261-275, Grenoble, France, April 2002.
J. Yang and R. Gupta,
Energy-Efficient Load and Store Reuse,
ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED),
pages 72-75, Huntington, CA, August 2001.
S. Onder and R. Gupta,
Load and Store Reuse Using Register File Contents,
ACM 15th International Conference on Supercomputing (ICS),
pages 289-302, Sorrento, Naples, Italy, June 2001.
J. Yang and R. Gupta,
Load Redundancy Removal through Instruction Reuse,
International Conference on Parallel Processing (ICPP),
pages 61-68, Toronto, Canada, August 2000.
S. Onder and R. Gupta,
Dynamic Memory Disambiguation in the Presence of Out-of-order Store Issuing,
IEEE/ACM 32nd International Symposium on Microarchitecture (MICRO),
pages 170-176, Haifa, Israel, November 1999. (longer version)
S. Onder, J. Xu, and R. Gupta,
Caching and Predicting Branch Sequences for Improved Fetch Effectiveness,
International Conference on Parallel Architectures and Compilation Techniques (PACT),
pages 294-302, Newport Beach, California, October 1999.
T. Nakra, R. Gupta, and M.L. Soffa,
Value Prediction in VLIW Machines,
ACM/IEEE 26th International Symposium on Computer Architecture (ISCA),
pages 258-269, Atlanta, Georgia, May 1999.
T. Nakra, R. Gupta, and M.L. Soffa,
Global Context-based Value Prediction,
IEEE 5th International Symposium on High Performance Computer Architecture (HPCA),
pages 4-12, Orlando, Florida, January 1999.
S. Onder and R. Gupta,
Superscalar Execution with Direct Data Forwarding,
International Conference on Parallel Architectures and Compilation Techniques (PACT),
pages 130-135, Paris, France, October 1998.
R. Gupta and M.L. Soffa,
``Region Scheduling: An Approach for Detecting and Redistributing Parallelism,''
IEEE Transactions on Software Engineering,
Vol. 16, No. 4, pages 421-431, April 1990.
R. Gupta and M.L. Soffa,
``Compile-time Techniques for Efficient Utilization of Parallel Memories,'' ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications, Languages and Systems (PPEALS), pages 235-246, New Haven, July 1988.

Path-Sensitive Optimizations
R. Bodik, R. Gupta, and V. Sarkar,
ABCD: Eliminating Array Bounds Checks on Demand,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
pages 321-333, Vancouver B.C., Canada, June 2000.
R. Bodik, R. Gupta, and M.L. Soffa,
Load-Reuse Analysis: Design and Evaluation,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
pages 64-76, Atlanta, Georgia, May 1999.
R. Gupta and R. Bodik,
Register Pressure Sensitive Redundancy Elimination,
International Conference on Compiler Construction (CC),
LNCS 1575, Springer Verlag, pages 107-121, Amsterdam, Netherlands, March 1999.
Selected for 20 Years of PLDI.
R. Bodik, R. Gupta and M.L. Soffa,
Complete Removal of Redundant Expressions,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
pages 1-14, Montreal, Canada, June 1998.
R. Gupta, D. Berson, and J.Z. Fang,
Path Profile Guided Partial Redundancy Elimination Using Speculation,
IEEE International Conference on Computer Languages (ICCL),
pages 230-239, Chicago, Illinois, May 1998.
R. Gupta, D. Berson, and J.Z. Fang,
Path Profile Guided Partial Dead Code Elimination Using Predication,
International Conference on Parallel Architectures and Compilation Techniques (PACT),
pages 102-115, San Francisco, California, November 1997.
R. Bodik and R. Gupta,
Partial Dead Code Elimination using Slicing Transformations,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
pages 159-170, Las Vegas, Nevada, June 1997.
R. Bodik, R. Gupta, and M.L. Soffa,
Interprocedural Conditional Branch Elimination,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
pages 146-158, Las Vegas, Nevada, June 1997.
R. Bodik and R. Gupta,
Array Data-Flow Analysis for Load-Store Optimizations in Superscalar Architectures,
8th Annual Workshop on Languages and Compilers for Parallel Computing (LCPC),
LNCS 1033 Springer Verlag, pages 1-15, Columbus, Ohio, August 1995.
Also published in International Journal of Parallel Programming,
Vol. 24, No. 6, pages 481-512, 1996.
R. Gupta,
A Fresh Look at Optimizing Array Bound Checks,
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
pages 272-282, White Plains, NY, June 1990.
Also published in ACM Letters on Programming Languages and Systems (LOPLAS),
Vol.2, Nos.1-4, pages 135-150, March-December 1994.

Compiler Optimizations & Architectural Support

Funding

Description

Publications

Secure and Power-Aware Processing

Embedded Processors: Compacted Code and Performance

Embedded Processors: Compacted Data and Performance

On-chip Memory and Buses

Superscalar and VLIW Processors

Path-Sensitive Optimizations