Compiler, Architectural, and Hardware Support
Rajiv Gupta
Funded By
-
National Science Foundation, CISE, ITR Medium Grant Program,
CCR-0324969, 9/2003-8/2007.
-
National Science Foundation, CISE, Computer Systems Architecture Program,
CCR-0208756, 9/2002-8/2006.
-
Intel Corp., MRL, Santa Clara,
6/2003-6/2006.
-
National Science Foundation, CISE, ITR Small Grant Program,
CCR-0220334, 9/2002-8/2005.
-
National Science Foundation, CISE, Compiler Program,
CCR-0105355, 9/2001-8/2005.
-
DARPA, Power Aware Computing/Communication Program,
Award no. F29601-00-1-0183, 7/2000-11/2002.
-
Intel Corporation, MRL, Santa Clara, California,
1/1995-6/2002.
-
National Science Foundation, CISE, Compiler Program,
CCR-0096122, 9/1998-8/2002.
-
National Science Foundation, CISE, Experimental Systems Program,,
EIA-9806525, 9/1998-8/2002.
Ph.D. Students
-
Min Feng
-
Vijayanand Nagarajan
-
Chen Tian
-
Sriraman Tallam -- graduated 2007
-
Xiangyu Zhang -- graduated 2006
-
Bengu Li -- graduated 2006
-
Arvind Krishnaswamy -- graduated 2006
-
Jun Yang -- graduated 2002
-
Youtao Zhang -- graduated 2002
-
Ras Bodik -- graduated 1999
-
Soner Onder -- graduated 1999
Description
This project is considering hardware designs, architectural innovations, and
compiler techniques for simultaneously optimizing memory usage, performance,
and power consuption of embedded applications. A primary emphasis of this
project is on developing processors with low power consumption with minimal
sacrifice of performance. Security issues relevant to distributed/parallel
embedded systems are being considered. The work carried out so far includes
the following:
-
Embedded Processors.
In the context of embedded processors we developed techniques
which allow us to achieve performance while operating on compacted code and data.
We have shown how compact code can be executed to deliver performance through proper
instruction set and microarchitectural support. Compacted high performance code results
in lower power consumption. We have also developed new compiler algorithms and
instruction set support to show how compacted narrow width data, prevelant in
multimedia codes, can be effectively manipulated. A novel register allocation algorithm
that allows colocation of multiple narrow width data items in a single register have
been designed.
-
Low Power Caches and Buses.
These techniques were developed techniques for lowering the power consumed by
on-chip memory and external data buses associated with the processors. They are
useful for both high-performance and embedded processors since in both types of
processors on-chip memory and external buses consume significant part of the
total power. These techniques are based upon compression/encoding of frequent
values. We have also developed compiler support for carrying out data compression
for reducing power consumed by the memory subsystem.
-
Superscalar and VLIW Processors.
In context of high performance superscalar processors we have developed low complexity
memory disambiguation mechanism, path-sensitive value prediction technique, power
efficient dynamic instruction issue mechanism, and load/store reuse techniques.
These techniques have also been implemented as part of the gcc compiler and the
FAST simulation system. In context of VLIW processors, a
novel architecture that incorporates value prediction has been developed. In
addition, global instruction scheduling algorithms based upon control dependence
regions have been developed.
-
Chip Multiprocessors.
The focus of this project has been on developing fine-grained synchronization and
communication support for multiple processors residing on the same chip. A low
overhead hardware fuzzy barrier synchronization mechanism has been designed. To
minimize idling of processors that arrive at the barrier early, fuzzy barrier is
defined to contain a sequence of instructions that a processor can safely execute
while waiting for other processors to arrive. Synchronizing register channels
have been designed to efficiently communicate data values between processors.
Finally, a collective branching mechanism has been designed to enable multiple
processors to a share condition code computed by only one of the processors.
-
Path-Sensitive Optimizations represent situations in which it is possible to optimize
a statement with respect to some paths along which it lies while the same optimization
opportunity does not exist along other paths through the statement. We have developed
demand-driven and profile-guided analysis for aggressive application of
path-sensitive optimizations. Examples of optimizations studied include conditional branch
elimination, partial redundancy elimination, partial dead code elimination, load redundancy
removal, and elimination of array bound checks. Code motion and control flow
restructuring are two transformations that have been used to enable path sensitive
optimizations along frequently executed paths. In this research, we also developed techniques
which apply optimizations in a resource sensitive and take advantage of machine
characteristics such as support for speculation and predication.
Publications
Secure and Power-Aware Processing
-
V. Nagarajan, R. Gupta, and A. Krishnaswamy,
``Compiler-Assisted Memory Encryption for Embedded Processors,''
International Conference on High Performance Embedded Architectures and Compilers (HiPEAC),
Springer Verlag, LNCS 4367, pages 7-22, Ghent, Belgium, January 2007.
Extended version in invited special issue
Transactions on High Performance Embedded Architectures and Compilers,
Vol. 2, No. 1, pages 21-41, Springer Verlag, 2007.
-
H. Liu and R. Gupta,
Temporal Analysis of Routing Activity for Anomaly Detection in Ad hoc Networks,
Third IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS),
pages 505-508, Vancouver, October 2006.
-
B. Li, G. Venkatesh, B. Calder, and R. Gupta,
Exploiting Computation Reuse Cache to Reduce Energy in Network Processors,
International Conference on High Performance Embedded Architectures
and Compilers, (HiPEAC)
LNCS 3793, Springer Verlag, pages 251-265, Barcelona, Spain, Nov. 2005.
-
Y. Zhang, L. Gao, J. Yang, X. Zhang and R. Gupta,
SENSS: Security Enhancement to Symmeteric Shared Memory Multiprocessors,
IEEE 11th International Symposium on High Performance Computer Architecture (HPCA),
pages 352-362, San Francisco, California, February 2005.
-
H. Liu and R. Gupta,
Selective Backbone Construction for Topology Control,
First IEEE International Conference on Mobile Ad-hoc and Sensor Systems (MASS),
pages 41-50, Fort Lauderdale, Florida, October 2004.
-
S. Tallam and R. Gupta,
Profile-Guided Java Program Partitioning for Power Aware Computing,
Sixth International Workshop on Java for Parallel and Distributed Computing,
Sante Fe, NM, April 2004.
-
X. Zhang and R. Gupta,
Hiding Program Slices for Software Security,
First Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO),
pages 325-336, San Francisco, CA, March 2003.
Embedded Processors: Compacted Code and Performance
-
A. Krishnaswamy and R. Gupta,
Efficient Use of Invisible Registers in Thumb Code,
IEEE/ACM 38th International Symposium on Microarchitecture (MICRO),
pages 30-40, Barcelona, Spain, Nov. 2005.
-
A. Krishnaswamy and R. Gupta,
Dynamic Coalescing for 16-bit Instructions,
ACM Transactions on Embedded Computing Systems (TECS),
Vol. 4, No. 1, pages 3-37, special issue of selected LCTES'03 papers, Feb. 2005.
-
A. Krishnaswamy and R. Gupta,
Enhancing the Performance of 16-bit Code Using Augmenting Instructions,
ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES),
pages 254-264, San Diego, CA, June 2003.
-
A. Krishnaswamy and R. Gupta,
Mixed Width Instruction Sets,
Communications of the ACM (CACM),
invited paper in special section on Program Compaction,
Vol. 46, No. 8, pages 47-52, August 2003.
-
W-K. Chen, B. Li, and R. Gupta,
Code Compaction of Matching Single-Entry Multiple-Exit Regions,
10th Annual International Static Analysis Symposium (SAS),
pages 401-417, San Diego, CA, June 2003.
-
A. Krishnaswamy and R. Gupta,
Profile Guided Selection of ARM and Thumb Instructions,
ACM SIGPLAN Joint Conference on Languages Compilers and Tools
for Embedded Systems
& Software and Compilers for Embedded Systems (LCTES-SCOPES),
pages 55-63, Berlin, Germany, June 2002.
Embedded Processors: Compacted Data and Performance
-
B. Li, Y. Zhang, and R. Gupta,
Speculative Subword Register Allocation in Embedded Processors,
The 17th International Workshop on Languages and Compilers
for Parallel Computing (LCPC),
LNCS 3602, Springer Verlag, pages 56-71,
West Lafayette, Indiana, September 2004.
-
B. Li and R. Gupta,
Simple Offset Assignment in Presence of Subword Data,
International Conference on Compilers, Architecture, and Synthesis for
Embedded Systems (CASES),
pages 12-23, San Jose, CA, October 2003.
-
S. Tallam and R. Gupta,
Bitwidth Aware Global Register Allocation,
30th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
(POPL),
pages 85-96, New Orleans, LA, January 2003.
-
B. Li and R. Gupta,
Bit Section Instruction Set Extension of ARM for Embedded Applications,
International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES),
pages 69-78, Grenoble, France, October 2002.
-
R. Gupta, E. Mehofer, and Y. Zhang,
A Representation for Bit Section based Analysis and Optimization,
International Conference on Compiler Construction (CC),
LNCS 2304, Springer Verlag, pages 62-77, Grenoble, France, April 2002.
On-chip Memory and Buses
-
Y. Zhang and R. Gupta,
Compressing Heap Data for Improved Memory Performance,
Software - Practice & Experience (SP&E),
Volume 36, Issue 10, pages 1081-1111, August 2006.
-
J. Yang, R. Gupta, and C. Zhang,
Frequent Value Encoding for Low Power Data Buses,
ACM Transactions on Design Automation of Electronic Systems (TODAES),
Vol. 9, No. 3, pages 354-384, July 2004.
-
Recipient of ICPP 2003 Most Original Paper Award.
Y. Zhang and R. Gupta,
Enabling Partial Cache Line Prefetching Through Data Compression,
International Conference on Parallel Processing (ICPP),
pages 277-285, Kaohsiung, Taiwan, October 2003.
-
J. Yang and R. Gupta,
Frequent Value Locality and its Applications,
ACM Transactions on Embedded Computing Systems (ACM TECS),
special inaugural issue on Memory Systems, Vol. 1, No. 1, pages 79-105, Nov. 2002.
-
J. Yang and R. Gupta,
Energy Efficient Frequent Value Data Cache Design,
IEEE/ACM 35th International Symposium on Microarchitecture (MICRO),
pages 197-207, Istanbul, Turkey, November 2002.
-
Y. Zhang and R. Gupta,
Data Compression Transformations for Dynamically Allocated Data Structures,
International Conference on Compiler Construction (CC),
LNCS 2304, Springer Verlag, pages 14-28, Grenoble, France, April 2002.
-
J. Yang and R. Gupta,
FV Encoding for Low-Power Data I/O,
ACM/IEEE International Symposium on Low Power Electronics
and Design (ISLPED),
pages 84-87, Huntington, CA, August 2001.
-
J. Yang, Y. Zhang and R. Gupta,
Frequent Value Compression in Data Caches,
IEEE/ACM 33rd International Symposium on Microarchitecture (MICRO-33),
pages 258-265, Monterey, CA, December 2000.
-
Y. Zhang, J. Yang, and R. Gupta,
Frequent Value Locality and Value-Centric Data Cache Design,
ACM 9th International Conference on Architectural Support for Programming
Lanuguages and Operating Systems (ASPLOS-IX),
pages 150-159,
Cambridge, MA, November 2000.
Superscalar and VLIW Processors
-
S. Onder and R. Gupta,
Instruction Wake-up in Wide Issue Superscalars,
European Conference on Parallel Computing (Euro-Par),
LNCS 2150, Springer Verlag, pages 418-427, Manchester, UK, August 2001.
-
S. Rele, S. Pande, S. Onder, and R. Gupta,
Optimizing Static Power Dissipation by Functional Units in Superscalar Processors,
International Conference on Compiler Construction (CC),
LNCS 2304, Springer Verlag, pages 261-275, Grenoble, France, April 2002.
-
J. Yang and R. Gupta,
Energy-Efficient Load and Store Reuse,
ACM/IEEE International Symposium on Low Power Electronics
and Design (ISLPED),
pages 72-75, Huntington, CA, August 2001.
-
S. Onder and R. Gupta,
Load and Store Reuse Using Register File Contents,
ACM 15th International Conference on Supercomputing (ICS),
pages 289-302, Sorrento, Naples, Italy, June 2001.
-
J. Yang and R. Gupta,
Load Redundancy Removal through
Instruction Reuse,
International Conference on Parallel Processing (ICPP),
pages 61-68, Toronto, Canada, August 2000.
-
S. Onder and R. Gupta,
Dynamic Memory Disambiguation in the Presence of Out-of-order
Store Issuing,
IEEE/ACM 32nd International Symposium on Microarchitecture (MICRO),
pages 170-176, Haifa, Israel, November 1999.
(longer version)
-
S. Onder, J. Xu, and R. Gupta,
Caching and Predicting Branch Sequences for Improved
Fetch Effectiveness,
International Conference on Parallel Architectures and
Compilation Techniques (PACT),
pages 294-302, Newport Beach, California,
October 1999.
-
T. Nakra, R. Gupta, and M.L. Soffa,
Value Prediction in VLIW Machines,
ACM/IEEE 26th International Symposium on
Computer Architecture (ISCA),
pages 258-269,
Atlanta, Georgia, May 1999.
-
T. Nakra, R. Gupta, and M.L. Soffa,
Global Context-based Value Prediction,
IEEE 5th International Symposium on High Performance Computer
Architecture (HPCA),
pages 4-12, Orlando, Florida, January 1999.
-
S. Onder and R. Gupta,
Superscalar Execution with Direct Data Forwarding,
International Conference on Parallel Architectures and
Compilation Techniques (PACT),
pages 130-135, Paris, France, October 1998.
-
R. Gupta and M.L. Soffa,
``Region Scheduling: An Approach for Detecting and Redistributing Parallelism,''
IEEE Transactions on Software Engineering,
Vol. 16, No. 4, pages 421-431, April 1990.
-
R. Gupta and M.L. Soffa,
``Compile-time Techniques for Efficient Utilization of Parallel Memories,''
ACM SIGPLAN Symposium on Parallel Programming: Experience with Applications,
Languages and Systems (PPEALS), pages 235-246, New Haven, July 1988.
Chip Multiprocessors
-
B. Malloy, R. Gupta, and M.L. Soffa,
A Shape Matching Approach for Scheduling Fine-Grained Parallelism,
IEEE/ACM 25th International Symposium on Microarchitecture (MICRO),
pages 264-267, Portland, Oregon, December 1992.
-
S. Lee and R. Gupta,
Executing Loops on a Fine-Grained MIMD Architecture,
IEEE/ACM 24th International Symposium on Microarchitecture (MICRO),
pages 199-205, Albuquerque, New Mexico, November 1991.
-
R. Gupta, M. Epstein, and M. Whelan,
The Design of a RISC based Multiprocessor Chip,
Supercomputing'90 (SC),
pages 920-929, New York, November 1990.
-
R. Gupta,
A Fine-grained MIMD Architecture based upon Register Channels,
IEEE/ACM 23rd Workshop on Microprogramming and Microarchitecture (MICRO),
pages 28-37, Orlando, Florida, December 1990.
-
R. Gupta,
Employing Register Channels for the Exploitation of Instruction
Level Parallelism,
ACM SIGPLAN 2nd Symposium on Principles and Practice of Parallel Programming (PPoPP),
pages 118-127, Seattle, Washington, March 1990.
-
R. Gupta,
The Fuzzy Barrier: A Mechanism for High-Speed
Synchronization of Processors,
ACM 3rd International Conference on Architectural Support
for Programming
Languages and Operating Systems (ASPLOS),
pages 54-64, Boston, April 1989.
Path-Sensitive Optimizations
-
R. Bodik, R. Gupta, and V. Sarkar,
ABCD: Eliminating Array Bounds Checks on Demand,
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pages 321-333, Vancouver B.C., Canada, June 2000.
-
R. Bodik, R. Gupta, and M.L. Soffa,
Load-Reuse Analysis: Design and Evaluation,
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pages 64-76, Atlanta, Georgia, May 1999.
-
R. Gupta and R. Bodik,
Register Pressure Sensitive Redundancy Elimination,
International Conference on Compiler Construction (CC),
LNCS 1575, Springer Verlag, pages 107-121,
Amsterdam, Netherlands, March 1999.
-
Selected for 20 Years of PLDI.
R. Bodik, R. Gupta and M.L. Soffa,
Complete Removal of Redundant Expressions,
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pages 1-14, Montreal, Canada, June 1998.
-
R. Gupta, D. Berson, and J.Z. Fang,
Path Profile Guided Partial Redundancy Elimination Using Speculation,
IEEE International Conference on Computer Languages (ICCL),
pages 230-239, Chicago, Illinois, May 1998.
-
R. Gupta, D. Berson, and J.Z. Fang,
Path Profile Guided Partial Dead Code Elimination Using Predication,
International Conference on Parallel Architectures and Compilation
Techniques (PACT),
pages 102-115, San Francisco, California, November 1997.
-
R. Bodik and R. Gupta,
Partial Dead Code Elimination using Slicing Transformations,
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pages 159-170, Las Vegas, Nevada, June 1997.
-
R. Bodik, R. Gupta, and M.L. Soffa,
Interprocedural Conditional Branch Elimination,
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pages 146-158, Las Vegas, Nevada, June 1997.
-
R. Bodik and R. Gupta,
Array Data-Flow Analysis for Load-Store Optimizations
in Superscalar Architectures,
8th Annual Workshop on Languages and Compilers
for Parallel Computing (LCPC),
LNCS 1033 Springer Verlag, pages 1-15,
Columbus, Ohio, August 1995.
Also published in International Journal
of Parallel Programming,
Vol. 24, No. 6, pages 481-512, 1996.
-
R. Gupta,
A Fresh Look at Optimizing Array Bound Checks,
ACM SIGPLAN Conference on Programming Language Design and
Implementation (PLDI),
pages 272-282, White Plains, NY, June 1990.
Also published in
ACM Letters on Programming Languages and Systems (LOPLAS),
Vol.2, Nos.1-4, pages 135-150, March-December 1994.