Professor, Department of Computer Science and Engineering
Chair, Faculty of the College of Engineering
University of California, Riverside
(951) 827-4710, fax: (951) 827-4643, vahid@cs.ucr.edu, http://www.cs.ucr.edu/~vahid
Associate Director, Center for Embedded Computer Systems, UC Irvine.
Embedded systems design, hardware/software codesign, FPGAs, embedded systems applications to health.
Ph.D.M.S.
- Ann Gordon-Ross, CS&E Ph.D. 2007. Currently an Assistant Professor at the University of Florida, Gainesville (ECE).
- Greg Stitt, CS&E Ph.D. 2007. Currently an Assistant Professor at the University of Florida, Gainesville (ECE).
- Susan Lysecky (formerly Susan Cotterell), CS&E Ph.D. 2006. Currently an Assistant Professor at the University of Arizona (ECE).
- Roman Lysecky, CS&E M.S. 2000, Ph.D. 2005. Currently an Assistant Professor at the University of Arizona (ECE).
- Chuanjun Zhang, CS&E Ph.D. 2004. Currently an Assistant Professor at the University of Missouri, Kansas City(ECE).
- Tony Givargis, CS&E Ph.D. 2001. Currently an Associate Professor at the University of California, Irvine (ICS, and Center for Embedded Computer Systems).
- Enoch Hwang, CS&E Ph.D. 1998. Currently an Assistant Professor at La Sierra University, Riverside, California, and a lecturer at UCR.
- Ann Gordon-Ross, Greg Stitt, Susan Lysecky, Roman Lysecky, and Tony Givargis each began working with me as undergraduate researchers.
Present graduate students
- Shawn Nemetebakshi(MS CS 2005)
- Kelly Downey (MS EE 2004)
- Brian Grattan (MS EE 2002)
- Weijun Zhang, EE M.S. 2001 (Silicon Valley).
- Deepa Varghese CS&E M.S. 1998 (Motorola).
- Linus Tauro, CS&E M.S. 1997 (Quick-Logic).
- Thuy Le, CS&E M.S. 1997 (IMA).
- William Kang, CS&E M.S. 1995 (TRW).
- Rosely Ng, CS&E M.S. 1995 (IMA).
- Shawn Nemetebakshi and Kelly Downey began working with me as undergraduate researchers.
B.S. (other students who actively participated in research in my lab as undergraduates)
- Scott Sirowy, CE B.S. in 2005, currently my Ph.D. student
- David Sheldon, CS B.S. in 2003, currently my Ph.D. student
- Chen Huang
- Scott Sirowy and David Sheldon began working with me as undergraduate researchers.
- Bailey Miller, CE B.S. 2008 (expected)
- Andrea Coba, CE B.S. 2008 (expected)
- Robert (Mike) Ballou, CE B.S. 2008 (expected)
- Jonathon Basseri, CE B.S. 2008 (expected)
- Margaret Ukwu, CE B.S. 2009 (expected)
- Casey Czechowski, CE B.S. 2007, now M.S. student at UCR.
- Caleb Leak, CE B.S. 2007.
- Josef Spjut, CE B.S. in 2006, now in graduate school at Utah
- Cathy Vu, CS B.S. in 2004, currently a graduate student at UCR
- Korey Sewell, CS B.S. in 2004, (MSRIP program), currently a graduate student at U. Michigan
- San Nguyen, CS. BS in 2004, (MSRIP program), currently a graduate student at UCR.
- Ed Garcia, CS B.S. in 2004
- Daniel Tan, EE B.S. in 2004, currently at Boeing
- Ron Feliciano, CS B.S. in 2004, obtained CS M.S. at UCR.
- Rafael Lopez, CS B.S. in 2002, (MSRIP program), Ph.D. student at UC Irvine
- Kris Miller, CS B.S. in 2001, currently UCR CS lecturer
- Puneet Mehra, CS B.S. in 1999 (Ph.D. student at UC Berkeley)
- Shannon Alfaro, CS B.S. in 1999 (M.S. from UC Irvine, now a lecturer at UC Irvine)
- Jason Villarreal, CS B.S. in 1999 (Ph.D. student at UC Riverside)
- Victor Hill, CS B.S. in 1999 (Ph.D. student at UC Riverside)
Published over 120 conference/journal publications, three textbooks, and two language-introduction books. See www.cs.ucr.edu/~vahid/pubs
Books
Digital Design   By Frank Vahid, John Wiley and Sons publishers, 2007. Adopted by several dozen universities worldwide so far, including several top-20 universities. VHDL for Digital Design   By Frank Vahid and Roman Lysecky, John Wiley and Sons publishers, 2007. Verilog for Digital Design   By Frank Vahid and Roman Lysecky, John Wiley and Sons publishers, 2007.
Embedded System Design -- A Unified Hardware/Software Introduction   By Frank Vahid and Tony Givargis, published by J. Wiley and Sons, (c) 2002.
Specification and Design of Embedded Systems   By Dan Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong, published by Prentice Hall, 1994.
Journal and Conference Papers (excerpted from http://www.cs.ucr.edu/~vahid/pubs)
F. Vahid. It's Time to Stop Calling Circuits Hardware. IEEE Computer Magazine, September 2007.
G. Stitt and F. Vahid. Multi-Threaded Warp Processing. Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES/ISSS), 2007.
A. Gordon-Ross and F. Vahid. A Self-Tuning Configurable Cache. Design Automation Conference (DAC), 2007.
K. Schleupen, S. Lekuch, R. Mannion, Z. Guo, W. Najjar, and F. Vahid. Dynamic Partial FPGA Reconfiguration in a Prototype Microprocessor System . (FPL), 2007.
G. Stitt and F. Vahid. Binary Synthesis. ACM Transactions on Design Automation of Electronic Systems (TODAES), 2007 (to appear).
S. Sirowy and F. Vahid. Integrated Coupling and Clock Frequency Assignment. International Embedded Systems Symposium (IESS), 2007.
D. Sheldon, F. Vahid and S. Lonardi. Soft-Core Processor Customization Using the Design of Experiments Paradigm. IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 821-826.
A. Gordon-Ross, P. Viana, F. Vahid, W. Najjar, E. Barros. A One-Shot Configurable-Cache Tuner for Improved Energy and Performance. IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 755-760.
S. Sirowy, Y. Wu, S Lonardi and F. Vahid. Two Level Microprocessor-Accelerator Partitioning. IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 313-318.
S. Sirowy, Y. Wu, S Lonardi and F. Vahid. Clock-Frequency Partitioning for Multiple Clock Domains Systems-on-a-Chip. IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 397-402.
Conjoining Soft-Core FPGA Processors pdf   ppt  
D. Sheldon, R. Kumar, F. Vahid, D.M. Tullsen, R. Lysecky
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2006, pp. to appear.
A Code Refinement Methodology for Performance-Improved Synthesis from C pdf   ppt  
G. Stitt, F. Vahid, W. Najjar
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2006.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors pdf   ppt  
D. Sheldon, R. Kumar, R. Lysecky, F. Vahid, D.M. Tullsen,
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2006, pp. to appear.
Automated Application-Specific Tuning of Parameterized Sensor-Based Embedded System Building Blocks pdf   ppt  
S. Lysecky, F. Vahid
Int. Conf. on Ubiquitous Computing (UbiComp), Sep. 2006, pp. to appear.
Automated Generation of Basic Custom Sensor-Based Embedded Computing Systems Guided by End-User Optimization Criteria pdf   ppt  
S. Lysecky, F. Vahid
Int. Conf. on Ubiquitous Computing (UbiComp), Sep. 2006, pp. to appear.
Warp Processors pdf  
R. Lysecky, G. Stitt, F. Vahid
ACM Transactions on Design Automation of Electronic Systems (TODAES), July 2006, pp. 659-681.
Configurable Cache Subsetting for Fast Cache Tuning pdf  
P. Viana, A. Gordon-Ross, E. Keogh, E. Barros, F. Vahid
IEEE/ACM Design Automation Conference (DAC), July 2006, pp. to appear.
New Decompilation Techniques for Binary-level Co-processor Generation pdf  
G. Stitt, F. Vahid
IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2005, pp. 547-554.
Fast Configurable-Cache Tuning with a Unified Second-Level Cache pdf  
A. Gordon-Ross, F. Vahid, N. Dutt
International Symposium on Low-Power Electronics and Design (ISLPED), Aug. 2005, pp. 323-326.
Hardware/Software Partitioning of Software Binaries: A Case Study of H.264 Decode-- pdf  
G. Stitt, F. Vahid, G. McGregor, B. Einloth
International Conference on Hardware/Software Codesign and System Synthesis (CODES/ISSS), Sep. 2005, pp. 285-290.
Shows that binary-level partitioning and synthesis of a real highly-optimized h264 video decoder application is competitive with source (C) level partitioning/synthesis. Also introduces several simple C coding guidelines that greatly improve synthesis results.Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware pdf  
A. Gordon-Ross and F. Vahid
IEEE Transactions on Computers, Special Issue-Embedded Systems, Microarchitecture, and Compilation Techniques in Memory of B. Ramakrishna (Bob) Rau, Oct. 2005, Vol. 54, Issue 10, pp 1203-1215.
Describes extensive studies resulting in lean profiler hardware that effectively finds addresses corresponding to frequent loops in an executing software binary.Usability of State Based Boolean eBlocks pdf  
S. Cotterell and F. Vahid
11th International Conference on Human-Computer Interaction (HCII), 2005, pp.
Four basic state-based blocks -- prolonger, tripper, toggle, and pulse generator -- are understandable by novice users and can be connected to define a good range of desired sensor-system behavior.A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation pdf  
R. Lysecky, F. Vahid and S. Tan
IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), 2005, pp. 57-62
Describes an FPGA routing approach that is lean in terms of runtime and memory, running three times faster while using over 15 times less memory than a popular router, yet creating a critical path that is only 30% longer on average and about equal for very large circuits compared to that other router. Our approach, ROCR (Riverside On-Chip Router), can be useful for methods requiring just-in-time FPGA compilation, like our warp processing method, and future methods using a standard FPGA binary.A Logic Block Enabling Logic Configuration by Non-Experts in Sensor Networks pdf  
S. Cotterell and F. Vahid
Conference on Human Factors in Computing (CHI), 2005, pp. 1925 - 1928
Describes attempts to build a logic block that non-experts could configure to compute particular sensor conditions (e.g., motion and no light). Shows that a truth table based block is too complicating to non-experts, but a sentence based block exhibits high success, though being less general. A truth table using color and presented in a sentence format also exhibits reasonable success while being more general.A Way-Halting Cache for Low-Energy High-Performance Systems pdf  
C. Zhang, F. Vahid, J. Yang, and W. Najjar
ACM Transactions on Architecture and Code Optimization (TACO), Vol. 2, No. 1, March 2005, pp 34-54.
Describes a cache design that separates the four low-order tags bits into its own fully-associative memory (a halt-tag array). Concurrently with address decoding, the halt-tag array determines mismatches in the low-order four tag bits (of all the tags). A mismatch masks out the decode line, halting further tag and data access. A way-halting cache yields 55% memory access energy savings on average, with no performance overhead.
A First Look at the Interplay of Code Reordering and Configurable Caches pdf  
A. Gordon-Ross, F. Vahid, N. Dutt
Great Lakes Symposium on VLSI (GLSVLSI), April 2005, pp. 416-421.
Shows that a configurable cache dominates over compiler-based code reordering with respect to tuning an application to a cache for power and performance improvements. Yet, combining the two methods does result in a smaller overall cache size, 13% on average and up to 89%.
eBlocks - An Enabling Technology for Basic Sensor Based Systems pdf  
S. Cotterell, R. Mannion, F. Vahid, H. Hsieh
IPSN Track on Sensor Platform, Tools and Design Methods for Networked Embedded Systems (SPOTS), April 2005, pp.
Describes how physical eBlock prototypes and a graphical eBlock simulation tool were used by hundreds of users during the development and refinement of eBlock sensor network nodes
A Highly Configurable Cache for Low Energy Embedded Systems pdf  
C. Zhang, F. Vahid and W. Najjar
ACM Transactions on Embedded Computing Systems (TECS), Vol. 4, Issue 2, May 2005, pp. 363-387
Describes a cache whose total size, associativity, and line size can be configured just by setting a few bits in a configuration register. Provides experimental results demonstrating that tuning the configuration to a particular software application's needs reduces memory access energy by over 40% on average across a large set of benchmarks.A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning pdf  
ppt  
R. Lysecky and F. Vahid
Design Automation and Test in Europe (DATE), March 2005, pp. 18-23.
Highlights speedup and energy results of implementing warp processing, which dynamically and transparently remaps software kernels to FPGA using on-chip synthesis tools, for software running on a Xilinx MicroBlaze soft-core processor. Results show competitive performance and energy compared to software on regular "hard core" embedded microprocessors, thus making soft-cores on FPGA even more attractive beyond just their flexibility of putting different numbers of cores and custom circuitry on a single chip.A Decompilation Approach to Partitioning Software for Microprocessor/FPGA Platforms pdf  
G. Stitt and F. Vahid
Design Automation and Test in Europe (DATE), March 2005, pp. 396-397
Utilizing advanced decompilation techniques enables synthesis of hardware from binaries to recover nearly all high-level constructs that existed in the source code, even for different compiler optimization levels.System Synthesis for Networks of Programmable Blocks pdf   ppt  
R. Mannion, H. Hsieh, S. Cotterell, F. Vahid
Design Automation and Test in Europe (DATE), March 2005, pp. 888-893
Describes techniques to automatically convert a network of pre-defined eBlocks into a minimal number of programmable eBlocks, while also generating code for those blocks.Techniques for Synthesizing Binaries to an Advanced Register/Memory Structure pdf  
G. Stitt, Z. Guo, F. Vahid, and W. Najjar
ACM/SIGDA Symp. on Field Programmable Gate Arrays (FPGA), Feb. 2005, pp. 118-124
Advanced decompilation methods can make synthesizing FPGA hardware from software binaries competitive with synthesizing directly from C-level source code, even when utilizing an advanced memory structure (smart buffer) requiring knowledge of loops and arrays. Synthesis from binaries provides numerous advantages of language independence, tool independence, portability, and support of legacy code.Applications and Experiments with eBlocks -- Electronic Blocks for Basic Sensor-Based Systems pdf   ppt  
S. Cotterell, K. Downey, and F. Vahid
IEEE Sensor and Ad Hoc Communications and Networks (SECON), Oct 2004, pp.
Describes common applications that can be built just by connecting eBlocks together, enabling people without programming experience to build useful sensor-based systems. Summarizes experiences with hundreds of users, showing success rates even when utilizing logic and state based blocks.A Way-Halting Cache for Low-Energy High-Performance Systems pdf  
C. Zhang, F. Vahid, J. Yang and W. Najjar
International Symposium on Low-Power Electronics and Design (ISLPED), Aug 2004, pp. 126-131
Describes a cache whose tag comparison logic includes a small and fast fully-associative memory that quickly detects a mismatch in a particular cache way, and then halts further tag and data access of that way, thus saving power.Dynamic FPGA Routing for Just-in-Time FPGA Compilation pdf   ppt  
R. Lysecky, F. Vahid, and S. Tan
Design Automation Conference (DAC), June 2004, pp. 954-959.
Describes an FPGA routing heuristic suitable for execution on-chip, to support Just-in-Time compilation for FPGAs.Tuning caches to applications for low-energy embedded systems pdf  
A. Gordon-Ross, C. Zhang, F. Vahid. N. Dutt
Chapter 6 in Ultra Low-Power Electronics and Design - Kluwer Academic Pub, June 2004A Self-Tuning Cache Architecture for Embedded Systems pdf   C. Zhang, F. Vahid and R. Lysecky
ACM Transactions on Embedded Computing Systems (TECS), Vol. 3., Issue 2, May 2004, pp. 407-425.
Describes a configurable cache that monitors its own hit rate, and automatically reconfigures the cache's number of ways (associativity), line size and total size to reduce power and/or improve performance, using an efficient heuristic that not only prunes the configuration search space but also avoids cache flushes during the search.A Quantitative Analysis of the Speedup Factors of FPGAs over Processors pdf  
Z. Guo, W. Najjar, F. Vahid and K. Visssers
ACM/IEEE International Symposium on Field-Programmable Gate Arrays, Feb. 2004, pp. ??
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning pdf   ppt
R. Lysecky and F. Vahid
Design Automation and Test in Europe Conference (DATE), February 2004, pp. 480-485.
Describes a simple configurable logic (FPGA) fabric and surrounding architecture specifically intended to support dynamic hardware/software partitioning -- meaning on-chip CAD tools must be able to quickly map a netlist to the fabric.Automatic Tuning of Two-Level Caches to Embedded Applications pdf   ppt
A. Gordon-Ross, F. Vahid and N. Dutt Design Automation and Test in Europe Conference (DATE), February 2004, pp. 208-213.
Describes efficient heuristics for tuning a two-level cache to a particular application, obtaining near-optimal memory-access energy savings of 53%-55% through such tuning, while exploring a mere 6% of the total configuration space.Using a Victim Buffer in an Application-Specific Memory Hierarchy pdf   ppt
C. Zhang and F. Vahid
Design Automation and Test in Europe Conference (DATE), February 2004, pp. 220-225.
Adding to a cache a configurable victim buffer, which can be turned on or off, improves memory-access energy of an application by up to 43%. Such savings occur even if the cache itself is configurable. Making the buffer configurable enables us to shut off the buffer for some applications that otherwise would suffer increased energy and performance penalties of up to 4%.A Self-Tuning Cache Architecture for Embedded Systems pdf   ppt
C. Zhang, F. Vahid and R. Lysecky
Design Automation and Test in Europe Conference (DATE), February 2004, pp. 142-147.
Describes a configurable cache that can tune its total size, associativity, and line size to an executing application. The search heuristic is carefully designed to avoid flushing. The cache transparently reduces memory-access related energy by 45%-55% on average, and by as much as 97% for particular applications.Low Static-Power Frequent-Value Data Caches pdf   ppt
C. Zhang, J. Yang and F. Vahid
Design Automation and Test in Europe Conference (DATE), February 2004, pp. 214-219.
Improves upon Yang/Gupta's previous frequent value cache, by eliminating performance overhead, and saving static power in addition to dynamic power, using circuit level design improvements. A frequent value cache encodes commonly-occurring data values into just a few bits, shutting down the remaining bit storage cells. 33% static energy savings are obtained.Energy Savings and Speedups from Partitioning Critical Software Loops to Hardware in Embedded Systems pdf
G. Stitt, F. Vahid, S. Nemetebaksh
IEEE Transactions on Embedded Computer Systems, January 2004.
Partitioning a program's kernels to FPGA hardware can reduce overall system energy.A Way-Halting Cache for Low-Energy High-Performance Systems pdf
C. Zhang, F. Vahid, J. Yang, W. Najjar
IEEE Computer Architecture Letters, Vol. 2, Sep. 2003.
The first four bits of a cache's tags are stored in a fast efficient CAM and accessed concurrently with set decoding -- if those four bits mismatch for the decoded set, the full tag comparisons and data array accesses are "halted," thus saving power, with no performance overhead (unlike other power-saving caches).Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware pdf
A. Gordon-Ross and F. Vahid
ACM/IEEE Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), 2003, pp. 117-124.
Describes efficient non-intrusive hardware for detecting the most frequent loops in an executing binary, and the relative frequencies of those loops.First Results with eBlocks: Embedded Systems Building Blocks pdf
S. Cotterell, F. Vahid, W. Najjar and H. Hsieh
ACM/IEEE ISSS/CODES conference, 2003, pp. 168-175
Describes embedded system building blocks that people with no training can connect together to build simple but useful systems.A Codesigned On-Chip Logic Minimizer pdf
R. Lysecky and F. Vahid
ACM/IEEE ISSS/CODES conference, 2003, pp. 109-113.
Hardware/software partitioning of an on-chip logic minimizer results in 8x speedup and 60% energy savings, improving the usefulness of on-chip logic minimization in a variety of applications.Tiny Instruction Caches For Low Power Embedded Systems pdf
A. Gordon-Ross, S. Cotterell, and F. Vahid
ACM Transactions on Embedded Computing Systems, Vol. 2, Issue 4, Nov. 2003, pp. 449-481.
Putting a very small (e.g., 128 word) loop cache in front of L1 instruction cache can greatly reduce power, with no performance overhead.Profiling tools for hardware/software partitioning of embedded applications pdf
D.C. Suresh, W.A. Najjar, F. Vahid, J.R. Villarreal, G. Stitt
Languages, Compilers and Tools for Embedded Systems (LCTES), 2003, pp. 189-198.
A Highly-Configurable Cache Architecture For Embedded Systems pdf
C. Zhang, F. Vahid and W. Najjar
International Symposium on Computer Architecture, 2003, pp. 136-146.
A cache with whose number of ways and total size can be tuned to a particular program yields big energy savings with almost no performance overhead.Cache Configuration Exploration on Prototyping Platforms pdf
C. Zhang and F. Vahid
Rapid System Prototyping, 2003, pp. 164-171
Methods to automatically tune a configurable cache to a particular software application.Dynamic Hardware/Software Partitioning: A First Approach pdf
G. Stitt, R. Lysecky and F. Vahid
Design Automation Conference, 2003, pp. 250-255.
Dynamically partitioning an executing software application onto on-chip FPGA is not only possible, but quite effective.On-Chip Logic Minimization pdf
R. Lysecky and F. Vahid
Design Automation Conference, 2003, 334-337.
Executing a lean form of logic minimization on-chip is feasible and has several immediate applications in networking.Embedded System Design: UCR's Undergraduate Three- Course Sequence pdf
F. Vahid
Microelectronics Systems Engineering (MSE) conference, 2003.
Summarizes UCR's successful 3-course sequence on embedded system design, based on the new ESD book (see above) that emphasizes a unified view of hardware and software.The Softening of Hardware pdf
F. Vahid
IEEE Computer, April 2003, pp. 27-34.
A new perspective on hardware becoming much more like software, due in part to configurable logic, and in part to hardware being created today by compiling high-level languages.Highly Configurable Platforms for Embedded Computing Systems pdf
F. Vahid, R. Lysecky, C. Zhang and G. Stitt
Microelectronics Journal, Elsevier Publishers, Volume 34, Issue 11, November 2003, Pages 1025-1029.
The case for creating platform chips with much configurability, including on-chip FPGA, configurable cache, etc.
Online version
Making the Best of those Extra Transistors pdf
F. Vahid
IEEE Design and Test of Computers, Jan/Feb 2003, pg. 96.
An argument for new uses of the abundant transistors on modern chips.Energy Benefits of a Configurable Line Size Cache for Embedded Systems pdf
C. Zhang, F. Vahid, W. Najjar
IEEE Computer Society Annual Symposium on VLSI, Feb. 2003, pp. 87-91.
Creating a cache with a line size that can be configured to 16, 32 or 64 bytes results in suprisingly large energy savings.Platune: A Tuning Framework for System-on-a-Chip Platforms pdf
T. Givargis and F. Vahid
IEEE Transactions on Computer Aided Design, Vol. 21, No. 11, Nov. 2002, pp. 1317-1327.
Platune tunes an architecture to an application by rapidly exploring the huge configuration space of configurable caches, buses, voltage levels, and many any other configurable architectural parameters.Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores pdf
T. Givargis, F. Vahid and J. Henkel
IEEE Transactions on VLSI, pp. 856-863, Vol 10, No 6, Dec. 2002.
We break peripheral behavior into "instructions" and back-annotate those instructions with low-level power data, yielding fast and accurate system-level power estimations.Power Estimator Development for Embedded System Memory Tuning pdf not avail
F. Vahid, T. Givargis and S. Cotterell
Journal of Circuits, Systems and Computers, vol. 11, no. 5, pp. 459-476, October 2002.
We describe three increasingly accurate methods for estimating power of a memory hierarchy.Partitioning Sequential Programs for CAD using a Three-Step Approach pdf
F. Vahid
ACM Transactions on Design Automation of Electronic Systems, Vol 7, Issue 3, pp 413-429, July 2002.
The Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic    pdf
G. Stitt and F. Vahid
IEEE Design and Test of Computers, November/December 2002, pp. 36-43.System-level Exploration for Pareto-optimal Configurations in Parameterized System-on-a-chip    pdf
T. Givargis, F. Vahid and J. Henkel
IEEE Transactions on VLSI Systems, Vol. 10, Issue 4, Dec. 2002, pp. 416-422.Improving Software Performance with Configurable Logic    pdf
J. Villarreal, D. Suresh, G. Stitt, F. Vahid and W. Najjar
Kluwer Journal on Design Automation of Embedded Systems, November 2002, Volume 7, Issue 4, pp. 325-339.Hardware/Software Partitioning of Software Binaries    pdf
G. Stitt and F. Vahid
IEEE/ACM International Conference on Computer Aided Design, November 2002, pp. 164-170.Synthesis of Customized Loop Caches for Core-Based Embedded Systems    pdf
S. Cotterell and F. Vahid
IEEE/ACM International Conference on Computer Aided Design, November 2002, pp. 655-662.Tuning of Loop Cache Architectures to Programs in Embedded System Design    pdf    ppt slides
S. Cotterell and F. Vahid
IEEE/ACM International Symposium on System Synthesis, October 2002, pp. 8-13.Dynamic Loop Caching Meets Preloaded Loop Caching -- A Hybrid Approach    pdf    ppt slides
A. Gordon-Ross and F. Vahid
International Conference on Computer Design, September 2002, pp. 446-449.Tuning of Cache Ways and Voltage for Low-Energy Embedded System Platforms    pdf
T. Givargis and F. Vahid
Kluwer Journal on Design Automation of Embedded Systems, vol. 7, issue 1-2, pp. 35-51, September 2002.A Fast On-Chip Profiler Memory    pdf
R. Lysecky, S. Cotterell and F. Vahid
IEEE/ACM Design Automation Conference, June 2002, pp. 28-33.Codesign-Extended Applications    pdf    ppt slides
B. Grattan, G. Stitt and F. Vahid
IEEE/ACM International Symposium on Hardware/Software Codesign, Estes Park, May 2002, pp. 1-6.A Power-Configurable Bus for Embedded Systems    pdf
C. Zhang and F. Vahid
IEEE International Symposium on Circuits and Systems, Scottsdale, May 2002, pp.V-809-812.Using On-Chip Configurable Logic to Reduce Embedded System Software Energy    pdf
G. Stitt, B. Grattan, J. Villarreal and F. Vahid
IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa Valley, April 2002, pp. 143-151.Propagating Constants Past Software to Hardware Peripherals in Fixed-Application Embedded Systems    pdf
G. Stitt and F. Vahid
In book "Compilers and operating systems for low power," editors L. Benini, M. Kandemir, J. Ramanujam, Kluwer Academic Publishers, 2003, Chapter 7, pp. 115-136.Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example    pdf
A. Gordon-Ross, S. Cotterell and F. Vahid
IEEE Computer Architecture Letters, Vol 1, January 2002.Prefetching for Improved Bus Wrapper Performance in Cores    pdf
R. Lysecky and F. Vahid
ACM Transactions on Design Automation of Electronic Systems, Vol. 7, No. 1, pp. 58-90, January 2002.Propagating Constants Past Software to Hardware Peripherals in Fixed-Application Embedded Systems    pdf    html (of COLP paper)    slides
F. Vahid, R. Patel and G. Stitt
Special Issue of ACM SIGARCH Newsletter, Dec. 2001. Selected for special issue from earlier version of paper in Compilers and Operating Systems for Low Power (COLP'01).
Describes the size and power advantages of recognizing that software-configurable control registers in peripherals may never change after being initialized, if the software itself never changes (as is common in embedded systems).System-level Exploration for Pareto-optimal Configurations in Parameterized Systems-on-a-chip    pdf    html    slides
T. Givargis and F. Vahid and J. Henkel
International Conference on Computer Aided Design, Nov 2001, pp. 25-30.
Provides a technique for efficiently exploring the configuration space of a parameterized system-on-a-chip (SOC) architecture to find all Pareto-optimal configurations. These configurations represent the range of meaningful power and performance tradeoffs that are obtainable by adjusting parameter values for a fixed application mapped onto the SOC architecture. Our approach extensively prunes the potentially large configuration space by taking advantage of parameter dependencies. We have successfully incorporated our technique into the parameterized SOC tuning environment (Platune) and applied it to a number of applications.Evaluating Power Consumption of Parameterized Cache and Bus Architectures in System-on-a-Chip Designs    pdf
T. Givargis, F. Vahid and J. Henkel
IEEE Transactions on VLSI, Vol 9, No. 4, pp. 500-508, Aug 2001.
Architectures with parameterizable cache and bus can support large tradeoffs between performance and power. We provide simulation data showing the large tradeoffs by such an architecture for several applications, and demonstrating that the cache and bus should be configured simultaneously to find the optimal solutions. Furthermore, we describe analytical techniques for speeding up the cache/bus power and performance evaluation by several orders of magnitude over simulation, while maintaining sufficient accuracy with respect to simulation-based approaches.A Self-Optimizing Embedded Microprocessor using a Loop Table for Low Power    pdf    html    slides
F. Vahid and A. Gordon-Ross
International Symposium on Low Power Electronics and Design, Aug 2001, pp. 219-224.
Describes the architecture and methodology of an embedded microprocessor that can automatically tune itself to the particular application that will run. The particular tunable component described is a loop table, similar to a loop cache except that its contents never change after the most frequent loops are detected.Platform Tuning for Embedded Systems Design    pdf
F. Vahid and T. Givargis
IEEE Computer, Vol. 34, No. 3, pp. 112-114, March 2001.
Provides an overview of the philosophy of our UCR Dalton Project, in particular, the idea of tuning a programmable system-on-a-chip architecture to the one application that it will eventually run forever.Trace-driven System-level Power Evaluation of System-on-a-chip Peripheral Cores    pdf    html
T. Givargis, F. Vahid and J. Henkel
Asia South-Pacific Design Automation Conference (ASP-DAC), pp. 306-311, January 2001.
Our earlier work for fast evaluation of power consumption of general cores in a system-on-a-chip described techniques that involved isolating high-level instructions of a core, measuring gate-level power consumption per instruction, and then annotating a system-level simulation model with the obtained data. In this work, we describe a method for speeding up the evaluation further, through the use of instruction traces and trace simulators for every core, not just microprocessor cores. Our method shows noticeable speedups at an acceptable loss of accuracy. We show that reducing trace sizes can speed up the method even further. The speedups allow for more extensive system-level power exploration and hence better optimization.A First-step Towards an Architecture Tuning Methodology for Low Power    pdf    html    slides
G. Stitt, F. Vahid, T. Givargis, R. Lysecky
Compilers, Architectures, and Synthesis for Embedded Systems (CASES'00), pp. 187-192, November 2000.
We describe an automated environment to assist a system-on-a-chip designer to tune a microprocessor core to a particular application program that will run on the microprocessor, and vice-versa, with the goal of reducing embedded system power consumption. We limit such tuning to modifications that do not change the microprocessor instruction set, thus avoiding the large costs that would come with such a change. Our tuning environment for the 8051 microcontroller is freely-available on the web.Instruction-based System-level Power Evaluation of System-on-a-chip Peripheral Cores    pdf    html    slides
T. Givargis, F. Vahid and J. Henkel
IEEE/ACM International Symposium on System Synthesis (ISSS), pp. 163-169, September 2000.
We propose a new technique, suitable for a variety of cores like peripheral cores, that is the first to combine gate-level power data with a system-level simulation model written in C++ or Java. For that purpose, we investigated peripheral cores and decomposed their functionality into so-called instructions. Our technique addresses a core-based system design paradigm. We show that our technique is sufficiently accurate for making power-related system-level design decisions, and that its computation time is orders of magnitude smaller than lower-level simulation approaches.Experiments with the Peripheral Virtual Component Interface    pdf    html    slides
R. Lysecky, F. Vahid, T. Givargis
International Symposium on System Synthesis (ISSS), pp. 221-224, September 2000. The Peripheral Virtual Component Interface, or PVCI, is a standard intended to simplify the interfacing of peripheral cores to on-chip buses in a system-on-a-chip, by standardizing the interface between a core's internals and its bus wrapper. We provide results of experiments intended to determine the power, performance, and size overhead associated with using a PVCI bus wrapper versus using a non-PVCI bus wrapper, and versus using no bus wrapper at all. The results demonstrate that using a bus wrapper may result in only small performance, power and size overhead versus using no wrapper, though even that performance overhead can be reduced or eliminated using pre-fetching. The results also demonstrate that using a PVCI bus wrapper yields no significant additional power, performance or size overhead compared with a non-PVCI bus wrapper.Parameterized System Design    pdf    html
T. Givargis and F. Vahid
IEEE/ACM International Workshop on Hardware/Software Codesign (CODES), pp. 98-102, May 2000.
Continued growth in chip capacity has led to new methodologies stressing reuse, not only of pre-designed processing components, but even of entire pre-designed architectures. To be used across a variety of applications, such architectures must be heavily parameterized, so they can adapt to those applications' differing constraints by trading off power, performance and size. We describe several parameterized system design issues, and provide results showing how a single architecture with easily configurable parameters can support a wide range of tradeoffs.Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip Design    pdf    html    slides
T. Givargis, F. Vahid and J. Henkel
Design Automation and Test in Europe (DATE) Conference pp. 334-338, March 2000.
We present a technique for fast estimation of the power consumed by the cache and bus sub-system of a parameterized system-on-a-chip design for a given application. The technique uses a two-step approach of first collecting intermediate data about an application using simulation, and then using equations to rapidly predict the performance and power consumption for each of thousands of possible configurations of system parameters, such as cache size and associativity and bus size and encoding. The estimations display good absolute as well as relative accuracy for various examples, and are obtained in dramatically less time than other techniques, making possible the future use of powerful search heuristics.Techniques for Reducing Read Latency of Core Bus Wrappers    pdf    html    slides
R. Lysecky, F. Vahid, T. Givargis
Design Automation and Test in Europe (DATE) Conference pp. 84-91, March 2000. Best Paper Award. Today's system-on-a-chip designs consist of many cores. To enable cores to be easily integrated into different systems, many propose creating cores with their internal logic separated from their bus wrapper. This separation may introduce extra read latency. Pre-fetching register data into register copies in the bus wrapper can reduce or eliminate this extra latency. In this paper, we introduce a technique for automatically designing a pre-fetch unit that satisfies user-imposed register-access constraints. The technique benefits from mapping the pre-fetching problem to the well-known real-time process scheduling problem. We then extend the technique to allow user-specified register interdependencies, using a Petri Net model, resulting in even more efficient pre-fetch schedules.A Hybrid Approach for Core-Based System-Level Power Modeling    pdf    html
T. Givargis, F. Vahid and J. Henkel
Asia South-Pacific Design Automation Conference (ASP-DAC), pp. 141-145, January 2000.
Describes a technique for obtaining fast yet accurate power estimations of core-based systems. The main idea is to use an object-oriented language (C++ or Java) to create a system-level model , modeling each core as an object, and extending each object with power-estimation methods based on statistics from low-level power data of a synthesized version of the core. By executing the system-level model, which runs about 1000x faster than gate-level simulation, we obtain very accurate power estimates.Interface and Cache Power Exploration for Core-Based Embedded System    pdf    html
T. Givargis, J. Henkel and F. Vahid
International Conference on Computer-Aided Design (ICCAD), pp. 270-273, November 1999.
Demonstrates, through experiments on four applications, the large power, performance and size tradeoffs possible just by varying architectural parameters relating to cache and bus for a given reference architecture. Illustrates that these parameters must be tuned to one another for each application, and thus argues for the need for a parameter exploration environment in a configure-and-execute design paradigm.Pre-fetching for Improved Core Interfacing    pdf    html
R. Lysecky, F. Vahid, T. Givargis, and R. Patel
International Symposium on System Synthesis (ISSS), pp. 51-55, November 1999. Introduces a method to reduce or eliminate the extra latency that may arise when reading from a core designed with a bus wrappers for ease of retargeting to different system buses. The method involves pre-fetching registers from the core's internals to registers added in the bus wrapper, akin to caching.The Case for a Configure-and-Execute Paradigm    pdf    html
F. Vahid and T. Givargis
International Workshop on Hardware/Software Codesign (CODES), pp. 59-63, May 1999.
Provides an argument, supported by data obtained by various researchers, in favor of building systems-on-a-chip by configuring a pre-designed reference design already in silicon, rather than building systems by connecting large numbers of cores.FSMD Functional Partitioning for Low Power
E. Hwang and F. Vahid and Y.C. Hsu
Design Automation and Test in Europe (DATE) Conference pp. 22-28, March 1999.Techniques for Minimizing and Balancing I/O during Functional Partitioning
F. Vahid
IEEE Transactions on CAD, Vol. 18, No. 1, pp. 69-75 January 1999.Procedure Cloning: A Transformation for Improved System-Level Functional Partitioning
F. Vahid
ACM Transactions on Design Automation of Electronic Systems, Volume 4, Number 1, pp. 70-96, 1999.A Three-Step Approach to the Functional Partitioning of Large Behavioral Processes
F. Vahid
International Symposium on System Synthesis, pp. 152--157, December 1998.Incorporating Cores into System-Level Specification    pdf    html
F. Vahid and T. Givargis
International Symposium on System Synthesis (ISSS), pp. 43--48, December 1998.
Describes a method for describing a system built from pre-designed system components (cores) at the system level, using an object-oriented language, resulting in dramatically faster simulations than approaches based on HDL's.Interface Exploration for Reduced Power in Core-Based Systems    pdf    html
T. Givargis and F. Vahid
International Symposium on System Synthesis (ISSS), pp. 117--122, December 1998.
Provides equations developed to enable one to explore various bus configurations in a parameterized architecture very rapidly. One simulates an application once, from which bus traffic data is accumalated, and then fed into a tool that analyzes each bus configuration in constant-time using the equations. The power or performance optimal bus can thus be quickly selected for a given application.System-Level Exploration with SpecSyn
D. Gajski, F. Vahid, S. Narayan and J. Gong
Design Automation Conference, pp. 812-817, June 1998.Functional Partitioning Improvements over Structural Partitioning for Packaging Constraints and Synthesis-tool Performance
F. Vahid, T.D.M. Le and Y.C. Hsu
ACM Transactions on Design Automation of Electronic Systems, Volume 3, Number 2, pp. 181-208, 1998.SpecSyn: An Environment Supporting the Specify-Explore-Refine Paradigm for Hardware/Software System Design
D.D. Gajski and F. Vahid and S. Narayan and J. Gong
IEEE Transactions on VLSI Systems, Vol. 6, No. 1, pp. 84-100, 1998.
Awarded the IEEE VLSI Transactions Best Paper Award, June 2000.
Guest Editors' Introduction to the Special Issue on ISSS'96
F. Vahid and S. Narayan
ACM Transactions on the Design Automation of Electronic Systems, Vol. 2, No. 4, Oct. 1997, pp. 307-311.Port Calling: A Transformation for Reducing I/O during Multi-Package Functional Partitioning
F. Vahid
International Symposium on System Synthesis, pp. 107--112, September 1997.Message-Based Hardware/Software Communication in HDL/C Environments
L. Tauro and F. Vahid
Asia-Pacific Conference on Hardware Description Languages ASP-CHDL), pp. ??, August 1997.An Object-Oriented Communication Library for Hardware-Software Co-Design
F. Vahid and L. Tauro
International Workshop on Hardware/Software Codesign (CODES), pp. 81--86, March 1997.Extending the Kernighan/Lin Heuristic for Hardware and Software Functional Partitioning
F. Vahid and T.D.M. Le
Kluwer Journal on Design Automation of Embedded Systems, Vol. 2, No. 2, pp. 237-261, March 1997.Procedure Cloning: A Transformation for Improved System-Level Functional Partitioning
F. Vahid
European Design and Test Conference, pp. 487--492, March 1997.Modifying Min-Cut for Hardware and Software Functional Partitioning
F. Vahid
International Workshop on Hardware/Software Codesign, pp. 43--48, March 1997.I/O and Performance Tradeoffs with the FunctionBus during Multi-FPGA Partitioning
F. Vahid
International Symposium on Field-Programmable Gate Arrays, pp. 27-34, February 1997.A Comparison of Functional and Structural Partitioning
F. Vahid and T.D.M. Le and Y.C. Hsu
International Symposium on System Synthesis, pp. 121-126, November 1996.Towards a Model for Hardware and Software Functional Partitioning
F. Vahid and T.D.M. Le
International Workshop on Hardware/Software Codesign, pp. 116-123, March 1996.System Design Methodologies: Aiming at the 100 h Design Cycle
D. Gajski and S. Narayan and L. Ramachandran and F. Vahid and P. Fung
IEEE Transactions on VLSI Systems, Vol. 4, No. 1, pp. 70-82, 1996.Closeness Metrics for System-Level Functional Partitioning
F. Vahid and D.D. Gajski
European Design Automation Conference, pp. 328-333, September 1995.Clustering for Improved System-Level Functional Partitioning
F. Vahid and D.D. Gajski
International Symposium on System Synthesis, pp. 28-33, September 1995.Procedure Exlining: A Transformation for Improved System and Behavioral Synthesis
F. Vahid
International Symposium on System Synthesis, pp. 84-89, September 1995.Procedure Exlining: A New System-Level Specification Transformation
F. Vahid
European Design Automation Conference -- EuroVHDL, pp. 508-513, September 1995.Incremental Hardware Estimation during Hardware/Software Functional Partitioning
F. Vahid and D. Gajski
IEEE Transactions on VLSI Systems, Vol. 3, No. 3, pp. 459-464, September 1995.SpecCharts: A VHDL Front-End for Embedded Systems
F. Vahid and S. Narayan and D. Gajski
IEEE Transactions on CAD, Vol. 14, No. 6, pp. 694-706, 1995.SLIF: A Specification-Level Intermediate Format For System Design
F. Vahid and D.D. Gajski
European Design and Test Conference, pp. 185-189, March 1995.
Specification and Design of Embedded Software-Hardware Systems
D. Gajski and F. Vahid
IEEE Design & Test of Computers, Vol. 12, No. 1, Spring 1995, pp. 53-67.A Binary-Constraint Search Algorithm for Minimizing Hardware during Hardware-Software Partitioning
F. Vahid and J. Gong and D.D. Gajski
European Design Automation Conference -- EuroDAC, pp. 214-219, September 1994.A Transformation Integrating VHDL Behavioral Specification with Synthesis and Software Generation
F. Vahid, S. Narayan and D.D. Gajski
European Design Automation Conference -- EuroDAC, pp. 552-557, September 1994.A System-Design Methodology: Executable-Specification Refinement
D.D. Gajski and F. Vahid and S. Narayan
European Conference on Design Automation, pp. 458-463, March 1994.BOOK: Specification and Design of Embedded Systems
Title page, Contents, and Preface
Online slides
D.D. Gajski and F. Vahid and S. Narayan and J. Gong
Prentice Hall, 1994.
Specification Partitioning for System Design
F. Vahid, and D. Gajski
Design Automation Conference, pp. 219-224, June 1992.Pubs from before 1994 not listed.
Talks (not associated with conference papers above)
For publications supported by NSF awards: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Standard Binaries for FPGAs, & eBlocks -- NSF's Cyber-Physical Workshop, July 2007
SensorBlocks -- UCR's College of Engineering TechHorizons, 2007
Warp Processing -- SRC annual review, Carnegie Mellon Univ., 2007
Soft Core Customization and other UCR FPGA Research Xilinx, July 2006
The New Software: FPGAs University of Arizona, ECE, April 2006
Warp Processors -- Freescale, April 2006
Warp Processing: Dynamic Transparent Conversion of Binaries to Circuits -- Notre Dame, CS, Mar 2006
Warp Processing -- SRC annual review, Ohio State Univ., 2006
Warp Processor: A Dymamically Reconfigurable Coprocessor -- Talk at Intel's System Design Symposium (San Jose), Nov. 2005
Supercomputing in a Pencil Tip -- Talk at UCR's Engineering Industry Day, Oct 2005
Silicon prototyping issues -- Panel talk at Intel, May 2005
eBlocks -- Talk at UCSD, April 2005
eBlocks -- Talk at Intel, September 2004
Warp Processors -- Talk at ASU, April 2004
Warp Processors -- Talk at IBM Research, Yorktown Heights, Apr 2004
Warp Processors -- SRC annual review talk, March 2004
Self-Improving Configurable IC Platforms -- SRC annual review talk, February 2003
Improving Embedded System Software Speed and Energy using Microprocessor/FPGA Platform ICs -- UCR colloquium talk, October 2002
New Opportunities with Platform-Based Design -- Keynote talk at ESCODES'02
System-on-a-Chip Platform Tuning for Embedded Systems -- given at 2002 Southern California Embedded Systems Seminar
Recent Results at UCR with Configurable Cache and Hw/Sw Partitioning -- given at Triscend Corp., September 2002.