Embedded System Design -- A Unified Hardware/Software Introduction   By Frank Vahid and Tony Givargis, published by J. Wiley and Sons, (c) 2002. Embedded systems are designed very differently than they were 10 years ago. Today's designers must be able to tradeoff between software and hardware implementations of their systems, requiring a unified view of hardware and software. Yet most existing courses and books on embedded systems emphasize the nitty-gritty details of the architecture and assembly-language programming of a specific microprocessor, a focus whose relevance is fading quickly due to good compilers. This book provides a view in which embedded systems are composed of processors, some of which may be programmable, some customized. It is independent of any particular microprocessor or hardware environment, focusing instead on principles.
Specification and Design of Embedded Systems   By Dan Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong, published by Prentice Hall, 1994.
Other Books
C. Huang, F. Vahid.
Dynamic Transmuting Coprocessors.
IEEE/ACM Design Automation Conference, to appear, July 2009.
pdf (to appear)  
R. Lysecky, F. Vahid.
Design and Implementation of a MicroBlaze-based Warp Processor.
ACM Transactions on Embedded Computing Systems (TECS), April, 2009,
22 pages.
pdf  
S. Sirowy, D. Sheldon, T. Givargis, and F. Vahid.
Virtual Microcontrollers
ACM SIGBED Review, Vol. 6., Issue 1, 2009.
pdf (not avail.)  
S. Lysecky and F. Vahid.
Enabling Non-Expert Construction of Basic Sensor-Based Systems
ACM Trans. on Computer-Human Interaction (TOCHI), 2008 (to appear).
pdf (to appear)  
A. Gordon-Ross, F. Vahid, and N. Dutt.
Fast Configurable-Cache Tuning with a Unified Second-Level Cache .
IEEE Transactions on VLSI (TVLSI), 2008 (to appear).
pdf (to appear)  
R. Lysecky and F. Vahid.
Design and Implementation of a MicroBlaze-based Warp Processor.
ACM Transactions on Embedded Computer Systems (TECS), 2008 (to appear).
pdf (to appear)  
S. Sirowy, D. Sheldon, T. Givargis, and F. Vahid.
Virtual Microcontrollers
Int. Wkshp. on Embedded Systems Education, (WESE), Oct 2008.
pdf (to appear)  
F. Vahid. Timing is Everything -- Embedded Systems Demand Teaching of Structured Time-Oriented Programming . Int. Wkshp. on Embedded Systems Education, (WESE), Oct 2008. pdf  
C. Huang, D. Sheldon, and F. Vahid.
Dynamic Tuning of Configurable Architectures: The AWW
Online Algorithm .
IEEE/ACM Int. Conf. on Hardware/Software Codesign and System Synthesis,
(CODES/ISSS), Oct 2008.
pdf  
Data and Tools  
D. Sheldon and F. Vahid.
Don't Forget Memories: A Case Study Redesigning a Pattern Counting
ASIC Circuit for FPGAs.
IEEE/ACM Int. Conf. on Hardware/Software Codesign and System Synthesis,
(CODES/ISSS), Oct 2008.
pdf  
F. Vahid and T. Givargis.
Highly-Cited Ideas in System Codesign and Synthesis.
IEEE/ACM Int. Conf. on Hardware/Software Codesign and System Synthesis,
(CODES/ISSS), Oct 2008.
pdf  
ppt  
data (xls) (posted for a limited time) 
C. Huang and F. Vahid.
Dynamic Coprocessor Management for FPGA-Enhanced Compute Platforms.
IEEE/ACM Int. Conf. on Compilers, Architectures, and Synthesis for
Embedded Systems (CASES), Oct 2008.
pdf  
Data and Tools  
F. Vahid, G. Stitt, and R. Lysecky.
Warp Processing: Dynamic Translation of Binaries to FPGA Circuits .
IEEE Computer, Vol. 41, No. 7, July 2008, pp. 40-46.
pdf  
P. Viana, A. Gordon-Ross, E. Barros, F. Vahid.
A Table-Based Method for Single-Pass Cache Optimization .
ACM Great Lakes Symposium on VLSI, May 2008, pp. 71-76.
pdf  
S. Sirowy, G. Stitt, and F. Vahid.
C is for Circuits: Capturing FPGA Circuits as Sequential Code for
Portability.
ACM Int. Symp. on FPGAs, 2008, pp. 117-126.
pdf  
F. Vahid and G. Stitt.
Hardware/Software Partitioning .
Chapter 26 in S. Hauck, A. DeHon (editors), Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation", Morgan Kaufmann/Elsevier, 2008.
pdf not available  
F. Vahid.
It's Time to Stop Calling Circuits Hardware.
IEEE Computer Magazine, September 2007, Vol 40, Issue 9, pp. 106-108 .
pdf  
G. Stitt and F. Vahid.
Thread Warping: A Framework for Dynamic Synthesis of Thread Accelerators.
Int. Conf. on Hardware/Software Codesign and System Synthesis
(CODES/ISSS), 2007, pp. 93-98.
pdf  
ppt  
A. Gordon-Ross and F. Vahid.
A Self-Tuning Configurable Cache.
Design Automation Conference (DAC), 2007, pp. 234-237.
pdf  
ppt  
K. Schleupen, S. Lekuch, R. Mannion, Z. Guo, W. Najjar, and F. Vahid.
Dynamic Partial FPGA Reconfiguration in a Prototype
Microprocessor System . (FPL), 2007, pp. 533-536.
pdf (to appear)  
ppt (to appear)  
G. Stitt and F. Vahid. Binary Synthesis. ACM Transactions on Design Automation of Electronic Systems (TODAES), Vol. 12 No. 3, Aug 2007. pdf  
S. Sirowy and F. Vahid.
Integrated Coupling and Clock Frequency Assignment of Accelerators During Hardware/Software Partitioning
Assignment. International Embedded Systems Symposium (IESS), 2007,
pp. 145-154.
pdf  
ppt  
D. Sheldon, F. Vahid and S. Lonardi.
Soft-Core Processor Customization Using the Design of Experiments
Paradigm.
IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 821-826.
pdf  
ppt  
A. Gordon-Ross, P. Viana, F. Vahid, W. Najjar, E. Barros.
A One-Shot Configurable-Cache Tuner for Improved Energy and Performance.
IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 755-760.
pdf  
ppt  
S. Sirowy, Y. Wu, S Lonardi and F. Vahid.
Two Level Microprocessor-Accelerator Partitioning.
IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 313-318.
pdf  
ppt  
S. Sirowy, Y. Wu, S Lonardi and F. Vahid.
Clock-Frequency Partitioning for Multiple Clock Domains Systems-on-a-Chip.
IEEE/ACM Design Automation and Test in Europe (DATE), 2007, pp. 397-402.
pdf  
ppt  
Conjoining Soft-Core FPGA Processors
pdf  
ppt  
D. Sheldon, R. Kumar, F. Vahid, D.M. Tullsen, R. Lysecky
IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
Nov. 2006, pp. 694-701.
A Code Refinement Methodology for Performance-Improved Synthesis from C
pdf  
ppt  
G. Stitt, F. Vahid, W. Najjar
IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
Nov. 2006, pp. 716-723.
Application-Specific Customization of Parameterized FPGA Soft-Core
Processors
pdf  
ppt  
D. Sheldon, R. Kumar, R. Lysecky, F. Vahid, D.M. Tullsen,
IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
Nov. 2006, pp. 261-268.
Automated Application-Specific Tuning of Parameterized Sensor-Based
Embedded System Building Blocks
pdf  
ppt  
S. Lysecky, F. Vahid
Int. Conf. on Ubiquitous Computing (UbiComp), Sep. 2006,
pp. 507-524.
Automated Generation of Basic Custom Sensor-Based Embedded Computing
Systems Guided by End-User Optimization Criteria
pdf  
ppt  
S. Lysecky, F. Vahid
Int. Conf. on Ubiquitous Computing (UbiComp), Sep. 2006,
pp. 69-86.
Warp Processors
pdf  
R. Lysecky, G. Stitt, F. Vahid
ACM Transactions on Design Automation of Electronic Systems (TODAES),
July 2006, pp. 659-681.
Configurable Cache Subsetting for Fast Cache Tuning
pdf  
P. Viana, A. Gordon-Ross, E. Keogh, E. Barros, F. Vahid
IEEE/ACM Design Automation Conference (DAC), July 2006,
pp. 695 - 700.
New Decompilation Techniques for Binary-level Co-processor Generation
pdf  
G. Stitt, F. Vahid
IEEE/ACM International Conference on Computer-Aided Design (ICCAD),
Nov. 2005, pp. 547-554.
Fast Configurable-Cache Tuning with a Unified Second-Level Cache
pdf  
A. Gordon-Ross, F. Vahid, N. Dutt
International Symposium on Low-Power Electronics and Design (ISLPED),
Aug. 2005, pp. 323-326.
Hardware/Software Partitioning of Software Binaries: A
Case Study of H.264 Decode--
pdf  
G. Stitt, F. Vahid, G. McGregor, B. Einloth
International Conference on Hardware/Software Codesign and System Synthesis
(CODES/ISSS), Sep. 2005, pp. 285-290.
Shows that binary-level partitioning and synthesis of a real
highly-optimized h264 video decoder application is competitive with
source (C) level partitioning/synthesis. Also introduces several
simple C coding guidelines that greatly improve synthesis results.
Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware
pdf  
A. Gordon-Ross and F. Vahid
IEEE Transactions on Computers, Special Issue-Embedded Systems,
Microarchitecture, and Compilation Techniques in Memory of B.
Ramakrishna (Bob) Rau, Oct. 2005, Vol. 54, Issue 10, pp 1203-1215.
Describes extensive studies resulting in lean profiler hardware
that effectively finds addresses corresponding to frequent loops
in an executing software binary.
Usability of State Based Boolean eBlocks
pdf  
S. Cotterell and F. Vahid
11th International Conference on Human-Computer Interaction (HCII),
2005, pp.
Four basic state-based blocks -- prolonger, tripper, toggle, and
pulse generator -- are understandable by novice users and can
be connected to define a good range of desired sensor-system behavior.
A Study of the Scalability of On-Chip Routing for Just-in-Time
FPGA Compilation
pdf  
R. Lysecky, F. Vahid and S. Tan
IEEE Symposium on Field-Programmable Custom Computing
Machines (FCCM), 2005, pp. 57-62
Describes an FPGA routing approach that is lean in terms of runtime
and memory, running three times faster while using over 15 times less
memory than a popular router, yet creating a critical path that is only
30% longer on average and about equal for very large circuits compared
to that other router. Our approach, ROCR (Riverside On-Chip Router), can
be useful for methods requiring just-in-time FPGA compilation, like our
warp processing method, and future methods using a standard FPGA binary.
A Logic Block Enabling Logic Configuration by Non-Experts in Sensor
Networks
pdf  
S. Cotterell and F. Vahid
Conference on Human Factors in Computing (CHI), 2005, pp. 1925 - 1928
Describes attempts to build a logic block that non-experts could configure
to compute particular sensor conditions (e.g., motion and no light).
Shows that a truth table based block is too complicating to non-experts,
but a sentence based block exhibits high success, though being less
general. A truth table using color and presented in a sentence format
also exhibits reasonable success while being more general.
A Way-Halting Cache for Low-Energy High-Performance Systems
pdf  
C. Zhang, F. Vahid, J. Yang, and W. Najjar
ACM Transactions on Architecture and Code Optimization (TACO),
Vol. 2, No. 1, March 2005, pp 34-54.
Describes a cache design that separates the four low-order tags bits
into its own fully-associative memory (a halt-tag array).
Concurrently with address decoding, the halt-tag array determines
mismatches in the low-order four tag bits (of all the tags). A mismatch
masks out the decode line, halting further tag and data access.
A way-halting cache yields 55% memory access energy savings on average,
with no performance overhead.
A First Look at the Interplay of Code Reordering and Configurable Caches
pdf  
A. Gordon-Ross, F. Vahid, N. Dutt
Great Lakes Symposium on VLSI (GLSVLSI), April 2005, pp. 416-421.
Shows that a configurable cache dominates over compiler-based
code reordering with respect to tuning an application to a cache for
power and performance improvements. Yet, combining the two methods does
result in a smaller overall cache size, 13% on average and up to 89%.
eBlocks - An Enabling Technology for Basic Sensor Based Systems
pdf  
S. Cotterell, R. Mannion, F. Vahid, H. Hsieh
IPSN Track on Sensor Platform, Tools and Design Methods for
Networked Embedded Systems (SPOTS), April 2005, pp.
Describes how physical eBlock prototypes and a graphical eBlock
simulation tool were used by hundreds of users during the development
and refinement of eBlock sensor network nodes
A Highly Configurable Cache for Low Energy Embedded Systems
pdf  
C. Zhang, F. Vahid and W. Najjar
ACM Transactions on Embedded Computing Systems (TECS), Vol. 4, Issue 2,
May 2005, pp. 363-387
Describes a cache whose total size, associativity, and line size
can be configured just by setting a few bits in a configuration
register. Provides experimental results demonstrating that tuning
the configuration to a particular software application's needs
reduces memory access energy by over 40% on average across a large
set of benchmarks.
A Study of the Speedups and Competitiveness of FPGA
Soft Processor Cores using Dynamic Hardware/Software Partitioning
pdf  
ppt  
R. Lysecky and F. Vahid
Design Automation and Test in Europe (DATE), March 2005, pp. 18-23.
Highlights speedup and energy results of implementing warp
processing, which dynamically and transparently remaps software
kernels to FPGA using on-chip synthesis tools, for software running
on a Xilinx MicroBlaze soft-core processor. Results show
competitive performance and energy compared to software on
regular "hard core" embedded microprocessors, thus making
soft-cores on FPGA even more attractive beyond just their
flexibility of putting different numbers of cores and custom
circuitry on a single chip.
A Decompilation Approach to Partitioning Software for
Microprocessor/FPGA Platforms
pdf  
G. Stitt and F. Vahid
Design Automation and Test in Europe (DATE), March 2005, pp. 396-397
Utilizing advanced decompilation techniques enables synthesis of
hardware from binaries to recover nearly all high-level constructs
that existed in the source code, even for different compiler
optimization levels.
System Synthesis for Networks of Programmable Blocks
pdf  
ppt  
R. Mannion, H. Hsieh, S. Cotterell, F. Vahid
Design Automation and Test in Europe (DATE),
March 2005, pp. 888-893
Describes techniques to automatically convert a network of pre-defined
eBlocks into a minimal number of programmable eBlocks, while also
generating code for those blocks.
Techniques for Synthesizing Binaries to an Advanced Register/Memory
Structure
pdf  
G. Stitt, Z. Guo, F. Vahid, and W. Najjar
ACM/SIGDA Symp. on Field Programmable Gate Arrays (FPGA),
Feb. 2005, pp. 118-124
Advanced decompilation methods can make synthesizing FPGA hardware
from software binaries competitive with synthesizing directly from
C-level source code, even when utilizing an advanced memory structure
(smart buffer) requiring knowledge of loops and arrays. Synthesis
from binaries provides numerous advantages of language independence,
tool independence, portability, and support of legacy code.
Applications and Experiments with eBlocks -- Electronic Blocks for Basic Sensor-Based Systems
pdf  
ppt  
S. Cotterell, K. Downey, and F. Vahid
IEEE Sensor and Ad Hoc Communications and Networks (SECON),
Oct 2004, pp.
Describes common applications that can be built just by connecting
eBlocks together, enabling people without programming experience to
build useful sensor-based systems. Summarizes experiences with
hundreds of users, showing success rates even when utilizing
logic and state based blocks.
A Way-Halting Cache for Low-Energy High-Performance Systems
pdf  
C. Zhang, F. Vahid, J. Yang and W. Najjar
International Symposium on Low-Power Electronics and Design (ISLPED),
Aug 2004, pp. 126-131
Describes a cache whose tag comparison logic includes a small
and fast fully-associative memory that quickly detects a mismatch
in a particular cache way, and then halts further tag and data access
of that way, thus saving power.
Dynamic FPGA Routing for Just-in-Time FPGA Compilation
pdf  
ppt  
R. Lysecky, F. Vahid, and S. Tan
Design Automation Conference (DAC), June 2004, pp. 954-959.
Describes an FPGA routing heuristic suitable for execution on-chip,
to support Just-in-Time compilation for FPGAs.
Tuning caches to applications for low-energy embedded systems
pdf  
A. Gordon-Ross, C. Zhang, F. Vahid. N. Dutt
Chapter 6 in Ultra Low-Power Electronics and Design - Kluwer Academic Pub, June 2004
A Self-Tuning Cache Architecture for Embedded Systems
pdf  
C. Zhang, F. Vahid and R. Lysecky
ACM Transactions on Embedded Computing Systems (TECS),
Vol. 3., Issue 2, May 2004, pp. 407-425.
Describes a configurable cache that monitors its own hit rate, and
automatically reconfigures the cache's number of ways (associativity),
line size and total size to reduce power and/or improve performance,
using an efficient heuristic that not only prunes the configuration
search space but also avoids cache flushes during the search.
A Quantitative Analysis of the Speedup Factors of FPGAs over Processors
pdf  
Z. Guo, W. Najjar, F. Vahid and K. Visssers
ACM/IEEE International Symposium on Field-Programmable Gate Arrays,
Feb. 2004, pp. ??
A Configurable Logic Architecture for Dynamic Hardware/Software
Partitioning
pdf  
ppt
R. Lysecky and F. Vahid
Design Automation and Test in Europe Conference (DATE), February 2004,
pp. 480-485.
Describes a simple configurable logic (FPGA) fabric and surrounding
architecture specifically intended to support dynamic hardware/software
partitioning -- meaning on-chip CAD tools must be able to quickly map a
netlist to the fabric.
Automatic Tuning of Two-Level Caches to Embedded Applications
pdf  
ppt
A. Gordon-Ross, F. Vahid and N. Dutt
Design Automation and Test in Europe Conference (DATE), February 2004,
pp. 208-213.
Describes efficient heuristics for tuning a two-level cache to a particular
application, obtaining near-optimal memory-access energy savings of
53%-55% through such tuning, while exploring a mere 6% of the total
configuration space.
Using a Victim Buffer in an Application-Specific Memory Hierarchy
pdf  
ppt
C. Zhang and F. Vahid
Design Automation and Test in Europe Conference (DATE), February 2004,
pp. 220-225.
Adding to a cache a configurable victim buffer, which can be turned on
or off, improves memory-access energy of an application by up to 43%.
Such savings occur even if the cache itself is configurable. Making the
buffer configurable enables us to shut off the buffer for some applications
that otherwise would suffer increased energy and performance penalties
of up to 4%.
A Self-Tuning Cache Architecture for Embedded Systems
pdf  
ppt
C. Zhang, F. Vahid and R. Lysecky
Design Automation and Test in Europe Conference (DATE), February 2004,
pp. 142-147.
Describes a configurable cache that can tune its total size, associativity,
and line size to an executing application. The search heuristic is
carefully designed to avoid flushing. The cache transparently reduces
memory-access related energy by 45%-55% on average, and by as much
as 97% for particular applications.
Low Static-Power Frequent-Value Data Caches
pdf  
ppt
C. Zhang, J. Yang and F. Vahid
Design Automation and Test in Europe Conference (DATE), February 2004,
pp. 214-219.
Improves upon Yang/Gupta's previous frequent value cache, by eliminating
performance overhead, and saving static power in addition to dynamic power,
using circuit level design improvements. A frequent value cache encodes
commonly-occurring data values into just a few bits, shutting down the
remaining bit storage cells. 33% static energy savings are obtained.
Energy Savings and Speedups from Partitioning Critical Software Loops
to Hardware in Embedded Systems
pdf
G. Stitt, F. Vahid, S. Nemetebaksh
IEEE Transactions on Embedded Computer Systems, January 2004.
Partitioning a program's kernels to FPGA hardware can reduce overall
system energy.
A Way-Halting Cache for Low-Energy High-Performance Systems
pdf
C. Zhang, F. Vahid, J. Yang, W. Najjar
IEEE Computer Architecture Letters, Vol. 2, Sep. 2003.
The first four bits of a cache's tags are stored in a fast
efficient CAM and accessed concurrently with set decoding -- if
those four bits mismatch for the decoded set, the full tag comparisons
and data array accesses are "halted," thus saving power, with no
performance overhead (unlike other power-saving caches).
Frequent Loop Detection Using Efficient Non-Intrusive On-Chip Hardware
pdf
A. Gordon-Ross and F. Vahid
ACM/IEEE Conference on Compilers, Architecture and Synthesis for
Embedded Systems (CASES), 2003, pp. 117-124.
Describes efficient non-intrusive hardware for detecting the most
frequent loops in an executing binary, and the relative frequencies
of those loops.
First Results with eBlocks: Embedded Systems Building Blocks
pdf
S. Cotterell, F. Vahid, W. Najjar and H. Hsieh
ACM/IEEE ISSS/CODES conference, 2003, pp. 168-175
Describes embedded system building blocks that people with no training
can connect together to build simple but useful systems.
A Codesigned On-Chip Logic Minimizer
pdf
R. Lysecky and F. Vahid
ACM/IEEE ISSS/CODES conference, 2003, pp. 109-113.
Hardware/software partitioning of an on-chip logic minimizer results
in 8x speedup and 60% energy savings, improving the usefulness
of on-chip logic minimization in a variety of applications.
Tiny Instruction Caches For Low Power Embedded Systems
pdf
A. Gordon-Ross, S. Cotterell, and F. Vahid
ACM Transactions on Embedded Computing Systems, Vol. 2, Issue 4, Nov. 2003,
pp. 449-481.
Putting a very small (e.g., 128 word) loop cache in front of L1
instruction cache can greatly reduce power, with no performance overhead.
Profiling tools for hardware/software partitioning of embedded applications
pdf
D.C. Suresh, W.A. Najjar, F. Vahid, J.R. Villarreal, G. Stitt
Languages, Compilers and Tools for Embedded Systems (LCTES), 2003,
pp. 189-198.
A Highly-Configurable Cache Architecture For Embedded Systems
pdf
C. Zhang, F. Vahid and W. Najjar
International Symposium on Computer Architecture, 2003, pp. 136-146.
A cache with whose number of ways and total size can be tuned to
a particular program yields big energy savings with almost no
performance overhead.
Cache Configuration Exploration on Prototyping Platforms
pdf
C. Zhang and F. Vahid
Rapid System Prototyping, 2003, pp. 164-171
Methods to automatically tune a configurable cache to a particular
software application.
Dynamic Hardware/Software Partitioning: A First Approach
pdf
G. Stitt, R. Lysecky and F. Vahid
Design Automation Conference, 2003, pp. 250-255.
Dynamically partitioning an executing software application onto
on-chip FPGA is not only possible, but quite effective.
On-Chip Logic Minimization
pdf
R. Lysecky and F. Vahid
Design Automation Conference, 2003, 334-337.
Executing a lean form of logic minimization on-chip is feasible and has
several immediate applications in networking.
Embedded System Design: UCR's Undergraduate Three-
Course Sequence
pdf
F. Vahid
Microelectronics Systems Engineering (MSE) conference, 2003.
Summarizes UCR's successful 3-course sequence on embedded system design,
based on the new ESD book (see above) that emphasizes a unified view
of hardware and software.
The Softening of Hardware
pdf
F. Vahid
IEEE Computer, April 2003, pp. 27-34.
A new perspective on hardware becoming much more like software, due
in part to configurable logic, and in part to hardware being created
today by compiling high-level languages.
Highly Configurable Platforms for Embedded Computing Systems
pdf
F. Vahid, R. Lysecky, C. Zhang and G. Stitt
Microelectronics Journal, Elsevier Publishers, Volume 34, Issue 11, November 2003, Pages 1025-1029.
The case for creating platform chips with much configurability,
including on-chip FPGA, configurable cache, etc.
Online version
Making the Best of those Extra Transistors
pdf
F. Vahid
IEEE Design and Test of Computers, Jan/Feb 2003, pg. 96.
An argument for new uses of the abundant transistors on modern
chips.
Energy Benefits of a Configurable Line Size Cache for Embedded Systems
pdf
C. Zhang, F. Vahid, W. Najjar
IEEE Computer Society Annual Symposium on VLSI, Feb. 2003, pp. 87-91.
Creating a cache with a line size that can be configured to 16, 32
or 64 bytes results in suprisingly large energy savings.
Platune: A Tuning Framework for System-on-a-Chip Platforms
pdf
T. Givargis and F. Vahid
IEEE Transactions on Computer Aided Design, Vol. 21, No. 11, Nov. 2002,
pp. 1317-1327.
Platune tunes an architecture to an application by rapidly exploring the
huge configuration space of configurable caches, buses, voltage levels,
and many any other configurable architectural parameters.
Instruction-based System-level Power Evaluation of System-on-a-chip
Peripheral Cores
pdf
T. Givargis, F. Vahid and J. Henkel
IEEE Transactions on VLSI, pp. 856-863, Vol 10, No 6, Dec. 2002.
We break peripheral behavior into "instructions" and back-annotate
those instructions with low-level power data, yielding fast and accurate
system-level power estimations.
Power Estimator Development for Embedded System Memory Tuning
pdf not avail
F. Vahid, T. Givargis and S. Cotterell
Journal of Circuits, Systems and Computers, vol. 11, no. 5, pp. 459-476,
October 2002.
We describe three increasingly accurate methods for estimating power
of a memory hierarchy.
Partitioning Sequential Programs for CAD using a Three-Step Approach
pdf
F. Vahid
ACM Transactions on Design Automation of Electronic Systems,
Vol 7, Issue 3, pp 413-429, July 2002.
The Energy Advantages of Microprocessor Platforms with On-Chip Configurable Logic
   pdf
G. Stitt and F. Vahid
IEEE Design and Test of Computers, November/December 2002, pp. 36-43.
System-level Exploration for Pareto-optimal Configurations in
Parameterized System-on-a-chip
   pdf
T. Givargis, F. Vahid and J. Henkel
IEEE Transactions on VLSI Systems, Vol. 10, Issue 4, Dec. 2002,
pp. 416-422.
Improving Software Performance with Configurable Logic
   pdf
J. Villarreal, D. Suresh, G. Stitt, F. Vahid and W. Najjar
Kluwer Journal on Design Automation of Embedded Systems,
November 2002, Volume 7, Issue 4, pp. 325-339.
Hardware/Software Partitioning of Software Binaries
   pdf
G. Stitt and F. Vahid
IEEE/ACM International Conference on Computer Aided Design,
November 2002, pp. 164-170.
Synthesis of Customized Loop Caches for Core-Based Embedded Systems
   pdf
S. Cotterell and F. Vahid
IEEE/ACM International Conference on Computer Aided Design,
November 2002, pp. 655-662.
Tuning of Loop Cache Architectures to Programs in Embedded System Design
   pdf
   ppt slides
S. Cotterell and F. Vahid
IEEE/ACM International Symposium on System Synthesis,
October 2002, pp. 8-13.
Dynamic Loop Caching Meets Preloaded Loop Caching -- A Hybrid Approach
   pdf
   ppt slides
A. Gordon-Ross and F. Vahid
International Conference on Computer Design,
September 2002, pp. 446-449.
Tuning of Cache Ways and Voltage for Low-Energy Embedded System
Platforms
   pdf
T. Givargis and F. Vahid
Kluwer Journal on Design Automation of Embedded Systems,
vol. 7, issue 1-2, pp. 35-51, September 2002.
A Fast On-Chip Profiler Memory
   pdf
R. Lysecky, S. Cotterell and F. Vahid
IEEE/ACM Design Automation Conference, June 2002, pp. 28-33.
Codesign-Extended Applications
   pdf
   ppt slides
B. Grattan, G. Stitt and F. Vahid
IEEE/ACM International Symposium on Hardware/Software Codesign,
Estes Park, May 2002, pp. 1-6.
A Power-Configurable Bus for Embedded Systems
   pdf
C. Zhang and F. Vahid
IEEE International Symposium on Circuits and Systems, Scottsdale,
May 2002, pp.V-809-812.
Using On-Chip Configurable Logic to Reduce Embedded System
Software Energy
   pdf
G. Stitt, B. Grattan, J. Villarreal and F. Vahid
IEEE Symposium on Field-Programmable Custom Computing
Machines (FCCM), Napa Valley, April 2002, pp. 143-151.
Propagating Constants Past Software to Hardware Peripherals in
Fixed-Application Embedded Systems
   pdf
G. Stitt and F. Vahid
In book "Compilers and operating systems for low power,"
editors L. Benini, M. Kandemir, J. Ramanujam,
Kluwer Academic Publishers, 2003, Chapter 7, pp. 115-136.
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example
   pdf
A. Gordon-Ross, S. Cotterell and F. Vahid
IEEE Computer Architecture Letters, Vol 1, January 2002.
Prefetching for Improved Bus Wrapper Performance in Cores
   pdf
R. Lysecky and F. Vahid
ACM Transactions on Design Automation of Electronic Systems,
Vol. 7, No. 1, pp. 58-90, January 2002.
Propagating Constants Past Software to Hardware Peripherals in
Fixed-Application Embedded Systems
   pdf
   html (of COLP paper)
   slides
F. Vahid, R. Patel and G. Stitt
Special Issue of ACM SIGARCH Newsletter, Dec. 2001.
Selected for special issue from earlier version of paper in
Compilers and Operating Systems for Low Power (COLP'01).
Describes the size and power advantages of recognizing that
software-configurable control registers in peripherals may never
change after being initialized, if the software itself never changes
(as is common in embedded systems).
System-level Exploration for Pareto-optimal Configurations in
Parameterized Systems-on-a-chip
   pdf
   html
   slides
T. Givargis and F. Vahid and J. Henkel
International Conference on Computer Aided Design, Nov 2001, pp. 25-30.
Provides a technique for efficiently exploring the
configuration space of a parameterized system-on-a-chip (SOC)
architecture to find all Pareto-optimal configurations. These
configurations represent the range of meaningful power and
performance tradeoffs that are obtainable by adjusting parameter
values for a fixed application mapped onto the SOC architecture. Our
approach extensively prunes the potentially large configuration space
by taking advantage of parameter dependencies. We have successfully
incorporated our technique into the parameterized SOC tuning
environment (Platune) and applied it to a number of applications.
Evaluating Power Consumption of Parameterized Cache and Bus
Architectures in System-on-a-Chip Designs
   pdf
T. Givargis, F. Vahid and J. Henkel
IEEE Transactions on VLSI, Vol 9, No. 4, pp. 500-508, Aug 2001.
Architectures with parameterizable cache and bus can support
large tradeoffs between performance and power. We provide
simulation data showing the large tradeoffs by such an
architecture for several applications, and demonstrating that the
cache and bus should be configured simultaneously to find the
optimal solutions. Furthermore, we describe analytical
techniques for speeding up the cache/bus power and
performance evaluation by several orders of magnitude over
simulation, while maintaining sufficient accuracy with respect to
simulation-based approaches.
A Self-Optimizing Embedded Microprocessor using a Loop Table for
Low Power
   pdf
   html
   slides
F. Vahid and A. Gordon-Ross
International Symposium on Low Power Electronics and Design, Aug 2001,
pp. 219-224.
Describes the architecture and methodology of an embedded microprocessor
that can automatically tune itself to the particular application that will
run. The particular tunable component described is a loop table, similar
to a loop cache except that its contents never change after the most
frequent loops are detected.
Platform Tuning for Embedded Systems Design
   pdf
F. Vahid and T. Givargis
IEEE Computer, Vol. 34, No. 3, pp. 112-114, March 2001.
Provides an overview of the philosophy of our UCR Dalton Project,
in particular, the idea of tuning a programmable system-on-a-chip
architecture to the one application that it will eventually run
forever.
Trace-driven System-level Power Evaluation of System-on-a-chip
Peripheral Cores
   pdf
   html
T. Givargis, F. Vahid and J. Henkel
Asia South-Pacific Design Automation Conference (ASP-DAC),
pp. 306-311, January 2001.
Our earlier work for fast evaluation of power consumption of general
cores in a system-on-a-chip described techniques that involved isolating
high-level instructions of a core, measuring gate-level power consumption
per instruction, and then annotating a system-level simulation model with
the obtained data. In this work, we describe a method for speeding up the
evaluation further, through the use of instruction traces and trace
simulators for every core, not just microprocessor cores. Our method
shows noticeable speedups at an acceptable loss of accuracy. We show
that reducing trace sizes can speed up the method even further. The
speedups allow for more extensive system-level power exploration and
hence better optimization.
A First-step Towards an Architecture Tuning Methodology for Low Power
   pdf
   html
   slides
G. Stitt, F. Vahid, T. Givargis, R. Lysecky
Compilers, Architectures, and Synthesis for Embedded Systems (CASES'00),
pp. 187-192, November 2000.
We describe an automated environment to assist a system-on-a-chip
designer to tune a microprocessor core to a particular application
program that will run on the microprocessor, and vice-versa, with the
goal of reducing embedded system power consumption. We limit such tuning
to modifications that do not change the microprocessor instruction set,
thus avoiding the large costs that would come with such a change. Our
tuning environment for the 8051 microcontroller is freely-available on
the web.
Instruction-based System-level Power Evaluation of
System-on-a-chip Peripheral Cores
   pdf
   html
   slides
T. Givargis, F. Vahid and J. Henkel
IEEE/ACM International Symposium on System Synthesis (ISSS),
pp. 163-169, September 2000.
We propose a new technique, suitable for a variety of cores like
peripheral cores, that is the first to combine gate-level power data with
a system-level simulation model written in C++ or Java. For that purpose,
we investigated peripheral cores and decomposed their functionality into
so-called instructions. Our technique addresses a core-based system design
paradigm. We show that our technique is sufficiently accurate for making
power-related system-level design decisions, and that its computation time
is orders of magnitude smaller than lower-level simulation approaches.
Experiments with the Peripheral Virtual Component Interface
   pdf
   html
   slides
R. Lysecky, F. Vahid, T. Givargis
International Symposium on System Synthesis (ISSS),
pp. 221-224, September 2000.
The Peripheral Virtual Component Interface, or PVCI, is a standard intended to simplify the interfacing of
peripheral cores to on-chip buses in a system-on-a-chip, by standardizing the interface between a core's internals
and its bus wrapper. We provide results of experiments intended to determine the power, performance, and size
overhead associated with using a PVCI bus wrapper versus using a non-PVCI bus wrapper, and versus using no
bus wrapper at all. The results demonstrate that using a bus wrapper may result in only small performance, power
and size overhead versus using no wrapper, though even that performance overhead can be reduced or eliminated
using pre-fetching. The results also demonstrate that using a PVCI bus wrapper yields no significant additional
power, performance or size overhead compared with a non-PVCI bus wrapper.
Parameterized System Design
   pdf
   html
T. Givargis and F. Vahid
IEEE/ACM International Workshop on Hardware/Software Codesign (CODES),
pp. 98-102, May 2000.
Continued growth in chip capacity has led to new methodologies stressing
reuse, not only of pre-designed processing components, but even of entire
pre-designed architectures. To be used across a variety of applications,
such architectures must be heavily parameterized, so they can adapt to
those applications' differing constraints by trading off power, performance
and size. We describe several parameterized system design issues, and
provide results showing how a single architecture with easily configurable
parameters can support a wide range of tradeoffs.
Fast Cache and Bus Power Estimation for Parameterized System-on-a-Chip
Design
   pdf
   html
   slides
T. Givargis, F. Vahid and J. Henkel
Design Automation and Test in Europe (DATE) Conference
pp. 334-338, March 2000.
We present a technique for fast estimation of the power consumed by the
cache and bus sub-system of a parameterized system-on-a-chip design for a
given application. The technique uses a two-step approach of first
collecting intermediate data about an application using simulation, and
then using equations to rapidly predict the performance and power
consumption for each of thousands of possible configurations of system
parameters, such as cache size and associativity and bus size and
encoding. The estimations display good absolute as well as relative
accuracy for various examples, and are obtained in dramatically less
time than other techniques, making possible the future use of powerful
search heuristics.
Techniques for Reducing Read Latency of Core Bus Wrappers
   pdf
   html
   slides
R. Lysecky, F. Vahid, T. Givargis
Design Automation and Test in Europe (DATE) Conference
pp. 84-91, March 2000. Best Paper Award.
Today's system-on-a-chip designs consist of many cores. To enable cores
to be easily integrated into different systems, many propose creating
cores with their internal logic separated from their bus wrapper. This
separation may introduce extra read latency. Pre-fetching register data
into register copies in the bus wrapper can reduce or eliminate this
extra latency. In this paper, we introduce a technique for automatically
designing a pre-fetch unit that satisfies user-imposed register-access
constraints. The technique benefits from mapping the pre-fetching problem
to the well-known real-time process scheduling problem. We then extend
the technique to allow user-specified register interdependencies, using
a Petri Net model, resulting in even more efficient pre-fetch schedules.
A Hybrid Approach for Core-Based System-Level Power Modeling
   pdf
   html
T. Givargis, F. Vahid and J. Henkel
Asia South-Pacific Design Automation Conference (ASP-DAC),
pp. 141-145, January 2000.
Describes a technique for obtaining fast yet accurate power estimations
of core-based systems. The main idea is to use an object-oriented language
(C++ or Java) to create a system-level model , modeling each core as an
object, and extending each object with power-estimation methods based
on statistics from low-level power data of a synthesized version of
the core. By executing the system-level model, which runs about 1000x
faster than gate-level simulation, we obtain very accurate power
estimates.
Interface and Cache Power Exploration for Core-Based Embedded System
   pdf
   html
T. Givargis, J. Henkel and F. Vahid
International Conference on Computer-Aided Design (ICCAD),
pp. 270-273, November 1999.
Demonstrates, through experiments on four applications, the large power,
performance and size tradeoffs possible just by varying architectural
parameters relating to cache and bus for a given reference architecture.
Illustrates that these parameters must be tuned to one another for each
application, and thus argues for the need for a parameter exploration
environment in a configure-and-execute design paradigm.
Pre-fetching for Improved Core Interfacing
   pdf
   html
R. Lysecky, F. Vahid, T. Givargis, and R. Patel
International Symposium on System Synthesis (ISSS),
pp. 51-55,
November 1999.
Introduces a method to reduce or eliminate the extra latency that may
arise when reading from a core designed with a bus wrappers for
ease of retargeting to different system buses. The method involves
pre-fetching registers from the core's internals to registers added
in the bus wrapper, akin to caching.
The Case for a Configure-and-Execute Paradigm
   pdf
   html
F. Vahid and T. Givargis
International Workshop on Hardware/Software Codesign (CODES),
pp. 59-63, May 1999.
Provides an argument, supported by data obtained by various researchers,
in favor of building systems-on-a-chip by configuring a pre-designed
reference design already in silicon, rather than building systems
by connecting large numbers of cores.
FSMD Functional Partitioning for Low Power
E. Hwang and F. Vahid and Y.C. Hsu
Design Automation and Test in Europe (DATE) Conference
pp. 22-28, March 1999.
Techniques for Minimizing and Balancing I/O during Functional Partitioning
F. Vahid
IEEE Transactions on CAD,
Vol. 18, No. 1, pp. 69-75
January 1999.
Procedure Cloning: A Transformation for Improved
System-Level Functional Partitioning
F. Vahid
ACM Transactions on Design Automation of Electronic Systems,
Volume 4, Number 1, pp. 70-96,
1999.
A Three-Step Approach to the Functional Partitioning
of Large Behavioral Processes
F. Vahid
International Symposium on System Synthesis,
pp. 152--157,
December 1998.
Incorporating Cores into System-Level Specification
   pdf
   html
F. Vahid and T. Givargis
International Symposium on System Synthesis (ISSS),
pp. 43--48, December 1998.
Describes a method for describing a system built from pre-designed
system components (cores) at the system level, using an object-oriented
language, resulting in dramatically faster simulations than
approaches based on HDL's.
Interface Exploration for Reduced Power in Core-Based Systems
   pdf
   html
T. Givargis and F. Vahid
International Symposium on System Synthesis (ISSS),
pp. 117--122,
December 1998.
Provides equations developed to enable one to explore various
bus configurations in a parameterized architecture very rapidly.
One simulates an application once, from which bus traffic data is
accumalated, and then fed into a tool that analyzes each bus
configuration in constant-time using the equations. The power
or performance optimal bus can thus be quickly selected for a given
application.
System-Level Exploration with SpecSyn
D. Gajski, F. Vahid, S. Narayan and J. Gong
Design Automation Conference,
pp. 812-817, June 1998.
Functional Partitioning Improvements over Structural Partitioning for
Packaging Constraints and Synthesis-tool Performance
F. Vahid, T.D.M. Le and Y.C. Hsu
ACM Transactions on Design Automation of Electronic Systems,
Volume 3, Number 2, pp. 181-208,
1998.
SpecSyn: An Environment Supporting the Specify-Explore-Refine
Paradigm for Hardware/Software System Design
D.D. Gajski and F. Vahid and S. Narayan and J. Gong
IEEE Transactions on VLSI Systems,
Vol. 6, No. 1, pp. 84-100,
1998.
Awarded the IEEE VLSI Transactions Best Paper Award, June 2000.
Guest Editors' Introduction to the Special Issue on ISSS'96
F. Vahid and S. Narayan
ACM Transactions on the Design Automation of Electronic Systems,
Vol. 2, No. 4, Oct. 1997, pp. 307-311.
Port Calling: A Transformation for Reducing I/O
during Multi-Package Functional Partitioning
F. Vahid
International Symposium on System Synthesis,
pp. 107--112,
September 1997.
Message-Based Hardware/Software Communication in HDL/C Environments
L. Tauro and F. Vahid
Asia-Pacific Conference on Hardware Description Languages ASP-CHDL),
pp. ??,
August 1997.
An Object-Oriented Communication Library for Hardware-Software Co-Design
F. Vahid and L. Tauro
International Workshop on Hardware/Software Codesign (CODES),
pp. 81--86,
March 1997.
Extending the Kernighan/Lin Heuristic for Hardware and Software
Functional Partitioning
F. Vahid and T.D.M. Le
Kluwer Journal on Design Automation of Embedded Systems,
Vol. 2, No. 2, pp. 237-261,
March 1997.
Procedure Cloning: A Transformation for Improved System-Level
Functional Partitioning
F. Vahid
European Design and Test Conference,
pp. 487--492,
March 1997.
Modifying Min-Cut for Hardware and Software Functional Partitioning
F. Vahid
International Workshop on Hardware/Software Codesign,
pp. 43--48,
March 1997.
I/O and Performance Tradeoffs with the FunctionBus during
Multi-FPGA Partitioning
F. Vahid
International Symposium on Field-Programmable Gate Arrays,
pp. 27-34,
February 1997.
A Comparison of Functional and Structural Partitioning
F. Vahid and T.D.M. Le and Y.C. Hsu
International Symposium on System Synthesis,
pp. 121-126,
November 1996.
Towards a Model for Hardware and Software Functional Partitioning
F. Vahid and T.D.M. Le
International Workshop on Hardware/Software Codesign,
pp. 116-123,
March 1996.
System Design Methodologies: Aiming at the 100 h Design Cycle
D. Gajski and S. Narayan and L. Ramachandran and F. Vahid and P. Fung
IEEE Transactions on VLSI Systems,
Vol. 4, No. 1, pp. 70-82,
1996.
Closeness Metrics for System-Level Functional Partitioning
F. Vahid and D.D. Gajski
European Design Automation Conference,
pp. 328-333,
September 1995.
Clustering for Improved System-Level Functional Partitioning
F. Vahid and D.D. Gajski
International Symposium on System Synthesis,
pp. 28-33,
September 1995.
Procedure Exlining: A Transformation for Improved System and
Behavioral Synthesis
F. Vahid
International Symposium on System Synthesis,
pp. 84-89,
September 1995.
Procedure Exlining: A New System-Level Specification Transformation
F. Vahid
European Design Automation Conference -- EuroVHDL,
pp. 508-513,
September 1995.
Incremental Hardware Estimation during Hardware/Software Functional
Partitioning
F. Vahid and D. Gajski
IEEE Transactions on VLSI Systems,
Vol. 3, No. 3, pp. 459-464,
September 1995.
SpecCharts: A VHDL Front-End for Embedded Systems
F. Vahid and S. Narayan and D. Gajski
IEEE Transactions on CAD,
Vol. 14, No. 6, pp. 694-706,
1995.
SLIF: A Specification-Level Intermediate Format For System Design
F. Vahid and D.D. Gajski
European Design and Test Conference,
pp. 185-189,
March 1995.
Specification and Design of Embedded Software-Hardware Systems
D. Gajski and F. Vahid
IEEE Design & Test of Computers,
Vol. 12, No. 1, Spring 1995, pp. 53-67.
A Binary-Constraint Search Algorithm for Minimizing
Hardware during Hardware-Software Partitioning
F. Vahid and J. Gong and D.D. Gajski
European Design Automation Conference -- EuroDAC,
pp. 214-219,
September 1994.
A Transformation Integrating VHDL Behavioral Specification with Synthesis
and Software Generation
F. Vahid, S. Narayan and D.D. Gajski
European Design Automation Conference -- EuroDAC,
pp. 552-557,
September 1994.
A System-Design Methodology: Executable-Specification Refinement
D.D. Gajski and F. Vahid and S. Narayan
European Conference on Design Automation,
pp. 458-463,
March 1994.
BOOK: Specification and Design of Embedded Systems
Title page, Contents, and Preface
Online slides
D.D. Gajski and F. Vahid and S. Narayan and J. Gong
Prentice Hall,
1994.
Specification Partitioning for System Design
F. Vahid, and D. Gajski
Design Automation Conference,
pp. 219-224, June 1992.
Pubs from before 1994 not listed.
F. Vahid, R. Lysecky, G. Stitt. Warp Processor for Dynamic Hardware/Software Partitioning. US Patent 7,356,672, 2008. (UC Case 2004-390).
J. Henkel, T. Givargis, F. Vahid, Method for core-based system-level
power modeling using object-oriented techniques.
(UCCASENO: 2000-261-1). U.S. Patent #6,865,526
Talks (not associated with conference papers above)
Portable FPGA Applications: Warp Processing and SystemC Bytecode
-- Keynote talk, Reconfigurable Architectures Workshop (RAW), 2009,
Rome, Italy
Warp Processing
-- Univ. of California, San Diego, Dept. of CS&E, Nov 2008
Warp Processing
-- Univ. of Illinois, Urbana/Champaign, Oct 2008
(also Univ. of Washington, Sep 2008,
and Microsoft Research, Redmond WA, Sep 2008)
eBlocks -- Electronic Building Blocks for Sensor-Based Systems
-- SRC/NSF Virtual Immersion Architecture Workshop, Santa Cruz, July 2008
You Can Do It -- eBlocks Enabling Regular People to Build Useful Customized Sensor-Based Systems
-- Riverside Community College seminar, May 2008
Warp Processing
-- SRC annual review, Dallas, 2008
Self-Improving Computer Chips -- Warp Processing
-- UCR CS&E Colloquium, Oct. 2007
Standard Binaries for FPGAs, & eBlocks
-- NSF's Cyber-Physical Workshop, July 2007
SensorBlocks
-- UCR's College of Engineering TechHorizons, 2007
Warp Processing
-- SRC annual review, Carnegie Mellon Univ., 2007
Soft Core Customization and other UCR FPGA Research Xilinx, July 2006
The New Software: FPGAs University of Arizona, ECE, April 2006
Warp Processors -- Freescale, April 2006
Warp Processing: Dynamic Transparent Conversion of Binaries to Circuits
-- Notre Dame, CS, Mar 2006
Warp Processing
-- SRC annual review, Ohio State Univ., 2006
Warp Processor: A Dymamically Reconfigurable Coprocessor
-- Talk at Intel's System Design Symposium (San Jose), Nov. 2005
Supercomputing in a Pencil Tip -- Talk at UCR's Engineering
Industry Day, Oct 2005
Silicon prototyping issues -- Panel talk at Intel, May 2005
eBlocks -- Talk at UCSD, April 2005
eBlocks -- Talk at Intel, September 2004
Warp Processors -- Talk at ASU, April 2004
Warp Processors -- Talk at IBM Research, Yorktown Heights, Apr 2004
Warp Processors -- SRC annual review talk, March 2004
Self-Improving Configurable IC Platforms
-- SRC annual review talk, February 2003
Improving Embedded System Software Speed
and Energy using Microprocessor/FPGA Platform ICs
-- UCR colloquium talk, October 2002
New Opportunities with Platform-Based
Design
-- Keynote talk at ESCODES'02
System-on-a-Chip Platform Tuning for Embedded Systems
-- given at 2002 Southern California Embedded Systems Seminar
Recent Results at UCR with Configurable Cache and Hw/Sw Partitioning
-- given at Triscend Corp., September 2002.