UCR CS269: Hardware/Software Engineering of Embedded Systems

Course objective

To study state-of-the-art research focusing on the hardware and software design of embedded computing systems. This year's focus will be on system-on-a-chip, new embedded computing architectures, and low power.

Course information

Instructor: Frank Vahid (vahid@cs.ucr.edu). Office hours TR 2-3, Bourns A207
Class meeting time TR 3:10-4:30, GEOL 1429 (course call# 16330)
Prerequisites None, though digital design knowledge helpful
Textbooks None -- all readings will be online papers.
Grade Based on presentations, participation and a few possible homeworks.

Online readings

Date Presenter Online material Brief Summary Presenter's outline/summary
Tu 11-Jan F. Vahid International Roadmap for Semiconductors 1999 (ITRS)
Status and trends of IC's Presenter's Summary
Th 13-Jan F. Vahid ITRS (continued)
 
Tu 18-Jan R. Lysecky VSIA's Architecture Document Industry consortium's plan to standardize cores/IP Presenter's Summary
VSIA's System Level Model Taxonomy Document Different ways of representing a system. Presenter's Summary
Th 20-Jan R. Lysecky VSIA docs (continued)
 
Tu 25-Jan S. Cotterell Future of Computing Architectures (Patterson) Embedded, not desktop, systems to drive future architecture design. Presenter's Summary
Philips Silicon Platforms Using pre-designed architectures to build new systems. Presenter's Summary
Heterogenous Reconfigurable Processors (Rabaey) A new configurable architecture that adapts to a computation. Presenter's Summary
Th 27-Jan S. Cotterell (Above papers continued)
 
Tu 1-Feb T. Givargis EXPRESSION Architecture Description Language (UCI) Describing parameterized architectures so compilers can optimize for a configuration. Presenter's Summary
Adapting Cache Line Size to Application Behavior (UCI) Optimizing cache parameters to a specific program. Presenter's Summary
Philips's Retargetable Simulator Simulating different configurations of a parameterized architecture Presenter's Summary
Philips approach an MPEG example. Presenter's Summary
Programmable interconnect (Rabaey) Presenter's Summary
Th 3-Feb T. Givargis
 
Th 10-Feb S. Cotterell Java Driven Codesign and Prototyping of Networked Embedded Systems Presenter's Summary
Description and Simulation of Hardware/Software Systems with Java Presenter's Summary
Representation of Function Variants for Embedded System Optimization and Synthesis Presenter's Summary
Fast Prototyping: a system design flow applied to a complex System-On-Chip multiprocessor design Presenter's Summary
 
Th 17-Feb L. Tauro Code Compression for Embedded Systems (Lekatsas/Wolf) Presenter's Summary
Customized Instruction-Sets for Embedded Processors (Fisher) Presenter's Summary
Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops Presenter's Summary
Selective Instruction Compression for Memory Energy Reduction in Embedded Systems Presenter's Summary
 
Th 24-Feb R. Lysecky Address Bus Encoding Techniques for System-Level Power Optimization Presenter's Summary
Motorola's IP Interface Standard Presenter's Summary
VSIA's On-Chip Bus Presenter's Summary
 
Th 2-Mar T. Givargis Profile-Driven Program Synthesis for Evaluation of System Power Dissipation Presenter's Summary
A Power Estimation Framework for Designing Low Power Portable Video Applications Presenter's Summary
Cycle-Accurate Simulation of Energy Consumption in Embedded Systems Presenter's Summary
Memory Exploration for Low Power Embedded Systems Presenter's Summary
 
Th 9-Mar L. Tauro New Chips Move Networking onto Silicon Presenter's Summary
Prototyping Networked Embedded Systems Presenter's Summary
Designing the Next Step In Internet Appliances Presenter's Summary
Information Appliances: From Web Phones to Smart Refridgerators Presenter's Summary
 
Th 16-Mar
MISC:
Motorola's Semiconductor Reuse Standards

Some interesting related links

Interview with Gordon Moore
Intel Museum (Intro's to transistors and microprocessors)
History of the transistor
ASICs textbook by Smith
Online VLSI Design Tutorial

Summaries

Title: International Roadmap for Semiconductors 1999 (ITRS)
Summary:
Back to listings

Title: VSIA's Architecture Document
Summary: The VSIA architecture document described requirements for Virtual Component(VC) designers to aid in design reuse. The main focus of this document is to provide a set of standards for different levels of VC's and their requirements. They classify virtual component as being either soft VC's (RTL descriptions), firm VC's (ranges form partially mapped RTL descriptions to fully placed netlists), and hard VC's (fully mapped and placed netlists). Along with each of these VC's is a set of documentation and required models design with the intention of reducing the effort involved with integrating a VC into a design. These requirements include documentation on interface, timing, electrical characteristics, test procedures, and models.
Back to listings

Title: VSIA's System Level Model Taxonomy Document
Summary: This document was created by the VSI to provide a common acceptable nomenclature and classification for models such that design information can be transfer between designers and quickly understood. The document discusses many classifications and levels of modeling. They first define what they describe as System Models, Architectural Models, Hardware Models, and Software Models. Within each model they further define types of models within each category, namely, behavioral, functional, structural, interface, performance, and dataflow graph models. Within each of these classifications, they describe the precision of the model. That is they define what aspects of the design need to be implemented in a particular model and what attributes might be included in the model.
Back to listings

Title: Future of Computing Architectures
Summary: This document basically evaluates the past, present, and predicted computer architectures. They acknowledge that in the past the emphasis was in desktop systems. However, the future trends seem to be heading for personal mobile computing. This type of system has different requirements, such as size, portablility, power consumption/effciency, real time response, design scalibility, etc... They believe that present day architectures have a heavy bias to looking at the past for architecture designs. However, with the shift from desktop to embedded systems this is not sensible. Different architectures were examined in this paper and IA-64 and Raw were thought to be among the better architecture because there systems were not based on the past but looked to future demands for design ideas.
Back to listings

Title: Philips Silicon Platforms
Summary: For well defined application "silicon platforms" (SOC) must be defined which combine effcient implemantitions with programmability. The large architecture space leads to new possiblities for new architectures. However, with all this space and possibilities we eventually have the problem with time to market and the design gap. Some of the solutions were reuse, libraries, and breaking up the architecture in differing levels of granularity. A multiwindow TV appliacation was chosen to explore system level architecutres. In the process they also came up with an architecture template and heuristics to aid the in the design of the system.
Back to listings

Title: Heterogenous Reconfigurable Processors
Summary: With the paradigm shift in reconfigurable systems, many implementations and techniques are utilized in the design. This paper discusses some of these, the tradeoffs, and issues associated with them. Custom designs yeild the best solutions however issued such as time to market, flexibility, and adaptivity do not fare so well. What about programmable architectures or configurable architectures? These in the past were found to be too confining. They bring up some of the potential expansions in configurable architectures to remedy this, and even classify future system-on-a-chips into three categories: Homogeneous arrays of general-purpose processing elements, application specific combination or processesing elements, and heterogenous combinations of processing elements.
Back to listings

Title: EXPRESSION Architecture Description Language(UCI)
Summary: An Architecture Description Language (ADL) is presented that allows for the capturing an architectures behavior as well as the structure. The behavior representation of the architecture described operation of the processors, and operand types while the structure representation of the architecture describes the pipeline structure and data-path construction. In addition EXPRESSION allows for describing the memory hierarchy of the architecture being described. From an EXPRESSION description, reservation tables per operation are automatically obtained. Also, using the EXPRESSION grammar, the described architecture is automatically verified for correctness. The EXPRESSION description is also used to generate tool-kits such as simulators and compilers for the architecture, automatically.
Back to listings

Title: Adapting Cache Line Size to Application Behaviour(UCI)
Summary: This paper describes the implementation of a cache system that adapts its cache-line size dynamically. The paper gives details on the hardware for such an adaptive cache and describes the policies required for such a system. The main idea is that during the execution of an application, the cache line size that caused a hit is either merged with its neighbor or broken into smaller cache lines. There are a 48 possible ways of configuring the proposed cache architecture. Each configuration has slightly different policy or algorithm for the increase/decrease of the cache line size.
Back to listings

Title: Philip's Retargetable Simulator
Summary: A methodology (Y-Chart) is described that allows for the design of a programmable systems in the domain of high-performance video signal processing. This approach allows for describing various architectures and sets of applications. This description is then used to derive an architecture instance (where all the configurable parameters are defined.) The architecture instance is then processed to create an executable architecture instance. This executable is then instrumented to enable it to collect performance metrics. Then, the application is mapped and executed via the simulator to obtain performance metrics. The steps above are then repeated to refine the system. Object oriented principles are used to capture the architecture instance. A multithreaded library is used for the concurrent execution model. The execution model is capable of performing 10K coarse-grain instructions per second.
Back to listings

Title: Philips approach an MPEG example
Summary: This paper presents a case study of architecture for MPEG2 decoding. The objective is to validate the System Level Performance Analysis and Design space Exploration (SPADE) methodology. Using this methodology, one can, concurrently, break an application into a Kahn process model and specify a parameterizable architecture. The Kahn process model is then mapped onto this architecture. Afterwards, the system and the architecture are simulated and based on performance analysis the design space is explored. The Kahn process representation of the application is executed using a multithreaded object oriented environment where each process is executed as a thread and communication is performed using blocking-read and write primitives. The application execution model is augmented with code to output performance metrics and execution traces. Then, after manual mapping onto the architecture, the performance of the overall system is evaluated via trace-driven simulation. The SPADE methodology is an example of the Y-Chart methodology.
Back to listings

Title: Programmable interconnect
Summary: This paper discusses a number of interconnect architectures for SOCs. They divide these architectures into either "global" and "local" interconnects. Global interconnects are those that have a fixed communication cost regardless of the distance between the communicating components. Local interconnects are networks that provide cheap interconnect among local components while minimizing the cost of communication between distant components. For global interconnects, the CrossBar, Multi-Stage (Omega) and Multi-Bus is described. For local interconnect, the mesh, generalized mesh and hierarchical such as fat-tree networks are described. The paper concludes that the generalized hierarchical mesh interconnect is best suited for reconfigurable SOCs. This network breaks the components into clusters and provides a local generalized mesh within each cluster. Likewise, a generalized mesh interconnect is used for across-cluster communication.
Back to listings

Title: Java Driven Codesign and Prototyping of Networked Embedded Systems
Summary: This paper mentions the challenge in codesign. This paper presents a method and tools called JaCOP which will support co-synthesis and prototyping of networked embedded systems. These tools will aid in the design of hardware/software development, profiling, amnd managing the interaction of software and hardware components. The proposed design flow starts with the initial Java specificatin, and profiling data. The data is then analyzed and animated to help the designer in partioning. Functions are implemented and synthesized using high-level and logical synthesis tools. Previously designed hardware components kept in a library can then be reused. These and software components are put into a pool. As the design process continues portions of the chip can be reconfigued with other sections are being worked on. This way we have actual hardware and software codesign. This papers elaborates the detail entailed in each of these steps.
Back to listings

Title: Description and Simulation of Hardware/Software Systems with Java
Summary: This paper suggests the idea of using JavaBeans to model systems. This way complexities that arise in designing of integrated systems can be avoided by using higher and higher levels of abstraction. The system is composed of structure and behavior on the sytem level, alogrithmic level, and register transfer level. They suggest that the problem can be solved by an object model that is based on differing interpretations of objects. Some of the interpretations are objects as components, hardware objects, and objects as connections. This paper shows the use of tools to, classes, and libraries in accordance with Java and an expansion of Java to design these systems. They even come up with a design flow and examples.
Back to listings

Title: Representation of Function Variants for Embedded System Optimization and Synthesis
Summary: This paper proposes a novel approach for the coherent representation and selection of function variants in the different phases of the design process. They utilize a real example from the video processing domain to illustrate the example. An System Property Interval model is used and extended to show the concepts of function variants. They define clusters and interfaces using the function variants and explain the seclection processes.
Back to listings

Title: Fast Prototyping: a system design flow applied to a complex System-On-Chip multiprocessor design
Summary: This paper introduces CoWare N2C which is supposed to reduce time to market, enable concurrent hardware and software development, enable early verificatin, and productive reuse of intellectual property. The describe the challenges of system on a chip design and illustrates how the use of this product would provide solutions. The tools used in the actual development of their product and the strategies used were also discussed. Some of the perks of this product was the ability to refine behvariorl C descriptions down to a clocked-C (similar to VHDL RTL), cross checking capabilities, functional validation, fully emulated prototypes, and co-simulation engine that allows hybrid prototyping. They give descriptions on the prototyping of the megacell and the overall system. The time it took to design their system was achieved in four to five months which is significantly shorter than would be expected.
Back to listings

Title: Code Compression for Embedded Systems
Summary: Code compression can provide substantial savings in terms of size for memory. Two algorithms were proposed in the paper to compress code in a space efficient and simple to decompose way. a) SAMC (Semi Adaptive Markov Compression) uses a binary arithmetic coder driven by a Markov model. The idea is to divide instructions into 4 streams of eight bits each (for a 32 bit word) and build Markov models for each one of them. b) SADC(Semi Adaptive Dictionary Compression) uses a semi adaptive dictionary method to compress opcodes, opcode register combinations and opcode immediate combinations. The basic idea is that opcodes in an instruction sequence tend to exhibit some dependence between them. These dependencies are exploited by generating new augmented opcodes which combine several opcodes together.

The compression performance results showed significant improvements over previous attempts.SAMC (Semi Adaptive Markov Compression) is targetted for RISC instruction sets with fixedsized instructions and can work for any architecture.Compression ratios are comparable to UNIX compress.SADC(Semi Adaptive Dictionary Compression) works on a specific program and instruction set.As a result it can achieve significantly better compression. As it is a dictionary method it allows for fast hardware implementations.
Back to listings

Title: Customized Instruction-Sets for Embedded Processors
Summary: Instruction Set Architectures (ISA) are the most visible instructions of the processor. It is the contract between the hardware and the software. The major motivation for breaking the ISA is that doing so can lead to performance or performance/price gains.
There are 5 barriers that come in the way of breaking the ISA:
1. Existing binaries barrier:
2. Toolchain devolopment and mentainance costs
3. Lost savings/ higher chip cost due to lower volumes:
4. Hardware devolpoment costs: Each variant processor needs a new chip design.
5. The product devolopment cycle for embedded products:

The paper discusses why each of the above is a barrier and some solutions .The paper then outlines the factors that the author believes will cause ISA's to become performance driven families of what are now incompatable ISA's.
Back to listings

Title: Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops
Summary: In this paper they propose using a small instruction buffer called a loop cache to reduce the instruction fetch energy when executing tight program loops. Instructions are specified to the execution core either from the loop cache or from the main cache. The proposed technique is based on a special class of branch instructions called the short backward branch instruction(sbb). When a sbb is detected in an instruction stream and found to be taken, the loop cache controller assumes that we are starting to execute the second iteration of the program loop and tries to fill up the loop cache. From the third iteration onwards the controller directs all the instruction requests to the loop cache and shuts off the main cache completly.

The loop cache controller knows precisely when the next instruction request will hit in the loop cache well ahead of time. The loop cache has no address tag store. It can be implemented as a direct mapped array.There is no valid bit associated with each loop cache entry. It does not require program loops to be aligned to any perticular address boundry . But most importantly there is no cycle count penalty nor cycle time degredation associated with this technique.
Back to listings

Title: Selective Instruction Compression for Memory Energy Reduction in Embedded System
Summary: The authors propose a technique for reducing energy required by firmware code to execute on embedded systems. The main idea here is that firmware running on a given embedded processor uses only a small subset of the instructions supported by the processor.By replacing such instructions with binary patterns of limited with (i.e. log2 N), memory bandwidth usage can be reduced, thus decreasing the total energy. Each time an instruction is fetched from the memory it is first decompressed by means of an instruction decompression table and then passed to the processor's decoding logic. The authors propose to compress only a subset of fixed cardinality(256 elements) of the instructions used in the program.

The paper discusses four architectural schemes to do the compression. Based on experimental results they state that dynamic memory utilization is substancially improved for all the compression schemes. The main advantage of their scheme is that it does not require any modification of the processor since it always executes full-size instructions.
Back to listings

Title: Address Bus Encoding Techniques for System-Level Power Optimization
Summary: Analyzed the use to different bus encoding schemes with regard to the average number of bus line transitions per clock cycle. They first discussed the use of Bus Invert and T0 encodings, and then describe the use mixed encoding. The combination of Bus Invert and T0 encoding resulted in the lowest average bus line transitions per clock cycle. Furthermore, they analyzed the impact of these encodings in regards to power consumption. It is evident from their research, that there is a point at which using the mixed encoding schemes will result in the reduced power consumption. That is, depending on the capacitance of the chip, the logic needed to implement the circuitry will results in greater power consumption. But as the capacitance increases, this overhead will become smaller and at some point using the mixed encoding will result in lower power consumption.
Back to listings

Title: Motorola's IP Interface Standard
Summary: Motorola's IP interface provides a standard similar to the VSIA on-chip bus standard. However, Motorola's interface in divided into a number or bus lines that corresponds to different portions of the bus interface, such as the main system bus, the peripheral bus, interrupts, DMA control, etc. In particular the blue line describes the peripheral virtual component interface, which is referred to as a gasket. This interface is a two signal handshaked protocol that provides both a point-to-point connection between a bus gasket and VC or a one-to-many connection between a single bus gasket and many VC's.
Back to listings

Title: VSIA's On-Chip Bus
Summary: The on-chip bus described in the VSIA document can be classified into two different interfaces, and the full Virtual Component Interface (VCI) and the Peripheral Virtual Component Interface (PVCI). The PVCI defines the interface between a peripheral core and a bus wrapper. The bus wrapper is used to interface the VC with the on-chip bus. This allows for designers to provide multiple bus wrappers to different standard buses. It also allows VC integrators to design bus wrappers to proprietary busses. The PVCI is a simple two signal handshaked protocol that provides a point-to-point connection between a bus wrapper and VC.
Back to listings

Title: Profile-Driven Program Synthesis for Evaluation of System Power Dissipation
Summary: This paper provides algorithms for synthesizing an execution tract from an original program's execution trace such that the power consumption of the two traces is identical. The simulation time (evaluation of power) is thus reduced. In their approach, they use integer linear programming to solve a best fit basic-block template to that of the original program. Then they select a set of matching instructions for each block, assign operands and allocate memory based on mathematical models. They report a simulation reduction time of 10 to 10000 times while maintaining power accuracy of less than 5%.
Back to listings

Title: A Power Estimation Framework for Designing Low Power Portable Video Applications
Summary: This paper presents a hierarchical and mixed-level simulation environment that is targeted towards data processing applications. In particular, they use an MPEG encoder example to illustrate their methodology. In their environment, they model each component at both functional level as well as low structural level using C and VHDL. A designer can use the high-level model to check functionality and capture intermediate data that is later applied to the low-level structural models for accurate power and performance metrics.
Back to listings

Title: Cycle-Accurate Simulation of Energy Consumption in Embedded Systems
Summary: This paper gives a methodology for performing a cycle-accurate simulation (for power and function) of embedded systems. Their work focuses on a set of board-level components. They assume that only interface and manufacture's datasheet are available to the designer for these discrete components. Their work models the processor and components in a high-level language and simulate while capturing power consumption by evaluating simple mathematical models for cache, CPU, memory and so on.
Back to listings

Title: Memory Exploration for Low Power Embedded Systems
Summary: This paper presents a memory exploration strategy based on three performance metrics. They study cache size, line size, set associativity and tiling. They outline an exhaustive exploration algorithm that tries to find a set of values for the parameters just mentioned that minimizes power consumption while meeting timing constraints.
Back to listings

Title: New Chips Move Networking onto Silicon
Summary: Networking technolgy today is towards faster, less expensive and more functional networks. The growing trend is towards providing many networking functions via a single internetworking chip rather than previous approaches that use either multiple ASIC's or software on general purpose RISC processors. Internetworking Chips are Integrated specialized chipsets optimized to perform high-level networking functions. They take advantage of improvements in processor technology that permits lower chip prices, as well as more sophisticated decision making in hardware. Therefore they are customizable like software but are also fast like ASICs. The advantages of internetworking chips are that they help meet the demand for faster, less expensive and more functional networks by offering better performance than software that runs on a gneral purpose processor. They also offer lower prices, faster time to market and more flexibility than ASIC's. They provide a standerized approach that could permit more interoperability. On the other hand they have the disadvantage that Generic functions hardwired into commoditized chipsets may not be sufficient to add some of the specialized features that vendors want to incorporate in their products. Also Networking venders may not want to put aside big investments in in-house-ASIC design teams in favor of buying off-the-shelf internetworking chips.
Back to listings

Title: Prototyping Networked Embedded Systems
Summary: The low cost consumer oriented fast time-to-market mentality that dominates embedded system design today forces design teams to use hardware-software codesign to cope with growing design complexities. New codesign methodologies and tools must support a key characteristic of next generation embedded systems: the capabitlity to communicate over networks and adapt to different operating environments. They devoloped a co-synthesis method that makes the most efficient assignment of tasks to either software or hardware. They first begin with an initial Java specification of the desired functionality. Software profiling by the Java virtual machine then identifies bottlenecks and computation intensive tasks. The designer then seeks canditates for hardware implementations based on profiling results and a reuse library of available hardware components.The designer uses a high level synthesis tool to tranform Java methods into VHDL. At the same time the tool generates an appropriate interface description for each hardware block.
Back to listings

Title: Designing the Next Step in Internet Applications
Summary: Web enabled information Appliances are showing up in many places but the challenege is supplying web pages with dynamic content. In order to support dynamic content the authors suggest the following steps
1. Generate web content using an HTML editor of choice. For dynamic content pages, proprietry tags are inserted where the dynamic portions are to be inserted.
2. Use a conversion toolto convert the HTML pages,images and applets into embedded application source code.
3. Implement the generated shell routines that are specific to the overall application running in the embedded device.
4. Compile and link the resulting source code
The embedded server could support both Static and Dynamic web content as well as support from processing. By utilizing a conversion tool designers spend their time implementing the web portions specific to their deive. This results in a very large time savings in their application devolopment time
Back to listings

Title: Information Appliances: From Web Phones to Smart Refridgerators
Summary: IA(Information Appliances) hides its operating system and computational abilities behind a task oriented interface. Using LAN,s, the Internet and wireless technologies, these embedded systems will provide connectivity to nearly every kind of electronic device manufactured in the coming years. Some categories of IA are thin clients, set-top web browser, web phones, smart cellular phones and pagers. The embedded system within all these devices performs the low level protocol processing and link control invisibly enabling smart machines to talk to each other. IA technologies are making it much easier not only to build intelligence into a product but also to allow it to communicate with other devices across a variet of infrastructures. The biggest impact of IA's will be in the invisible realm- meter reading, data capture and industry automation which benefits from a steady flow of information.
Back to listings



Back to Frank Vahid's home page