UCR CS269: Hardware/Software Engineering of Embedded Systems

Course description

To study state-of-the-art research focusing on the hardware and software design of embedded computing systems. This year's focus will be on new embedded computing architectures with an emphasis on low power architectural techniques.

Course information

Instructor: Frank Vahid (vahid@cs.ucr.edu). Office hours TR 2-3, Bourns A207
Class meeting time TuTh 9:40am-11am, HMNSS 1404
Prerequisites Background in programming, digital design, and architecture
Textbooks None -- all readings will be online papers.
Grade Based on presentations, participation and a few possible homeworks.

Expectations

Students are expected to read assigned readings before the date they are presented, to attend classes and actively participate in discussions, and to present several papers during quarter, which they should study thoroughly and for which they might do additional study. Outside reading for this class is estimated to be about an hour for every hour of class time.

Online readings

Date Presenter Online material Brief Summary Presenter's outline/summary
Tu 4/3 NO CLASS Int. Technology Roadmap for Semiconductors: Intro. Widely-read document on chip directions: in short, capacity is HUGE. We'll focus on the SOC and design parts only. Presenter's Summary
ITRS'99: SOC's The trend towards putting entire systems-on-a-chip. Don't dwell on the tables. Presenter's Summary
ITRS'99: Design How are we going to make use of such high capacity chips?. Presenter's Summary
Th 4/5 F. Vahid Platform Tuning for Embedded System Design. F. Vahid and T. Givargis Recent IEEE Computer article summarizing the UCR Dalton Project's technical basis and philosophy. Presenter's Summary

Tu 4/10 Chuanjun Zhang Power Analysis of Embedded Software: A First Step Towards Software Power Minimization. Tiwari, V.; Malik, S.; Wolfe, A. A widely-referenced early work on software power estimation. Presenter's Summary
Chuanjun Zhang High-Level Power Modeling, Estimation and Optimization E. Macii, M. Pedram, F. Somenzi. A good tutorial on low-power design. Presenter's Summary
Th 4/12 Greg Stitt Energy Dissipation in General Purpose Processors. Gonzalez, R.; Horowitz, M. Presenter's Summary
Greg Stitt The Design of a High Performance Low Power Microprocessor. Dobberpuhl, D. Presenter's Summary
Greg Stitt The Technology Behind Crusoe Processors. A. Klaiber. Presenter's Summary

Tu 4/17 Girish Venkataramani Technical Report: The SimpleScalar 2.0 Toolset. D. Burger, T. Austin. Presenter's Summary
Girish Venkataramani Wattch: A Framework for Architectural-Level Power Analysis and Optimizations D. Brooks, V. Tiwari, M. Martonosi. Presenter's Summary
Th 4/19 Jason Villarreal Cache Designs for Energy Efficiency C.L. Su and A.M. Despain, HICSS 1995. Presenter's Summary
Jason Villarreal Power and performance tradeoffs using various caching strategies. Bahar, R.I.; Albera, G.; Manne, S. Presenter's Summary

Tu 4/24 Ann Gordon-Ross The Filter Cache: An Energy Efficient Memory Structure. Kin, J.; Munish Gupta; Mangione-Smith, Presenter's Summary
Ann Gordon-Ross Energy and Performance Improvements in Microprocessor Design Using a Loop Cache. Bellas, N.; Hajj, I.; Polychronopoulos, C.; Stamoulis, G. (ICCD '99) International Conference on Computer Design, 1999. Presenter's Summary
Ann Gordon-Ross Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops. L. H. Lee, B. Moyer, J. Arends, International Symposium On Low Power Electronics and Design (ISLPED), 1999. Presenter's Summary
Th 4/26 Brian Grattan An Energy Conscious Methodology for Early Design Exploration of Heterogeneous DSP's M. Wan, Y. Ichikawa, D. Lidsky and J. Rabaey. Presenter's Summary
Brian Grattan A Low Power Hardware/Software Partitioning Approach for Core-based Embedded Systems. J. Henkel, DAC99. Presenter's Summary
Brian Grattan Energy-Conscious HW/SW Partitioning of Embedded Systems: A Case Study on an MPEG-2 Encoder. J. Henkel and Yanbin Li, CODES98. Presenter's Summary

Tu 5/1 Yu Chang Address Bus Encoding Techniques for System-Level Power Optimization L. Benini, G. De Micheli, E. Macii, D. Scuito, C. Silvano Presenter's Summary
Yu Chang A2BC: Adaptive Address Bus Coding for Low Power Deep Sub-Micron Designs J. Henkel, H. Lekatsas, DAC'01. Presenter's Summary
Yu Chang Synthesis of Low-Overhead Interfaces for Power-Efficient Communication over Wide Buses L. Benini, A. Macii, E. Macii, M. Poncino, R. Scarsi, DAC'99. Presenter's Summary
Th 5/3 Troy Smith Architectural Power Optimization by Bus Splitting C.T. Hsieh and M. Pedram, DATE'00. Presenter's Summary
Troy Smith Communication Architecture Tuners: A Methodology for the Design of High-Performance Communication Architectures for System-on-Chips K. Lahiri, A. Raghunathan, G. Lakshminarayana and Sujit Dey, DAC'00 (Awarded Best Paper). Presenter's Summary
Troy Smith Bus-Invert Coding for Low-Power I/O M.R. Stan, W.P. Burleson, IEEE Transactions on VLSI Systems, March 1995. Presenter's Summary

Tu 5/8 Yi Chen Selective Instruction Compression for Memory Energy Reduction in Embedded Systems L. Benini, A. Macii, E. Macii, M. Poncino, ISLPED99. Presenter's Summary
Yi Chen A Power Reduction Technique with Object Code Merging for Application Specific Embedded Processors T. Ishihara and H. Yasuura Presenter's Summary
Yi Chen Code Compression for Low Power Embedded System Design H. Lekatsas, J. Henkel, W. Wolf. Presenter's Summary
Th 5/10 Susan Cotterell A Low Power Unified Cache Architecture Providing Power and Performance Flexibility A. Malik, B. Moyer, D. Cermak. Presenter's Summary
Susan Cotterell A portable fault-tolerant microprocessor based on the SPARC V8 architecture. J. Gaisler. Data Systems in Aerospace, 1999. Presenter's Summary
Susan Cotterell Leon Center Homepage (link no longer active) Presenter's Summary

Tu 5/15 Xiaohu Chen Custom-Fit Processors: Letting Applications Define Architectures J. Fisher, P. Faraboschi and G. Desoli, MICRO'96. Presenter's Summary
Xiaohu Chen Customized Instruction-Sets for Embedded Processors J.A. Fisher, DAC'99. Presenter's Summary
Xiaohu Chen An ASIP Design Methodology for Embedded Systems K. Kucukcakar, CODES'99. Presenter's Summary
Th 5/17 Yumin Wang Media Architecture: General Purpose vs. Multiple Application- Specific Programmable Processor C. Lee, J. Kin, M. Potkonjak, W.H. Mangione-Smith Presenter's Summary
Yumin Wang Automatic Architectural Synthesis of VLIW and EPIC Processors S. Aditya, B.R. Rau, V. Kathail, ISSS'99. Presenter's Summary

Tu 5/22 Rafael Lopez Xtensa: A Configurable and Extensible Processor R. Gonzalez, IEEE Micro'00. Also see www.tensilica.com Presenter's Summary
Rafael Lopez System and architecture-level power reduction of microprocessor-based communication and multi-media applications L. Nachtergaele, V. Tiwari and N. Dutt, ICCAD'00. Presenter's Summary
Th 5/24 Kris Miller Memory Data Organization for Improved Cache Performance in Embedded Processor Applications P. Panda, N. Dutt, A. Nicolau, TODAES'97. Presenter's Summary
Kris Miller Memory Exploration for Low Power Embedded Systems W.T. Shiue, C. Chakrabarti, DAC'99. Presenter's Summary

Tu 5/29 Dinesh Suresh Global Multimedia System Design and Exploration using Accurate Memory Organization Feedback. A. Vandecappelle, M. Miranda, E. Brockmeyer, F. Catthoor, D. Verkest, DAC'99. Presenter's Summary
Dinesh Suresh Memory aware compilation through accurate timing extraction. P. Grun, N. Dutt, A. Nicolau, DAC'00. Presenter's Summary
Th 5/31 Yun Zheng Power: A First Class Architectural Design Constraint. T. Mudge, IEEE Computer, 2001. Presenter's Summary

Tu 6/5 Philip Hoang AMULET3: A 100 MIPS Asynchronous Embedded Processor S.B. Furber, D.A. Edwards, and J.D. Garside, ICCD'00. Presenter's Summary
Philip Hoang Power Management in the Amulet Microprocessors. S.B. Furber, A. Efthymiou, J.D. Garside, D.W. Lloyd, M.J.G. Lewis, S. Temple, DT'01. Presenter's Summary
Th 6/7 --- Presenter's Summary
 
MISC:
Motorola's Semiconductor Reuse Standards

Some interesting related links

Interview with Gordon Moore
Intel Museum (Intro's to transistors and microprocessors)
History of the transistor
ASICs textbook by Smith
Online VLSI Design Tutorial

Summaries

Title: ITRS roadmap, SOC and design
Summary: The International Technology Roadmap for Semiconductors is a widely-read document put together by representatives from leading chip companies. It charts the future of chips and points out key challenges to be overcome. A general understanding of the trends in this field is important to understanding the future of embedded systems -- namely, that chip capacities are going ballistic, and that designing computing systems that take advantage of such capacity is very hard. Such capacity may enable totally new approaches to chip design. Of particular interest to our class are programmable platforms: pre-designed, over-designed chips for particular domains, like networking, digital cameras, cell-phones, set-top boxes, or video games.
Back to online readings table

Title: Platform Tuning for Embedded System Design
Summary: Huge chip capacities are outpacing designer productivity, meaning such chips are underutilized. Companies with chip-design and domain expertise are thus creating programmable platforms: pre-designed, over-designed systems for particular domains, like digital cameras, thus easing the end-product designer's burden. Such platforms typically have parameterized architectures. We want to tune those parameters to give the best performance and power. This paper introduces this new field of platform tuning, and discusses research being done at UCR in this field.
Back to online readings table

Title: Power Analysis of Embedded Software : A first step toward software power minimization
Summary: 1. The goal of this research is to present a methodology for developing and validating an instruction level power model for any given processor. 1.1 it is used power estimation 1.2 and power optimization 1.3 to verify a design meet its specification 2. Experimental methods; 2.1 The aim is to provide a method that makes it possible to talk about the power/energy cost of a given program on a given processor. 2.2 Hypothesis: By measuring the current drawn by the processor as it repeatedly executes certain instructions or certain short instruction sequences, it is possible to obtain most of the information that is needed to evaluate the power cost of a program for that processor. 2.3 Base Energy cost :Constructing a loop of the same instructions There are some factors affect the base energy: the effect of the loop instruction, the parameter(memory , register) of the instruction and the cache miss etc. 2.4 Inter-instruction 2.4.1 Circuit state 2.4.2 Effect of resource constrains 2.4.3 Cache Misses 3. Generation of Energy Efficient Code 3.1 Power consumed by instructions with memory operands is much higher than instructions with register operands.
Title: High-level power modeling, Estimation, and Optimization
Summary: 1. This paper is a survey of the most successful and innovative ideas in power modeling estimation and optimization 2. Power Modeling and Estimation 2.1 Statistical sampling 2.1.1 Static: it relies on probabilistic information about the input signal and their correlation t oestimate the internal switching activity of the circuit. Limitation: can not get accurate parameters used in the model 2.1.2 Dynamic: Simulate the circuit in a typical input stream Problem is that with simulations 2.2 Probabilistic Compaction 2.3 RT_Level Power Estimation Use capacitance models for circuits moduls and activitiy profiles for data or control signals 2.4 Behavioral-level power estimation 3. Synthesis 3.1 Operation scheduling: shut down resources that are performing useful computations. 3.2 Resource Allocation 3.2.1 three classes of resources: Registers, functional units and interconnections. 3.3 Multiple Supply Voltage Scheduling 3.3 Control synthesis: it means control the process or patter the circuit is synthesized. 4: Optimization: 4.1 Bus Encoding 4.2 Control logic: it seems like the Control synthesis. 4.3 retiming 4.4 shut down Techniques
Back to online readings table

Title: Energy Dissipation in General Purpose Processors
Summary: This paper investigates energy saving that occur from using pipelining and super-scalar issue machines. The authors analyze energy dissipation for three ideal machines where the only energy included is from reading and writing memories and clocking storage elements. The results show that both pipelining and super-scalar machines achieve a much better energy-delay product. For real machines, these results were roughly half as efficient as the ideal machines.
Back to online readings table

Title: The Design of a High Performance Low Power Microprocessor
Summary: This paper discusses various power optimization applied to the Strong ARM 110 processor in order to achieve low power consumption without sacrificing high performance.
Back to online readings table

Title: The Technology Behind Crusoe Processors
Summary: This paper discusses the new Crusoe processor from Transmeta. This processor is an x86 compatible processor that achieves tremendous power savings while having good performance. The Crusoe uses a highly power efficient VLIW processor which is surrounded by CodeMorphing Software. This software handles the dynamic translation of x86 instructions into the instruction set of the VLIW processor. This software reduces the size of the Crusoe to 1/4 of the size of the Pentium III and allows for many performance optimization such as a translation cache.
Back to online readings table

Title: The SimpleScalar Tool Set, Version 2.0
Summary: This paper presents the SimpleScalar Tool Set, which is a set architecture simulator tools. SimpleScalar simulates a MIPS-like (actually, a superset of the MIPS-IV instruction set architecture) architecture at the software level. It provides five different simulators that focus on different aspects of the architecture. While sim-fast is a functional simulator providing quick results without too much statistics (and without timing information), sim-outorder is a detailed, low-level simulator that simulates the microarchitecture cycle-by-cycle. The paper gives an overview of the entire tool set, and how it is structured, and how it can be used.
Back to online readings table

Title: Wattch: A Framework for Architectural-Level Power Analysis and Optimizations
Summary: This paper presents a system (Wattch) that adapts the SimpleScalar simulator for power analysis. Power analysis is done at the architectural level, and the simulation is built on top of the sim-outorder simulator (of the SimpleScalar tool set). The paper describes the various power optimizations that can be done at the architectural level, and accounts for these optimizations in the simulation. The simulation itself provides a platform to analyze different configurations, optimizations and strategies to save power. It describes how these simulation results can be used by computer architects and compiler writers to deliver architectures that are power-efficient as well as performance-driven.
Back to online readings table

Title: The Filter Cache: An Energy Efficient Memory Structure
Summary: The motivation for this paper is based on the high power consumption of microprocessor caches which often occupy a significant area of the chip. They propose to trade performance for power consumption with the addition of a very small "filter" cache located between the CPU and the L1 cache. This cache exploits the high locality of reference of embedded software that tends to execute small blocks of code very frequently eliminating the need for a large cache that will only partially be used at a given time. This filter cache will consume less power and access time will be faster than that of an L1 cache because of the smaller size. However, insertion of the filter cache will increase the wait time on a cache miss and will degrade performance. They show that the decrease in energy consumption out weighs the decrease in performance and yields good savings.
Back to online readings table

Title: Energy and Performance Improvements in Microprocessor Design using a Loop Cache
Summary: This paper takes the premise of a filter cache and improves upon it with the help of a specially designed compiler that tries to keep the most frequently executed instructions in the small cache. The goal of this design is to reduce power consumption while not decreasing the performance. They use a small L-cache between the L1 cache and the CPU to hold blocks of instructions. A trace run is done on the code to determine instruction frequencies. Then a compiler is used to analyze the code and determine what blocks should be placed into the L-cache. These blocks are marked and are the only ones that will be brought into the L-cache. This reduces cache misses because infrequent instructions are not using up space in the L-cache and the most frequently accessed instructions are stored. They show that is some cases, performance can be improved.
Back to online readings table

Title: Instruction Fetch Energy Reduction Using Loop Caches For Embedded Applications with Small Tight Loops
Summary: The motivation for this paper is to exploit small loops in program execution. A very small cache is placed on chip close to the CPU to store currently executing instructions. A controller is used to know where the next instruction will be serviced from eliminated any performance degradation. Because of the small size of the loop cache and its close proximity to the CPU, power consumption for accesses to it are much less than accesses to L1 cache. They show that this loop cache can significantly reduce accesses to main cache which would inturn reduce power consumption.
Back to online readings table

4/29/01 Summaries

An Energy Conscious Methodology for Early Design Exploration of Heterogeneous +DSP's M. Wan, Y. Ichikawa, D. Lidsky and J. Rabaey. Summary: The writers of this paper present a methodology for hardware and +software partitioning that is geared towards conserving energy and emphasizes bottleneck detection at a high level of +abstraction. This methodology also tries to give an early prediction of the effects of different architecture choices. They +start with a basic architecture which was proposed in the Pleiades project and the DSP algorithm (preferably in subset +of C/C++). They then determine the computational kernels within the given algorithm. With these kernels they +determine the cost (power, delay, area) of that function and use what they call a "macromodel." Given these macromodels +the designer can make algorithm changes and architecture decisions. The paper presents an example of a voice processor +which they applied this methodology to and the results from it.

A Low Power Hardware/Software Partitioning Approach for Core-based Embedded Systems. J. Henkel, DAC99. Summary: This paper concentrates on partitioning software and hardware in a manner that maximizes "utilization." The term "utilization" is used to describe the percentage of gate transitions that are necessary for a given calculation. Also, Dr. Henkel points out that the methodology introduced in this paper focuses on reducing power of the whole system and does not focus primarily on any particular sub-system while ignoring it's effects on other sub-systems. The paper contains several rather complex algorithms and ends with some results obtained from experimentation.

Energy-Conscious HW/SW Partitioning of Embedded Systems: A Case Study on an MPEG-2 Encoder. J. Henkel and Yanbin Li, CODES98. Summary: This paper (which pre-dates Dr. Henkel's paper "A Low Power Hardware/Software Partitioning Approach for Core-based Embedded Systems.) is a case study of the hardware/software partitioning for an MPEG-2 Encoder. They introduce a way to estimate and optimize power dissipation. The results are a substantial energy savings with little or no performance degradation.
Back to online readings table

5/1/01 Summaries

Title:Adaptive Address Bus Coding for Low Power Dee Sub-Micron Designs Summary:This paper presents new address bus coding methods that take coupling capacitance into consideration as well as base capacitance according to a physical bus model for power consumption. The authors assign the most active bit lines to those bit lines expected to possess the smallest capacitance. And then the value of a new measurement (ETAM) raised from the model decides whether to invert the bus or not. The result shows that the power/energy saving up to 56% is achieved compared to the Gray code encoding.

Title:Synthesis of Low-Overhead Interfaces for Power-Efficient Communication over Wide Bus. Summary:This paper tells us one algorithm of encoding to reduce the transition activity on system address buses when the statistic information of data is known. And the author gives not only two approximations of the exact algorithm but also an adaptive architecture without a-priori knowledge. Results have demonstrated that this approach has a better performance than low-power encoding schemes in the past.

Title:Address Bus Encoding Techniques for System-Level Power Optimization. Summary:Based on the advantages of several previous buses encoding schemes (Bus Invert, T0, etc), the authors combine these ideas in order to optimize the +system's power budget. For the architecture with the different kinds of address buses, +various combinations of schemes are developed. In the experiments of a real system, the effectiveness of such encoding has been showed.
Back to online readings table

5/3/01 Summaries

Title: Architectural Power Optimization by Bus Splitting Summary: By placing tri-state buffers strategically along the internal communications bus in SOC with multiple modules, power savings can be made by decreasing the effective load of the bus. As there where no existing benchmarks for testing SOC a numerical result was given. The problem was shown to be NP-hard and therefore partitioning of the problem into clusters is necessary for larger problem sets. Overall bus splitting has the potential to decrease power and increase performance by multi processes occurring concurrently, although the latter was not discussed in the paper.

Title: Communication Architecture Tuners A Methodology for the Design of High Performance Communication Architectures System-on-Chips Summary: An award winning paper that introduces the new idea of varying the communication protocol parameters to meet the changing demands of real time SOC through the additional control circuitry called a CAT (Communication Architecture Tuners). Several real world problems of this type where shown to improve the number of missed deadlines by varying factors and even reduce the number of missed deadlines to zero in the TCP/IP example.

Title: Bus-Invert Coding for Low-Power I/O Summary: A well referenced paper that covers the finite problem of minimizing the I/O bus switching activity for random data accesses. The use of a control line for inverting the current bus signal if the next signal is less than the hamming distance is purposed and shown to be the best possible solution for random data.
Back to online readings table

5/8/01 Summaries

Title: Selective Instruction Compression for Memory Energy Reduction in Embedded Systems. This paper proposes a Selective Instruction Compression method for memory energy reduction. The main idea is based on the fact that a given embedded program normally uses only a small subset of the instructions. Those instructions are picked out to fit in an IDT table close to processor, and only the compressed codes are saved in the memory. The decompression is performed on the fly between processor and memory. By then, no changes to the processor architecture are required. They provide four possible architectures for the decompression design. The simulation result shows all of them achieve significant power reduction, and some even improve the execution performance. This is new since previous work always assumes a performance degradation.

Title: A Power Reduction Technique with Object Code Merging for Application Specific Embedded Processors. Summary: The motivation for this paper is to exploit the frequently executed blocks in programs. By merging those sequences of object code into a set of single instructions, a significant energy reduction can be achieved. The merged object code is restored in a decompressor ROM, and decoded before running. The authors give an algorithm to find out the basic block and optimize the energy cost. They also show that further power optimization can be achieved by compress sequential blocks into one single instruction.

Title:Code Compression for Low Power Embedded System Design
Summary: In this paper the authors propose instruction code compression as a system-level power optimization method. Unlike the previous work, they optimize the power consumption of a complete SOC base. Sample instructions are divided into four groups, with different compression method respectively. They also discuss the bus compaction strategy and decompressor architecture. The comprehensive experiments suggest that energy/power saving can be achieved at the same or even higher performance.
Back to online readings table

5/10/01 Summaries

Title:Low Power Unified Cache Architecture Providing Power and Performance Flexibility - Summary:The focus of this paper is varying the cache subsytem to maximize the performance and power saving for a given application. Cache subsystem parameters explored consisted of write policy, way size, store buffers, and push buffers. The paper shows that depending on the program to be executed, your optimal parameters will differ. To test the effects of varying the cache subsystems Powerstone benchmark suite were used. This suite consists of embedded and portable applications. Simulations were run on the gate level as well as on silicon.

Title:A Portable fault-tolerant microprocessor based on the SPARC V8 architecture - Summary:This paper was basically an introduction to LEON. It explained the reasons why the developed LEON, the goals for LEON, and the design decisions made. The European Space Agency needed a processor with better performance and lower cost for future space applications. The processors on the market which met some of their metrics posed problems. These processors did not provided the flexibility desired, the availablity as a component, they had restriction on usage, etc.. For each of the design decisions made, portablility and flexibility was the main focus. The paper also discusses the error detection and fault tolerance methods which could be employed.
Back to online readings table

5/15/01 Summaries

Custon-Fit Processors: Letting Applications Define Architectrues This paper presents a system which automatically designs realistic VLIW architectures highly optimized for one givern application. The author defines a cost function which has 6 parameters of total number of ALU, MUL, register, parallel accesses to L2 memory, lactency and clusters. Speedups of different combination of these parameters are showed in terms of different benchmarks. It shows that large speedups can be achieved on color and image processing codes. Also the speedups for jammed benchmarks are studied. It is showed that the average speedup is greater for large patition of one algorithm willing to back off.

Customized Instruction-Sets for Embedded Processors. This paper discusses five major barriers that could hinder customization: existing binaries, toolchain development and maintenance costs, lost saving/higher chip cost due to the lower volumes of customized processors, added hardware development costs, and some factors related to the product development cycle for embedded products. Also, the petential solutions to each barrier are presented.

An ASIP Design Methodology for Embedded Systems This paper presents a unique architecture and methodology to design Application-Specific Instruction-Set Processors(ASIP) in the embedded controller domain by cutomizing an existing processor instruction set and architecture. The authors shows the desin flow, identification of new instructions, processor architecture, ASIP implementation and firmware modification. Two examples are used. By customized coding, the reduction of cycle-counts can be up to 75% and 71%, respectively.


Back to online readings table

5/17/01 Summaries

Title: Media Architecture: General Purpose vs. Multiple Application-Specific Programmable Processor--Summary: This paper presents a framework that uses the production-quality ILP compilers and simulation tools to synthesize a high performance machine for an application. This way makes it possible for a designer to explore the application-specific programmable processor design sapce under area constraints. The autor briefly introduces the machine model, benchmarks, tool and example set of results, approch in this work, the search problem and the search strategy and argorithm. Experimental results show for a given compiler technology and benchmarks it is not always the machines which have greater area have speed-up increase.

Title: Automatic architectural synthesis of VLIW and EPIC processors--Summary: This paper presents a synthesis system--PICO_VLIW which automatically design the architecture and micro-architecture of VLIW processors and their generalization--EPIC. The autor decomposes the process into 3 inter-related subsystems:Spacewalker, VLIW architecture synthesis and Elcor for designing an application-specific VLIW. The autor introduces the PICO_VLIW design flow in detail. He shows the various design steps involved in the design flow sequence as well as the dependence relationships among these steps. Experimental results show that the machines which have function units that are application-specific have better cost/performance ratio than general-purpose machines.

Back to online readings table

5/22/01 Summaries

Title:System and architecture-level power reduction of microprocessor-based communication and multi-media applications. Summary: The authors of this paper identify and describe some problems related to memory access. They called them data access bottlenecks. They identify the need for novel solutions to deal with memory access and data transfer problems. The novel solutions described in this paper include: * Processor architecture optimizations * Improved compiler technology * Exploring Interface between the system hardware and software. They describe in detail each one of the mentioned points.

Title: Xtensa: A configurable and extensible processor. Summary: Xtensa is a fully customizable processor core. Xtensa lets the system designer select and size only the features required for a given application. Customers use Xtensa's interface to describe a design and they will get a processor core and the tools to go with it.

Back to online readings table

5/29/01 Summaries

Title : Memory aware compilation through accurate timing extraction. Summary : Memory delay is a major bottleneck in embedded systems. Newer memory modules exhibit efficient accessing modes.This paper suggests a memory-aware compiler approach that exploits such efficient accessing modes by extracting accurate timing information. This would allow the compiler to perform more global optimizations on the input. Their test cases have shown a 24% improvement (on an average)over conventional methods.

Title : Global Multimedia System Design Exploration using Accurate Memory Organization Feedback Summary : This paper outlines an approach with which different memory organization design alternatives can be tested using the feedback given by the tools that they had developed earlier. The effectiveness of their approach is tested using an industrial application. Using this approach, they could explore a substantial part of the design search space in a short design time, resulting in a very cost-efficient solution that conforms to the design constraints.

Back to online readings table

5/31/01 Summaries

Title: Power: A First-Class Architectural Design Constraint. Summary: Power is a design constraint not only for portable computers and mobile communication devices but also for high-end systems. In this paper, based on the power model for CMOS logic, techniques and ideas to reduce power consumption at logic, architecture and operating systems levels are discussed in detail. To continue reducing power consumption, future challenges facing logic designers, architects and systems builders are discussed.

Back to online readings table

6/5/01 Summaries

Title: AMULET3: a 100 MIPS Asynchronous Embedded Processor. Summar: AMULET3 is an asynchronous (clockless) implementation of the 32-bit ARM processor core. AMULET3 shows that asynchronous technology is commercially viable, and it is competitive in terms of performance, area and power-efficiency, compared with clocked designs. In this paper, they discuss how AMULET3 is implemented commercially in the DRACO chip. The paper discusses asynchronous pipelining techniques, power management, performance, design flow, controller synthesis, high-level synthesis, timing verification, and a production test strategy.

Title: Power Management in the AMULET Microprocessors. Summary: Power management techniques for the Amulet microprocessor are discussed in this paper. Some conventional methods are used as well as methods in exploiting asynchronous designs. Much of the discussion focuses on the cache and branch predictor, but the most interesting aspect about this processor is that there is no activity unless useful work is carried out. Power is only consumed when needed. Although, Amulet processors are not the fastest, they are very good at doing nothing.

Back to online readings table



Back to Frank Vahid's home page


Other papers of interest: Influence of Compiler Optimizations on System Power M. Kandamir, N. Vijaykrishnan, M.J. Irwin and W. Ye, DAC'00. Designing Systems-on-a-Chip Using Cores. R. Bergamaschi, W. Lee, DAC'00. The Energy Efficiency Of Iram Architectures. Fromm, R.; Perissakis, S.; Cardwell, N.; Kozyrakis, C.; McGaughy, B.; Patterson, D.; Anderson, T.; Yelick, K. The Design of a Low Energy FPGA V. George, H. Zhang and J. Rabaey