Lab 9: Power Estimation

Introduction

In embedded systems research there are several metrics we focus on: size, performance, and power. In systems such as mobile phones, PDAs, laptops, etc..., where we have a limited power supply. Thus, power consumption is of great concern. To see if our research efforts have any effect we must be able to measure the difference in power consumption between the original design and the modified one.

There are several different ways to obtain power consumption for a given design. There are a variety of tools such as Synopsys, Wattch, SimplePower, Spice, etc.. These tools range from obtaining power data based on instruction traces to obtaining power from the layout. As you can imagine there exist tradeoffs in power estimation tools as well. An instruction trace is easier and quicker to obtain then the full blown layout design. However power estimation using the layout is more accurate then instruction-level estimation. This is because at the layout level, you know exactly where the transistors are located, how many wires exist, and how long the wires are. However, on the other hand it takes more time and requires expensive tools to be able to synthesize your design to the layout level. It will also take much, much, much longer to simulate a design at the layout level.

PowerStone Benchmark Suite

How do you compare the modification you made with the modifications made by another research group? If each research group comes up with it's own set of experiments it is not possible to compare results using differing experiments. Instead there needs to exist a common set of tests available to be able to compare results between various research groups and to be able to test how a given modification will perform in a particular class of applications. PowerStone is a group of programs (also called benchmarks) which represent various embedded applications. By using this suite, various research groups can compare their results amongst each other fairly and more accurately.

Benchmark Description
bcnt Bit Manipulation
binary Binary Insertion
crc Cyclic Redundancy Check
matmul Matrix Multiplication
summin Handwriting Recognition

Obtaining Power Consumption of a i8051 VHDL Core

The 8051 is an 8-bit microprocessor originally designed in the 1980's by Intel that has gained great popularity since its introduction. Its standard form includes several standard on-chip peripherals, including timers, counters, and UART's, plus 4kbytes of on-chip program memory and 128 bytes (note: bytes, not Kbytes) of data memory, making single-chip implementations possible. Its hundreds of derivatives, manufactured by several different companies (like Philips) include even more on-chip peripherals, such as analog-digital converters, pulse-width modulators, I2C bus interfaces, etc. Costing only a few dollars per IC, the 8051 is estimated to be used in a large percentage (maybe 1/2?) all embedded system products.

The 8051 memory architecture includes 128 bytes of data memory that are accessible directly by its instructions. A 32-byte segment of this 128 byte memory block is bit addressable by a subset of the 8051 instructions, namely the bit-instructions. External memory of up to 64 Kbytes is accessable by a special "movx" instruction. Up to 4 Kbytes of program instructions can be stored in the internal memory of the 8051, or the 8051 can be configured to use up to 64 Kbytes of external program memory The majority of the 8051's instructions are executed within 12 clock cycles.

Suppose we wanted to see if we can modify this 8051 core to consume less power. This first thing we would want to do is to measure the power the original core consumes executing a given program. If we were using Synopsys to obtain gate level power estimation we would perform the following steps:

Behavioral Synthesis and Simulation

  1. Analyze each of the source files starting with the innermost entities.
  2. Get your C code test program ready to simulate (i.e. converting your C file to VHDL ROM model)
  3. We use the Synopsys VHDL Debugger to simulate the design, outputting a waveform to verify functionality is correct.





Gate-Level Synthesis and Simulation
  1. Now that we know this code works on the behavioral level, we synthesize the source code to gate level.
  2. Now we analyze the gate files starting with the intermost entities.
  3. We once again use the Synopsys VHDL Debugger to simulate the gate-level design, outputting a waveform to verify functionality is correct.
Generate a Toggle File
  1. The toggle information that we collect will inform us of when each signal and net in our design chagnes form 1 to 0 or form 0 to 1. Since most power is consumed when a net switches, or toggles, we can then use this information to estimate the power of our design.
  2. We create a file called run.scr which indicates how long the design needs to be simulated.
  3. We can now simulate the design using vhdlsim The output of the simulation is a file containing the toggle information for every net in the design.
  4. We convert the file using sim2dp command to obtain a file readable by dc_shell.
Power Analysis
  1. The actual power analysis will be performed using dc_shell.
  2. After starting dc_shell, we read in our synthesized design.
  3. If there is a clock in our design we must create a clock. The command works by creating a clock that will be used in power analysis. In the testbench of our design, we would have simulated a clock with a given frequency. Therefore, we will need to create a similar clock within dc_shell.
  4. We will now use the information gathered from simulation by including the toggle file previously generated.
  5. We are now ready to perform the actual power analysis. This is done with the "report_power -analysis_effort high" command.
  6. At the end of the output from this command, dc_shell will report the power used by our design.

We have gone through high-level simplified steps required to obtain power of a design using Synopsys (For more detailed steps on synthesis and simulation you can refer to the dalton page). For the i8051 code, it takes only a couple of minutes to synthesize and simulate a behavioral level design. However, to synthesize the i8051 source code to gate level can take up to an hour. To simulate a gate level design can take anywhere from an hour to a week (sometimes more) depending on the code we are trying to simulate. The power analysis can take equally as long.

What happens when we want to run 10 benchmarks using anywhere from one to a dozen different i8051 configurations? We would end up having to wait weeks even months trying to obtain power data! Is there any way that would could obtain power data at a fraction of the time? Remember, the lower we go down (from behavior to layout) the longer it takes to simulate the design. Therefore, if we could estimate the power consumption at a higher level we could obtain power data at a fraction of the time, instead of hours we need only seconds. The problem with estimating power at a higher level is that we loose accuracy.

Power Estimation Tool

In this lab you will write a high-level power estimation tool. Since we have limited resources and time, the estimation tool will determine how much power an ALU would consume as opposed to a microprocessor core such as the i8051. You will be given:

You job is to implement a high-level methodolgy for power consumption while trying to preserve accuracy. Then using your power estimation tool measure the power the ALU consumes for each of these benchmarks.

In previous labs, we have given you an introduction on how to use SimpleScalar, an 8051 simulator, and Synopsys. You can use these tools or any other you can find to assist you in developing you power estimation tool. Since these simulation tools are processor dependent you are allowed to choose if the ALU will be a component of an 8051 or MIPs processor.

Lab Questions

In your lab report, be sure to discuss the following:

  • Discuss the high level strategy behind your power estimation tool. Explain why your strategy works.
  • Discuss the implementation details. What tools did you use (i.e. SimpleScalar, Keil, 8051 simulator)? What code did you have to write? What does it do?
  • Is your estimation tool scalable to the processor level? System level? If so, how would you scale it?
  • Be sure to include the power measurements for each of the benchmarks.


CS122B, Winter 2002