

### **Digital Design**

### Chapter 5: Register-Transfer Level (RTL) Design

Slides to accompany the textbook *Digital Design*, First Edition, by Frank Vahid, John Wiley and Sons Publishers, 2007. http://www.ddvahid.com

### Copyright © 2007 Frank Vahid

Instructors of courses requiring Valid's Digital Design textbook (published by John Wiley and Sons) have permission to modify and use these slides for customary course-related activities, subject to keeping this copyright notice in place and unmodified. These slides may be posted as <u>unanimated pdf</u> versions on publicly-accessible course websites. PowerPoint source (or pdf with animations) may gain be posted to publicly-accessible websites, but may be posted for students on internal protected sites or distributed directly to students by other electronic means. Instructors may make printous of the slides available to students for a reasonable photocopying charge, without incurring royalites. Any other use requires explicit permission. Instructors may obtain PowerPoint source or obtain special use permissions from Wiley – see <a href="http://www.ddv.huld.com">http://www.ddv.huld.com</a> for information.



### RTL Design: Capture Behavior, Convert to Circuit

### Recall

- Chapter 2: Combinational Logic Design
  - First step: Capture behavior (using equation or truth table)
  - Remaining steps: Convert to circuit
- Chapter 3: Sequential Logic Design
  - First step: Capture behavior (using FSM)
  - Remaining steps: Convert to circuit
- RTL Design (the method for creating custom processors)
  - First step: Capture behavior (using highlevel state machine, to be introduced)
  - Remaining steps: Convert to circuit





3

5.2

## RTL Design Method

|             | Step                                  | Description                                                                                                                                                                                                                                                                            |
|-------------|---------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Step 1      | Capture a high-level<br>state machine | Describe the system's desired behavior as a high-level state machine. The state machine consists of states and transitions. The state machine is "high-level" because the transition conditions and the state actions are more than just Boolean operations on bit inputs and outputs. |
| Step 2      | Create a datapath                     | Create a datapath to carry out the data operations of the high-level state machine.                                                                                                                                                                                                    |
| Step 3 Step | Connect the datapath to a controller  | Connect the datapath to a controller block. Connect external Boolean inputs and outputs to the controller block.                                                                                                                                                                       |
| Step 4      | Derive the controller's FSM           | Convert the high-level state machine to a finite-state machine (FSM) for the controller, by replacing data operations with setting and reading of control signals to and from the datapath.                                                                                            |















### Step 1: Create a High-Level State Machine

- Let's consider each step of the RTL design process in more detail
- Step 1
  - Soda dispenser example
  - Not an FSM because:
    - Multi-bit (data) inputs a and s
    - Local register tot
    - Data operations *tot=0*, *tot<s*, *tot=tot+a*.
  - Useful high-level state machine:
    - Data types beyond just bits
    - · Local registers
    - · Arithmetic equations/expressions



Inputs: c (bit), a (8 bits) s (8 bits)
Outputs: d (bit)
Local registers tot (8 bits)



11

### Step 1 Example: Laser-Based Distance Measurer



- Example of how to create a high-level state machine to describe desired processor behavior
- Laser-based distance measurement pulse laser, measure time T to sense reflection
  - Laser light travels at speed of light, 3\*10<sup>8</sup> m/sec
  - Distance is thus  $D = T \sec^* 3*10^8$  m/sec / 2



## Step 1 Example: Laser-Based Distance Measurer



- Inputs/outputs
  - B: bit input, from button to begin measurement
  - L: bit output, activates laser
  - S: bit input, senses laser reflection
  - D: 16-bit output, displays computed distance



13

### Step 1 Example: Laser-Based Distance Measurer



from button B

Laserbased
distance
measurer

to laser

s from sensor

- Step 1: Create high-level state machine
- Begin by declaring inputs and outputs
- Create initial state, name it S0
  - Initialize laser to off (L=0)
  - Initialize displayed distance to 0 (D=0)



## Step 1 Example: Laser-Based Distance Measurer

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)





- Add another state, call S1, that waits for a button press
  - B' stay in **S1**, keep waiting
  - B go to a new state S2

Q: What should S2 do? A: Turn on the laser



15

to laser

S from sensor

Laserbased

distance

### Step 1 Example: Laser-Based Distance Measurer

from button  $\frac{B}{}$ 

Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits)



- Add a state **S2** that turns on the laser (L=1)
- Then turn off laser (L=0) in a state S3

Q: What do next? A: Start timer, wait to sense reflection



### Step 1 Example: Laser-Based Distance Measurer Inputs: B, S (1 bit each) Outputs: L (bit), D (16 bits) Local Registers: Dctr (16 bits) distance measuer S from sensor S' (no reflection) S (reflection) S0 L = 0Dctr = Dctr + 1 (count cycles) L = 0D = 0Dctr = 0L = 1 (reset cycle count) Stay in S3 until sense reflection (S) To measure time, count cycles for which we are in S3 - To count, declare local register Dctr - Increment Dctr each cycle in S3 Initialize Dctr to 0 in S1. S2 would have been O.K. too Digital Design Copyright © 2006 Frank Vahid



### Step 2: Create a Datapath

- Datapath must
  - Implement data storage
  - Implement data computations
- Look at high-level state machine, do three substeps
  - (a) Make data inputs/outputs be datapath inputs/outputs
  - (b) Instantiate declared registers into the datapath (also instantiate a register for each data output)
  - (c) Examine every state and transition, and instantiate datapath components and connections to implement any data computations

*Instantiate*: to introduce a new component into a design.



















- We'll use several more examples to illustrate RTL design
- Example: Bus interface
  - Master processor can read register from any peripheral
    - Each register has unique 4-bit address
    - · Assume 1 register/periph.
  - Sets rd=1, A=address
  - Appropriate peripheral places register data on 32-bit D lines
    - Periph's address provided on Faddr inputs (maybe from DIP switches, or another register)





5.3

### RTL Example: Bus Interface

Inputs: rd (bit); Q (32 bits); A, Faddr (4 bits)
Outputs: D (32 bits)
Local register: Q1 (32 bits)

WaitMyAddress

D = "Z"
Q1 = Q

(A = Faddr)
and rd

D = Q1

- Step 1: Create high-level state machine
  - State WaitMyAddress
    - Output "nothing" ("Z") on D, store peripheral's register value Q into local register Q1
    - Wait until this peripheral's address is seen (A=Faddr) and rd=1
  - State SendData
    - Output Q1 onto D, wait for rd=0 (meaning main processor is done reading the D lines)











# RTL Example: Video Compression – Sum of Absolute Differences



Each is a pixel, assume represented as 1 byte (actually, a color picture might have 3 bytes per pixel, for intensity of red, green, and blue components of pixel)

- Need to quickly determine whether two frames are similar enough to just send difference for second frame
  - Compare corresponding 16x16 "blocks"
    - Treat 16x16 block as 256-byte array
  - Compute the absolute value of the difference of each array item
  - Sum those differences if above a threshold, send complete frame for second frame; if below, can use difference method (using another technique, not described)



33

# RTL Example: Video Compression – Sum of Absolute Differences



- Want fast sum-of-absolute-differences (SAD) component
  - When go=1, sums the differences of element pairs in arrays A and B, outputs that sum













- Common pitfall: Assuming register is update in the state it's written
  - Final value of Q?
  - Final state?
  - Answers may surprise you
    - · Value of Q unknown
    - Final state is C, not D
  - Why?
    - State A: R=99 and Q=R happen simultaneously
    - State B: R not updated with R+1 until next clock cycle, simultaneously with state register being updated





39

### RTL Design Pitfalls and Good Practice

- Solutions
  - Read register in following state (Q=R)
  - Insert extra state so that conditions use updated value
  - Other solutions are possible, depends on the example



Digital Design Copyright © 2006 Frank Vahid



- Common pitfall: Reading outputs
  - Outputs can only be written
  - Solution: Introduce additional register, which can be written and read

Inputs: A, B (8 bits)
Outputs: P (8 bits)

Inputs: A, B (8 bits)
Outputs: P (8 bits)
Local register: R (8 bits)





41

### RTL Design Pitfalls and Good Practice

- Good practice: Register all data outputs
  - In fig (a), output P would show spurious values as addition computes
    - Furthermore, longest register-to-register path, which determines clock period, is not known until that output is connected to another component
  - In fig (b), spurious outputs reduced, and longest register-to-register path is clear







### Control vs. Data Dominated RTL Design

- Designs often categorized as control-dominated or datadominated
  - Control-dominated design Controller contains most of the complexity
  - Data-dominated design Datapath contains most of the complexity
  - General, descriptive terms no hard rule that separates the two types of designs
  - Laser-based distance measurer control dominated
  - Bus interface, SAD circuit mix of control and data
  - Now let's do a data dominated design



43

## Data Dominated RTL Design Example: FIR Filter

- Filter concept
  - Suppose X is data from a temperature sensor, and particular input sequence is 180, 180, 181, 240, 180, 181 (one per clock cycle)
  - That 240 is probably wrong!
    - Could be electrical noise
  - Filter should remove such noise in its output Y
  - Simple filter: Output average of last N values
    - Small N: less filtering
    - Large N: more filtering, but less sharp output





### Data Dominated RTL Design Example: FIR Filter

- FIR filter
  - "Finite Impulse Response"
  - Simply a configurable weighted sum of past input values
  - y(t) = c0\*x(t) + c1\*x(t-1) + c2\*x(t-2)
    - Above known as "3 tap"
    - Tens of taps more common
    - Very general filter User sets the constants (c0, c1, c2) to define specific filter
  - RTL design
    - Step 1: Create high-level state machine
      - But there really is none! Data dominated indeed.
    - · Go straight to step 2





y(t) = c0\*x(t) + c1\*x(t-1) + c2\*x(t-2)

45

## Data Dominated RTL Design Example: FIR Filter

- Step 2: Create datapath
  - Begin by creating chain of xt registers to hold past values of X



y(t) = c0\*x(t) + c1\*x(t-1) + c2\*x(t-2)

Suppose sequence is: 180, 181, 240





### Data Dominated RTL Design Example: FIR Filter • Step 2: Create datapath digital filter (cont.) clk-- Instantiate registers for c0, c1, c2 y(t) = c0\*x(t) + c1\*x(t-1) + c2\*x(t-2)- Instantiate multipliers to compute c\*x values 3-tap FIR filter c0 c1 c2 xt0 xt1 xt2

Digital Design Copyright © 2006 Frank Vahid



# Data Dominated RTL Design Example: FIR Filter Step 2: Create datapath (cont.) - Add circuitry to allow loading of particular c register $y(t) = c0^*x(t) + c1^*x(t-1) + c2^*x(t-2)$



## Data Dominated RTL Design Example: FIR Filter

 $y(t) = c0^*x(t) + c1^*x(t-1) + c2^*x(t-2)$ 

- Step 3 & 4: Connect to controller, Create FSM
  - No controller needed
  - Extreme data-dominated example
  - (Example of an extreme control-dominated design an FSM, with no datapath)
- Comparing the FIR circuit to a software implementation
  - Circui
    - Assume adder has 2-gate delay, multiplier has 20-gate delay
    - Longest past goes through one multiplier and two adders
       20 + 2 + 2 = 24-gate delay
    - 100-tap filter, following design on previous slide, would have about a 34-gate delay: 1 multiplier and 7 adders on longest path
  - Software
    - 100-tap filter: 100 multiplications, 100 additions. Say 2 instructions per multiplication, 2 per addition. Say 10-gate delay per instruction.
    - (100\*2 + 100\*2)\*10 = 4000 gate delays
  - Circuit is more than 100 times faster (10,000% faster). Wow.



## **Determining Clock Frequency**

- Designers of digital circuits often want fastest performance
  - Means want high clock frequency
- Frequency limited by longest register-to-register delay
  - Known as critical path
  - If clock is any faster, incorrect data may be stored into register
  - Longest path on right is 2 ns
    - Ignoring wire delays, and register setup and hold times, for simplicity





51

5.4

### Critical Path

- Example shows four paths
  - a to c through +: 2 ns
  - a to d through + and \*: 7 ns
  - b to d through + and \*: 7 ns
  - b to d through \*: 5 ns
- · Longest path is thus 7 ns
- Fastest frequency
  - 1/7 ns = 142 MHz





### Critical Path Considering Wire Delays

- · Real wires have delay too
  - Must include in critical path
- Example shows two paths
  - Each is 0.5 + 2 + 0.5 = 3 ns
- Trend
  - 1980s/1990s: Wire delays were tiny compared to logic delays
  - But wire delays not shrinking as fast as logic delays
    - Wire delays may even be greater than logic delays!
- Must also consider register setup and hold times, also add to path
- Then add some time to the computed path, just to be safe
  - e.g., if path is 3 ns, say 4 ns instead





53

# A Circuit May Have Numerous Paths

- · Paths can exist
  - In the datapath
  - In the controller
  - Between the controller and datapath
  - May be hundreds or thousands of paths
- Timing analysis tools that evaluate all possible paths automatically very helpful







# Behavioral-Level Design: Start with C (or Similar Language)

- Replace first step of RTL design method by two steps
  - Capture in C, then convert C to high-level state machine
  - How convert from C to high-level state machine?

Step 1A: Capture in C

Step 1B: Convert to high-level state machine

|              | Step                                              | Description                                                                                                                                                                                                                                                                            |
|--------------|---------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Step 1       | Capture a high-level<br>state machine             | Describe the system's desired behavior as a high-level state machine. The state machine consists of states and transitions. The state machine is "high-level" because the transition conditions and the state actions are more than just Boolean operations on bit inputs and outputs. |
| Step 2       | Create a datapath                                 | Create a datapath to carry out the data operations of the high-level state machine.                                                                                                                                                                                                    |
| Step 3Step 2 | Connect the datapath to a controller              | Connect the datapath to a controller block. Connect external Boolean inputs and outputs to the controller block.                                                                                                                                                                       |
| Step 4       | Derive the controller's FSM                       | Convert the high-level state machine to a finite-state machine (FSM for the controller, by replacing data operations with setting and reading of control signals to and from the datapath.                                                                                             |
|              | Digital Design<br>Copyright © 2006<br>Frank Vahid | 5                                                                                                                                                                                                                                                                                      |

### Converting from C to High-Level State Machine

- Convert each C construct to equivalent states and transitions
- · Assignment statement
  - Becomes one state with assignment
- *If-then* statement
  - Becomes state with condition check, transitioning to "then" statements if condition true, otherwise to ending state
    - "then" statements would also be converted to states





### Converting from C to High-Level State Machine

if (cond) {

// then stmts

- If-then-else
  - Becomes state with condition check, transitioning to "then" statements if condition true, or to "else" statements if condition false
- else {
  // else stmts
  }
  (end)
- While loop statement
  - Becomes state with condition check, transitioning to while loop's statements if true, then transitioning back to condition check

Digital Design Copyright © 2006 Frank Vahid



cond

(then stmts) (else stmts)





## **Memory Components**

- Register-transfer level design instantiates datapath components to create datapath, controlled by a controller
  - A few more components are often used outside the controller and datapath
- MxN memory
  - M words, N bits wide each
- Several varieties of memory, which we now introduce





5.6

### Random Access Memory (RAM)

- RAM Readable and writable memory
  - "Random access memory"
    - Strange name Created several decades ago to contrast with sequentially-accessed storage like tape drives
  - Logically same as register file Memory with address inputs, data inputs/outputs, and control
    - RAM usually just one port; register file usually two or more
  - RAM vs. register file
    - RAM typically larger than roughly 512 or 1024 words
    - RAM typically stores bits using a bit storage approach that is more efficient than a flip flop
    - RAM typically implemented on a chip in a square rather than rectangular shape – keeps longest wires (hence delay) short



16×32 register file

W\_data

W\_addr

R data











## **Comparing Memory Types**

- · Register file
  - Fastest
  - But biggest size
- SRAM
  - Fast
  - More compact than register file
- DRAM
  - Slowest
  - And refreshing takes time
  - But very compact
- Use register file for small items, SRAM for large items, and DRAM for huge items
  - Note: DRAM's big capacitor requires a special chip design process, so DRAM is often a separate chip



Size comparison for same number of bits (not to scale)

Digital Design Copyright © 2006 Frank Vahid

67

#### Reading and Writing a RAM clk addr 13 9 valid addr setup time 500 999 500 data valid 500 data hold time 1 means write setup time access RAM[9] RAM[13] time now equals 500 now equals 999 Writing (b) - Put address on addr lines, data on data lines, set rw=1, en=1 Reading - Set addr and en lines, but put nothing (Z) on data lines, set rw=0 - Data will appear on data lines Don't forget to obey setup and hold times In short - keep inputs stable before and after a clock edge Digital Design Copyright © 2006 Frank Vahid 68













# **ROM Types**

# • Fuse-Based Programmable ROM

- Each cell has a fuse
- A special device, known as a programmer, blows certain fuses (using higher-than-normal voltage)
  - Those cells will be read as 0s (involving some special electronics)
  - Cells with unblown fuses will be read as 1s
  - 2-bit word on right stores "10"
- Also known as One-Time Programmable (OTP) ROM





75

### **ROM Types**

### Erasable Programmable ROM (EPROM)

- Uses "floating-gate transistor" in each cell
- Special programmer device uses higherthan-normal voltage to cause electrons to tunnel into the gate
  - · Electrons become trapped in the gate
  - Only done for cells that should store 0
  - Other cells (without electrons trapped in gate) will be 1
    - 2-bit word on right stores "10"
  - Details beyond our scope just general idea is necessary here
- To erase, shine ultraviolet light onto chip
  - Gives trapped electrons energy to escape
  - Requires chip package to have window





### **ROM Types**

- Electronically-Erasable Programmable ROM (EEPROM)
  - Similar to EPROM
    - Uses floating-gate transistor, electronic programming to trap electrons in certain cells
  - But erasing done *electronically*, not using UV light
  - Erasing done one word at a time
- Flash memory
  - Like EEPROM, but all words (or large blocks of words) can be erased simultaneously
  - Become common relatively recently (late 1990s)
- Both types are in-system programmable
  - Can be programmed with new stored bits while in the system in which the ROM operates
    - Requires bi-directional data lines, and write control input
    - Also need busy output to indicate that erasing is in progress – erasing takes some time



Digital Design Copyright © 2006 Frank Vahid



















### Common Uses of a Queue

- Computer keyboard
  - Pushes pressed keys onto queue, meanwhile pops and sends to computer
- · Digital video recorder
  - Pushes captured frames, meanwhile pops frames, compresses them, and stores them
- Computer network routers
  - Pushes incoming packets onto queue, meanwhile pops packets, processes destination information, and forwards each packet out over appropriate port



87

### **Queue Usage Example**

- Example series of pushes and pops
  - Note how rear and front pointers move
  - Note that popping doesn't really remove the data from the queue, but that data is no longer accessible
  - Note how rear (and front) wraps around from address 7 to 0
- Note: pushing a full queue is an error
  - As is popping an empty queue















### **Chapter Summary**

- Modern digital design involves creating processor-level components
- Four-step RTL method can be used
  - 1. High-level state machine 2. Create datapath 3. Connect datapath to controller 4. Derive controller FSM
- Several example
  - · Control dominated, data dominated, and mix
- Determining fastest clock frequency
  - · By finding critical path
- Behavioral-level design C to gates
  - By using method to convert C (subset) to high-level state machine
- Additional RTL components
  - Memory: RAM, ROM
  - Queues
- Hierarchy: A key concept used throughout Chapters 2-5

