CS203A Course Project : A Tomasulo Algorithm Simulation Sketch
Hand out date: Nov 4, 2004
Due date: Dec 14-15, 2004
In this project, you will need to implement the Tomasulo algorithm for an out-of-order
execution pipeline architecture. It is preferred
that you write this project in C or C++. You should find one or two partner(s) to form a group
of two or three. Please document your source code as you develop it. On the due day, you will
demonstrate your project to me on a linux machine such as eon.cs.ucr.edu.
You will implement the following 4 stages: IF, Issue, Execute, and
Writeback. The tasks performed in each stage were explained in class. You can refer to the
detailed actions taken for bookkeeping from the textbook (Figure 3.5 on page 193). Those
will help you greatly in coding the dynamic scheduler.
You will implement only the FP pipeline. We will consider the following FUs:
- FP adder which can perform both FP add and FP sub; pipelined
- FP multiplier which performs FP multiplication; pipelined
- FP divider which performs FP division; not pipelined
- load unit; pipelined
- store unit; pipelined
- 32 general-purpose integer registers and 32 general-purpose
floating-point registers; infinite number of ports for each register file.
The hardware configuration should be fully parameterizable, i.e., the number of reservation
stations for each FU, and the number of execution cycles for each FU should all be the inputs
to your simulator.
You will need to simulate a "memory" which could be as simple as a data array. I'll leave
the implementation details to you as long as you have your own way of initializing the
memory and printing it out finally. During the demo, I will let you initialize your own memory
at the addresses and values I defined. Your simulator should be able to read and write to your own
memory correctly.
- Instructions
Data Transfer Instructions
| ld Fa, offset(Ra) | load a single precision floating point value to Fa |
| sd Fa, offset(Ra) | store a single precision floating point value to memory |
Control Transfer Instructions
| beq Rs, Rt, offset | if Rs==Rt then branch to PC+offset |
| bne Rs, Rt, offset | if Rs<>Rt then branch to PC+offset |
ALU Instructions
| add Rd, Rs, Rt | Rd = Rs+Rt |
| add.d Fd, Fs, Ft | Fd = Fs+Ft |
| addi Rt, Rs, immediate | Rt=Rs+immediate |
| sub Rd, Rs, Rt | Rd = Rs-Rt |
| sub.d Fd, Fs, Ft | Fd = Fs-Ft |
| mult.d Fd, Fs, Ft | Fd = Fs*Ft, assuming that Fd is enough to hold the result |
| div.d Fd, Fs, Ft | Fd = Fs/Ft |
Assume that each integer instruction takes only one cycle to finish (just as what we
discussed in class) and integer instructions do not contend resources with FP instructions.
The input to your simulator is a text file containing assembly instructions specified in
the above format. Read the input file and proceed your simulation with reading the file
line by line as if you are fetching binary instructions from an instruction memory.
The ID stage is also simplified to parsing the assembly instruction as opposed to decoding the binary.
Other inputs are: FU execution cycles as well as their allocated reservation station numbers.
I will leave it to you on how to define the input format.
The output from your program should contain messages about
the final instruction status table including the cycle information
in each stage. It is possible that during the demo, I ask you to print out
the status of the reservation status table and register result status table
at any cycle.
The program execution results should be reflected in the integer register
file, floating point register file, and the memory. You should provide clear
output information of those.
- Tips:
- Write your own test code. Come up with the solutions by going through
the algorithm cycle by cycle. Before you start coding, you should first
understand fully how the algorithm works. Developing your own test cases helps a lot.
- Start from small sample code, one instruction at a time.
If a single instruction doesn't work, neither does a sequence of code.
- Vary the input parameters during debugging. For example, change the
number of reservation numbers and the FU execution latencies, and see how your
program reacts.
- Write a powerful print_stat() that can generate pretty on-screen
display of the reservation station status, instruction status, and the
register result status on every cycle. This can be of great help in debugging.
- Also design a clear output display for registers and memory contents.
This can help improve my impression on your code quality.
- Divide the work among the group members. When you can distribute the load
in a team efficiently, you can improve your time management and the quality of your project.
- Make plans ahead of time. Conquer the project step by step. For example, instruction parsing,
input interface, and output display can all be done first and separately. Leave at least one
week for final testing and debugging. Don't wait till last minute.