Below is our best recollection of the questions on the systems
qualifier given January 7th, 1997.  Anyone who took this test is
encouraged to submit additions or corrections to this document.
   Places where we're uncertain of the exact wording are marked with "??"
   Editorial comments are in brackets [].

======================================================================
Architecture - questions 1-10 are 2 points. questions 11-14 are 20 points.
======================================================================

T/F - Just as a left shift instruction can replace an integer multiply by a
      power of 2, a right shift is the same as an integer division by a
      power of 2.
T/F - More powerful instructions mean higher performance.
T/F - If there is space in control store, new instructions are free of cost.
T/F - Instruction set design has little impact on pipelining.
T/F - Increasing the depth of pipelining always increases performance.
T/F - For lower miss rate, copyback reduces memory traffic, while for
      higher miss rate, write-through reduces memory traffic.
T/F - The advantage of set associativity in improving the miss rate gets
      smaller as the cache gets larger.
T/F - To take advantage of spatial locality, a cache must have a block size
      larger that one word.
T/F - An I/O interrupt is asynchronous with respect to the instruction
      excution.
T/F - Amdahl's Law doesn't apply to parallel computers.

11. - Suppose floating-point square root (FPSQRT) is responsible for 20%.
      Proposal is to add FPSQRT hardware that will speedup this operation
      by a factor of 10.  The other alternative is just to try to make
      all FP instructions run faster; FP instructions are responsible
      for a total of 50% of the execution time.  The design team believes
      that they can make all FP instructions run two times faster with the
      same effort as required for the faster square root.  Compare these
      two design alternatives.

12. - You have designed a memory system and must choose a CPU to use with
      it.  The memory system has a separate instruction cache and a
      writeback data cache with no store buffer.  Each cache is direct-
      mapped, holds 16Kbytes, and has 32-byte blocks.  The main memory
      has an access time of 100ns, and can transfer 16 bytes per access.
      Simulations for both CPUs have shown that the instruction miss rate
      is 2% dirty.  10% stores, and 25% ?? register set, which allows
      the compiler to eliminate 40 and 30?? the rest of the instructions.
      Both processors have a CPI (excluding memory stalls) of 1.
      Which CPU will run the workload faster using this memory system,
      and by how much? 

13. - Assume we have a pipeline for a load/store machine without delayed
      branches.  It has the following hazard types and average stall
      length (in cycles):

      Instruction   Hazard type   Av stall length  Inst mix
      Loads         data           0.5              22%
      Branches      control        2                 5%
      FP Mult       data           3                 5%
      FP Add/Sub    data           1                 7%
      FP Div        data          10                 3%

      Assume the CPI without pipeline stalls is 1.5.

 a) - How much faster is the ideal pipelined machine versus the machine
      with these stalls?
 b) - Assume the branches can be converted to delayed branches with the
      success in filling each of the 2 slots as follows:

      No slots filled 40%
      1 slot filled 40%
      2 slots filled 20%

 c) - Suppose the average stall length for FP Mult can be reduced to 1.5
      or the FP Div average stall can be reduced to 5.  How much performance
      will each scheme gain?  Assume delayed branches have not been added.

14.
 a) - What is the remote memory load problem in a multiprocessor?
 b) - What is the synchronizing load problem in a multiprocessor?
 c) - Devise a mechanism for these two issues for multiprocessors?

======================================================================
Compiler -  
10 T/F 3 points each; 6 short questions, 10 points each and 
1 question for 20 points.
======================================================================

T/F - Inharitance attribute is top-down? 
      True
T/F - If a block has several direct predecessors, one of them dominates
      this block.
      False
T/F 

Q - What is a loop invariant? How can you move it to the preheader of the loop?

Q - Shift/reduce conflict, what is the default YACC action to resolve it.

Q - Write the algorithm for parsing a string using the LALR table.

Q - How do you convert from an LR(1) to LALR table?

Q - Write the alg to find which block dominates another block. 

Q - Something about static link for languages such as Pascal and Algol.

Q (20 points; 10 minutes; 2 pages long)
  - we want to design a intelligence compiler which is able to insert prefetch
    instruction into intermediate code.  The purpose of the prefetch is to cause
    a cache miss.  It is like "LD ..., ...", which does everying as load except 
    writing to register.  
 a) How conservative will the compiler be?
 b) What kind of information of the system(implementation) the compiler
    need to know.
 c) Besides the information of data flow analysis, what the compiler should 
    know?
 d) Describe when and where the compiler should insert the prefetch.


======================================================================
OS
======================================================================

T/F - given 8 sites and 2 fail sites, can we reach an agreement?

Q - What scheduling alg gives the shortest preemptive average waiting time?

Q - Banker's algorithm, given MAX, Allocate, Request
  - what is the NEED? is it save?
  - Given a request, can you grant it?
  - What class of alg is Bankers?
  - What class of alg is linear ordering?

Q - what is "orphan" in RPC.  Give 2 reasons for orphans.  How can you 
    handle the situation.

Q - Write an alg for termination in a distributed computation.

Q - Explain why Unix and Sprite use a 30 sec delay for delay write to disk.

Q - What is priority inversion, describe the scenario of several CPU-bound
    and IO-bound process.

Q  given 4 processes with their burst time, what is the average waiting time?
  P1 arrives at time 0 and the other processes arrive every 2 sec after.
  The time of each process is 10, 4, 4, 6.

Q - Give two examples each of how to handle the critical section problem by
 a) hardware
 b) software
 c) distributed system? 
    Give an example from one of the above three. Make sure you declared
    the shared variables.