Below is our best recollection of the questions on the systems
qualifier given June 28th, 1996.  Anyone who took this test is
encouraged to submit additions or corrections to this document. 
   Places where we're uncertain of the exact wording are marked with "??"
   Editorial comments are in brackets [].

======================================================================
Architecture - 8 questions, variable scoring
======================================================================

Q1 - Using only AND gates, OR gates, and Inverters, show the circuit
diagram for a full 4-bit subtraction unit, with inputs A1-A4 and B1-B4
(A1 and B1 are MSB's, respectively) and outputs C1-C4 (C1 is MSB).  An
additional input, S, determines if the input quantities are signed
(S=1)  (2's complement)  or unsigned (S=0) quantities.
Show how the carry & overflow flags are set.

------------------------------
Q2 - The IEEE 732 ?? FP-format uses 32 bits to express floating-point
numbers.  MSB is a sign bit (0=positive, 1 = negative).  Exponent uses
next 8 bits, expressed as a biased signed nuber (so that all 0's leads
to mean zero.).  The last 23 bits represent the mantissa, sucxh that 1
<= m <= 2.  The leading 1 is assumed, so 24 bits are effectively
encoded.
Q - How many unique values can be encoded using this representation?
A - 2^32 - 1 (0 can be expressed in two ways)

Q - What is the smallest [positive??] integer that cannot be represented?
A - 2^24 + 1

Q - Write the bit-pattern for (-??)61.625 represented using this format...
     sign |--------mantissa--------|exponent
A -     0  11101 101000000000000000 000000101
          1     .<- use this to find exponent
           \leading 1 is understood

------------------------------
Q - What is the function of the Transition [sic] lookaside buffer
	(TLB)?  How does it help processor performance? 
A - To cache the most recent (or most frequent) address-translations
	performed by the Page table.  This improves overall
	memory-access time, since the TLB lookup is performed by very
	fast associative hardware. 

------------------------------
With Branch prediction hardware, 
Q - At what stage in the pipeline is the branch-outcome predicted?
Q - If the branch is not taken, how is the new PC found?
Q - If branch is taken, hown is the new PC found?

------------------------------
Q - What is the difference between Superscalar, superpipeline, VLIW?

------------------------------
In an architecture with a four-deep pipeline, there are certain
statements that cause stalls that cannot be resolved until the fourth
stage of the pipeline.  These dependencies occur, relative to the ith
statement, with the following frequencies:
	[i + 1]  22%
	[i + 2]   6%	
Q - If the clock-speed of the pipelined implementation is 3.5 times
	faster than that of a non-pipelined implementation, then what
	is the ideal speedup gained by pipelining?
A - I talked to Tyson about this one.  Apparently, we all put 3.5 * 4 = 14.
	This is wrong.  The non-pipelined version still executes one
	instruction per cycle, it's just that it's cycle is 3.5 times
	as long.  Thus, the effective speedup of pipelining over
	sequential execution is 3.5.  (Note that ideally, this might
	be as high as 4, since we're using four pipeline stages, but
	there is some overhead in pipelining that slows the pipeline
	clock down to 3.5 times as fast.) 

Q - What is the effective speedup considering the dependencies?
    What is the effective performance increase from pipelining?
A - effective CPI = 1 + .22(1) + .06(1) = 1.28
    speedup = 4/1.28 = 3.125

------------------------------
Q - In a 2^3 x 2^3 omega network, P4?? is connected to M6?? already.
	Now P1?? wants to establish connection with M5??  what should
	it do? 

Q - In a 2^3 x 2^3 delta network, a processor number is described as
	p2p1p0.  a memory-module as m2m1m0.  The switching-stages use
	control varialbes x0 (first switch, x1, and x2 (last switch).
	write the logic equations for x0, x1, and x2 in terms of p and
	m. 

-------------------------
You are consulting for UCR computing, INC, and you have to evaluate a
suggested architecture improvement.  The improvement implements
instruction prefetch, which improves the cache-hit rate from 72% to
88%.  However, the modification also causes a 20% increase in the
number of loads.  If the original architecture executes 25%
memory-access instructions, and cache-misses are responsible for 28%
of the execution-time, then evaluate the effective speedup of the
proposed modification.  [ I hope I got all the numbers right.]

======================================================================
Operating Systems - 3 points each for 10 T/F (negative for wrong answer)
		    6 pts each for 12 short-answer questions
======================================================================
T/F - With an infinite time slice, Shortest-Job-First (SJF) is the
	same as FIFO scheduling.	
	FALSE

T/F - It is possible to resolve the Byzantine agreement problem
	between three processors, one of which may be faulty.
	FALSE

T/F - In a system with multiple resource instances, a cycle in the
	wait-for graph implies a deadlock.
	FALSE

T/F - Consider a system in which all resources are ranked, although a
	single resource-type might have several instances which all
	share the same rank.  If every process requests its resource
	in order of increasing rank, then deadlock cannot occur.
	FALSE?  [I think] It is TRUE [see page 226]

T/F - [Something about two-phase commit communication where a reply is
	dropped.  If the link is broken after the client has set
	READY, the client will abort.]
	[this is very hazy, though... Does anyone remember the exact text?] 

-------------------
Q - What is meant by virtualization of a resource?

-------------------
Q - Explain the impact of fair-share scheduling on the overlapped use of
        the CPU & the I/O channel(s) to the disks in an environment
        containing a mix of CPU-bound and I/O-bound jobs.  Also, draw a
        qualitatively correct graph of system throughput as a function of
        degree of multiprogramming, assumning that the system has virtual
        memory.

-------------------
Q - NFS is a stateless server.  Describe the difference between
	stateless and state-oriented servers.   What is good about
	using a stateless server?   Explain this request behavior & why
        it is used.

A - Stateless means it has no memory of previous history.  It requires
	the clients' file operations have to be idempotent (meaning
	that they can re-sent without further changes, in case there
	is no response to an earlier request.)  Stateless servers are
	more robust than "statefull" servers, since it is relatively
	easy for a stateless server to recovery from failure.

-------------------
Q - What is Belady's anomaly?  How does it relate to the inclusion
	property and stack algorithms?
A - Belady's anomaly is a paging phenomenon.  If you increase the
	number of pages allocated to a process, then it IS POSSIBLE
	that the resulting page-fault rate is HIGHER than before.  Any
	replacement scheme that exhibits the inclusion property,
	(such as stack algorithms) is not susceptible to Belady's
	anomaly. 

-------------------
Q - What is an inverted page table?  Why are we likely to see inverted
	page tables being used more frequently in the future?
A - Blah blah.  Used more in future because large virtual memories
	(e.g. 64 bit address space) require too much memory to store
	the regular page table.  (64 bit, with 1K pages, would require
	a page table with 2^54 entries!)

------------------------------
Q - Write a monitor solution to the dining philosophers problem.  How
	does your solution avoid deadlock?

------------------------------
Q - Describe the elevator (SCAN) algorithm for disk-requests and
	explain why it is prefered over the nearest-request-first
	algorithm.

------------------------------
Q - Why does Unix have both a per-process open-file table and a
        system-wide open-file table?

------------------------------
Q - What is the purpose of the active-file table (active inode table
        in UNIX) in a file system?

------------------------------
Q - If a row of the ACCESS MATRIX model represents a capability list,
	what information is represented by a column?
A - access control lists

------------------------------
Q - Singhal mentions four classes of algorithms to perform distributed
	deadlock detection.  Name these four classes, and distinguish
	between them.
A - Centralized Control - others are distributed.
    Diffusion computation (OR model) vs. edge-chasing (AND Model), and
	path-pushing... 

------------------------------

======================================================================
Compilers - 10 T/F questions, 10 "short" answer.  All questions worth 5 pts.
======================================================================

T/F - If an LR(1) grammar has no shift/reduce conflicts, then the
	corresponding LALR grammar will not, either.
	TRUE

T/F - Register allocation for an arbitray sized section of
	straightline code is NP complete.
	FALSE??  Sundar says TRUE.  I am tempted to believe him.  :-)

T/F - If a basic block B has multiple direct predecessors, then one of
	these predecessor blocks dominates B.
	FALSE

T/F - When using the Gencode algorithm to generate code for a
	tree-structured graph, it is better to begin by generating
	code for the subtree requiring fewer registers first.
	FALSE.

T/F - In attribute synthesis, it is easier to do bottom-up parsing
	than top-down parsing.
	TRUE.

T/F - In available-code analysis, we definitely should include the
	possible kills [from pointers] into the kill set.
	TRUE.  [ see the notes.  All downward analyses enlarge the KILL
	set as much as possible, while upward analyses keep it small.
	Available-code analysis is a downward analysis, so we try to
	include everything possible into the kill set, so TRUE.]
        [upward analysis enlarge the GEN set as much as possible.]

T/F - In an interference graph, there is an edge between two nodes if
	one node is live while another is defined.
	TRUE.
	[I disagree.  I think the question said "... there is an edge
	between two nodes whenever one variable is live while the
	other is being defined.  I interpreted this to mean "If and
	only if...", which is FALSE.  It is easy enbough to construct
	an example of two concurrently live variables (whose nodes are
	connected bny an edge) such that neither variable is live at
	the points where the other is defined. (use separate
	predecessors to define the variables.] 

-----------------
Q - Define what it means for J to be a non-basic induction variable.
	How can we remove J from a loop and replace it with a basic
	induction variable when dealing with affine functions?

-----------------
Q - In a parser generated by YACC, what happens to the syntactic and
	semantic portion of the stack during a shift operation?

A - On a shift, the yylval gets pushed on the semantic stack and
        the current state is pushed on the syntatic stack.

-----------------
Q - In a parser generated by YACC, what happens to the syntactic and
	semantic portion of the stack during a reduce operation?

A - On a reduce, the translation record gets pushed on the semantic stack
        and the current state is pushed on the syntatic stack.

-----------------
Q - Show how you would implement monitors and conditions (in the Hoare
	sense) using semaphores.

-----------------
Q - Describe Chaitin's algorithm for local register allocation.

-----------------
Q - Write the algorithm for live variable analysis

-----------------
Q - Construct the DAG for the following code segment
	d = b * c
	a = a + b
	b = b * c      [the variables here are wrong]
	e = a - d

    If the compiler must execute this code on a machine that has only
	one register, in what order should the above code be executed?
	Why?

A - [cannot have 2 subtrees at the same level.]

-----------------
Q - In producing a parse table, YACC breaks shift/reduce conflicts on the
        basis of each rule's precedence.  What determines the default
        precedence of a grammar rule?

A - The precedence of the rightmost terminal in that rule (if that
        terminal has a precedence defined).  Otherwise, the precedence
        of any terminal in that rule.

-----------------
Q - We have a very nice, very user-friendly C-compiler available,
	which is self-compiling.  However, it is not at all
	optimizing.  We also have access to an extremely unfriendly
	optimizing C-compiler.  Is there anything we can do to get a
	fast, user-friendly compiler? 

A(Torsten) - Yes.  Compile the user-firendly compiler using the optimizing
	compiler [Note that the fact that the user-friendly compiler
	is self-compiling implies that it is written in C].  This
	creates a non-optimizing, user friendly compiler that performs
	compilations much faster than before.  (Note that the
	C-programs that it compiles will be identical, just that the 
	process of compiling them will be faster if the compiler,
	itself, was optimized.  
	[To address Jan's comments:  I am pretty certain that the
	question asked for a "fast, user-friendly" compiler, without 
	requiring that it should be optimizing.  Also, there was no
	mention of whether the fast compiler was self-compiling ... it
	could be written in a language other than C!  Given this
	analysis, it seems like we have two options: use the
	unfriendly fast compiler straight up, or to do what I suggested.]  

A(Jan) - [Torsten's answer is not the whole solution, since in his
	solution, the new compiler might run faster, but it still does
	not generate optimized code (it only consists of optimized code!)]
  Jan's answer was to 
  1) compile the unfriendly compiler using the slow one; this gives us the
     optimizing capability of the new compiler, but that compiler is still
     itself unoptimized

  2) now compile the new compiler using itself, enabling all optimization
     options; this gives us an unfriendly compiler, but one that is both
     optimized itself and able to generate optimized code.


-----------------
Q19 - Given the following grammar, generate the full LR(1) parse table
(there should be 10 states)

S -> RD | D ????
R -> sR | s
D -> cD | c

   For this particular grammar, would it be advantageous to construct
an LALR parser instead?  Why or why not?

A - In this grammar, since there are no LR(1) states that can be
	combined by ignoring the lookahead (in other words, no two
	states have the same core), there is no advantage to using
	LALR over LR(1), since the tables are of the same size.


-----------------
Q20 - Parse the string "ssscc"  [was that it??]  using the parse-table
	for the grammar from the above question.