Below is our best recollection of the questions on the systems qualifier given June 28th, 1996. Anyone who took this test is encouraged to submit additions or corrections to this document. Places where we're uncertain of the exact wording are marked with "??" Editorial comments are in brackets []. ====================================================================== Architecture - 8 questions, variable scoring ====================================================================== Q1 - Using only AND gates, OR gates, and Inverters, show the circuit diagram for a full 4-bit subtraction unit, with inputs A1-A4 and B1-B4 (A1 and B1 are MSB's, respectively) and outputs C1-C4 (C1 is MSB). An additional input, S, determines if the input quantities are signed (S=1) (2's complement) or unsigned (S=0) quantities. Show how the carry & overflow flags are set. ------------------------------ Q2 - The IEEE 732 ?? FP-format uses 32 bits to express floating-point numbers. MSB is a sign bit (0=positive, 1 = negative). Exponent uses next 8 bits, expressed as a biased signed nuber (so that all 0's leads to mean zero.). The last 23 bits represent the mantissa, sucxh that 1 <= m <= 2. The leading 1 is assumed, so 24 bits are effectively encoded. Q - How many unique values can be encoded using this representation? A - 2^32 - 1 (0 can be expressed in two ways) Q - What is the smallest [positive??] integer that cannot be represented? A - 2^24 + 1 Q - Write the bit-pattern for (-??)61.625 represented using this format... sign |--------mantissa--------|exponent A - 0 11101 101000000000000000 000000101 1 .<- use this to find exponent \leading 1 is understood ------------------------------ Q - What is the function of the Transition [sic] lookaside buffer (TLB)? How does it help processor performance? A - To cache the most recent (or most frequent) address-translations performed by the Page table. This improves overall memory-access time, since the TLB lookup is performed by very fast associative hardware. ------------------------------ With Branch prediction hardware, Q - At what stage in the pipeline is the branch-outcome predicted? Q - If the branch is not taken, how is the new PC found? Q - If branch is taken, hown is the new PC found? ------------------------------ Q - What is the difference between Superscalar, superpipeline, VLIW? ------------------------------ In an architecture with a four-deep pipeline, there are certain statements that cause stalls that cannot be resolved until the fourth stage of the pipeline. These dependencies occur, relative to the ith statement, with the following frequencies: [i + 1] 22% [i + 2] 6% Q - If the clock-speed of the pipelined implementation is 3.5 times faster than that of a non-pipelined implementation, then what is the ideal speedup gained by pipelining? A - I talked to Tyson about this one. Apparently, we all put 3.5 * 4 = 14. This is wrong. The non-pipelined version still executes one instruction per cycle, it's just that it's cycle is 3.5 times as long. Thus, the effective speedup of pipelining over sequential execution is 3.5. (Note that ideally, this might be as high as 4, since we're using four pipeline stages, but there is some overhead in pipelining that slows the pipeline clock down to 3.5 times as fast.) Q - What is the effective speedup considering the dependencies? What is the effective performance increase from pipelining? A - effective CPI = 1 + .22(1) + .06(1) = 1.28 speedup = 4/1.28 = 3.125 ------------------------------ Q - In a 2^3 x 2^3 omega network, P4?? is connected to M6?? already. Now P1?? wants to establish connection with M5?? what should it do? Q - In a 2^3 x 2^3 delta network, a processor number is described as p2p1p0. a memory-module as m2m1m0. The switching-stages use control varialbes x0 (first switch, x1, and x2 (last switch). write the logic equations for x0, x1, and x2 in terms of p and m. ------------------------- You are consulting for UCR computing, INC, and you have to evaluate a suggested architecture improvement. The improvement implements instruction prefetch, which improves the cache-hit rate from 72% to 88%. However, the modification also causes a 20% increase in the number of loads. If the original architecture executes 25% memory-access instructions, and cache-misses are responsible for 28% of the execution-time, then evaluate the effective speedup of the proposed modification. [ I hope I got all the numbers right.] ====================================================================== Operating Systems - 3 points each for 10 T/F (negative for wrong answer) 6 pts each for 12 short-answer questions ====================================================================== T/F - With an infinite time slice, Shortest-Job-First (SJF) is the same as FIFO scheduling. FALSE T/F - It is possible to resolve the Byzantine agreement problem between three processors, one of which may be faulty. FALSE T/F - In a system with multiple resource instances, a cycle in the wait-for graph implies a deadlock. FALSE T/F - Consider a system in which all resources are ranked, although a single resource-type might have several instances which all share the same rank. If every process requests its resource in order of increasing rank, then deadlock cannot occur. FALSE? [I think] It is TRUE [see page 226] T/F - [Something about two-phase commit communication where a reply is dropped. If the link is broken after the client has set READY, the client will abort.] [this is very hazy, though... Does anyone remember the exact text?] ------------------- Q - What is meant by virtualization of a resource? ------------------- Q - Explain the impact of fair-share scheduling on the overlapped use of the CPU & the I/O channel(s) to the disks in an environment containing a mix of CPU-bound and I/O-bound jobs. Also, draw a qualitatively correct graph of system throughput as a function of degree of multiprogramming, assumning that the system has virtual memory. ------------------- Q - NFS is a stateless server. Describe the difference between stateless and state-oriented servers. What is good about using a stateless server? Explain this request behavior & why it is used. A - Stateless means it has no memory of previous history. It requires the clients' file operations have to be idempotent (meaning that they can re-sent without further changes, in case there is no response to an earlier request.) Stateless servers are more robust than "statefull" servers, since it is relatively easy for a stateless server to recovery from failure. ------------------- Q - What is Belady's anomaly? How does it relate to the inclusion property and stack algorithms? A - Belady's anomaly is a paging phenomenon. If you increase the number of pages allocated to a process, then it IS POSSIBLE that the resulting page-fault rate is HIGHER than before. Any replacement scheme that exhibits the inclusion property, (such as stack algorithms) is not susceptible to Belady's anomaly. ------------------- Q - What is an inverted page table? Why are we likely to see inverted page tables being used more frequently in the future? A - Blah blah. Used more in future because large virtual memories (e.g. 64 bit address space) require too much memory to store the regular page table. (64 bit, with 1K pages, would require a page table with 2^54 entries!) ------------------------------ Q - Write a monitor solution to the dining philosophers problem. How does your solution avoid deadlock? ------------------------------ Q - Describe the elevator (SCAN) algorithm for disk-requests and explain why it is prefered over the nearest-request-first algorithm. ------------------------------ Q - Why does Unix have both a per-process open-file table and a system-wide open-file table? ------------------------------ Q - What is the purpose of the active-file table (active inode table in UNIX) in a file system? ------------------------------ Q - If a row of the ACCESS MATRIX model represents a capability list, what information is represented by a column? A - access control lists ------------------------------ Q - Singhal mentions four classes of algorithms to perform distributed deadlock detection. Name these four classes, and distinguish between them. A - Centralized Control - others are distributed. Diffusion computation (OR model) vs. edge-chasing (AND Model), and path-pushing... ------------------------------ ====================================================================== Compilers - 10 T/F questions, 10 "short" answer. All questions worth 5 pts. ====================================================================== T/F - If an LR(1) grammar has no shift/reduce conflicts, then the corresponding LALR grammar will not, either. TRUE T/F - Register allocation for an arbitray sized section of straightline code is NP complete. FALSE?? Sundar says TRUE. I am tempted to believe him. :-) T/F - If a basic block B has multiple direct predecessors, then one of these predecessor blocks dominates B. FALSE T/F - When using the Gencode algorithm to generate code for a tree-structured graph, it is better to begin by generating code for the subtree requiring fewer registers first. FALSE. T/F - In attribute synthesis, it is easier to do bottom-up parsing than top-down parsing. TRUE. T/F - In available-code analysis, we definitely should include the possible kills [from pointers] into the kill set. TRUE. [ see the notes. All downward analyses enlarge the KILL set as much as possible, while upward analyses keep it small. Available-code analysis is a downward analysis, so we try to include everything possible into the kill set, so TRUE.] [upward analysis enlarge the GEN set as much as possible.] T/F - In an interference graph, there is an edge between two nodes if one node is live while another is defined. TRUE. [I disagree. I think the question said "... there is an edge between two nodes whenever one variable is live while the other is being defined. I interpreted this to mean "If and only if...", which is FALSE. It is easy enbough to construct an example of two concurrently live variables (whose nodes are connected bny an edge) such that neither variable is live at the points where the other is defined. (use separate predecessors to define the variables.] ----------------- Q - Define what it means for J to be a non-basic induction variable. How can we remove J from a loop and replace it with a basic induction variable when dealing with affine functions? ----------------- Q - In a parser generated by YACC, what happens to the syntactic and semantic portion of the stack during a shift operation? A - On a shift, the yylval gets pushed on the semantic stack and the current state is pushed on the syntatic stack. ----------------- Q - In a parser generated by YACC, what happens to the syntactic and semantic portion of the stack during a reduce operation? A - On a reduce, the translation record gets pushed on the semantic stack and the current state is pushed on the syntatic stack. ----------------- Q - Show how you would implement monitors and conditions (in the Hoare sense) using semaphores. ----------------- Q - Describe Chaitin's algorithm for local register allocation. ----------------- Q - Write the algorithm for live variable analysis ----------------- Q - Construct the DAG for the following code segment d = b * c a = a + b b = b * c [the variables here are wrong] e = a - d If the compiler must execute this code on a machine that has only one register, in what order should the above code be executed? Why? A - [cannot have 2 subtrees at the same level.] ----------------- Q - In producing a parse table, YACC breaks shift/reduce conflicts on the basis of each rule's precedence. What determines the default precedence of a grammar rule? A - The precedence of the rightmost terminal in that rule (if that terminal has a precedence defined). Otherwise, the precedence of any terminal in that rule. ----------------- Q - We have a very nice, very user-friendly C-compiler available, which is self-compiling. However, it is not at all optimizing. We also have access to an extremely unfriendly optimizing C-compiler. Is there anything we can do to get a fast, user-friendly compiler? A(Torsten) - Yes. Compile the user-firendly compiler using the optimizing compiler [Note that the fact that the user-friendly compiler is self-compiling implies that it is written in C]. This creates a non-optimizing, user friendly compiler that performs compilations much faster than before. (Note that the C-programs that it compiles will be identical, just that the process of compiling them will be faster if the compiler, itself, was optimized. [To address Jan's comments: I am pretty certain that the question asked for a "fast, user-friendly" compiler, without requiring that it should be optimizing. Also, there was no mention of whether the fast compiler was self-compiling ... it could be written in a language other than C! Given this analysis, it seems like we have two options: use the unfriendly fast compiler straight up, or to do what I suggested.] A(Jan) - [Torsten's answer is not the whole solution, since in his solution, the new compiler might run faster, but it still does not generate optimized code (it only consists of optimized code!)] Jan's answer was to 1) compile the unfriendly compiler using the slow one; this gives us the optimizing capability of the new compiler, but that compiler is still itself unoptimized 2) now compile the new compiler using itself, enabling all optimization options; this gives us an unfriendly compiler, but one that is both optimized itself and able to generate optimized code. ----------------- Q19 - Given the following grammar, generate the full LR(1) parse table (there should be 10 states) S -> RD | D ???? R -> sR | s D -> cD | c For this particular grammar, would it be advantageous to construct an LALR parser instead? Why or why not? A - In this grammar, since there are no LR(1) states that can be combined by ignoring the lookahead (in other words, no two states have the same core), there is no advantage to using LALR over LR(1), since the tables are of the same size. ----------------- Q20 - Parse the string "ssscc" [was that it??] using the parse-table for the grammar from the above question.