Matlab version of Pseudocode

Pseudocode for Algorithms 2 and 3 Implemented in Matlab

I used the following group of Matlab m-files as my prototype "proof of concept" implementation to make sure the pseudocode shown in Alg2 and Alg3 in the paper actually worked (without off-by-one errors or other simple
bugs). It's not pretty, and I'm simulating pointers by a structure array, but oh well....

There are two global variables used by the program:

cache - a structure array where each element represents the set of all local variables at one node of the decision tree/cache.
free - an integer representing the first unallocated entry in the cache.

Since Matlab arrays use index origin 1, cache(1) represents a "dummy" root node for the tree, where we have seen the first zero packets of a busy period, and its children will be other cache nodes representing different packet lengths for the packet of a busy period. Thus the mfile

restartCache.m

erases the previous cache (if any), initializes (only) the dummy root node, and assigns free=2 to indicate the first unused index in the cache array. (Note that I don't need to set an upper bound to the size of the cache array in Matlab, because an assignment to the i-th array element will grow the upper bound of the array to i, if necessary, to make sure the assignment does not cause an subscript-out-of-range error.) Every time you execute this mfile, it erases the old variables and recreates them from scratch. Therefore, you need to type the following n the command window:

global cache free

This makes these global variables visible to you when you type interactively into the command window. Otherwise, these global variables are accessible (only) inside the m-files (since each of them includes this global declaration in the beginning).

The next interesting file is

incrementalqi.m

The main loop of this script prompts the user to input a vector of packet lengths [x₁ x₂ ... x_n] that represents the next busy period. The script then goes to the root of the decision tree/cache and attempts to follow a path through the tree where the i-th step follows an edge from the "current node" to its "child" for packet length x_i. As long as the current node already has a child for the required packet length, the script does nothing except move forward by one step. However, if processes the packets one at a time, one at a time to build the cache. The user must also say if this the child does not already exist, it will be created now. At the end of the busy period, script outputs the average delay for packets in this period, using data stored at the current node. The script then offers to display the contents of cache, and then prompts the user to either exit the script or continue processing another busy period.

Note that this script always ADDS TO the existing cache. Before running it the first time you must execute restartCache. Similarly, when the script terminates the contents of the cache is NOT erased, so you can examine it in more detail between runs, and the next execution of the script will build on the existing cache, rather than starting over from scratch.

The body of this script includes calls to three other mfiles:

doesChildExist.m

which naively searches the array of children for the current node for a node whose packet size matches the size of the next packet. This is probably a great place to find some performance improvements since (a) I insert the children into the list in the order in which they appear in the trace file (unrelated to their respective packet sizes), and (b) I do a linear search through the list to look for a match. Since the maximum number of distinct packet sizes is finite (i.e., for Etherent they must be an integer number of bytes between 64 and 1500) then from a complexity theoretic point of view this function is still O(1) with respect to the length of the trace. Nevertheless, it is clearly horrible from a practical point of view and should be replaced by something much faster -- a heap or hash table -- which can do the search and update operations with a time complexity of at most log(#children of this node).

initializeChild.m

which initializes the basic set of local variables for a new node (such as the current packet size x_i, cumulative sum of packet sizes up to this point X_i, etc. It also calls the recursive function

evalGH2.m

which initializes the specific rows from the triangular cache tables for the functions g(.) and h(.) that are stored at the current node. What this function is attempting to do is add a new "top element" to the triangular table -- which corresponds to adding the element (5,5) to the table shown in Fig.1 from the paper. Because each cache row is stored at a different node in the decision tree, the new row only contains a single element (for which k==i) for each function g(.) and h(.). However, in order to generate that element, we must perform the convolution over the two elements in the previous row -- which is stored at the parent of the current node in the decision tree. We know that the g(.) and h(.) vectors at the parent node must already include every element we need except the last one (i.e., at the previous "step" we would have already generated cache element (4,4) if necessary, so element (4,5) is the only one that could be missing from the previous row). If the last element from the previous row is indeed missing, the function makes a recursive call to itself to create it -- which in turn may trigger a recursive call to create the element (3,5) directly below it, and so on.

Finally, the script

drawCache.m

dumps the contents of the tree in the form of an indented outline (i.e., sibling nodes are directly below one another and children are indented with respect to their parent). This script is currently called by the main script incrementalqi at the end of each busy period, but you can call it whenever you want. It has one parameter (the array index of the roots node for the subtree to be displayed). To avoid clutter, it only displays the scalar values stored at each node. There are also three vector fields in the node structure: the row vectors cache(current).g[ ] and cache(current).h[ ] that are part of the "triangular cache" required for each busy period, and cache(current).child[ ] which holds a list of the children of the current node.

To see those vector fields you need to exit the main script and just type in the fields you're interested in -- kind of like working with a debugger. Alternately, I just created the following function

showMyCache.m

which walks through the persistent cache structure and displays (only) the triangular cache tables for functions g(.) and h(.) for specific busy period, using its vector of packet sizes as an input parameter. Note that it assumes that the information for the specified busy period is already in the cache. Bad things will happen if you ask it to display the data for a never-before-seen packet-size vector!

Oh, I almost forgot. I had to write a tiny little helper function called

atZero.m

which has one input parameter (which would be an array index if the language supports index origin 0, like C) and returns its input +1 (which is the adjusted array index -- shifted up by one -- to fit a language that wants index origin 1, like Matlab). The reason for this hack is that I think index origin 0 is perfectly fine for the pseudocode, and makes everything look much cleaner than trying to use index origin one. By introducing the helper function, I don't need to worry about messing up the something in the conversion between the two sets of array bounds.

The reason that index origin 0 works so much better than index origin 1 is that I did some transformations of the subscripts for the triangular cache. Originally, the smallest legal column number in each cache row is the row number, so as we get deeper into the busy period the cache rows get shorter and shorter but move away from the origin. I can prevent this by subtracting the row number from the column number so every cache row has zero as its smallest element. That works great in C but not Matlab, and hence the little helper function.

A Numerical Example

Suppose our trace data includes the following four busy periods:

[1 2 3]
[1 2 4 3]
[2 2 4]
[1 2 4 5 6]

Here is a transcript from the Matlab command window. Data generated by the Matlab interpreter is shown in black, including the command prompt (">>"). User input is shown in red.

>> restartCache
>> global cache free
>> incrementalqi
Enter a vector of packet sizes for the next busy period [1 2 3]
Num Packets =, Average delay =
    3.0000    2.6000

Do you want to see the contenst of the cache? (1=yes, 0=no) 1
   1: x= 0, X=0, D=0, A=0, T=0
    2: x= 1, X=1, D=1, A=5.000000e-01, T=1
     3: x= 2, X=3, D=4, A=2.200000e+00, T=1.750000e+00
      4: x= 3, X=6, D=10, A=5.859375e+00, T=2.600000e+00
Do you want to terminate the program now? (1=yes, 0=no) 0
Enter a vector of packet sizes for the next busy period [1 2 4 3]
Num Packets =, Average delay =
    4.0000    3.6503

Do you want to see the contenst of the cache? (1=yes, 0=no) 0
Do you want to terminate the program now? (1=yes, 0=no) 0
Enter a vector of packet sizes for the next busy period [2 2 4]
Num Packets =, Average delay =
    3.0000    3.5556

Do you want to see the contenst of the cache? (1=yes, 0=no) 1
   1: x= 0, X=0, D=0, A=0, T=0
    2: x= 1, X=1, D=1, A=5.000000e-01, T=1
     3: x= 2, X=3, D=4, A=2.200000e+00, T=1.750000e+00
      4: x= 3, X=6, D=10, A=5.859375e+00, T=2.600000e+00
      4: x= 4, X=7, D=11, A=6.398734e+00, T=2.933333e+00
       5: x= 3, X=10, D=21, A=1.281803e+01, T=3.650316e+00
    2: x= 2, X=2, D=2, A=1, T=2
     3: x= 2, X=4, D=6, A=3.333333e+00, T=2.500000e+00
      4: x= 4, X=8, D=14, A=8.280000e+00, T=3.555556e+00
Do you want to terminate the program now? (1=yes, 0=no) 0
Enter a vector of packet sizes for the next busy period [1 2 4 5 6]
Num Packets =, Average delay =
    5.0000    5.3980

Do you want to see the contenst of the cache? (1=yes, 0=no) 1
   1: x= 0, X=0, D=0, A=0, T=0
    2: x= 1, X=1, D=1, A=5.000000e-01, T=1
     3: x= 2, X=3, D=4, A=2.200000e+00, T=1.750000e+00
      4: x= 3, X=6, D=10, A=5.859375e+00, T=2.600000e+00
      4: x= 4, X=7, D=11, A=6.398734e+00, T=2.933333e+00
       5: x= 3, X=10, D=21, A=1.281803e+01, T=3.650316e+00
       5: x= 5, X=12, D=23, A=1.400988e+01, T=4.150316e+00
        6: x= 6, X=18, D=41, A=2.592458e+01, T=5.398024e+00
    2: x= 2, X=2, D=2, A=1, T=2
     3: x= 2, X=4, D=6, A=3.333333e+00, T=2.500000e+00
      4: x= 4, X=8, D=14, A=8.280000e+00, T=3.555556e+00
Do you want to terminate the program now? (1=yes, 0=no) 1
Goodbye!

>> showMyCache([1 2 4 3])
Triangular table for function G
6:                                        74.88
5:                             13.17      35.38
3:                    2.5      3.167      2.708
2:           1        0.5     0.1667    0.04167
Triangular table for function H
6:                                        959.8
5:                             84.25      371.3
3:                    5.5      12.25      14.92
2:         0.5        0.5       0.25    0.08333
>> showMyCache([1 2 4])
Triangular table for function G
5:                             13.17
3:                    2.5      3.167
2:           1        0.5     0.1667
Triangular table for function H
5:                             84.25
3:                    5.5      12.25
2:         0.5        0.5       0.25
>> showMyCache([1 2 4 5 6])
Triangular table for function G
11:                                                    1013
10:                                        101.2      406.1
5:                             13.17      35.38      64.59
3:                    2.5      3.167      2.708      1.758
2:           1        0.5     0.1667    0.04167   0.008333
Triangular table for function H
11:                                               2.627e+04
10:                                         1418       8653
5:                             84.25      371.3      936.4
3:                    5.5      12.25      14.92      12.52
2:         0.5        0.5       0.25    0.08333    0.02083
>>