Pseudocode for Algorithms 2 and 3 Implemented in Matlab
I used the following group of Matlab m-files as my prototype "proof of
concept" implementation to make sure the pseudocode shown in Alg2 and
Alg3 in the paper actually worked (without off-by-one errors or other
simple
bugs). It's not pretty, and I'm simulating
pointers by a structure array, but oh well....
There are two global variables used by the program:
- cache - a structure
array where each element represents the set of all
local variables at one node of the decision tree/cache.
- free - an integer
representing the first unallocated entry in the cache.
Since Matlab arrays use index origin 1,
cache(1) represents a "dummy"
root node for the tree, where we have seen the first zero packets of a
busy period, and its children will be other cache nodes representing
different packet lengths for the packet of a busy period. Thus the mfile
restartCache.m
erases the previous
cache
(if any), initializes (only) the dummy
root node, and assigns
free=2
to indicate the first unused index in the cache array. (Note that I
don't need to set an upper bound to the size of the cache array in
Matlab, because an assignment to the i-th array element will grow the
upper bound of the array to i, if necessary, to make sure the
assignment does not cause an subscript-out-of-range error.) Every time
you execute this mfile, it erases the old
variables and recreates them from scratch. Therefore, you need to type
the following n the command window:
global cache free
This makes these global variables visible to you when you type
interactively into the
command window. Otherwise, these global variables are accessible (only)
inside the
m-files (since each of them includes this global declaration in the
beginning).
The next interesting file is
incrementalqi.m
The main loop of this script prompts the user to input a vector of
packet lengths [
x1 x2 ...
xn] that represents the
next busy period. The script then goes to the root of the decision
tree/cache and attempts to follow a path through the tree where the
i-th step follows an edge from the
"current node" to its "child" for packet length
xi. As long as the
current node already has a child for the required packet length, the
script does nothing except move forward by one step. However, if
processes the packets one at a time, one at a
time to build the cache. The user must also say if this the child does
not already exist, it will be created now. At the end of the busy
period, script outputs the average delay for packets in this period,
using data stored at the current node. The script then offers to
display the contents of cache, and then prompts the user to either exit
the script or continue processing another busy period.
Note that this script always ADDS TO the existing cache. Before running
it the first time you must execute
restartCache. Similarly, when
the script terminates the contents of the cache is NOT erased, so you
can examine it in more detail between runs, and the next execution of
the script will build on the existing cache, rather than starting over
from scratch.
The body of this script
includes calls to three other mfiles:
doesChildExist.m
which naively searches the array of children for the current node for a
node whose packet size matches the size of the next packet. This is
probably a great place to find some performance improvements since (a)
I insert the children into the list in the order in which they appear
in the trace file (unrelated to their respective packet sizes), and (b)
I do a linear search through the list to look for a match. Since the
maximum number of distinct packet sizes is finite (i.e., for Etherent
they must be an integer number of bytes between 64 and 1500) then from
a complexity theoretic point of view this function is still O(1) with
respect to the length of the trace. Nevertheless, it is clearly
horrible from a practical point of view and should be replaced by
something much faster -- a heap or hash table -- which can do the
search and update operations with a time complexity of at most
log(#children of this node).
initializeChild.m
which initializes the basic set of local variables for a new node (such
as the current packet size
xi, cumulative sum of packet sizes up
to this point
Xi,
etc. It also calls the recursive function
evalGH2.m
which initializes the specific rows from the triangular cache tables
for the functions g(.) and h(.) that are stored at the current node.
What this function is attempting to do is add a new "top element" to
the triangular table -- which corresponds to adding the element (5,5)
to the table shown in Fig.1 from the paper. Because each cache row is
stored at a different node in the decision tree, the new row only
contains a single element (for which
k==i)
for
each
function g(.) and h(.). However, in order to generate that
element, we must perform the convolution over the two elements in the
previous row -- which is stored at the parent of the current node in
the decision tree. We know that the g(.) and h(.) vectors at the parent
node must already include every element we need except the last one
(i.e., at the previous "step" we would have already generated cache
element (4,4) if necessary, so element (4,5) is the only one that could
be missing from the previous row). If the last element from the
previous row is indeed missing, the function makes a recursive call to
itself to create it -- which in turn may trigger a recursive call to
create the element (3,5) directly below it, and so on.
Finally, the script
drawCache.m
dumps the contents of the tree in the form of an indented outline
(i.e., sibling nodes are directly below one another and children are
indented with respect to their parent). This script is currently called
by the main script
incrementalqi
at the end of each busy period, but you can call it whenever you want.
It has one parameter (the array index of the roots node for the subtree
to be displayed). To avoid clutter, it only displays the scalar values
stored at each node. There are also three vector fields in the node
structure: the row vectors
cache(current).g[
] and
cache(current).h[ ]
that are part of the "triangular cache" required for each busy period,
and
cache(current).child[ ]
which holds a list of the children of the current node.
To see those
vector fields you need to exit the main script and just type in the
fields you're interested in -- kind of like working with a debugger.
Alternately, I just created the following function
showMyCache.m
which walks through the persistent cache structure and displays (only)
the triangular cache tables for functions g(.) and h(.) for specific
busy period, using its vector of packet sizes as an input parameter.
Note that it assumes that the information for the specified busy period
is already in the cache. Bad things will happen if you ask it to
display the data for a never-before-seen packet-size vector!
Oh, I almost forgot. I had to write a tiny little helper function called
atZero.m
which has one input parameter (which would be an array index if the
language supports
index origin 0, like C) and returns its input +1 (which is the adjusted
array index --
shifted up by one -- to fit a language that wants index origin 1, like
Matlab). The reason for this hack is that I think index origin 0 is
perfectly fine for the pseudocode, and makes everything look much
cleaner than trying to use index origin one. By introducing the helper
function, I don't need to worry about messing up the something in the
conversion between the two sets of array bounds.
The reason that index origin 0 works so much better than index origin 1
is that I did some transformations of the subscripts for the triangular
cache. Originally, the
smallest legal column number in each cache row is the row number, so as
we get deeper into the busy period the cache rows get shorter and
shorter but move away from the origin. I can prevent this by
subtracting the row number from the column number so every cache row
has zero as its smallest element. That works great in C but not Matlab,
and hence the little helper function.
A Numerical Example
Suppose our trace data includes the following four busy periods:
- [1 2 3]
- [1 2 4 3]
- [2 2 4]
- [1 2 4 5 6]
Here is a transcript from the Matlab command window. Data generated by
the Matlab interpreter is shown in black, including the command prompt
(">>"). User input is shown in
red.
>>
restartCache
>>
global cache free
>>
incrementalqi
Enter a vector of packet sizes for the next busy period
[1 2 3]
Num Packets =, Average delay =
3.0000 2.6000
Do you want to see the contenst of the cache? (1=yes, 0=no)
1
1: x= 0, X=0, D=0, A=0, T=0
2: x= 1, X=1, D=1, A=5.000000e-01, T=1
3: x= 2, X=3, D=4, A=2.200000e+00,
T=1.750000e+00
4: x= 3, X=6, D=10, A=5.859375e+00,
T=2.600000e+00
Do you want to terminate the program now? (1=yes, 0=no)
0
Enter a vector of packet sizes for the next busy period
[1 2 4 3]
Num Packets =, Average delay =
4.0000 3.6503
Do you want to see the contenst of the cache? (1=yes, 0=no)
0
Do you want to terminate the program now? (1=yes, 0=no)
0
Enter a vector of packet sizes for the next busy period
[2 2 4]
Num Packets =, Average delay =
3.0000 3.5556
Do you want to see the contenst of the cache? (1=yes, 0=no)
1
1: x= 0, X=0, D=0, A=0, T=0
2: x= 1, X=1, D=1, A=5.000000e-01, T=1
3: x= 2, X=3, D=4, A=2.200000e+00,
T=1.750000e+00
4: x= 3, X=6, D=10, A=5.859375e+00,
T=2.600000e+00
4: x= 4, X=7, D=11, A=6.398734e+00,
T=2.933333e+00
5: x= 3, X=10, D=21,
A=1.281803e+01, T=3.650316e+00
2: x= 2, X=2, D=2, A=1, T=2
3: x= 2, X=4, D=6, A=3.333333e+00,
T=2.500000e+00
4: x= 4, X=8, D=14, A=8.280000e+00,
T=3.555556e+00
Do you want to terminate the program now? (1=yes, 0=no)
0
Enter a vector of packet sizes for the next busy period
[1 2 4 5 6]
Num Packets =, Average delay =
5.0000 5.3980
Do you want to see the contenst of the cache? (1=yes, 0=no)
1
1: x= 0, X=0, D=0, A=0, T=0
2: x= 1, X=1, D=1, A=5.000000e-01, T=1
3: x= 2, X=3, D=4, A=2.200000e+00,
T=1.750000e+00
4: x= 3, X=6, D=10, A=5.859375e+00,
T=2.600000e+00
4: x= 4, X=7, D=11, A=6.398734e+00,
T=2.933333e+00
5: x= 3, X=10, D=21,
A=1.281803e+01, T=3.650316e+00
5: x= 5, X=12, D=23,
A=1.400988e+01, T=4.150316e+00
6: x= 6, X=18, D=41,
A=2.592458e+01, T=5.398024e+00
2: x= 2, X=2, D=2, A=1, T=2
3: x= 2, X=4, D=6, A=3.333333e+00,
T=2.500000e+00
4: x= 4, X=8, D=14, A=8.280000e+00,
T=3.555556e+00
Do you want to terminate the program now? (1=yes, 0=no)
1
Goodbye!
>>
showMyCache([1 2 4 3])
Triangular table for function G
6:
74.88
5:
13.17
35.38
3:
2.5
3.167
2.708
2:
1 0.5
0.1667 0.04167
Triangular table for function H
6:
959.8
5:
84.25
371.3
3:
5.5
12.25
14.92
2:
0.5
0.5 0.25 0.08333
>>
showMyCache([1 2 4])
Triangular table for function G
5:
13.17
3:
2.5
3.167
2:
1 0.5
0.1667
Triangular table for function H
5:
84.25
3:
5.5
12.25
2:
0.5
0.5 0.25
>>
showMyCache([1 2 4 5 6])
Triangular table for function G
11:
1013
10:
101.2
406.1
5:
13.17
35.38
64.59
3:
2.5
3.167
2.708
1.758
2:
1 0.5
0.1667 0.04167 0.008333
Triangular table for function H
11:
2.627e+04
10:
1418
8653
5:
84.25
371.3
936.4
3:
5.5
12.25
14.92
12.52
2:
0.5
0.5 0.25
0.08333 0.02083
>>