CS204 Fall 2003

Learning to Use Traffic Models

Due: 11:59pm Friday November 7, 2003

I. Summary

The original dataset used by Leland et al. to support their groundbreaking work on self-similarity in Ethernet traffic is publically available on the web through the Internet Traffic Archive. You will be applying some of the network traffic analysis methods we covered in lecture to a small subset of these traces to see first-hand what real data analysis looks like

II. Description of the BC-pAug89.TL Trace file

The measurement was done at the Bellcore Morristown Research and Engineering  facility, on the computing lab's Ethernet, which carried primarily local traffic,  but also all traffic between Bellcore and the Internet. These traces each contain a million packet arrivals seen on the Ethernet.

The traces are in 2-column ASCII format. The first column gives a floating-point  time stamp representing the time in seconds since the start of a trace. The second  column gives the Ethernet data length in bytes. Although the times are expressed to  6 places after the decimal point, giving the appearance of microsecond resolution,  the hardware clock had an actual resolution of 4 microseconds. The length field does  not include the Ethernet preamble, header, or CRC.

The trace BC-pAug89.TL began at 11:25 on August 29, 1989, and ran for about  3142.82 seconds (until 1,000,000 packets had been captured).  

One million packets is too much to deal with for a homework assignment, so I have created two smaller files, each with 50,000 sample points, covering two consecutive time intervals

III. Tasks

  1. Estimate the Network Calculus-style minimal arrival curve that fits the two segments of trace data. Recall that you will need to create a cumulative input function, R(t), then perform the sub-additive deconvolution of  R(t) with itself to find the minimal arrival curve. Since (de-)convolution is an O(n2) operation for a sequence of length n, you should carry out the calculation using various discrete step sizes (eg., look R(t) only when t is a multiple of 25 seconds, or 5 seconds, or 1 second, say, rather than all possible times, say). 
  2. Use at least three of the SELFIS estimators to calculate the Hurst parameter,H, for this data. Recall that the analysis of self-similarity and/or long-range dependence assumes stationarity in the data set, so this time you must work with a time series of incremental, rather than cumulative values. In other words, you need to work with the input during the jth fixed-length interval,  rj  =R(j·c) - R((j-1c), rather than the cumulative input up to time  j·c. This requires you to partition the trace into a sequence of "bins", each with a fixed length of c seconds. This requires some caution, especially when c is small (10 msec or 100 msec, say), because all the bits that belong to large packet do not arrive instantaneously!  Since the data rate for a 10 Mbps Ethernet is 107 bits/sec, the elapsed time between the start and end of transmission for a 1500-byte packet is (12,000 bits)/(107 bits/sec) = 1.2 msec. However, since there is only one timestamp given for each packet, you need to determine whether this represents the time at which its first or last bit was transmitted, and if the transmission interval crosses the boundary between "bins" how many of the bits should be credited to each of them. (HINT: Look in the trace files for closely-spaced packet arrivals and see if you can find places where one interpretation makes sense and the other is impossible. For example, does the start of transmission for the (n+1)st  packet seem to happen before the end of transmission for the nth packet?)

IV. What to turn in

  1. For the minimal arrival curve, give me a graph (PDF preferred) showing the cumulative input function, R(t), together with the minimal arrival curve that you computed with different discrete step sizes. Also, give me a copy of the program you used to generate the minimal arrival curve (C/C++, matlab, MS excel, etc) together with a README file on how to use it.
  2. For the Hurst parameter estimation, give me a README file that explains which estimators you used, the value(s) of H you found using each method, sample input/output files and instructions for generating the results using SELFIS. Be sure to test the estimators on the full 100,000 sample dataset along with several smaller datasets to see how stable the estimate of H is over time. You should also experiment with several choices for the "bin" size, c.