This project is among the first to study the Internet topology and its effect on traffic. The study of the topology of the Internet has been neglected by the community. The PIs have revived the interest in this area with their pioneer work [Faloutsos et al. 1999]. They showed that despite its apparent randomness, the Internet topology follows some surprisingly simple laws. Namely, topological properties can be described by power-laws. As a result, the topology can be characterized concisely using simple numbers: the exponents of these power-laws. Having such an accurate topological model will help us discover how the topology affects the Internet traffic. The benefit from this work will be that researchers will be able to: a) use simulation results to speculate the performance of the network, b) design efficient protocols, and c) troubleshoot anomalous behavior.

Motivation. Modeling the topology of the Internet is a very challenging task due to its size and its constant growth. The Internet is a large collection of heterogeneous components interconnected in an arbitrary way. There have been several measurement studies of its topology, but there does not exist a satisfactory model of the topology. The absence of this knowledge is one of the reasons "Why we don't know how to model the Internet" [Paxson and Floyd, 1997]. An important part of the problem is the inability to generate realistic graphs for simulation purposes. Furthermore, the interpretation of the results, measured or simulated, becomes more difficult when the topology is not considered.

The scope of this project is twofold. The first part will analyze the Internet topology and derive simple yet accurate models. This part will provide the models that should be used for fast and realistic simulations. In addition, it will provide an understanding of the evolution of the network and enable predictions. The second part will quantify the effect of the topology on the traffic and the end-to-end performance. The analysis of traffic data is a challenging data-mining task due to each volume and intricate interdependencies. The work will provide mechanisms and tools to to interpret the observed network behavior.

The initial work has laid the groundwork for a systematic understanding of the topology. However, there are still a lot of open questions that this project addresses.

Topology Analysis and Generation of Realistic Graphs:
This part of the work develops an Internet model and methods and tools to generate realistic graphs for simulation purposes. Despite the recent discoveries, there does not exist a complete comprehensive model for the Internet topology yet. We do not have a set of properties that could act as a necessary and sufficient condition of the realism of a model. We need to develop a comprehensive model and then devise methods to generate realistic graphs. The evolution of the topology can also provides significant intuition on the nature of the network and it can help estimate and predict future of the Internet. Note that our study uses the extensive measured topological data collected by NLANR, which is the data repository with the longest time span available.

The Effect of Topology on Traffic:
This project studies the effect of topology on traffic aspects such as congestion and end-to-end delay. There is very limited work on traffic analysis in conjunction with topological properties. The work has two components: a) we need to study the traffic, and b) identify the effect of topology. In the first part, we need to collect the appropriate data and develop models for its behavior. In this part, the work will be leveraged from the work of the other research participants. In the second part, we need to identify which of the traffic properties are the result of topology. For both parts, the size and the type of the data requires novel techniques of low complexity. One of the co-PIs is an expert in data-mining and brings a large array of novel techniques. He has developed a novel technique to approximate the density of the network (neighborhood size), which is computationally very expensive.