Improving WWW Traffic Characteristics

The WWW as the main internet application has a strong influence on quality characteristics of the underlying network infrastructure such as end-to-end latency, availably bandwidth, jitter, and failure rate. In case of IP networks, the resulting variance of these characteristics is high. As a consequence, it is complicated to adjust the network to the fast changing needs. This high degree of uncertainty can be reduced by enabling network providers to integrate caches in sensible network areas semi-automatically and demand-driven. By allowing caches to replicate actively a given set of documents, the variance in network effects can be further smoothed out.

Provided, a sufficiently high hit rate can be achieved, the WWW and the underlying network benefit from an extensive deployment of a cache and replication infrastructure in several ways: (1) an reduced network latency can be observed by the client resp. user, (2) the cummulative usage of the overall network capacity is optimized, and (3) the availability of cached documents is increased on certain paths in the network.

Caching in the context of the WWW is often accomplished by so-called proxy servers. In most cases, the hit rates achieved are very low compared to caching in other areas, e.g. instruction and data caches on a microprocessor. This is due to the fact that the sheer number of documents, servers, paths, and clients complicates the determination
of reference distributions and prohibits the usage of simple caching schemes. Cache management becomes even more complex because of different document sizes and the high penalty to pay in latency when checking for the consistency of cached data is required. By replicating frequently accessed documents, some of the problems with caches can be alleviated. First of all, the end-to-end latency when checking for staleness of cached data can be reduced immediately. In many situations, it is also possible to postpone the update of changed documents into time slots with relatively low network traffic (dependending on the geographic location of primary servers and replica servers).

The past research in our group focused on the development of different caching and replication schemes and on the prototypical implementation of enhanced http servers. A modified Apache server has also been developed in order to conduct further experiments. In the next future it is intended, to model the influence of an enhanced caching and replication infrastructure using spatial variants of BMAPs in order to quantify the improvements on quality characteristics. It is also intended to derive heuristics to improve caching strategies and to automate the deployment of replica servers in critical areas of the network.