Network Visualization Example

Top Website Routes

_{Note: Drag to organize, mouseover to see node info (name and IP address)}

Legend:

Description:

This page demonstrates a D3.js force directed graph. The network presented is built by a series of IPv4 traceroutes launched from my home to 25 popular websites. It shows routes that start at my home node (a node is a distinct IP address) and traces through many nodes to each of the websites. There are several shared nodes in the graph. Where routes share nodes it either indicates a shared ISP, an ISP interconnect location or a commercial relationship between the sites (e.g. bing.com and live.com are both Microsoft, or Twitter.com and t.co are both Twitter). Other than entertainment, the potential use for this graph would be to learn who the big sites use for hosting their traffic.

Code Inspiration:

The D3 graph code was borrowed (and modified) from Mike Bostock's website http://bl.ocks.org/mbostock. The graph's legend was borrowed (and slightly modified) from this website http://zeroviscosity.com/d3-js-step-by-step/step-3-adding-a-legend

Data Source:

The data is from traceroutes run on the top 25 webites. The list of websites were pulled from Alexa.com http://www.alexa.com/topsites. To run traceroute in Python I heavily borrowed from Phllip Calvin's python-traceroute.py, http://gist.github.com/502451.

Data Manipulation:

Step 1: The top 25 websites were copied from Alexa into a csv file. (see topsites.csv)

Step 2: The CSV file was read by a Python script which ran a traceroute on each site and stored the results as json in a Couchdb database (see traceroute2.py).

Step 3: Another python script was used to pull the data from the database and then save it as a json file for further manipulation (see couch_to_json.py).
Step 4: The final script (topsite_to_network2.py) was used to format the output of the previous step to be consumable by the D3 network graph. Because traceroute isn't always complete, (some nodes do not respond to pings) and because some nodes are load balancers or sit within private subnets there is often missing, duplicate and localhost labelled entries that provide no real information. Therefore the script removed duplicates and some other cleaning operations to end up with informative data. The script then created the network.json file of node and link data. Groups were established based on the end points being traced. End points were given a larger radius than the intermediate points. The script also assigned different values for the links to make the links different sizes based on their distance (hops) from the root. I considered eliminating duplicate links but for this exercise chose to leave them in.