Posted 8/17/16
Traceroute is a utility for detecting the hops between your computer and a destination server. It is commonly used for diagnosing network problems, and in conjunction with ping
makes up the majority of ICMP traffic.
It is also a piece of black magic that exploits the darkest turns of networking to succeed.
At its core, traceroute is simple:
Send a packet to the destination with a time-to-live of 1
When the packet inevitably fails to deliver, the furthest computer it reached sends back a “time exceeded” packet
Now try again with a time-to-live of 2…
This continues until a packet finally does reach the destination, at which point the traceroute is finished. The time-to-live of the successful packet tells you how many hops away the destination server is, and the hosts the “time exceeded” packets come from gives you the address of each server between you and the host. The result is the familiar output:
% traceroute google.com
traceroute to google.com (216.58.194.174), 64 hops max, 52 byte packets
1 10.0.0.1 (10.0.0.1) 1.274 ms 0.944 ms 0.897 ms
2 96.120.89.65 (96.120.89.65) 19.004 ms 8.942 ms 8.451 ms
3 be-20005-sur04.rohnertpr.ca.sfba.comcast.net (162.151.31.169) 9.279 ms 9.269 ms 8.939 ms
4 hu-0-2-0-0-sur03.rohnertpr.ca.sfba.comcast.net (68.85.155.233) 9.546 ms 9.101 ms 9.935 ms
5 be-206-rar01.rohnertpr.ca.sfba.comcast.net (68.85.57.101) 9.197 ms 9.214 ms 9.443 ms
6 hu-0-18-0-0-ar01.santaclara.ca.sfba.comcast.net (68.85.154.57) 12.564 ms
hu-0-18-0-4-ar01.santaclara.ca.sfba.comcast.net (68.85.154.105) 11.646 ms
hu-0-18-0-1-ar01.santaclara.ca.sfba.comcast.net (68.85.154.61) 13.703 ms
7 be-33651-cr01.sunnyvale.ca.ibone.comcast.net (68.86.90.93) 12.517 ms 12.109 ms 14.443 ms
8 hu-0-14-0-0-pe02.529bryant.ca.ibone.comcast.net (68.86.89.234) 12.600 ms 12.057 ms
hu-0-14-0-1-pe02.529bryant.ca.ibone.comcast.net (68.86.89.230) 12.915 ms
9 66.208.228.70 (66.208.228.70) 12.188 ms 12.207 ms 12.031 ms
10 72.14.232.136 (72.14.232.136) 13.493 ms 13.934 ms 13.812 ms
11 64.233.175.249 (64.233.175.249) 12.241 ms 13.006 ms 12.694 ms
12 sfo07s13-in-f14.1e100.net (216.58.194.174) 12.841 ms 12.563 ms 12.410 ms
Well that doesn’t look so bad! It gets a little messier if there are multiple paths between you and the host, particularly if the paths are of different lengths. Most traceroute implementations deal with this by sending three different probes per TTL value, thus detecting all potential paths during the scan. You can see this under hops ‘6’ and ‘8’ in the above example.
The real dark side is in how to support multiple traceroutes at once.
When you send a TCP or UDP packet it includes four key pieces of information:
Source IP (so the other side can write back)
Source Port (so you can distinguish between multiple network connections, usually randomly chosen)
Destination IP (so your router knows where to send the packet)
Destination Port (so the server knows what service the packet is for)
TCP also includes a sequence number, allowing packets to be reorganized if they arrive in the wrong order. UDP opts for “drop any packets we don’t get in order.”
However, ICMP is a little different. It includes the source and destination addresses, but has no concept of a “port number” for sending or receiving.
Traceroute can send probes in TCP, UDP, or ICMP format, but it always receives responses as ICMP “TIME EXCEEDED” messages.
So if ICMP responses don’t include port numbers, how can your computer distinguish between responses meant for different traceroutes?
The trick is in a minor detail of the ICMP specification. For Time Exceeded messages the packet includes a type (11 for time exceeded), code (0 or 1 depending on the reason the time was exceeded), a checksum, and the first 8 bytes of the original packet.
UDP is just small enough that the first 8 bytes of the UDP header include the source port. Thus if we choose our source port for our probe carefully we can use this same number as an ID received in the ICMP response. This requires creating our own raw packets (as root) so we can select a source port, and then parsing the bytes of the ICMP response ourselves to extract the ID.
On FreeBSD the traceroute program is setuid root for this purpose, and it uses its own process-ID to select an unused source port for its probes. To quote the FreeBSD implementation source code:
Don’t use this as a coding example. I was trying to find a routing problem and this code sort-of popped out after 48 hours without sleep. I was amazed it ever compiled, much less ran.
And yet it does run, and has run this way for more than 20 years.
I implemented traceroute in Python. It was part of a larger project to detect critical hub systems across the Internet, which may be deserving of its own article once I have more conclusive data. The point is I needed to run a lot of traceroutes simultaneously, and doing it myself with multithreading gave me better access to data than trying to parse the output from the traceroute program over and over.