Traceroute is Black Magic

Posted 8/17/16

Traceroute is a utility for detecting the hops between your computer and a destination server. It is commonly used for diagnosing network problems, and in conjunction with ping makes up the majority of ICMP traffic.

It is also a piece of black magic that exploits the darkest turns of networking to succeed.

Traceroute Overview

At its core, traceroute is simple:

  • Send a packet to the destination with a time-to-live of 1

  • When the packet inevitably fails to deliver, the furthest computer it reached sends back a “time exceeded” packet

  • Now try again with a time-to-live of 2…

This continues until a packet finally does reach the destination, at which point the traceroute is finished. The time-to-live of the successful packet tells you how many hops away the destination server is, and the hosts the “time exceeded” packets come from gives you the address of each server between you and the host. The result is the familiar output:

% traceroute google.com
traceroute to google.com (216.58.194.174), 64 hops max, 52 byte packets
 1  10.0.0.1 (10.0.0.1)  1.274 ms  0.944 ms  0.897 ms
 2  96.120.89.65 (96.120.89.65)  19.004 ms  8.942 ms  8.451 ms
 3  be-20005-sur04.rohnertpr.ca.sfba.comcast.net (162.151.31.169)  9.279 ms  9.269 ms  8.939 ms
 4  hu-0-2-0-0-sur03.rohnertpr.ca.sfba.comcast.net (68.85.155.233)  9.546 ms  9.101 ms  9.935 ms
 5  be-206-rar01.rohnertpr.ca.sfba.comcast.net (68.85.57.101)  9.197 ms  9.214 ms  9.443 ms
 6  hu-0-18-0-0-ar01.santaclara.ca.sfba.comcast.net (68.85.154.57)  12.564 ms
    hu-0-18-0-4-ar01.santaclara.ca.sfba.comcast.net (68.85.154.105)  11.646 ms
    hu-0-18-0-1-ar01.santaclara.ca.sfba.comcast.net (68.85.154.61)  13.703 ms
 7  be-33651-cr01.sunnyvale.ca.ibone.comcast.net (68.86.90.93)  12.517 ms  12.109 ms  14.443 ms
 8  hu-0-14-0-0-pe02.529bryant.ca.ibone.comcast.net (68.86.89.234)  12.600 ms  12.057 ms
    hu-0-14-0-1-pe02.529bryant.ca.ibone.comcast.net (68.86.89.230)  12.915 ms
 9  66.208.228.70 (66.208.228.70)  12.188 ms  12.207 ms  12.031 ms
10  72.14.232.136 (72.14.232.136)  13.493 ms  13.934 ms  13.812 ms
11  64.233.175.249 (64.233.175.249)  12.241 ms  13.006 ms  12.694 ms
12  sfo07s13-in-f14.1e100.net (216.58.194.174)  12.841 ms  12.563 ms  12.410 ms

Well that doesn’t look so bad! It gets a little messier if there are multiple paths between you and the host, particularly if the paths are of different lengths. Most traceroute implementations deal with this by sending three different probes per TTL value, thus detecting all potential paths during the scan. You can see this under hops ‘6’ and ‘8’ in the above example.

The real dark side is in how to support multiple traceroutes at once.

How does the Internet work, anyway?

When you send a TCP or UDP packet it includes four key pieces of information:

  • Source IP (so the other side can write back)

  • Source Port (so you can distinguish between multiple network connections, usually randomly chosen)

  • Destination IP (so your router knows where to send the packet)

  • Destination Port (so the server knows what service the packet is for)

TCP also includes a sequence number, allowing packets to be reorganized if they arrive in the wrong order. UDP opts for “drop any packets we don’t get in order.”

However, ICMP is a little different. It includes the source and destination addresses, but has no concept of a “port number” for sending or receiving.

Traceroute can send probes in TCP, UDP, or ICMP format, but it always receives responses as ICMP “TIME EXCEEDED” messages.

Parallel Traceroute

So if ICMP responses don’t include port numbers, how can your computer distinguish between responses meant for different traceroutes?

The trick is in a minor detail of the ICMP specification. For Time Exceeded messages the packet includes a type (11 for time exceeded), code (0 or 1 depending on the reason the time was exceeded), a checksum, and the first 8 bytes of the original packet.

UDP is just small enough that the first 8 bytes of the UDP header include the source port. Thus if we choose our source port for our probe carefully we can use this same number as an ID received in the ICMP response. This requires creating our own raw packets (as root) so we can select a source port, and then parsing the bytes of the ICMP response ourselves to extract the ID.

On FreeBSD the traceroute program is setuid root for this purpose, and it uses its own process-ID to select an unused source port for its probes. To quote the FreeBSD implementation source code:

Don’t use this as a coding example. I was trying to find a routing problem and this code sort-of popped out after 48 hours without sleep. I was amazed it ever compiled, much less ran.

And yet it does run, and has run this way for more than 20 years.

Why do you know this? What’s this arcane knowledge good for?

I implemented traceroute in Python. It was part of a larger project to detect critical hub systems across the Internet, which may be deserving of its own article once I have more conclusive data. The point is I needed to run a lot of traceroutes simultaneously, and doing it myself with multithreading gave me better access to data than trying to parse the output from the traceroute program over and over.