Understanding Traceroute (Part 1 of 4)

Traceroute can be used to measure performance

If performance is used to mean packet loss and latency to an end host, traceroute cannot be used to accurately measure performance, especially if one also means the actual bandwidth available/throughput. The best – and simplest – way to do measure packet loss and latency reliably, by using a large enough sample, is by a recursive ping – on a Windows system, simply ping –t end.host (e.g.: ping –t slsdemo.sea2.superb.net), breaking the ping by Ctrl-C. It is best to let the ping run at least a 100 times, before stopping it to examine the results, which will be shown in a summary view as follows:

Ping statistics for 209.160.40.47:
Packets: Sent = 186, Received = 186, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 6ms, Maximum = 22ms, Average = 6ms

If this is what is desired, then, for this particular purpose, some traceroute-like programs, such as Ping Plotter, my personal favourite, will work well (as long as the end host is not denying all ICMP ping requests by a firewall, in which case it will simply show 100% packet loss). However, only the results given for the end host can be trusted (see more on that later below) – not those in the route inbetween, and then only if the end host has no firewalls or such in place to restrict its responses to ICMP ping requests.

When examining the results of a ping (using ping-t results, or results from a program such as Ping Plotter):

  • The most important item to consider is the packet loss – there should be none (not unless the end server is configured to treat ICMP ping requests at a very low priority and/or has a very low rate-limit set on how many maximum pings it will respond to in a certain timeframe, in which case there is no way using ping or traceroute or related tools, such as ping, to measure it’s performance. Then it should be measured at the application level that it is serving). While below 2% packet loss will usually go unnoticed for typical applications (such as a web server or mail server), it may cause some occasional lag for online real-time games. Anything above 5% should be a real concern; although still not necessarily noticeable for web, mail or other such servers, it is a sign of some real problems. It is important to note that seeing packet loss through such a ping does not necessarily mean that the problem is at the end server. This (doing a recursive ping) should be the first step, however, to see if there is a problem. If recursive pings shows no packet loss and latency is as expected, then all is fine – there is no need to do anything further (such as run a traceroute) for a performance measurement perspective (one can still do so just out of plain curiosity to see the route, of course). If it shows packet loss above acceptable norms or abnormally high latency, then traceroute and other tools should be used to try to find what is causing it (often it may be the end user’s ISP, e.g. slow, not clear signal, or overloaded dial-up, DSL, or cable modem network).
  • Latency is of a concern most frequently for online games, where [almost] every millisecond matters. However, for anything but online gaming or real-time database servers (that are typically on the same local network as the corresponding web servers anyhow – there it only matters how far away is the database server from the web server latency wise, not database server from the end user accessing the gateway web server), latency is not a big issue.

1000 milliseconds = 1 second. So, even a web server 500 ms away will only take half a second to return a request (as the latency shown in a traceroute is bi-directional: the time it takes for packets to travel from the originating host to the other end and back 1). Latency is not an indication of the speed of data transmission in any way. Traceroute or ping do not measure the bandwidth (throughput).

While packet loss is an indication of inconsistent connectivity that will generally considerably impact the data transmission speed, latency by itself does not represent the actual speed of throughput. One may very well have ten times faster download from a server that is 400ms away on a, for example, multiple GigE link non-congested network, that from a server that is only 40ms away, but is only on a T1, or otherwise on a congested network, or a slow LAN connection.

It is important to note that while latency has no direct impact on the speed, it does have an indirect impact – than further away the server is from the host trying to access it, than longer is the physical route, and than greater the chance that there is a slow (low capacity or congested) link in the way. So, while latency is not directly related to speed, it is more likely to have faster download speeds from servers that are closer by (lower latency), as there are less routers, less networks, and shorter circuits that the data has to travel through, and thus, statistically, a lower chance of hitting a congested or otherwise low capacity link on the way. (This is why CDNs are often used by major content providers to serve the content at the closest possible point to the end user: as that will result in a typically faster and more reliable, consistent experience.)

The natural question here may be, “how do I measure the throughput (maximum bandwidth available)?” There are some tools to do that, but few and not very reliable as a whole. Also, such tools should be used with caution, as the only sure way to measure that is to saturate the link to its maximum capacity, but if doing so, especially for a low bandwidth link (such as a T1), it can have drastic impact on the end host and easily slow down its performance while the measurement is done – as the saturation measurement essentially does the same thing as a high traffic volume DoS attack: try to saturate the link capacity, to find out what is the maximum available throughout (so such a measurement can be often taken as a DoS attack, and according steps taken by the end provider – which is why such are best avoided). This is why it is typically not recommended to measure the throughput, and it is not really possible for a typical end user, as the lowest throughput point (the bottleneck) will be likely their end DSL or cable modem connection. If one is determined to measure this, there are some tools, such as the “WAN Killer” in the SolarWinds suite of tools that will try do just this – saturate the WAN, or the various other tools that do this in a more tactful, not a brute force way, conveniently listed here.

Understanding Traceroute: A Four-part Series

Part 1: Traceroute can be used to measure performance

Part 2: Traceroute/tracert-like applications or commands will provide accurate results

Part 3: * * * s (apparent packet loss) somewhere in the route is always bad and a sign of problems

Part 4: A uni-directional traceroute (run only in one direction) can be used to accurately spot a problem area in a route

Also check our Free VPS offer

Be Sociable, Share!
Tags from the story
Written By
More from Haralds Jass

Understanding Traceroute – A Four-part Series

There are a number of popular misconceptions and misunderstandings about traceroutes. Traceroutes...
Read More