What does packet loss metric mean?
The loss of packets between sender and receiver over the Internet is generally undesirable, and often indicative of less than ideal connections. It is rarely zero, and it’s OK if it’s just a small fraction of a percent. Most Internet protocols are robust enough to not be sensitive to that small a loss.
High level protocols, like TCP-IP, are immune from loss, as they will re-transmit if needed. But there is a time penalty. Protocols that depend on pure UDP (non-transactional), are more sensitive to loss.
Many speed tests, like the DSLreports.com/speedtest, report the percentage of packet loss detected by the device running the test. This is assigned a grade they call ‘Quality’, but really, it’s just a Device view of how many packets are being dropped during the test.
Now, the way TCP works is that if the sender is sending too fast, the receiver, or elements between sender and receiver, will drop packets as a ‘signal’ that the pace is too high. To which the TCP stack of the sender will react by slowing down the send rate, and by sending less data in each transmission (via adjusting CWND, or congestion window size). This type of ‘loss’ is benign, as it’s purely a signaling mechanism. There is a more efficient and cleaner method of signaling, called ECN we’ll discuss further ahead.
Then there is the bad form of packet loss, and that is when there is an actual issue with the customer premises modem to ISP infrastructure link. If there are bad connections, bad wire, etc. there are errors, some of which might be recoverable, but many aren’t, and which result in true loss of packets. Fixing this usually requires an ISP truck-roll.
Furthermore, there is additional loss, drops actually, that happen within the ISP network and the broader Internet. If the ISPs peering links to some Tier1 provider are extremely congested at some point, there will be some amount of random packet drops at that point to try and reduce the load to something that will ‘fit’ into the total capacity. This congestion related loss can be thought of as lower-quality, because the pipes are not sized for the load. The only fix for this type of loss (and associated delay) is for the ISP to adjust their infrastructure, and/or to pay for more/bigger pipes to the rest of the Internet.
And finally, there is packet loss done on purpose by active queue managers, or traffic managers, anywhere in the big long chain of hops between you and the target service. As noted above, this loss is really ‘signaling’ for TCP stacks to slow-down the rate of send and to adjust their window sizes. So not really ‘loss’ per se, but it shows up on the tests as just that.
So while a traffic manager can improve bufferbloat metrics, it might have the side effect of increasing packet loss as it forces devices to go the speed limit.
The ‘fix’ for use of packet loss as a signaling mechanism is for client devices to respect a unique signal called ‘Explicit Congestion Notification’ or ECN. The traffic managers that use that can immediately inform TCP stacks that the line is being overrun, and to slow down, all without any packet loss. Modern OSs like the latest Windows 10, Linux, MacOS and iOS (>11) all have support for ECN enabled by default.
The IQrouter traffic manager uses ECN to signal clients, and therefore has no packet loss due to the ‘signaling’ mentioned above.
If your Speedtest is showing high packet loss when running an IQrouter, it could be a sign of the client device not supporting ECN, or actual issues in the ISP infrastructure.
In that later case, we recommend running a utility called Pingplotter (available for free here https://www.pingplotter.com/ ) and looking at the packet loss reports per-hop that this utility displays. We’ve found that most often, ping loss builds at the peering points that get overrun by demand at certain points of the day.
So in the end, Quality is not defined solely by packet loss, as there are legitimate reasons why it exists. But packet loss, if occurring in certain places, or in very high amounts, can be indicative of either a line problem, or an ISP capacity problem.