Every Microsecond Counts: Tracking Fine-Grain Latencies with a Lossy Difference Aggregator
Many network applications have stringent end-to-end latency requirements, including VoIP and interactive video conferencing, automated trading, and high-performance computing—where even microsecond variations may be intolerable. The resulting fine-grain measurement demands cannot be met effectively by existing technologies, such as SNMP, NetFlow, or active probing. We propose instrumenting routers with a hash-based primitive that we call a Lossy Difference Aggregator (LDA) to measure latencies down to tens of microseconds and losses as infrequent as one in a million.
Such measurement can be viewed abstractly as what we refer to as a coordinated streaming problem, which is fundamentally harder than standard streaming problems due to the need to coordinate values between nodes. We describe a compact data structure that efficiently computes the average and standard deviation of latency and loss rate in a coordinated streaming environment. Our theoretical results translate to an efficient hardware implementation at 40 Gbps using less than 1% of a typical 65-nm 400-MHz networking ASIC.When compared to Poisson-spaced active probing with similar overheads, our LDA mechanism delivers orders of magnitude smaller relative error; active probing requires 50–60 times as much bandwidth to deliver similar levels of accuracy.
| Attachment | Size |
|---|---|
| p255.pdf | 671.51 KB |
Novel solution to a traditional problem
This paper presents a novel data structure to estimate the average one-way packet delay and standard deviation between two measurement points. The basic idea is to maintain two separate counters at the sender and the receiver that accumulate the packet timestamps seen in a measurement interval. At the end of each interval, the sender transmits its accumulator to the receiver, which can simply compute the average delay as the difference between both counters divided by the number of packets. This simple scheme however is not resilient to packet loss, since any single loss renders the counter at the receiver unusable. To address this problem the paper proposes to use a hash function to split the traffic in separate streams and maintain different counters for each, thus increasing the probability of finding usable counters that have not encountered any loss. Since this implies a significant increase in the number of counters to sustain even low packet loss rates, packet sampling is used to keep the number of counters reasonably low at the cost of reducing the number of samples. The sampling rate is computed according to the expected packet loss but, since this value is unknown a priori, the counters are divided in separate banks each provisioned for different loss rates. Finally, the standard deviation can be deduced from the same data structure using a known mathematical trick.
In general, the paper is very well written and motivated, and presents a novel solution to a traditional problem. The main advantage of the proposed solution is its low space requirements, which makes its implementation feasible within routers. It also significantly reduces the amount of control traffic, compared to previous passive-based techniques (that need to maintain individual timestamps) or solutions based on active probing. The technique relies on two fundamental assumptions. First, it requires clock synchronization between measurement points. Although it could raise some controversy, it is a common assumption among previous one-way delay measurement techniques. Second, packet ordering must be be preserved in order to enforce that the sender and receiver consider the same set of packets within measurement intervals. On the other hand, there are some details that deserve additional discussion, such as the assumptions behind the choice of p or the division of memory in banks of equal size. Another observation is that the sampling rate for increasing packet loss rates decreases rapidly, which can make this solution impractical for scenarios with high packet loss. Finally, even though the solution is largely evaluated using synthetic traffic, it would also be interesting to validate the results using real packet traces and analyze the potential impact of losses of control packets, which may be non-negligible at high loss rates.

Lightweight data structure for embedded fine-grained measurents
The article introduces a lightweight data structure that allows embedded
measurements of loss, delay and jitter between two measurement points. Those two
ends can be either the two ends of a link, or two stages in a router. The paper
clearly exposes the topic and motivations, and presents an original solution
whose performance is evaluated, and its feasibility is proven by an hardware
implementation.
The paper's contribution is to propose a passive technique and datastructure to
allow for precise measurements with negligible overhead on the network. It only
needs to send control packets to synchronize timestlots at both ends,
transmitting at the same time the collected statistics to perform the
computation.
Typical applications cited by the paper are high-performance computing and
automatic trading datacenters infrastructures where it is crucial to test the
hardware and monitor such performance indicators, which have very stringent
requirements. Interactive multimedia solutions could also benefit from the
proposition but their requirements are less prone to require such specialized
implementations. The counterpart of this is to require the vendors to implement
this solution into their hardware. Still, the authors argue that in such
specialized environments, it is usual to acquire third party appliances to
perform these tasks, and that the presented solution, which is easily
implementable, could provide a competitive advantage. Still, we have to keep in
mind that it can also help smaller companies to compete with manufacturers of
highly-specialized products.
The monitoring operates at the packet level, and aggregates the number of
received packets and their timestamp in different buckets of a data structure,
based on hash on the packet content. Each bucket stores packet counts and the
sum of timestamps, which allows for a direct computation of losses and delays.
Jitter can also be extracted from the data structure. A packet loss invalidates
a bucket, and the proposition relies on sampling to lower the number of unusable
buckets. When the loss rate is a priori unknown, several banks can be used, with
different sampling rates, and the packets can be hashed so that they are
accounted in one and only one bucket across every bank. The paper gives insights
on how to compute the sampling rate when a single bank in used, and evaluates 2,
3 and 4 banks before claiming that 2 should be sufficient for moderate loss
rates as encountered in the internet. A more detailed explanation of the choice
of the number of banks and the different sampling rates would be interesting.
The way the system computes the different statistics makes it mandatory to have
coordinated end points and FIFO scheduling. It also argues that several counters
can be embedded inside a router, to receive traffic coming from classifiers.
This raises the issue of coordination between the different couples of data
structures, and at the same time, open new possibilities of measurement, since
it suffices that the filtered streaming is FIFO.
To conclude, this proposition is very attrative for local measurements,
especially when compared to Poisson-spaced active probing. The same precision in
the measurements would require a high strain on the network. Though, in the case
of end-to-end measurements, all the segments constituing the path should be
instrumented.