Skip navigation.
Home

Safe and Effective Fine-grained TCP Retransmissions for Datacenter Communication

Vijay Vasudevan, Amar Phanishayee, Hiral Shah, Elie Krevat, David G. Andersen, Gregory R. Ganger, Garth A. Gibson, Brian Mueller

This paper presents a practical solution to a problem facing high-fan-in, high-bandwidth synchronized TCP workloads in datacenter Ethernets|the TCP incast problem. In these networks, receivers can experience a drastic reduction in application throughput when simultaneously requesting data from many servers using TCP. Inbound data overffills small switch buffers, leading to TCP timeouts lasting hundreds of milliseconds. For many datacenter workloads that have a barrier synchronization requirement (e.g., filesystem reads and parallel data-intensive queries), throughput is reduced by up to 90%. For latency-sensitive applications, TCP timeouts in the datacenter impose delays of hundreds of milliseconds in networks with round-trip-times in microseconds.
Our practical solution uses high-resolution timers to enable microsecond-granularity TCP timeouts. We demonstrate that this technique is effective in avoiding TCP incast collapse in simulation and in real-world experiments. We show that eliminating the minimum retransmission timeout bound is safe for all environments, including the wide-area.

AttachmentSize
p303.pdf990.48 KB

Comments on "TCP Retransmissions for Datacenter Communication"

This paper re-thinks the TCP protocol in the environment of datacenter. It proposed a novel solution to release the potential of datacenter communication by studying the TCP incast collapse. It can be seen that TCP incast collapse will drastically reduce application throughput when multiple senders communicate with a single receiver in high bandwidth, low delay datacenter environments using TCP. By using high-resolution timers to enable microsecond-granularity retransmission timeouts (RTO), the proposed solution is able to prevent TCP incast collapse in a real cluster, scale to 47 concurrent senders in its evaluation. The most important is that the authors demonstrate the safety and generality of this solution.

Strengths: This paper is perfect in its logic flow and the evaluations are not only in simulation but also in real environment. The idea of reducing the RTO to microsecond-granularity is elegant. The demonstration of the safety and effectivity of the solution is meaningful and convincing.

Comments: (i) Datacenter is the main application of this paper. I am very curious whether the method can be applied to other applications. In other words, is there any assumption to use the method? (ii) In the experimental part, authors didn’t give us the demonstration of how this solution scales with different block and buffer sizes. I'm looking forward to seeing the results.

Tian Song, Zhou Zhou
Beijing Institute of Technology