Sunday, February 13, 2011

scp stalled during big files transfer


Ok today I had some annoying problem while transferring files through scp which sometimes stalled.


It could be caused by several reasons:- disallow icmp and mtu mismatch between networks
- Split routes
- firewall timeouts
- auto negotiation
======== anyway the result is always the same : Stalled "scp" session and time and file lost !


You probably noteced like I did that when "scp" huge files (> 4GB) between hosts, it stalls forever at random instants. It even happens with ftp/rsync. Two reasons may attribute to this problem:
1. Since scp grabs as much bandwidth of the network as possible when it transfers files, any delay caused by the network switch or the firewall can easily make the TCP connection stalled.
For this reason, the solution is to limit the bandwidth quota for scp as below:


username@localhost> scp -l 200 SOURCE DESTINATION # The option "-l 2000" limits the bandwidth up to 200 Kbit/s which is safe and fast enough for a background transfer.


2. It is due to the Linux SACK implementation problem for both 2.4 and 2.6 when the TCP window is > 20 MB. Linux takes such long time to locate the SACKed packet that a TCP timeout is easily reached and CWND goes back to the first packet when there are too many packets in flight and a SACK event is invoked.
For information about SACK:
http://www.ietf.org/rfc/rfc2018.txt
http://www.ietf.org/rfc/rfc1072.txt
It might be working to restrict the TCP buffer size to about 12 MB. However, the total throughput is limited. The better solution may be:


username@localhost> su # Enter the root password
append "net.ipv4.tcp_sack=0" to /etc/sysctl.conf
username@localhost> sysctl -p


Or


username@localhost> su # Enter the root password
username@localhost> cat 0 > /proc/sys/net/ipv4/tcp_sack


Or


username@localhost> su # Enter the root password
username@localhost> sysctl -w net.ipv4.tcp_sack=0


With this configuration, the SSH transfer of huge-sized file will stall occasionally with every short period of less than 1 second and then recover automatically. That means the simple cumulative acknowledgement scheme of TCP is strong enough.
FYI: There are many other suggestions through the internet as listed below (I did not tried all of them):
- Eliminating all the DROP rules for port 22 inside the iptables.


- Changing the MTU of NIC by:


username@localhost> ifconfig eth0 mtu xxx


- Increasing the queue for transmission by


username@localhost> ifconfig eth0 txqueuelen 2000


- Tuning TCP performance by


net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.netdev_max_backlog=2500
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_no_metrics_save=1
net.ipv4.tcp_timestamps=0


- Turning off the buggy TCP segmentation offload by


username@localhost> ethtool -K eth0 tso off


- Compressing the files being transfered by


username@localhost> scp -C


- Using pipe and std io to avoid possible "scp" huge file
limitation by


username@localhost> cat localfile | ssh ravana cat ">" remotefile


Or


username@localhost> tar cf - . | ssh ravana tar xvpf -


- Clamping MSS by


username@localhost> iptables -I FORWARD 1 -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu"

====for example you can use the following over 2M lease line link

scp -l 1500 VMware-server-1.0.5-80187.i386.rpm 1.2.3.4:/tmp


what ever you find usefull please report here in the comments for future reference

cheers Alex


http://lxphotostudio.mine.nu

4 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. I have tried all the setting given by you but still problem is same.
    Any more help ????

    ReplyDelete
  3. This is the simplest, clearest explanation of this topic I have found. Thanks!
    linux scp

    ReplyDelete