SSH file transfers hang after ~700KB

6,223

Solution 1

NOTE (TLDR): As salva said, the quick fix is to update to Net::SFTP::Foreign 1.70_08 or newer. There was a bug preventing the library from using the right default queue_size in Windows. Read on for my original solution.

I was able to eliminate stalled transfers by reducing the Perl Net::SFTP::Foreign module's queue_size parameter from the default 32 to 23. This is a network performance-related variable and I don't perceive a speed difference when it's reduced. For production, I'm going to dial it back to 16 for safety.

queue_size => $size

read and write requests are pipelined in order to maximize transfer throughput. This option allows to set the maximum number of requests that can be concurrently waiting for a server response.

(from Net::SFTP::Foreign documentation on CPAN)

The queue_size option applies to Net::SFTP::Foreign's new, get, put, rget, and rput methods.

If you're using Net::SFTP::Foreign, you can tack on the queue_size parameter at the end of your constructor or the individual method calls.

my $sftp = Net::SFTP::Foreign->new( $host, queue_size => 16 );
# -- OR --
$sftp->put( $local_file, $remote_file, queue_size => 16 );

I'm using the Net::SFTP::Foreign::Compat wrapper for a Net::SFTP-like API. The wrapper lets you set default parameters to use with the underlying Net::SFTP::Foreign method calls.

use Net::SFTP::Foreign::Compat;

# Note that the Compat CPAN documentation is INCORRECT
# The DEFAULTS hash is under Compat, not Foreign

$Net::SFTP::Foreign::Compat::DEFAULTS{ new } = [ queue_size => 16 ];
# -- OR --
$Net::SFTP::Foreign::Compat::DEFAULTS{ put } = [ queue_size => 16 ];

my $sftp = Net::SFTP::Foreign::Compat->new( $host, %options );

Related thought: The maximum working queue_size of 23 and the default block_size of 32KB correlated with the number of bytes sent before the transfer stalled. 23 * 32KB is a little over 700KB. I can't explain why I needed to limit the queue_size on this deployment, but it solved my problem.

Solution 2

This feels like an MTU issue to me. I've seen similar problems with scping files when there is a path MTU mismatch. TCP buffers on the host can fill up, which is why you send more than a few packets, but once they fill up your client then stalls - long after the MTU issue itself actually had an impact.

To verify this you could look at a tool like scamper. Or you could try manually clamping your MTU to some lower point - say 1280 bytes - and seeing if that fixes it. The chances are good that there's only a slight mismatch, eg because of a PPPoE link in the path that reduces the MTU to 1492.

Solution 3

perhaps you can play with the block_size parameter? I've experienced some weirdness related to file uploads over SSH/SSL and somehow adjusting the block / chunk size helped. I believe the default is 32 * 1024, so perhaps lowering it to 16, 8, 4 and trying might make some difference. Might be worth a shot.

Solution 4

The default queue size on Windows was set to 4, but this default was not being picked up due to a bug on the module.

It has been corrected (and BTW, the Windows default queue size changed to 16). You can now download Net::SFTP::Foreign 1.70_08.

Next time you find a problem like this, please, report it!

Share:
6,223

Related videos on Youtube

Secure Shel
Author by

Secure Shel

Updated on September 18, 2022

Comments

  • Secure Shel
    Secure Shel over 1 year

    OVERVIEW

    I'm scripting SFTP file transfers on a Windows 7 box in the field. When I transfer zip files larger than a megabyte or so, the transfer (and SSH) stops unexpectedly after about 700KB. The SSH process seems to hang but it doesn't start wildly consuming CPU or memory and Windows doesn't say it's stopped responding. The server eventually gives up and times out, but the client never returns (despite setting ServerAliveCountMax and ServerAliveInterval). The server logs show the transfer start followed by the timeout and disconnect after a long time.

    DETAILS

    Re-running the script back to back to back, the same number of bytes are transferred before the hang each time. Walking away and coming back, then running the script again, the byte count will vary by a few tens of kilobytes.

    The command-line sftp.exe client works properly with default options; those larger zip files transfer successfully. I can't test scp.exe because the SSH user account doesn't have shell access. I believe the problem persisted across an OS wipe and re-image from XP to 7 so I suspect a network problem, but I can't fathom what kind of problem would cause this behavior.

    I'm scripting with Perl and the Net::SFTP::Foreign module. Here are the relevant bits of code extracted from my test case script (SSH output log attached below):

     my %connectOpts = ( 
      user     => $username
      ,timeout => $timeout
      ,warn    => sub {}
      ,more    => [ "-o PreferredAuthentications=publickey",
                   "-o ServerAliveCountMax=6",
                   "-o ServerAliveInterval=" . int($timeout/6),
                   "-vvvv" ]
      );
    
    my $sftp = Net::SFTP::Foreign::Compat->new( $host, %connectOpts );
    
    foreach my $local_file ( @local_files ) {
      $sftp->do_stat( 'upload.tmp' ) && print( "Overwriting upload.tmp\n" );
    
      $sftp->put( $local_file, 'upload.tmp' );
    
      $sftp->do_rename( 'upload.tmp', basename( $local_file ));
    }
    

    I use this code and very similar configuration on dozens of other field PCs for the same purpose. This is the first one I've seen with this problem. I'm looking for a root cause or diagnostic advice, not a workaround outside this ecosystem. There are countless ways to script this, but I need the granularity, feedback, and modularity of the Net::SFTP::Foreign module for logging and error reporting. Perl is the only scripting language available.

    Thanks. This has been incredibly frustrating.

    Configuration:

    • Windows 7
    • ActivePerl 5.8.9
    • Net::SFTP::Foreign v1.69
    • OpenSSH 5.3p1 (Windows client from CopSSH distro)
    • OpenVPN 2.2-rc (Windows connection back to corporate network)
    • Tectia Server 6.2.1.168 (Linux SSH server)

    Client output (-vvvv) until I manually terminated ssh.exe after it went sideways:

    OpenSSH_5.3p1, OpenSSL 0.9.8k 25 Mar 2009
    debug1: Reading configuration data /etc/ssh_config
    debug2: ssh_connect: needpriv 0
    debug1: Connecting to sftp.server.com [xx.xx.xx.xx] port 22.
    debug1: Connection established.
    debug1: identity file /cygdrive/c/program files/copssh/.ssh/identity type -1
    debug3: Not a RSA1 key file /cygdrive/c/program files/copssh/.ssh/id_rsa.
    debug2: key_type_from_name: unknown key type '-----BEGIN'
    debug3: key_read: missing keytype
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug3: key_read: missing whitespace
    debug2: key_type_from_name: unknown key type '-----END'
    debug3: key_read: missing keytype
    debug1: identity file /cygdrive/c/program files/copssh/.ssh/id_rsa type 1
    debug1: identity file /cygdrive/c/program files/copssh/.ssh/id_dsa type -1
    debug1: Remote protocol version 2.0, remote software version 6.2.1.168 SSH Tectia Server
    debug1: no match: 6.2.1.168 SSH Tectia Server
    debug1: Enabling compatibility mode for protocol 2.0
    debug1: Local version string SSH-2.0-OpenSSH_5.3
    debug2: fd 3 setting O_NONBLOCK
    debug1: SSH2_MSG_KEXINIT sent
    debug3: Wrote 792 bytes for a total of 813
    debug1: SSH2_MSG_KEXINIT received
    debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-g
    roup-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
    debug2: kex_parse_kexinit: ssh-rsa,ssh-dss
    debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour12
    8,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rij
    [email protected]
    debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour12
    8,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rij
    [email protected]
    debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160
    ,[email protected],hmac-sha1-96,hmac-md5-96
    debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-ripemd160
    ,[email protected],hmac-sha1-96,hmac-md5-96
    debug2: kex_parse_kexinit: none,[email protected],zlib
    debug2: kex_parse_kexinit: none,[email protected],zlib
    debug2: kex_parse_kexinit:
    debug2: kex_parse_kexinit:
    debug2: kex_parse_kexinit: first_kex_follows 0
    debug2: kex_parse_kexinit: reserved 0
    debug2: kex_parse_kexinit: diffie-hellman-group1-sha1,diffie-hellman-group14-sha
    1,[email protected],diffie-hellman-group-exchange-sha1,diffi
    e-hellman-group-exchange-sha256
    debug2: kex_parse_kexinit: ssh-rsa,[email protected]
    debug2: kex_parse_kexinit: aes256-cbc,aes192-cbc,aes128-cbc,3des-cbc
    debug2: kex_parse_kexinit: aes256-cbc,aes192-cbc,aes128-cbc,3des-cbc
    debug2: kex_parse_kexinit: hmac-sha1
    debug2: kex_parse_kexinit: hmac-sha1
    debug2: kex_parse_kexinit: none,zlib
    debug2: kex_parse_kexinit: none,zlib
    debug2: kex_parse_kexinit:
    debug2: kex_parse_kexinit:
    debug2: kex_parse_kexinit: first_kex_follows 0
    debug2: kex_parse_kexinit: reserved 0
    debug2: mac_setup: found hmac-sha1
    debug1: kex: server->client aes128-cbc hmac-sha1 none
    debug2: mac_setup: found hmac-sha1
    debug1: kex: client->server aes128-cbc hmac-sha1 none
    debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<2048<8192) sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP
    debug3: Wrote 24 bytes for a total of 837
    debug2: dh_gen_key: priv key bits set: 160/320
    debug2: bits set: 1016/2048
    debug1: SSH2_MSG_KEX_DH_GEX_INIT sent
    debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
    debug3: Wrote 272 bytes for a total of 1109
    debug3: check_host_in_hostfile: filename /cygdrive/c/program files/copssh/.ssh/known_hosts
    debug3: check_host_in_hostfile: match line 2
    debug3: check_host_in_hostfile: filename /cygdrive/c/program files/copssh/.ssh/known_hosts
    debug3: check_host_in_hostfile: match line 1
    debug1: Host 'sftp.server.com' is known and matches the RSA host key.
    debug1: Found key in /cygdrive/c/program files/copssh/.ssh/known_hosts:2
    debug2: bits set: 995/2048
    debug1: ssh_rsa_verify: signature correct
    debug2: kex_derive_keys
    debug2: set_newkeys: mode 1
    debug1: SSH2_MSG_NEWKEYS sent
    debug1: expecting SSH2_MSG_NEWKEYS
    debug3: Wrote 16 bytes for a total of 1125
    debug2: set_newkeys: mode 0
    debug1: SSH2_MSG_NEWKEYS received
    debug1: SSH2_MSG_SERVICE_REQUEST sent
    debug3: Wrote 52 bytes for a total of 1177
    debug3: Received SSH2_MSG_IGNORE
    debug3: Received SSH2_MSG_IGNORE
    debug2: service_accept: ssh-userauth
    debug1: SSH2_MSG_SERVICE_ACCEPT received
    debug2: key: /cygdrive/c/program files/copssh/.ssh/identity (0x0)
    debug2: key: /cygdrive/c/program files/copssh/.ssh/id_rsa (0xd4b720)
    debug2: key: /cygdrive/c/program files/copssh/.ssh/id_dsa (0x0)
    debug3: Wrote 84 bytes for a total of 1261
    debug3: Received SSH2_MSG_IGNORE
    debug1: Authentications that can continue: publickey,keyboard-interactive
    debug3: start over, passed a different list publickey,keyboard-interactive
    debug3: preferred publickey
    debug3: authmethod_lookup publickey
    debug3: remaining preferred:
    debug3: authmethod_is_enabled publickey
    debug1: Next authentication method: publickey
    debug1: Trying private key: /cygdrive/c/program files/copssh/.ssh/identity
    debug3: no such identity: /cygdrive/c/program files/copssh/.ssh/identity
    debug1: Offering public key: /cygdrive/c/program files/copssh/.ssh/id_rsa
    debug3: send_pubkey_test
    debug2: we sent a publickey packet, wait for reply
    debug3: Wrote 372 bytes for a total of 1633
    debug3: Received SSH2_MSG_IGNORE
    debug1: Server accepts key: pkalg ssh-rsa blen 277
    debug2: input_userauth_pk_ok: fp (...)
    debug3: sign_and_send_pubkey
    debug1: read PEM private key done: type RSA
    debug3: Wrote 644 bytes for a total of 2277
    debug3: Received SSH2_MSG_IGNORE
    debug1: Authentication succeeded (publickey).
    debug2: fd 4 setting O_NONBLOCK
    debug2: fd 5 setting O_NONBLOCK
    debug1: channel 0: new [client-session]
    debug3: ssh_session2_open: channel_new: 0
    debug2: channel 0: send open
    debug1: Entering interactive session.
    debug3: Wrote 68 bytes for a total of 2345
    debug3: Received SSH2_MSG_IGNORE
    debug2: callback start
    debug2: client_session2_setup: id 0
    debug1: Sending subsystem: sftp
    debug2: channel 0: request subsystem confirm 1
    debug2: fd 3 setting TCP_NODELAY
    debug2: callback done
    debug2: channel 0: open confirm rwindow 65536 rmax 32768
    debug3: Wrote 68 bytes for a total of 2413
    debug3: Wrote 52 bytes for a total of 2465
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel_input_status_confirm: type 99 id 0
    debug2: subsystem request accepted on channel 0
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 68 bytes for a total of 2533
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 84 bytes for a total of 2617
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 68 bytes for a total of 2685
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 16436 bytes for a total of 19121
    debug3: Wrote 4148 bytes for a total of 23269
    debug3: Wrote 12372 bytes for a total of 35641
    debug3: Wrote 16436 bytes for a total of 52077
    debug3: Wrote 4148 bytes for a total of 56225
    debug3: Wrote 12196 bytes for a total of 68421
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel 0: rcvd adjust 32899
    debug3: Wrote 212 bytes for a total of 68633
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 16436 bytes for a total of 85069
    debug3: Wrote 4148 bytes for a total of 89217
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel 0: rcvd adjust 98337
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 12372 bytes for a total of 101589
    debug3: Wrote 16436 bytes for a total of 118025
    debug3: Wrote 4148 bytes for a total of 122173
    debug3: Wrote 12372 bytes for a total of 134545
    debug3: Wrote 16436 bytes for a total of 150981
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 155129
    debug3: Wrote 12372 bytes for a total of 167501
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel 0: rcvd adjust 196674
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 16436 bytes for a total of 183937
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 188085
    debug3: Wrote 12372 bytes for a total of 200457
    debug3: Wrote 16436 bytes for a total of 216893
    debug3: Wrote 4148 bytes for a total of 221041
    debug3: Wrote 12372 bytes for a total of 233413
    debug3: Wrote 16436 bytes for a total of 249849
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 253997
    debug3: Wrote 12372 bytes for a total of 266369
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 16436 bytes for a total of 282805
    debug3: Wrote 4148 bytes for a total of 286953
    debug3: Wrote 12372 bytes for a total of 299325
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel 0: rcvd adjust 131204
    debug3: Wrote 16436 bytes for a total of 315761
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 319909
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 12372 bytes for a total of 332281
    debug3: Wrote 16436 bytes for a total of 348717
    debug3: Wrote 4148 bytes for a total of 352865
    debug3: Wrote 12372 bytes for a total of 365237
    debug3: Wrote 16436 bytes for a total of 381673
    debug3: Wrote 4148 bytes for a total of 385821
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 12372 bytes for a total of 398193
    debug3: Wrote 16436 bytes for a total of 414629
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 418777
    debug3: Wrote 12372 bytes for a total of 431149
    debug3: Wrote 16436 bytes for a total of 447585
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel 0: rcvd adjust 393348
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 451733
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 12372 bytes for a total of 464105
    debug3: Wrote 16436 bytes for a total of 480541
    debug3: Wrote 4148 bytes for a total of 484689
    debug3: Wrote 12372 bytes for a total of 497061
    debug3: Wrote 16436 bytes for a total of 513497
    debug3: Wrote 4148 bytes for a total of 517645
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 12372 bytes for a total of 530017
    debug3: Wrote 16436 bytes for a total of 546453
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 550601
    debug3: Wrote 12372 bytes for a total of 562973
    debug3: Wrote 16436 bytes for a total of 579409
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 583557
    debug3: Wrote 12372 bytes for a total of 595929
    debug3: Wrote 16436 bytes for a total of 612365
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 616513
    debug3: Wrote 12372 bytes for a total of 628885
    debug3: Wrote 16436 bytes for a total of 645321
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 649469
    debug3: Wrote 12372 bytes for a total of 661841
    debug3: Wrote 16436 bytes for a total of 678277
    debug3: Wrote 4148 bytes for a total of 682425
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 12372 bytes for a total of 694797
    debug3: Wrote 16436 bytes for a total of 711233
    debug3: Received SSH2_MSG_IGNORE
    debug2: channel 0: rcvd adjust 738120
    debug3: Received SSH2_MSG_IGNORE
    debug3: Wrote 4148 bytes for a total of 715381
    debug3: Received SSH2_MSG_IGNORE
    

    UPDATE 1
    Per @Daniel Lawson's advice, I tried using netsh to clamp the VPN interface's MTU to 1280 to avoid filling the client's TCP buffers in case of an MTU mismatch along the path to the server. The transfer still stalled at about the same point. I also tried disconnecting the VPN and clamping the physical interface's MTU to 1280 with the same failed result.

    • Gregory MOUSSAT
      Gregory MOUSSAT about 12 years
      You don't say anything about running your transfert manually from the command line, with sftp and the desired options. Is it okay, or do you have the same problem ?
    • Secure Shel
      Secure Shel about 12 years
      Actually, I did note that "the command-line sftp.exe client works properly." I've just now added a bit more explanation; sftp.exe file transfers succeed with default options. I've also tried Net::SFTP::Foreign without the keepalive options and got the same failure as before.
  • Secure Shel
    Secure Shel about 12 years
    Great suggestion. I played with ping -f -l and it looks like the 1500 MTU holds up from client to server. I used the netsh shell to clamp the VPN network interface MTU to 1280, but experienced the same stall. I also tried running the transfer over the physical interface with the MTU clamped to 1280 and the VPN disconnected--no luck there, either.
  • Secure Shel
    Secure Shel about 12 years
    Thanks, this was really close. I varied the Net::SFTP::Foreign::put() block_size parameter (default 32KB), but choosing smaller blocks translated into less data sent before the transfer stalled. There's a related queue_size parameter that controls the "maximum number of requests that can be concurrently waiting for a server response". It starts at 32. By cutting that down to 23, I finally got transfers up to 5MB to succeed. They fail with queue_size 24, so I'll probably back off to 16 for safety. Still not sure why this installation needs this adjustment, but it's working.
  • Yoav Aner
    Yoav Aner about 12 years
    Great to hear. I should have mentioned the queue_size too, but wasn't sure what it did exactly and didn't want to overwhelm with too many options that might or might not work. This upload chunking tweaks are also a mystery to me. I think it's something rather deep within OpenSSL/OpenSSH, but as long as there are workarounds we can live with it.
  • Secure Shel
    Secure Shel about 12 years
    Hey, nifty. Thanks for the pointer! Since we use Net::SFTP::Foreign on so many boxes without issue (though often v0.61), I just assumed the remote network was at fault here. Never suspected a bug in the library.