wget is much faster than scp or rsync


I needed to copy 3TB of data from my old homeserver to my new one. I decided to spend as much time "sharpening my axe" as possible. I spent ages dicking around with ZFS configs, tweaking BIOS settings, flashing firmware, and all the other yak-shaving necessary for convincing yourself you're doing useful work.

Then I started testing large file transfers. Both scp and rsync started well - transferring files at around 112MBps. That pretty much saturated my Gigabit link. Nice! This was going to take no time at all...

And then, after a few GB of a single large file had transferred, the speed slowed to a crawl. Eventually dropping to about 16MBps where it stayed for the majority of the transfer.

I spent ages futzing around with the various options. Disabling encryption, disabling compression, flicking obscure switches. I tried using an SSD as a ZIL. I rebuilt my ZFS pool as a MDADM RAID. I mounted disks individually. Nothing seemed to work. It seemed that something was filling a buffer somewhere when I used scp or rsync.

So I tried a speed test. Using curl I could easily hit my ISP's limit of 70MBps (about 560Mbps).

It was getting late and I wanted to start the backup before going to bed. Time for radical action!

On the sending server, I opened a new tmux and ran:

cd /my/data/dir/
python3 -m http.server 1234

That starts a webserver which lists all the files and folders in that directory.

On the receiving server, I opened a new tmux and ran:

wget --mirror http://server.ip.address:1234/

That downloads all the files that it sees, follows all the directories and subdirectories, and recreates them on the server.

After running overnight, the total transfer speed reported by wget was about 68MBps. Not exactly saturating my link - but better than the puny throughput I experienced earlier.

Downsides

There are a few (minor) problems with this approach.

  • No encryption. As this was a LAN transfer, I didn't really care.
  • No preservation of Linux attributes. I didn't mind losing metadata.
  • No directory timestamps. Although file timestamps are preserved.
  • An index.html file is stored every directory. I couldn't find an option to turn that off.
    • They can be deleted with find . -newermt '2023-04-01' \! -newermt '2023-04-03' -name 'index.html' -delete
  • It all just feels a bit icky. But, hey, if it's stupid and it works; it isn't stupid.

I'm sure someone in the comments will tell me exactly which obscure setting I needed to turn on to make scp work at the same speed as wget. But this was a quick way to transfer a bunch of large files with the minimum of fuss.


Share this post on…

8 thoughts on “wget is much faster than scp or rsync”

  1. I ran into this as well and ended up using croc to transfer the files. It maintained a relatively high speed, but there was a significant loading time when starting (parsing the files, perhaps?).

    Reply
  2. @Edent For this kind of bulk LAN copy I tend to use tar piped to netcat. Something likenc -l 9999 | tar -x -f-on the receiver, andtar -c -f- <dir> | nc <host> 9999on the sender.Can chuck a gzip in the pipeline if you're sending something compressible. It can keep all the file attributes, links, etc with the right tar options.

    Reply
    1. says:

      Yes, I've done something similar recently also using netcat.

      Bonus points: You only need netcat at one of the computers. On the other computer, you can just use bash special filename /dev/tcp/host/port in a file redirection. So:

      tar c sourcedir | nc -l -N -p 1234
      tar x < /dev/tcp/10.1.2.3/1234
      
      Reply
  3. Alex B says:

    I always find some way of getting old and new storage hosted in the same system when doing this kind of migration. Obviously, a HP microserver isn't optimal for that, though! Maybe use USB-to-SATA adaptors, or migrate from a single device to RAID after the copy, allowing the use of SATA ports for the old storage?

    Reply
  4. says:

    In theory, uftp could help you reach the absolute maximum speed your LAN could handle (no TCP ACKs!), but that would probably need a bet careful hand-tuning of the speed of transfer. (uftp can afford to avoid the slow start part of the TCP algorithm.)

    In practice, the could be due to rsync and scp being careful and calling fsync() every now and then, while plain wget/curl has no need to do that and relies on the OS to flush its buffers when it's comfortable. But I'm not sure at all that's the case here.

    Reply
  5. Combining rsync with GNU parallel is usually much faster, but is there any reason why you are not using “zfs send”?

    Reply
    1. @edent says:

      Firstly, because I didn't know about it. Secondly, because the sending system wasn't using ZFS. And, thirdly, it still requires a transfer over ssh - so that doesn't solve the problem.

      Reply

What are your reckons?

All comments are moderated and may not be published immediately. Your email address will not be published.Allowed HTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <p> <pre> <br> <img src="" alt="" title="" srcset="">

Discover more from Terence Eden’s Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading