wget is much faster than scp or rsync
I needed to copy 3TB of data from my old homeserver to my new one. I decided to spend as much time "sharpening my axe" as possible. I spent ages dicking around with ZFS configs, tweaking BIOS settings, flashing firmware, and all the other yak-shaving necessary for convincing yourself you're doing useful work.
Then I started testing large file transfers. Both scp
and rsync
started well - transferring files at around 112MBps. That pretty much saturated my Gigabit link. Nice! This was going to take no time at all...
And then, after a few GB of a single large file had transferred, the speed slowed to a crawl. Eventually dropping to about 16MBps where it stayed for the majority of the transfer.
I spent ages futzing around with the various options. Disabling encryption, disabling compression, flicking obscure switches. I tried using an SSD as a ZIL. I rebuilt my ZFS pool as a MDADM RAID. I mounted disks individually. Nothing seemed to work. It seemed that something was filling a buffer somewhere when I used scp
or rsync
.
So I tried a speed test. Using curl
I could easily hit my ISP's limit of 70MBps (about 560Mbps).
It was getting late and I wanted to start the backup before going to bed. Time for radical action!
On the sending server, I opened a new tmux
and ran:
cd /my/data/dir/
python3 -m http.server 1234
That starts a webserver which lists all the files and folders in that directory.
On the receiving server, I opened a new tmux
and ran:
wget --mirror http://server.ip.address:1234/
That downloads all the files that it sees, follows all the directories and subdirectories, and recreates them on the server.
After running overnight, the total transfer speed reported by wget
was about 68MBps. Not exactly saturating my link - but better than the puny throughput I experienced earlier.
Downsides
There are a few (minor) problems with this approach.
- No encryption. As this was a LAN transfer, I didn't really care.
- No preservation of Linux attributes. I didn't mind losing metadata.
- No directory timestamps. Although file timestamps are preserved.
- An
index.html
file is stored every directory. I couldn't find an option to turn that off.- They can be deleted with
find . -newermt '2023-04-01' \! -newermt '2023-04-03' -name 'index.html' -delete
- They can be deleted with
- It all just feels a bit icky. But, hey, if it's stupid and it works; it isn't stupid.
I'm sure someone in the comments will tell me exactly which obscure setting I needed to turn on to make scp
work at the same speed as wget
. But this was a quick way to transfer a bunch of large files with the minimum of fuss.
croc
to transfer the files. It maintained a relatively high speed, but there was a significant loading time when starting (parsing the files, perhaps?).bash
special filename/dev/tcp/host/port
in a file redirection. So:Alex B says:
@edent says:
ssh
- so that doesn't solve the problem.More comments on Mastodon.