Published: 5. 10. 2018   Category: GNU/Linux

File synchronization with rsync and rclone

I am keen rsync user for over 15 years. I have used rsync for very first time when Czech Linux User Group (CZLUG) released a czech variant of RedHat Linux (I think version 7.1). Everybody was downloading ISO from FTP, but the original ISO had a broken package included and it was causing the installation to fail. However, CZLUG quickly fixed the problem, but ISO was already downloaded by quite a lot of people…

Download it again on your 128 kbps will take another 10 hours! You can imagine people frustration. Fixed ISO was almost the same, but there was some small part updated containing fixed package. Member of CZLUG announced a tutorial using rsync how to update/synchronize you local broken ISO with the fixed one. And it compared CRCs of ISO's blocks and downloaded only the updated ones. WOW!

Since then I was using rsync for majority of my file transfers between computers I am accessing by SSH. I was using it on slow, unstable links, when it was the only way how to transfer file without any distortion.

I am using this one-liner, for most usual rsync file transfers:

rsync -avvHPS --rsh='ssh' src/ user@hostname:/path/dst/

Rsync has few handy options:

I have also few rsync scripts around, this one for web site update Makefile:

USERNAME = webuser
SITE     = regnet.cz
DIR      = /home/bruxy/web
EXCLUDE  = --exclude '.htaccess' --exclude 'Makefile'
OPTIONS  = --verbose --delete --stats --progress --recursive -z -e ssh \
           --inplace --partial --times $(EXCLUDE)
INPUT_DIR = $(shell pwd)/

.PHONY: copy test

copy:
    @rsync $(OPTIONS) $(INPUT_DIR) $(USERNAME)@$(SITE):$(DIR)
test:
    rsync --dry-run $(OPTIONS) $(INPUT_DIR) $(USERNAME)@$(SITE):$(DIR)

When rsync is not enough: rclone

First situation when rsync is not enough is quite simple, you need to transfer files between different file storage systems and rclone supports a lot: S3, Google Drive, Dropbox and many more. The second problem is the way rsync is handling directories with too many files. How much is too many? From my observations, the number is not very clear and it is around several million. Rsync may hang-up and it will end with a message with „expand file_list pointer array“ and then you will press Ctrl-c:
[sender] expand file_list pointer array to 524288 bytes, did move
[sender] expand file_list pointer array to 1048576 bytes, did move
^Crsync error: unexplained error (code 130) at rsync.c(642) [sender=3.1.3]
rsync: [sender] write error: Broken pipe (32)

Only solution I was able to google was an advice to separate the transfer into several stages, so each stage will transfer smaller amount of file so rsync will be able handle this.

Note: When you design an application creating files, do you think about number of files it will handle? On average? In extreme situation? Will be all files stored in a one directory or in a directory tree? You can have some situation when your application will generate millions of files and operating system will start to behave weirdly. Weirdly slow, file operation utilities will complain about Argument list too long and in the worst case you can get mysterious messages about full file system even if you have plenty of free space and inodes. Nobody thinks about it until problems appear, it is life :)

Because I am quite lazy, I did not want to create some helper script to manage files and I have tried rclone.

The rclone is used little bit differently. First you need to create a configuration of your remote system. I am using it for ssh/sftp transfer so my config is listed below and it is very simple. I believe that some different file storages will need to provide some API keys, access keys and more.

Use command rclone config for interactive config creation. On Ubuntu 16.04, it is stored in ~/.config/rclone/rclone.conf and once generated it will look like similar this:

[remote_server]
type = sftp
host = 172.102.0.123
user = bruxy
key_file = /home/bruxy/.ssh/id_rsa

The file sync will use this:

rclone -v sync source/directory/ remote_server:/destination/directory/

There is also option for test run --dry-run just to see what will happen during your transfer without any action.