- To Keep in Mind
- The
sync
command - The
check
command - Appendix
Invalid UTF-8 bytes will also be replaced, as they can't be used in JSON strings. Note that in 2020-05 Backblaze started allowing characters in file names. Rclone hasn't changed its encoding as this could cause syncs to re-transfer files.
B2sync is a multi-threaded command line tool that works on Linux, Windows, and Macintosh. It uploads files to B2 in a fashion inspired by rsync. What that means is that if the lastModified date on a file on your local system has not changed, it won't upload a new copy. A Synology NAS to Backblaze B2 Hyper Backup task is a great way to back up your NAS. In the last post that I created, we took a look at how to backup your Synology NAS to a Raspberry Pi. 'rsync for cloud storage' - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Yandex Files - Hsury/rclone.
Recap
The previous post detailed how Rclone can reliably upload large files with their checksums to Backblaze unlike other programs. This post will outline the workflow and some gotchas to keep in mind when doing massive data loads over the internet.
With trial and error, I was able to archive 8 TB of footage from my Synology NAS to Backblaze B2 in about a month.
To Keep in Mind
First, the overall workflow.
Remote to Remote is Possible
Keep in mind Rsync supports copying between two remotes directly. The computer running Rclone will stream data in RAM as it shuttles data between the two.
In fact that’s what I mainly did: transferred assets from a personal B2 bucket to the organization’s new B2 bucket. Pretty neat!
List Folders Syntax: lsd
After setting up your remote with rclone config
, use the list directory command lsd
to double check your source/target folders.
For example, if the B2 remote name is called b2-remote1
then the command to list the root is:
Note the :
at the end.
If a folder contains spaces, you use double quotes like this rather than backticks .
Also use trailing forward slashes /
instead of asterisks *
to indicate the files inside.
Consider copy
instead of sync
From the docs1:
rclone copy
- Copy files from source to dest, skipping already copied.rclone sync
- Make source and dest identical, modifying destination only.
Depending on your intention, copy
may be better.
Expect Errors and Verify
Although Rclone automatically retries upload errors (by default up to 10 times) there are few reasons why files never get uploaded. See the appendix for various scenarios.
Therfore, in a nutshell, always verify your transfer after (see below).
Beware Quota Restrictions
Unexpected EOF (end of file) errors can occur when streaming from a remote because of Backblaze quota restrictions.
Double Check the Source Supports (and has) Checksums
Since Backblaze only supports SHA-1 checksums, the Rclone docs indicate the source must also support SHA-1 checksums.2
For a large file to be uploaded with an SHA1 checksum, the source needs to support SHA1 checksums. The local disk supports SHA1 checksums so large file transfers from local disk will have an SHA1. See the overview for exactly which remotes support SHA1.
So B2 to B2 syncs should always populate checksums, right? Wrong. It will only if the source B2 bucket had checksums.
As detailed in the previous post, that means if the large files were copied with Rclone would they have checksums.
Rclone Browser is Great (but Deprecated) for Local <-> Remote
Rclone Browser is a wrapper that the same config as the CLI. Rclone Browser does not support direct remote to remote syncs, but it is good for normal use. Unfortunately the program deprecated in favor of the WebGUI, but the latter doesn’t let you yet upload things. 🤷🏾♂️
On Mac, Rclone Browser can be installed with Homebrew via brew cask install rclone-browser
⬆︎ Reliability by ⬆︎ Chunk Size (using ⬆︎ RAM)
The default settings seem to be optimized for small files, like webpages.
- Single part upload cutoff of 200 MB
- Chunk size of 96 MB
- Four concurrent transfers
For whatever reason, the error rate with these defaults was higher than I expected (see below).
Instead, I found better stability for large video files with:
- Cutoff of 1G
- 1G <= chunk size <=4G
- Two concurrent transfers
Note that all concurrent chunks are buffered into memory, so there is significantly more RAM usage with larger chunk sizes. Hence the downgrade to two transfers.
More specifics in the sync section below.
Measure Twice, Cut Once: dryrun
Before discussing the sync
command, it’s imperative mention the --dryrun
flag for the following reasons.
- Backblaze bills by usage/throughput
- B2 doesn’t support renaming files after they are uploaded
Therefore, when running rclone sync
always use the --dryrun
option first.
The sync
command
My goto sync
(orcopy
) command is:
rclone sync <source> <dest> --exclude .DS_Store -vv --b2-upload-cutoff 1G --b2-chunk-size 1G --transfers 2
Explanation of Flags
--exclude .DS_Store
to excluding Mac specific files-vv
to enable DEBUG logging for visibility into chunk retries, etc.--b2-upload-cutoff
files above this size will switch to a multipart chunked transfer--b2-chunk-size
the size of the chunks, buffered in memory--transfers
number of simulatenous transfers.b2-chunk-size
xtransfers
must fit in RAM
Phased Approach with --max-size
Sometimes I found it helpful to transfer all files under a certain size limit first, say 1 GB, and then re-run the command for larger files.
To do so, add --max-size 1G
to the rclone sync
command.
The check
command
Always verify after a sync. Even if you think you don’t need to. The command is straightforward:
rclone check <source> <dest> --exclude .DS_Store
If there are discrepancies the output will look like:
Use error output to create diff file
By massaging the rclone check
standard output into a new file with just the file names, it is possible to re-sync just these files. This saves us Backblaze read transactions on the files already copied.
Assuming a file mydiff.txt
:
the sync command is:
Then, run rclone check
again on all the files.
The cleanup
command
If your buckets are created with default settings, the file lifecyle is set to Keep all versions
.
To purge deleted files, use a similar syntax to the lsd
command.
Also note that3:
Note that cleanup
will remove partially uploaded files from the bucket if they are more than a day old.
Appendix
Performance Logs
The exact command I used at first was
and it completed, roughly 3 days later with a 5% error rate.
Instead, by using a chunk size 1G and two max transfers (total 2G in RAM at a time) transfers were noticeably more stable.
Upload cutoffs of “5G”
During my experiments, I once tried a 5G single-part cutoff: --b2-chunk-size 2G --b2-upload-cutoff 5G --max-size 5G
. The docs state This value should be set no larger than 4.657GiB ( 5GB)
however it threw this error.
So apparently 5G
is too high. 4G
worked fine though.
500 Internal Server Error
Backblaze B2 Vs Rsync.net
Something is wrong with Backblaze, usually a transient problem. Rclone will retry, by default up to 10 times with built-in rate limiting (pacer) as shown with the incident a7691a3d7f71-e47fc872d7ba
below.
References
Rclone brands themselves as “rsync for cloud storage”, and with its versatility and the number of providers it supports I’m inclined to believe them.
The setup I describe below is one that I use as a component of my backup process. It’s an automated, off-site, encrypted copy of my Documents folder from early in the morning before I start my day. It’s not meant to be a primary recovery source, but will be there if I need it.
The short of it is that I can sync my Documents directory to a Backblaze storage bucket with a single command line which I then pop into a script file that gets executed by cron every morning.
I chose Backblaze based on it’s reputation and price. They have a solid reputation and B2 is their business storage product. As for pricing, the comparison can be seen here.
Start by Logging into Backblaze, navigating to Buckets, and creating a new bucket. Give it a unique name and ensure it’s private. I chose to call mine “Desktop-1810-Documents”.
Now click the link to “Show Account ID and Application Key”. Under “Add Application Key” create a new key with access to only the bucket you just created. Copy the Application Key it shows you at the end, you won’t see it again and will have to create a new Application Key if you lose it or need to change it in the future.
Install Rclone
Compare your package version to the latest available from the Rclone website and then either install the package repository version or follow the instructions for the scripted install from their website.
Configure
Open a command prompt and type rclone config. We will start by creating a “backblaze” container pointing to our B2 bucket.
Select ‘n’ for a new remote and name it “backblaze”.
Use the bucket applicationKeyId as the account and the Application Key itself as the key. Do not use your actual Account ID listed at the top of the Buckets page.
Backblaze B2 Rsync
Now decide if you’d like to have files permanently deleted when you delete them from your local machine. Skip the advanced config (select ‘n’) and then review and finalize your settings.
Verify it worked with rclone lsd backblaze: The output should be the name of the B2 bucket.
Now we’re going to create an encrypted remote named “encrypted_b2” inside our backblaze remote.
Run rclone config again and select ‘new remote’. This time call it “encrypted_b2” and select the Encrypt/Decrypt a remote option from the list of remotes.
It will now ask for the name of the remote to encrypt. Here type in backblaze: followed by the B2 Bucket name from above. In my case this would be backblaze:Desktop-1810-Documents
Answer the remaining questions about encrypting file names and complete the wizard.
Your remote is now configured and all of your backup destinations should point to encrypted_b2.
We’re now ready to perform the first sync.
rclone sync -v /home/user/Documents/ encrypted_b2:/Documents/
Once completed (and after a few minutes) you can go back to the Backblaze B2 website, click on the Bucket, and you should see one high-level folder possibly with the name encrypted depending on your earlier choice.
Congratulations, your files are now backed-up securely offsite at minimal cost!
Now paste that line into a text file, name it with a .sh extension, make it executable, and schedule it with cron.
To restore files you would just initiate a sync in the opposite direction.
rclone sync -v encrypted_b2:/Documents/ /home/user/Desktop/restoredDocs/
As a nice bonus Rclone also gives you the ability to mount a remote B2 bucket as a local drive with the rclone mount command.