Backup automation with Paperless-ngx and rclone

Ready for the ultimate backup solution? Learn how to use Paperless with Rclone to back up your documents in the cloud fully automated and encrypted.

Last updated: Aug 31, 2024

13 mins read

Why backups matter

Regularly creating backups in your paperless office, especially when using Paperless-ngx, is indispensable for several reasons:

  • Data recovery: All your documents, including important contracts, invoices, and personal records, are stored digitally. Regular backups protect against data loss due to hardware failures, software issues, and accidental deletion. In the event of natural disasters, theft, or cyberattacks, you can quickly restore your data and maintain your business operations.
  • Legal requirements: You may be legally required to retain certain documents for a specific duration. Through backups, you ensure that you have the documents available for legal processes or audits.
  • Version control: With regular backups, you can create a chronological archive of your documents. This allows you to access older versions of your documents, which is useful for tracking changes.

The 3-2-1 Backup Rule

You may have already heard of the 3-2-1 backup rule. It's a recommended strategy for creating backups and ensuring high data security. The rule is simple but effective.

The three criteria:

  • 3 copies of your data: Create at least three separate copies of your data. This includes your original data in Paperless and two additional backups. If one backup fails, you have another option to restore your data.
  • 2 different devices: Store these backups on at least two distinct devices. These can be hard drives or cloud storage. Different storage technologies have different vulnerabilities, and diversifying across multiple types reduces device-specific risks.
  • 1 off-site location: At least one of the backups should be stored at a different location. This protects your data in the event of natural disasters or theft at your main office. Cloud storage is often used for off-site backups because you can access it from anywhere.

Implementing the 3-2-1 backup rule helps you effectively protect your data by diversifying across multiple storage types and locations.

Before you start: Where do you want to secure your data?

If you want to follow the 3-2-1 backup rule, you can now consider where you want to secure your data.

I will show you an example of setting up a Paperless backup automation from Linux to Google Drive using the open-source tool rclone.

Most NAS or VPS run a Linux distribution. Therefore, the steps should be quite similar regardless of the device.

What is rclone?

Rclone is a program for creating backups on cloud storage, running on Linux, macOS, and Windows, and supports over 70 cloud providers (Google Drive, OneDrive, Dropbox, Amazon S3, etc.)

Rclone mirrors your data from your server to the cloud storage. Thus, not everything needs to be re-uploaded with each change. To prevent the cloud provider from reading your data, we will also encrypt the backup with rclone.

Requirements

The prerequisite for this guide is that you have already successfully installed Paperless-ngx and fed it with some documents. You also need an account with a cloud provider. Here I use Google Drive. The following steps are very similar and sometimes simpler for other cloud providers.

Paperless Document Exporter

There are various ways to make Paperless-ngx backups. I recommend using the native “Paperless Document Exporter” & “Importer” instead of simply copying your entire Paperless folders. Since Paperless is probably running in Docker for you, some of the data might be in Docker volumes. You can run the Document Exporter with just one command, and it exports not only all files but also any configurations, user accounts, tags, and everything the algorithm has learned.

Step 1: Install rclone

Log in to your server via SSH and execute the command to install rclone via snap:

sudo snap install rclone

The snap store daemon must be installed for this (it is usually pre-installed on Ubuntu). If the command above fails, install the snap store daemon first:

sudo apt update
sudo apt install snapd

For this guide, it's important that rclone is installed on a device with web browser access (in addition) to retrieve a token for connecting with the cloud provider (see Step 3).

Since I am using a VPS without a web browser, I install rclone additionally on my Mac. For this, I use the package manager homebrew:

brew install rclone

You can find downloads for other operating systems here.

Step 2: Create a Google Client ID

Log into the Google API Console. It must be the same account that you want to use for Google Drive.

Select a project or create a new project.

Create a Google project

Create a Google project

Under APIs & Services, click on + ENABLE APIS AND SERVICES and search for drive, then activate the Google Drive API.

Select the Google Drive API

Select the Google Drive API

Activate the Google Drive API

Activate the Google Drive API

Click on Credentials in the left sidebar, then click on CONFIGURE CONSENT SCREEN.

Configure the consent screen

Click on configure consent screen

Select External and click CREATE. Then enter an app name, the user support email (your own email is okay), and the developer contact information (your own email is okay). Click SAVE AND CONTINUE (all other fields are optional).

Enter the app information

Enter the app information

Click on ADD OR REMOVE SCOPES. Add the scopes with the scope .../auth/docs, .../auth/drive, and .../auth/drive.metadata.readonly so that you can edit, create, and delete files with rclone. After that, click UPDATE and then SAVE AND CONTINUE.

Add the relevant scopes

Add the relevant scopes

Add your own account as a test user. Then click SAVE AND CONTINUE.

Then click on Credentials in the left sidebar again. Click on + CREATE CREDENTIALS and select OAuth client ID.

Choose to create an OAuth client ID

Choose to create an OAuth client ID

Select Desktop app as the application type and click CREATE.

Create an OAuth client ID

Create an OAuth client ID

Now you will see the Client ID and Client secret. Copy both! We will need these in Step 3 for rclone.

Copy the client ID and client secret

Copy the client ID and client secret

Click on OAuth consent screen in the left sidebar, then click on PUBLISH APP and confirm.

Step 3: Connect rclone with Google Drive

Go back to your terminal and create a new remote connection with rclone. The command for this is:

rclone config
e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>

Type n to create a new remote. Then enter a name for the remote (e.g., gdrive).

After that, select the cloud provider. For this, enter the correct number (17 for Google Drive).

Option Storage.
Type of storage to configure.
Choose a number from below, or type in your own value.
...
17 / Google Drive
   \ (drive)
...
Storage> 17

Next, enter the Client ID from Step 2:

Option client_id.
Google Application Client Id
Setting your own is recommended.
See https://rclone.org/drive/#making-your-own-client-id for how to create your own.
If you leave this blank, it will use an internal key which is low performance.
Enter a value. Press Enter to leave empty.
client_id> my-client-id

Next, enter the Client Secret from Step 2:

Option client_secret.
OAuth Client Secret.
Leave blank normally.
Enter a value. Press Enter to leave empty.
client_secret> my-client-secret

Choose the access permissions for rclone by entering a number (for the backup to work, number 3 is sufficient).

Option scope.
Comma separated list of scopes that rclone should use when requesting access from drive.
Choose a number from below, or type in your own value.
Press Enter to leave empty.
 1 / Full access all files, excluding Application Data Folder.
   \ (drive)
 2 / Read-only access to file metadata and file contents.
   \ (drive.readonly)
   / Access to files created by rclone only.
 3 | These are visible in the drive website.
   | File authorization is revoked when the user deauthorizes the app.
   \ (drive.file)
   / Allows read and write access to the Application Data folder.
 4 | This is not visible in the drive website.
   \ (drive.appfolder)
   / Allows read-only access to file metadata but
 5 | does not allow any access to read or download file content.
   \ (drive.metadata.readonly)
scope> 3

For the next step, you can leave it blank by pressing ENTER.

Option service_account_file.
Service Account Credentials JSON file path.
Leave blank normally.
Needed only if you want use SA instead of interactive login.
Leading `~` will be expanded in the file name as will environment variables such as `${RCLONE_CONFIG_DIR}`.
Enter a value. Press Enter to leave empty.
service_account_file>

After that, select the default by pressing ENTER.

Edit advanced config?
y) Yes
n) No (default)
y/n>

If your server does not have a web browser, enter n.

Use web browser to automatically authenticate rclone with remote?
 * Say Y if the machine running rclone has a web browser you can use
 * Say N if running rclone on a (remote) machine without web browser access
If not sure try Y. If Y failed, try N.
y) Yes (default)
n) No
y/n> n

Copy the entire line that starts with rclone authorize "drive" and execute it on your computer. To do this, open a new tab in your terminal. Your web browser should launch, and you can log into your Google account. Then copy the token and paste it back into your server terminal.

Option config_token.
For this to work, you will need rclone available on a machine that has
a web browser available.
For more help and alternate methods see: https://rclone.org/remote_setup/
Execute the following on the machine with the web browser (same rclone
version recommended):
	rclone authorize "drive" "very-long-code"
Then paste the result.
Enter a value.
config_token> mein-token

Select the default by pressing ENTER:

Configure this as a Shared Drive (Team Drive)?
y) Yes
n) No (default)
y/n> n

You will see a summary of your configuration. Press ENTER.

Keep this "gdrive" remote?
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y

Now the connection to your Google Drive account is established.

Step 4: Create an rclone crypt remote

We will now create another remote. This remote will form an encryption layer over the cloud storage remote that we just created. If you do not wish to encrypt the backup, you can skip this step and proceed to Step 5. The advantage of an unencrypted backup is that you can use Google Drive OCR search to quickly find your documents in Google Drive.

Start the configuration again with rclone config and create a new remote (n). You could name it, for example, gdrive_encrypted.

Next, select crypt from the list of providers by entering number 13.

Then you specify the name of the Google Drive remote along with the desired path where the backup should be stored. My remote is called gdrive and I want to store the backup in a folder named paperless_backup_encrypted. So, I enter gdrive:paperless_backup_encrypted:

Option remote.
Remote to encrypt/decrypt.
Normally should contain a ':' and a path, e.g. "myremote:path/to/dir",
"myremote:bucket" or maybe "myremote:" (not recommended).
Enter a value.
remote> gdrive:paperless_backup_encrypted

Then you select the encryption of the file names. I choose the default by pressing ENTER:

Option filename_encryption.
How to encrypt the filenames.
Choose a number from below, or type in your own string value.
Press Enter for the default (standard).
   / Encrypt the filenames.
 1 | See the docs for the details.
   \ (standard)
 2 / Very simple filename obfuscation.
   \ (obfuscate)
   / Don't encrypt the file names.
 3 | Adds a ".bin", or "suffix" extension only.
   \ (off)
filename_encryption> 1

Then you select the encryption of the folder names. I choose the default again:

Option directory_name_encryption.
Option to either encrypt directory names or leave them intact.
NB If filename_encryption is "off" then this option will do nothing.
Choose a number from below, or type in your own boolean value (true or false).
Press Enter for the default (true).
 1 / Encrypt directory names.
   \ (true)
 2 / Don't encrypt directory names, leave them intact.
   \ (false)
directory_name_encryption> 1

Then you set two passwords for the encryption. I let the passwords be generated with the maximum length.

Option password.
Password or pass phrase for encryption.
Choose an alternative below.
y) Yes, type in my own password
g) Generate random password
y/g> g
Password strength in bits.
64 is just about memorable
128 is secure
1024 is the maximum
Bits> 1024
Your password is: dCGprYdlIOtBGa5ozP7H2VM6uXRp_KlwH5OfEoufl-IOgJKXXSWNVKR92K0vf72u1oj3MM9CoEuEgnKTEyxh2na8LtGn-X6v3EGzNkr-UXrBg38pxrijclE0_jtOz1q0ldJe0X9Z918Fd0ZxDmAwT3IqjyTEXhs9bJMedYhyk9w
Use this password? Please note that an obscured version of this
password (and not the password itself) will be stored under your
configuration file, so keep this generated password in a safe place.
y) Yes (default)
n) No
y/n> y
Option password2.
Password or pass phrase for salt.
Optional but recommended.
Should be different to the previous password.
Choose an alternative below. Press Enter for the default (n).
y) Yes, type in my own password
g) Generate random password
n) No, leave this optional password blank (default)
y/g/n> g
Password strength in bits.
64 is just about memorable
128 is secure
1024 is the maximum
Bits> 1024
Your password is: XuL0KVb8vQWnIodgem7qWTlD6vrZKt18dgUmMAK61v1coMUt7DCc6EMPww4viD7YcQDcE78miAKBg9L9Qm8mx2kXiyquMUXrvND-BC9qcJMp95cJsPrsocVxQ26b1aU7aXa3glZre69phmqICZVb6ijfo_-61KRiOsBNCt-QKkA
Use this password? Please note that an obscured version of this
password (and not the password itself) will be stored under your
configuration file, so keep this generated password in a safe place.
y) Yes (default)
n) No
y/n> y
Edit advanced config?
y) Yes
n) No (default)
y/n> n
Configuration complete.

Copy the passwords and save them in a secure place. This completes the configuration of the crypt remote. Confirm with ENTER and terminate the process with CTRL+C.

Step 5: Create a backup script

I'll now show you how to create a script that will create a backup for you when executed. Go to the folder where you want to save the script. In my case, I'm going to the paperless-ngx folder:

cd paperless-ngx

Then create a new file:

sudo nano backup.sh

Then paste the script. My script is deliberately kept very simple. You can copy it, but you need to adjust the absolute path to your docker-compose.yml file and your export folder:

#!/bin/bash
# path to your docker-compose.yml
cd /home/tobias
# execute the paperless document exporter
# save the backup in the export folder
docker compose exec -T webserver document_exporter ../export
# rclone command to encrypt and sync with the cloud
/snap/bin/rclone sync /home/tobias/paperless-ngx/export gdrive_encrypted:

Save and close the file afterwards with CTRL+C, Y, and ENTER.

Step 6: Make the backup executable and run a test

To make the script executable, enter this command:

chmod +x backup.sh

Test now if your script works by executing it via the command line:

./backup.sh

Depending on the amount of data, it may take several minutes. The process should terminate automatically.

100%|██████████| 265/265 [00:00<00:00, 332.70it/s]

If you now open Google Drive in your web browser, you should find a new folder in your storage. In this folder, you will find the encrypted data from Paperless.

Google Drive folder with the encrypted backup

Google Drive folder with the encrypted backup

Step 7: Automate the backup with a cron job

Now we will create a new cron job that automatically executes the script at a defined time:

crontab -e

If the script, for example, is to be executed daily at 3:00 a.m., add the following line at the end of the file:

0 3 * * * /home/tobias/paperless-ngx/backup.sh >> /home/tobias/paperless-ngx/backup.log 2>&1

Here you need to specify the absolute path to your backup script and the path where the log file should be created. Save and close the file. When you execute crontab -l, you should see a list of your cron jobs including the added line.

How to restore Paperless with a backup

First and foremost, make sure to securely store the passwords for the backup encryption. If your server becomes unreachable, the backup will be useless if you cannot decrypt it.

If your server has failed and you don't have a snapshot, you'll need to reinstall rclone and Paperless on your server. Follow the same steps and then don't choose a new password for the crypt remote, but instead, enter the previous passwords. This way, rclone can decrypt your backup again.

Afterwards, execute the rclone sync command to import the backup:

rclone sync gdrive_encrypted: /home/tobias/paperless-ngx/export

Once your Paperless export folder is filled again, you can restore your Paperless instance with the Document Importer:

docker compose exec webserver document_importer ../export

Final thoughts

The presented Paperless backup automation can be similarly modeled with other cloud providers. Typically, an API key or client ID with a client secret is sufficient for authentication.

When you run the Document Exporter, the export folder in your Paperless directory contains everything to restore your instance.

If you want to make backups to local hard drives, you can simply copy the contents of the export folder. You can use tools like rsync paired with a cron job for this purpose. Some NAS operating systems already have integrated local backup functions.

🛠️ Paperless-ngx IT Support 🛠️

Need help with the installation or configuration of Paperless-ngx? I'm happy to assist! Just send me an email at: hello@digitizerspace.com

Leave a Comment

Your email address won't be published.