How I Use Borg for Ransomware-Resilient Backups

2022-06-22

If you’re in need of a backup solution for your *nix machines, BorgBackup is a great tool for it. Borg features encryption, deduplication, append-only data access for ransomware resiliency, and data compression. I’ve been using it for five or six years now and I’ve developed a strategy for deploying borg that I’ll share with you.

This isn’t a step-by-step tutorial to using borg. If you want that, you should go check out borg’s Installation Guide and Quick Start Guide which do a good job explaining it.

Do you see a problem with anything I’ve written about here? Please contact me and I’ll update the post appropriately.

Also in case it has to be said, this information is PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, and I’m not LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE BLOG POST. Do your own threat modeling and red-teaming. Don’t take my word for it.

Borg Repositories

Let’s establish what a borg repository is, and what its security properties are.

So, a borg repo stores a collection of backups. When you create the repo you specify an encryption key, encryption file, or both. The repository can be stored locally on disk (or anything that looks like a disk), but borg can also back up over an SSH connection. This creates a natural client-server model, where the data repository is stored on a server, and a client connects over SSH to that server to back itself up. SSH is a convenient means of authentication for the client-server model here.

While multiple clients can back up data to the same borg repository, any client that wants to write to the repository has to acquire a write-lock, so only one client can write data to the repository at a time. Consequently, I find that backing up multiple systems into a single repository is logistically challenging. It’s also a security risk, as multiple systems accessing the same repository would have to share a single security key. If any one system was compromised, an attacker could decrypt the data for all systems using the repository.

Instead, I host multiple repositories on a single backup server. This also provide an opportunity for additional access controls, which I’ll explain later.

Backup data is encrypted by the client before it is transmitted to the server. As a result, if the backup server is compromised, the attacker can delete or ransom the backups but they cannot decrypt them and recover the data within.

So, about the backups in the repos. Each backup is a complete snapshot in time of the file tree which is backed up, but the file data is deduplicated within a repository. If you back up a system on three separate occasions, and your /usr/bin/gcc file was the same in each snapshot, only one real copy of that data is stored. The specifics are actually a bit more complicated, since under the hood this works by breaking the file into chunks and deduplicating those individual chunks. As a result, you can get deduplication of files within the same snapshot, or even partial deduplication of files that were only appended or partially rewritten.

File data is also compressed within the repository, and you have your choice of algorithms like lz4, gzip, lzma, and zstd. I prefer lz4 on my resource-limited systems. Everywhere else I use zstd level 3 for a nice balance of compression speed and ratio.

How I Use Borg

Client Systems

First things first, I generate a long encryption key for the system. Never re-use keys between systems. I store one copy of that in a password vault, and another goes on the client system.

On each of my systems, I create a folder called /backup which is owned by root and unreadable to any other user. Within that folder, I create two files, env.sh and backup.sh. I also generate an SSH keypair.

env.sh stores environment variable definitions for the repository, including the backup destination and encryption key. The borg command looks at variables prefixed with BORG_ for additional configuration beyond the command line flags. Also note the BORG_BASE_DIR variable. This tells borg to use /backup to store working data, and the metadata cache for speeding up future backups.

$ cat env.sh
#!/bin/sh
export BORG_REPO='borg@backup.server:/path/to/repository'
export BORG_BASE_DIR=/backup
export BORG_PASSPHRASE="a randomly generated password unique for every system"
export BORG_RSH='ssh -i /backup/system-name-ssh-key'

backup.sh stores the actual command to create the backup. Here’s the simplified version:

$ cat backup.sh
#!/bin/sh
source /backup/env.sh

borg create "$@"                            \
    --stats                                 \
    --one-file-system                       \
    --compression auto,zstd,3               \
    --exclude /backup                       \
    --exclude /root/.cache                  \
    --exclude /home/**/.cache               \
    "$BORG_REPO::{hostname}-{now:%Y-%m-%d}" \
    /                                       \
    /mnt/otherfilesystem
    # add more paths as desired

# Note, you can use %Y-%m-%d_%H:%M as the time format
# string to also put the hour and minute in there.

Maybe you don’t want a full system backup and instead just want to capture some specific repos. Not a problem. Any directory you ask borg to backup will be recursively traversed, but it won’t follow symlinks, so don’t worry about that. You may also want to use application-specific backup processes in your backup script. For example, databases may not like being restored with a snapshot that was taken while the database was running, so you might want to exclude the database’s data folder and create/store a database dump instead.

Be mindful of swapfiles too by the way. You probably don’t want to back those up.

I split these two files so that I can interactively source env.sh to more easily work with the repository manually when I need to. Additionally, since backup.sh passes through any additional parameters passed in, I can manually run /backup/backup.sh -p to watch a backup happen in real time.

With these files in place, I then go over to my backup server to complete the configuration there.

Server System

On the other side of the coin is the backup server. I could have one backup server or multiple, it doesn’t really matter because this scales out to however you want to do things.

On the server I create a user called borg which all clients will use to connect in. The user has no password, so each client has to use their own SSH key to connect. This allows us to place restrictions on those connections. Each entry in the authorized_keys file looks something like this:

restrict,command="borg serve --append-only --restrict-to-repository /mnt/backups/borg/torchic-repo" ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIuGYc6VTb21SVzRehdi3Pd+AgVnw3g6JB66LK36IVdI root@torchic

The restrict flag basically tells OpenSSH to prevent the client from doing anything fun like port forwards, X forwarding, opening extra channels, sftp, and so on. I believe that on older systems you have to write out all the restrictions one by one, but restrict was introduced as a forward-compatible way to tell OpenSSH to basically sandbox the client into interacting with whatever command is being executed and nothing else.

The command argument forces a specific command to run when a client connects with that SSH key. borg serve was made for this purpose, and communicates with the connecting borg client over stdin/stdout. The --restrict-to-repository argument restricts the client to a specific repository, and, crucially, --append-only prevents the client from deleting any existing data in the repository.

With all this done, I borg init -e repokey-blake2 from the client system, run a backup manually, and then set up a cronjob or systemd timer to create a new backup daily from then on.

Replication

Replicating backups off-site is broadly out of scope for this post, but I do want to say that the repos are structured very regularly, a bit like this:

.
├── config
├── data
│   ├── 0
│   │   ├── 1
│   │   ├── 101
│   │   ├── 103
│   │   ├── 996
│   │   └── 998
│   └── 1
│       ├── 1000
│       ├── 1002
│       └── 1008
├── hints.1168
├── index.1168
├── integrity.1168
├── nonce
└── README

As a result, it’s very easy to mirror this to any media you’d like, such as a cloud bucket storage, a tape archive, or just some other hard drives.

Restoring data

Restoring the backup is a complicated topic that depends on the nature of the backup. For the most simplistic full system backup, all you’ve got to do is borg extract a backup into a freshly formatted filesystem, set up your bootloader, fix up /etc/fstab with any new filesystem UUIDs, and you’re good to go. Your mileage may vary, and you’ll need to develop your own processes here depending on your situation.

I often just borg mount a backup and copy over only the bits I care about. Nothing wrong with that if it suits your needs.

Threat Modeling

This is not a comprehensive look at the nitty gritty of the security, just the bits I think I can include in this overview. Borg has really great documentation on this topic, so I urge you to read their security FAQs and security internals pages for more information.

The nonce file

Before getting into specific scenarios, I need to tell you about the nonce file.

Every repo has a nonce file in it. Be mindful of the repo’s nonce file. This is used in the encryption and is modified whenever a client writes data to the repository. If a write ever re-uses a nonce, the repo’s encryption can be broken. See this FAQ entry for more info. When writing, a client will use the greater of either the server’s copy of the nonce value or the client’s cached value. If two clients write to the same repo, an attacker with server-access could reset the nonce value after client A writes data, causing client B to re-use the same nonce (it doesn’t know the nonce was incremented). This is another reason to stick to one client per repo.

This raises the question of how to manage the nonce file in an append-only replica of the repository. You could just exclude it from the replica, since it’s not needed for reading data. If your backup client has the latest nonce cached, this is fine. If it doesn’t, then the client will re-use nonce values, and this breaks the encryption.

If you are ever in a situation where a client may re-use a nonce, you should consider the repo’s encryption broken. The simplest solution is to make a new repo and migrate your old data into it.

Now let’s get into how the deployment I described stands up to various attacks.

Scenario 1: An attacker compromises a server

So an attacker compromises the server, and they can access the backup files. They could delete the backups. They could prevent clients from making new backups. They could ransom the backups.

If your server is automatically replicating backups offsite in a push-configuration, the attacker may also be able to delete or ransom your off-site backups. Therefore if you do want automatic push-replication, you ideally want to do it in a way restricts your server to append-only replication, much like the clients are restricted to append-only writes to the server.

Because I use one client per repo, an attacker with server access can’t cause nonce reuse unless the client loses its cached nonce. Still, if an attacker compromises your server, your safest bet is to create a new repository for future backups.

Theoretically, an attacker could also exploit a flaw in the client-server protocol to hack the client when it connects in to the server. To the best of my knowledge, there are no known flaws that could allow this at this time.

Scenario 2: An attacker compromises a client

If the attacker compromises the client, and they get root access, they get the encryption key. This is perhaps not terribly exciting for them, because they can also just read the client’s files right from the disk. Still, this key allows them to decrypt backup history, which may provide data that isn’t currently on the client.

They can also deny service to the other backup clients by uploading lots of data to the server. This can be mitigated with borg serve --storage-quota, and with per-connection bandwidth limits on the server at the network layer.

If the attacker finds a flaw in borg serve --restrict-to-repository, they may be able to break out of this repository restriction and access other clients’ repositories. This is not immediately concerning, because they can’t decrypt the data in those repositories, but if they find further exploits in borg serve they may be able to delete those repositories.

Additionally, if the attacker can find a flaw in borg serve that allows them to get code execution, they could more completely break out of that sandbox and gain a foothold on the server that way.

Both of these can be mitigated by creating separate users for each client, applying AppArmor or SELinux controls to the borg serve process, and other system-level isolation techniques on the borg server. Use similar logic if you’re running something other than Linux on the borg server. For example, you FreeBSD folk are probably reaching for your FreeBSD jails already.

Scenario 3: An attacker compromises server replicas

This is very dependent on how you’ve decided to replicate your data. In all cases, the attacker won’t be able to decrypt data unless they also get ahold of the encryption keys, but they can certainly try to ransom the replica.

If you use a pull-configuration with a replica server connecting to the primary borg server to download data, the replica may be able to delete that server’s data. It’s up to you to put in the access controls to give it read-only access.

Scenario 4: An attacker compromises your password vault

If an attacker compromises the password vault, they get access to all the keys needed to decrypt the data. At this point, your backups should no longer be considered encrypted. However, the attacker still needs to get access to the backup server or a replica to get the actual backup data.

Rotate your encryption keys immediately, do all your other incident response, etc.

I recommend against storing your SSH keys in the vault. If a client dies and you need to rebuild it, just make a new SSH key. If you store your SSH keys in the vault, the attacker can just use those to connect to the borg server and get the data.

Scenario 5: A repo has a stale write-lock

The borg client will not break an existing write-lock held by a different system. If the write-lock was acquired by the client system, but the process that acquired the lock is dead, the client can prove the lock is stale and automatically break the lock (perhaps it was rebooted while backing up).

You may occasionally end up in situations where the write-lock was acquired by the client, but the client can’t prove it. Maybe the system was hard-rebooted and the lock was not sync’d to disk properly. In this case, the client will be safe and avoid breaking the lock, with the assumption that the lock may be held by another connection.

If this happens, your client won’t back up new data. I recommend setting up some alerting system to let you know when your backups fail for any reason, so you can investigate yourself and do whatever you need to do to clear things up (breaking locks, repairing data, etc.)

Learning more

I’ve only scratched the surface of borg. Borg has some of the best documentation of any tool I’ve had the pleasure of using, and you should really go read it if you want to learn more. Head on over to their readthedocs pages at borgbackup.readthedocs.io.