You are currently viewing Papaless GoBd-compliant backup

Papaless GoBd-compliant backup

The tricky topic of Paperless GoBD compliant backup. Here’s a simple solution with everything you need to know.

Anyone who digitalizes their private or business documents will inevitably come across the powerful open-source tool Paperless-ngx. However, as the volume of sensitive data, invoices, and tax-relevant receipts grows, so does the responsibility. Especially for those operating within the European regulatory space, the sword of Damocles known as GoBD (Principles for the proper management and storage of books, records, and documents in electronic form) hovers directly over the hard drive.

In this guide, we will clear up common dangerous half-truths, explain the fundamental difference between an export and a true backup, and demonstrate step-by-step how to set up a highly efficient, generation-based backup system using rsyncand hard links within the Windows Subsystem for Linux (WSL).

Video: Papaless GoBd-compliant backup

Language: 🇩🇪|🇬🇧
☝️ Use YouTube subtitles for all languages AI Audio English

The Great Misconception: Exporting is Not a Backup!

Users frequently rely on the built-in Paperless-ngx tool document_exporter and lull themselves into a false sense of security, believing they possess a fully valid backup. This is a misconception that can end badly in an emergency or during a tax audit. Both tools serve completely different purposes:

The “Export” (via document_exporter)

The exporter extracts all existing documents from the internal Paperless structure and places them together with a file named manifest.json into a classic, accessible folder.

  • The Purpose of the Exporter: It serves data portability and independence. Should the Paperless-ngx project ever be discontinued, your PDFs remain cleanly named and accessible on your hard drive. Furthermore, the export allows you to migrate to a completely different system via the document_importer.
  • The GoBD Problem: The export on its own is not revision-safe (tamper-proof). The exported PDFs and the JSON file sit unencrypted on the operating system. They can be retroactively manipulated, deleted, or altered without detection—making it an absolute deal-breaker for tax authorities.

The Advantages of a True System Backup

A true technical backup, on the other hand, secures the entire infrastructure from the “outside.” It combines a consistent database dump (e.g., PostgreSQL) with the exact system directories of the Docker containers.

  • Maximum Data Security: It captures all metadata, user accounts, logs, and original timestamps immutably directly from the database.
  • Disaster Recovery in No Time: If a hard drive crashes, you simply copy the Docker directories back and restart the containers. The system resumes exactly where it left off.
  • Prerequisite for GoBD: By combining this technical backup (or the annual exports) with an unalterable storage medium (WORM/Snapshot Lock), you achieve a revision-safe long-term archive that withstands any tax audit.

Part 1: Guide for Upgraders (Migrating from the old tar.gz/ZIP backup to the new rsync hard link system)

If you have previously used a classic Windows batch file (.bat) that compressed backups into .tar.gz or .zip files, this old structure will block the new, highly efficient hard link process. With hard links, unmodified files consume exactly 0 bytes of additional storage space during daily backups, yet they appear as a complete full backup within every single daily folder.

Here is the migration path to cleanly convert your system:

Step 1: Install rsync in WSL

Since the new backup script runs natively under Linux, we must ensure that the synchronization tool rsync is available within your Windows Subsystem for Linux (WSL) environment.

  1. Open your WSL terminal (e.g., Ubuntu).
  2. Update the package sources and install rsync using the following command:Bashsudo apt-get update && sudo apt-get install -y rsync

Step 2: Clean up the old backup folder

The new rsync script searches the destination directory for the most recent backup folder in order to build the hard links upon it. If old .tar.gz archives from the Windows environment are still present there, rsync will abort with the error message --link-dest arg is not a dir.

  1. Navigate on your Windows drive to your previous backup path (e.g., D:\Backups\Paperless\media).
  2. Create a temporary subfolder there (e.g., alte_backups) and move all old compressed archive files into it. The subfolders in the main backup directory must be completely free of old archive files.

Step 3: Create the new Shell Script

Create a new file named backup.sh inside your Paperless root directory (e.g., /mnt/d/DockerServer/paperless/). Use a Linux-compatible editor or create it directly in the terminal via nano backup.sh. Copy and paste the following code:

Bash | Backup using the eml2pdf add-on, which includes the eml-import directory

#!/bin/bash

# ==========================================
# CONFIGURATION & RETENTION PERIOD
# ==========================================
KEEP_VERSIONS=3
BACKUP_DIR="/mnt/d/Backups/Paperless"

DOCKER_STAMM_DIR="/mnt/d/DockerServer/paperless"
MEDIA_DIR="/mnt/d/DockerServer/paperless/media"
EML_IMPORT_DIR="/mnt/d/DockerServer/paperless/eml-import"
EXPORT_DIR="/mnt/d/DockerServer/paperless/export"
DATA_DIR="/mnt/d/DockerServer/paperless/data"

DATE=$(date +%Y-%m-%d)

echo "Starting Paperless-ngx backup process for $DATE..."

# Structure destination folders
mkdir -p "$BACKUP_DIR/db"
mkdir -p "$BACKUP_DIR/stamm/$DATE"
mkdir -p "$BACKUP_DIR/media/$DATE"
mkdir -p "$BACKUP_DIR/eml-import/$DATE"
mkdir -p "$BACKUP_DIR/export/$DATE"
mkdir -p "$BACKUP_DIR/data/$DATE"

# 1. Database Backup (Guaranteed UTF-8 & Umlaut/Emoji safe)
echo "[$DATE] Starting PostgreSQL backup..."
docker exec paperless-dbpg pg_dumpall -U paperless --encoding=UTF8 > "$BACKUP_DIR/db/paperless_db_$DATE.sql"

# rsync helper function for incremental hard links
run_rsync_backup() {
    local src="$1"
    local dest_subfolder="$2"
    local extra_args="$3"

    local last_backup=$(find "$BACKUP_DIR/$dest_subfolder" -mindepth 1 -maxdepth 1 -type d ! -name "$DATE" | sort | tail -n 1)

    if [ -n "$last_backup" ]; then
        echo "[$DATE] Synchronizing $dest_subfolder (Using hard links to: $(basename "$last_backup"))..."
        rsync -av --delete --link-dest="$last_backup" $extra_args "$src/" "$BACKUP_DIR/$dest_subfolder/$DATE/"
    else
        echo "[$DATE] Synchronizing $dest_subfolder (First full backup)..."
        rsync -av --delete $extra_args "$src/" "$BACKUP_DIR/$dest_subfolder/$DATE/"
    fi
}

# 2. Synchronize directories
# For the root folder, we exclude media, exports, and active live Docker database raw folders
run_rsync_backup "$DOCKER_STAMM_DIR" "stamm" "--exclude=/media --exclude=/eml-import --exclude=/export --exclude=/data --exclude=/pgdata --exclude=/redisdata"
run_rsync_backup "$MEDIA_DIR" "media" ""
run_rsync_backup "$EML_IMPORT_DIR" "eml-import" ""
run_rsync_backup "$EXPORT_DIR" "export" ""
run_rsync_backup "$DATA_DIR" "data" ""

# 3. Clean up old retention states
echo "[$DATE] Removing old backups (Keeping the last $KEEP_VERSIONS versions)..."
ls -1tr "$BACKUP_DIR/db"/paperless_db_*.sql 2>/dev/null | head -n -$KEEP_VERSIONS | xargs -r rm

cleanup_old_folders() {
    local folder="$1"
    find "$BACKUP_DIR/$folder" -mindepth 1 -maxdepth 1 -type d | sort | head -n -$KEEP_VERSIONS | xargs -r rm -rf
}
cleanup_old_folders "stamm"
cleanup_old_folders "media"
cleanup_old_folders "eml-import"
cleanup_old_folders "export"
cleanup_old_folders "data"

echo "Backup completed successfully: $DATE."

Bash | Backup without the eml2pdf add-on; does not include the eml-import directory. Here you can see how to include or exclude directories.

#!/bin/bash

# ==========================================
# CONFIGURATION & RETENTION PERIOD
# ==========================================
KEEP_VERSIONS=3
BACKUP_DIR="/mnt/d/Backups/Paperless"

DOCKER_STAMM_DIR="/mnt/d/DockerServer/paperless"
MEDIA_DIR="/mnt/d/DockerServer/paperless/media"
EXPORT_DIR="/mnt/d/DockerServer/paperless/export"
DATA_DIR="/mnt/d/DockerServer/paperless/data"

# ==========================================
# DATE DETERMINATION
# ==========================================
DATE=$(date +%Y-%m-%d)

echo "Starte Paperless-ngx Backup-Prozess für den $DATE..."

# Sicherstellen, dass die Zielordner existieren
mkdir -p "$BACKUP_DIR/db"
mkdir -p "$BACKUP_DIR/stamm/$DATE"
mkdir -p "$BACKUP_DIR/media/$DATE"
mkdir -p "$BACKUP_DIR/export/$DATE"
mkdir -p "$BACKUP_DIR/data/$DATE"

# ==========================================
# 1. POSTGRESQL-DATABASE BACKUP
# ==========================================
echo "[$DATE] Starte PostgreSQL-Backup..."
docker exec paperless-dbpg pg_dumpall -U paperless --encoding=UTF8 > "$BACKUP_DIR/db/paperless_db_$DATE.sql"

# ==========================================
# rsync HARD LINK HELP FUNCTION
# ==========================================
run_rsync_backup() {
    local src="$1"
    local dest_subfolder="$2"
    local extra_args="$3"

    local last_backup=$(find "$BACKUP_DIR/$dest_subfolder" -mindepth 1 -maxdepth 1 -type d ! -name "$DATE" | sort | tail -n 1)

    if [ -n "$last_backup" ]; then
        echo "[$DATE] Synchronisiere $dest_subfolder (Nutze Hardlinks auf: $(basename "$last_backup"))..."
        rsync -av --delete --link-dest="$last_backup" $extra_args "$src/" "$BACKUP_DIR/$dest_subfolder/$DATE/"
    else
        echo "[$DATE] Synchronisiere $dest_subfolder (Erstes Vollbackup)..."
        rsync -av --delete $extra_args "$src/" "$BACKUP_DIR/$dest_subfolder/$DATE/"
    fi
}

# ==========================================
# RSYNC TRIGGER RUNS
# ==========================================
run_rsync_backup "$DOCKER_STAMM_DIR" "stamm" "--exclude=/media --exclude=/eml-import --exclude=/export --exclude=/data --exclude=/pgdata --exclude=/redisdata"
run_rsync_backup "$MEDIA_DIR" "media" ""
run_rsync_backup "$EXPORT_DIR" "export" ""
run_rsync_backup "$DATA_DIR" "data" ""

# ==========================================
# ADJUSTMENT
# ==========================================
echo "[$DATE] Entferne alte Backups (Behalte die letzten $KEEP_VERSIONS Versionen)..."

ls -1tr "$BACKUP_DIR/db"/paperless_db_*.sql 2>/dev/null | head -n -$KEEP_VERSIONS | xargs -r rm

cleanup_old_folders() {
    local folder="$1"
    find "$BACKUP_DIR/$folder" -mindepth 1 -maxdepth 1 -type d | sort | head -n -$KEEP_VERSIONS | xargs -r rm -rf
}

cleanup_old_folders "stamm"
cleanup_old_folders "media"
cleanup_old_folders "export"
cleanup_old_folders "data"

echo "Backup erfolgreich abgeschlossen: $DATE."

Step 4: Grant Permissions and Execute Initial Run

To allow the script to execute, you must assign execution permissions under Linux. Because Docker directories like pgdata are often owned by the root user, we execute the script strictly using sudo:

Bash

chmod +x /mnt/d/DockerServer/paperless/backup.sh
sudo ./backup.sh

During the first run, a complete snapshot of all folders is created. From the second execution onward, the hard link logic will automatically engage.

Part 2: Simplified Guide for Fresh Setups

If you are starting completely fresh and want to set everything up correctly from the beginning, follow this streamlined guide.

Step 1: Install rsync in WSL

Open your WSL terminal and ensure your system is up to date and rsync is ready with two quick commands:

Bash

sudo apt-get update
sudo apt-get install -y rsync

Step 2: Create and Customize the Script File

Create the backup routine directly in place:

Bash

nano /mnt/d/DockerServer/paperless/backup.sh

Paste the script code provided in Part 1. If necessary, adjust the paths in the configuration section at the very top if your directory structure is located on a different drive (e.g., /mnt/c/... instead of /mnt/d/...). Save using Ctrl + O and exit via Ctrl + X.

Make the script executable:

Bash

chmod +x /mnt/d/DockerServer/paperless/backup.sh

Step 3: Set up Automation (Cronjob)

To ensure the backup runs reliably in the background without manual intervention, we hand the task over to the Linux system service cron. Since the script requires elevated read permissions, we edit the crontab of the root user:

Bash

sudo crontab -e

Add the following line at the very end of the file to execute the backup automatically, for example, every Monday and Friday at 9:30 PM:

Code-Snippet

30 21 * * 1,5 /mnt/d/DockerServer/paperless/backup.sh > /mnt/d/Backups/Paperless/backup_cron.log 2>&1

Note: The suffix > ... backup_cron.log 2>&1 ensures that all outputs and potential error messages during the automated execution are written to a log file. This allows you to verify at any time whether the backup completed successfully.

Conclusion on the GoBD Strategy

With this configuration, the technical side is perfectly covered. For complete GoBD compliance within Europe, the following division is recommended: The backup.sh script established here continuously protects the live system against sudden data loss. Once a year, you additionally run the document_exporter to generate a clean year-end archive folder. If this annual folder is subsequently stored on a write-once, unalterable medium (such as a NAS with WORM folders or an immutable cloud repository) for 10 years, you can rest completely easy before your next tax audit.


Donate Bild

Support / Donation Link for the Channel
If my posts have been helpful or supported you in any way, I’d truly appreciate your support 🙏

PayPal Link
Bank transfer, Bitcoin and Lightning


#Paperlessngx #GoBD #DataBackup #Docker #Rsync #Backup #Selfhosting

Leave a Reply