Setup - Paperless-ngx (2024)

Installation

You can go multiple routes to setup and run Paperless:

  • Use the easy install docker script
  • Pull the image from Docker Hub
  • Build the Docker image yourself
  • Install Paperless directly on your system manually (bare metal)
  • A user-maintained list of commercial hosting providers can be found in the wiki

The Docker routes are quick & easy. These are the recommended routes.This configures all the stuff from the above automatically so that itjust works and uses sensible defaults for all configuration options.Here you find a cheat-sheet for docker beginners: CLIBasics

The bare metal route is complicated to setup but makes it easier shouldyou want to contribute some code back. You need to configure and run theabove mentioned components yourself.

Docker using the Installation Script

Paperless provides an interactive installation script. This script willask you for a couple configuration options, download and create thenecessary configuration files, pull the docker image, start paperlessand create your user account. This script essentially performs all thesteps described in Docker setup automatically.

  1. Make sure that Docker and Docker Compose are installed.

    Tip

    See the Docker installation instructions at https://docs.docker.com/engine/install/

  2. Download and run the installation script:

    $ bash -c "$(curl --location --silent --show-error https://raw.githubusercontent.com/paperless-ngx/paperless-ngx/main/install-paperless-ngx.sh)"

    Note

    macOS users will need to install e.g. gnu-sed with supportfor running as sed.

From GHCR / Docker Hub

  1. Login with your user and create a folder in your home-directory to have a place for yourconfiguration files and consumption directory.

    $ mkdir -v ~/paperless-ngx
  2. Go to the /docker/compose directory on the projectpageand download one of the docker-compose.*.yml files,depending on which database backend you want to use. Rename thisfile to docker-compose.yml. If you want to enableoptional support for Office documents, download a file with-tika in the file name. Download thedocker-compose.env file and the .env file as well and store themin the same directory.

    Tip

    For new installations, it is recommended to use PostgreSQL as thedatabase backend.

  3. Install Docker andDocker Compose.

    Warning

    If you want to use the included docker-compose.*.yml file, youneed to have at least Docker version 17.09.0 and Docker Composeversion v2. To check do: docker compose version or docker -v

    See the Docker installation guide on how to install the currentversion of Docker for your operating system or Linux distribution ofchoice. To get the latest version of Docker Compose, follow theDocker Compose installation guide if your package repositorydoesn't include it.

  4. Modify docker-compose.yml to your preferences. You may want tochange the path to the consumption directory. Find the line thatspecifies where to mount the consumption directory:

    - ./consume:/usr/src/paperless/consume

    Replace the part BEFORE the colon with a local directory of yourchoice:

    - /home/jonaswinkler/paperless-inbox:/usr/src/paperless/consume

    Don't change the part after the colon or paperless won't find yourdocuments.

    You may also need to change the default port that the webserver willuse from the default (8000):

    ports: - 8000:8000

    Replace the part BEFORE the colon with a port of your choice:

    ports: - 8010:8000

    Don't change the part after the colon or edit other lines thatrefer to port 8000. Modifying the part before the colon will maprequests on another port to the webserver running on the defaultport.

    Rootless

    Warning

    It is currently not possible to run the container rootless if additional languages are specified via PAPERLESS_OCR_LANGUAGES.

    If you want to run Paperless as a rootless container, you will needto do the following in your docker-compose.yml:

    • set the user running the container to map to the paperlessuser in the container. This value (user_id below), should bethe same id that USERMAP_UID and USERMAP_GID are set to inthe next step. See USERMAP_UID and USERMAP_GIDhere.

    Your entry for Paperless should contain something like:

    webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest user: <user_id>
  5. Modify docker-compose.env, following the comments in the file. Themost important change is to set USERMAP_UID and USERMAP_GID tothe uid and gid of your user on the host system. Use id -u andid -g to get these.

    This ensures that both the docker container and you on the hostmachine have write access to the consumption directory. If your UIDand GID on the host system is 1000 (the default for the first normaluser on most systems), it will work out of the box without anymodifications. id "username" to check.

    Note

    You can copy any setting from the file paperless.conf.example andpaste it here. Have a look at configuration to see what's available.

    Note

    You can utilize Docker secrets for configuration settings byappending _FILE to configuration values. For example PAPERLESS_DBUSERcan be set using PAPERLESS_DBUSER_FILE=/var/run/secrets/password.txt.

    Warning

    Some file systems such as NFS network shares don't support filesystem notifications with inotify. When storing the consumptiondirectory on such a file system, paperless will not pick up newfiles with the default configuration. You will need to usePAPERLESS_CONSUMER_POLLING, which will disable inotify. Seehere.

  6. Run docker compose pull. This will pull the image.

  7. To be able to login, you will need a super user. To create it,execute the following command:

    $ docker compose run --rm webserver createsuperuser

    or using docker exec from within the container:

    $ python3 manage.py createsuperuser

    This will prompt you to set a username, an optional e-mail addressand finally a password (at least 8 characters).

  8. Run docker compose up -d. This will create and start the necessary containers.

  9. The default docker-compose.yml exports the webserver on your localport

    8000. If you did not change this, you should now be able to visityour Paperless instance at http://127.0.0.1:8000 or your serversIP-Address:8000. Use the login credentials you have created with theprevious step.

Build the Docker image yourself

  1. Clone the entire repository of paperless:

    git clone https://github.com/paperless-ngx/paperless-ngx

    The main branch always reflects the latest stable version.

  2. Copy one of the docker/compose/docker-compose.*.yml todocker-compose.yml in the root folder, depending on which databasebackend you want to use. Copy docker-compose.env into the projectroot as well.

  3. In the docker-compose.yml file, find the line that instructsDocker Compose to pull the paperless image from Docker Hub:

    webserver: image: ghcr.io/paperless-ngx/paperless-ngx:latest

    and replace it with a line that instructs Docker Compose to buildthe image from the current working directory instead:

    webserver: build: context: .
  4. Follow steps 3 to 8 of Docker Setup. When asked to rundocker compose pull to pull the image, do

    $ docker compose build

    instead to build the image.

Bare Metal Route

Paperless runs on linux only. The following procedure has been tested ona minimal installation of Debian/Buster, which is the current stablerelease at the time of writing. Windows is not and will never besupported.

Paperless requires Python 3. At this time, 3.9 - 3.11 are tested versions.Newer versions may work, but some dependencies may not fully support newer versions.Support for older Python versions may be dropped as they reach end of life or as newer versionsare released, dependency support is confirmed, etc.

  1. Install dependencies. Paperless requires the following packages.

    • python3
    • python3-pip
    • python3-dev
    • default-libmysqlclient-dev for MariaDB
    • pkg-config for mysqlclient (python dependency)
    • fonts-liberation for generating thumbnails for plain textfiles
    • imagemagick >= 6 for PDF conversion
    • gnupg for handling encrypted documents
    • libpq-dev for PostgreSQL
    • libmagic-dev for mime type detection
    • mariadb-client for MariaDB compile time
    • mime-support for mime type detection
    • libzbar0 for barcode detection
    • poppler-utils for barcode detection

    Use this list for your preferred package management:

    python3 python3-pip python3-dev imagemagick fonts-liberation gnupg libpq-dev default-libmysqlclient-dev pkg-config libmagic-dev mime-support libzbar0 poppler-utils

    These dependencies are required for OCRmyPDF, which is used for textrecognition.

    • unpaper
    • ghostscript
    • icc-profiles-free
    • qpdf
    • liblept5
    • libxml2
    • pngquant (suggested for certain PDF image optimizations)
    • zlib1g
    • tesseract-ocr >= 4.0.0 for OCR
    • tesseract-ocr language packs (tesseract-ocr-eng,tesseract-ocr-deu, etc)

    Use this list for your preferred package management:

    unpaper ghostscript icc-profiles-free qpdf liblept5 libxml2 pngquant zlib1g tesseract-ocr

    On Raspberry Pi, these libraries are required as well:

    • libatlas-base-dev
    • libxslt1-dev

    You will also need these for installing some of the python dependencies:

    • build-essential
    • python3-setuptools
    • python3-wheel

    Use this list for your preferred package management:

    build-essential python3-setuptools python3-wheel
  2. Install redis >= 6.0 and configure it to start automatically.

  3. Optional. Install postgresql and configure a database, user andpassword for paperless. If you do not wish to use PostgreSQL,MariaDB and SQLite are available as well.

    Note

    On bare-metal installations using SQLite, ensure the JSON1extension isenabled. This is usually the case, but not always.

  4. Create a system user with a new home folder under which you wishto run paperless.

    adduser paperless --system --home /opt/paperless --group
  5. Get the release archive fromhttps://github.com/paperless-ngx/paperless-ngx/releases for example with

    curl -O -L https://github.com/paperless-ngx/paperless-ngx/releases/download/v1.10.2/paperless-ngx-v1.10.2.tar.xz

    Extract the archive with

    tar -xf paperless-ngx-v1.10.2.tar.xz

    and copy the contents to thehome folder of the user you created before (/opt/paperless).

    Optional: If you cloned the git repo, you will have tocompile the frontend yourself, see hereand use the build step, not serve.

  6. Configure paperless. See configuration for details.Edit the included paperless.conf and adjust the settings to yourneeds. Required settings for gettingpaperless running are:

    • PAPERLESS_REDIS should point to your redis server, such as.
    • PAPERLESS_DBENGINE optional, and should be one of postgres,mariadb, or sqlite
    • PAPERLESS_DBHOST should be the hostname on which yourPostgreSQL server is running. Do not configure this to useSQLite instead. Also configure port, database name, user andpassword as necessary.
    • PAPERLESS_CONSUMPTION_DIR should point to a folder whichpaperless should watch for documents. You might want to havethis somewhere else. Likewise, PAPERLESS_DATA_DIR andPAPERLESS_MEDIA_ROOT define where paperless stores its data.If you like, you can point both to the same directory.
    • PAPERLESS_SECRET_KEY should be a random sequence ofcharacters. It's used for authentication. Failure to do soallows third parties to forge authentication credentials.
    • PAPERLESS_URL if you are behind a reverse proxy. This shouldpoint to your domain. Please seeconfiguration for moreinformation.

    Many more adjustments can be made to paperless, especially the OCRpart. The following options are recommended for everyone:

    • Set PAPERLESS_OCR_LANGUAGE to the language most of yourdocuments are written in.
    • Set PAPERLESS_TIME_ZONE to your local time zone.

    Warning

    Ensure your Redis instance is secured.

  7. Create the following directories if they are missing:

    • /opt/paperless/media
    • /opt/paperless/data
    • /opt/paperless/consume

    Adjust as necessary if you configured different folders.Ensure that the paperless user has write permissions for every oneof these folders with

    ls -l -d /opt/paperless/media

    If needed, change the owner with

    sudo chown paperless:paperless /opt/paperless/mediasudo chown paperless:paperless /opt/paperless/datasudo chown paperless:paperless /opt/paperless/consume
  8. Install python requirements from the requirements.txt file.

    sudo -Hu paperless pip3 install -r requirements.txt

    This will install all python dependencies in the home directory ofthe new paperless user.

    Tip

    It is up to you if you wish to use a virtual environment or not for the Pythondependencies. This is an alternative to the above and may require adjustingthe example scripts to utilize the virtual environment paths

  9. Go to /opt/paperless/src, and execute the following commands:

    # This creates the database schema.sudo -Hu paperless python3 manage.py migrate# This creates your first paperless usersudo -Hu paperless python3 manage.py createsuperuser
  10. Optional: Test that paperless is working by executing

    # Manually starts the webserversudo -Hu paperless python3 manage.py runserver

    and pointing your browser to http://localhost:8000 ifaccessing from the same devices on which paperless is installed.If accessing from another machine, set up systemd services. You may needto set PAPERLESS_DEBUG=true in order for the development server to worknormally in your browser.

    Warning

    This is a development server which should not be used in production.It is not audited for security and performance is inferior toproduction ready web servers.

    Tip

    This will not start the consumer. Paperless does this in a separateprocess.

  11. Setup systemd services to run paperless automatically. You may usethe service definition files included in the scripts folder as astarting point.

    Paperless needs the webserver script to run the webserver, theconsumer script to watch the input folder, taskqueue for thebackground workers used to handle things like document consumptionand the scheduler script to run tasks such as email checking atcertain times .

    Note

    The socket script enables gunicorn to run on port 80 withoutroot privileges. For this you need to uncomment theRequire=paperless-webserver.socket in the webserver scriptand configure gunicorn to listen on port 80 (seepaperless/gunicorn.conf.py).

    You may need to adjust the path to the gunicorn executable. Thiswill be installed as part of the python dependencies, and is eitherlocated in the bin folder of your virtual environment, or in~/.local/bin/ if no virtual environment is used.

    These services rely on redis and optionally the database server, butdon't need to be started in any particular order. The example filesdepend on redis being started. If you use a database server, youshould add additional dependencies.

    Warning

    The included scripts run a gunicorn standalone server, which isfine for running paperless. It does support SSL, however, thedocumentation of GUnicorn states that you should use a proxy serverin front of gunicorn instead.

    For instructions on how to use nginx for that,see the wiki.

    Warning

    If celery won't start (check withsudo systemctl status paperless-task-queue.service forpaperless-task-queue.service and paperless-scheduler.service) you need to change the path in the files. Example:ExecStart=/opt/paperless/.local/bin/celery --app paperless worker --loglevel INFO

  12. Optional: Install a samba server and make the consumption folderavailable as a network share.

  13. Configure ImageMagick to allow processing of PDF documents. Mostdistributions have this disabled by default, since PDF documents cancontain malware. If you don't do this, paperless will fall back toghostscript for certain steps such as thumbnail generation.

    Edit /etc/ImageMagick-6/policy.xml and adjust

    <policy domain="coder" rights="none" pattern="PDF" />

    to

    <policy domain="coder" rights="read|write" pattern="PDF" />
  14. Optional: Install thejbig2encencoder. This will reduce the size of generated PDF documents.You'll most likely need to compile this by yourself, because thissoftware has been patented until around 2017 and binary packages arenot available for most distributions.

  15. Optional: If using the NLTK machine learning processing (seePAPERLESS_ENABLE_NLTK for details),download the NLTK data for the SnowballStemmer, Stopwords and Punkt tokenizer to yourPAPERLESS_DATA_DIR/nltk. Refer to the NLTKinstructions for details on how todownload the data.

Migration is possible both from Paperless-ng or directly from the'original' Paperless.

Migrating from Paperless-ng

Paperless-ngx is meant to be a drop-in replacement for Paperless-ng andthus upgrading should be trivial for most users, especially when usingdocker. However, as with any major change, it is recommended to take afull backup first. Once you are ready, simply change the docker image topoint to the new source. E.g. if using Docker Compose, editdocker-compose.yml and change:

image: jonaswinkler/paperless-ng:latest

to

image: ghcr.io/paperless-ngx/paperless-ngx:latest

and then run docker compose up -d which will pull the new imagerecreate the container. That's it!

Users who installed with the bare-metal route should also update theirGit clone to point to https://github.com/paperless-ngx/paperless-ngx,e.g. using the commandgit remote set-url origin https://github.com/paperless-ngx/paperless-ngxand then pull the latest version.

Migrating from Paperless

At its core, paperless-ngx is still paperless and fully compatible.However, some things have changed under the hood, so you need to adaptyour setup depending on how you installed paperless.

This setup describes how to update an existing paperless Dockerinstallation. The important things to keep in mind are as follows:

  • Read the changelog andtake note of breaking changes.
  • You should decide if you want to stick with SQLite or want tomigrate your database to PostgreSQL. See documentationfor details onhow to move your data from SQLite to PostgreSQL. Both work fine withpaperless. However, if you already have a database server runningfor other services, you might as well use it for paperless as well.
  • The task scheduler of paperless, which is used to execute periodictasks such as email checking and maintenance, requires aredis message broker instance. TheDocker Compose route takes care of that.
  • The layout of the folder structure for your documents and dataremains the same, so you can just plug your old docker volumes intopaperless-ngx and expect it to find everything where it should be.

Migration to paperless-ngx is then performed in a few simple steps:

  1. Stop paperless.

    $ cd /path/to/current/paperless$ docker compose down
  2. Do a backup for two purposes: If something goes wrong, you stillhave your data. Second, if you don't like paperless-ngx, you canswitch back to paperless.

  3. Download the latest release of paperless-ngx. You can either go withthe Docker Compose files fromhereor clone the repository to build the image yourself (seeabove). You caneither replace your current paperless folder or put paperless-ngx ina different location.

    Warning

    Paperless-ngx includes a .env file. This will set the project namefor docker compose to paperless, which will also define the nameof the volumes by paperless-ngx. However, if you experience thatpaperless-ngx is not using your old paperless volumes, verify thenames of your volumes with

    $ docker volume ls | grep _data

    and adjust the project name in the .env file so that it matchesthe name of the volumes before the _data part.

  4. Download the docker-compose.sqlite.yml file todocker-compose.yml. If you want to switch to PostgreSQL, do thatafter you migrated your existing SQLite database.

  5. Adjust docker-compose.yml and docker-compose.env to your needs.See Docker setup details onwhich edits are advised.

  6. Update paperless.

  7. In order to find your existing documents with the new searchfeature, you need to invoke a one-time operation that will createthe search index:

    $ docker compose run --rm webserver document_index reindex

    This will migrate your database and create the search index. Afterthat, paperless will take care of maintaining the index by itself.

  8. Start paperless-ngx.

    $ docker compose up -d

    This will run paperless in the background and automatically start iton system boot.

  9. Paperless installed a permanent redirect to admin/ in yourbrowser. This redirect is still in place and prevents access to thenew UI. Clear your browsing cache in order to fix this.

  10. Optionally, follow the instructions below to migrate your existingdata to PostgreSQL.

Migrating from LinuxServer.io Docker Image

As with any upgrades and large changes, it is highly recommended tocreate a backup before starting. This assumes the image was runningusing Docker Compose, but the instructions are translatable to Dockercommands as well.

  1. Stop and remove the paperless container
  2. If using an external database, stop the container
  3. Update Redis configuration

    1. If REDIS_URL is already set, change it to PAPERLESS_REDISand continue to step 4.

    2. Otherwise, in the docker-compose.yml add a new service forRedis, following the example composefiles

    3. Set the environment variable PAPERLESS_REDIS so it points tothe new Redis container

  4. Update user mapping

    1. If set, change the environment variable PUID to USERMAP_UID

    2. If set, change the environment variable PGID to USERMAP_GID

  5. Update configuration paths

    1. Set the environment variable PAPERLESS_DATA_DIR to /config
  6. Update media paths

    1. Set the environment variable PAPERLESS_MEDIA_ROOT to/data/media
  7. Update timezone

    1. Set the environment variable PAPERLESS_TIME_ZONE to the samevalue as TZ
  8. Modify the image: to point toghcr.io/paperless-ngx/paperless-ngx:latest or a specific versionif preferred.

  9. Start the containers as before, using docker compose.

Moving data from SQLite to PostgreSQL or MySQL/MariaDB

The best way to migrate between database types is to perform an export and thenimport into a clean installation of Paperless-ngx.

Moving back to Paperless

Lets say you migrated to Paperless-ngx and used it for a while, butdecided that you don't like it and want to move back (If you do, sendme a mail about what part you didn't like!), you can totally do thatwith a few simple steps.

Paperless-ngx modified the database schema slightly, however, thesechanges can be reverted while keeping your current data, so that yourcurrent data will be compatible with original Paperless. Thumbnailswere also changed from PNG to WEBP format and will need to bere-generated.

Execute this:

$ cd /path/to/paperless$ docker compose run --rm webserver migrate documents 0023

Or without docker:

$ cd /path/to/paperless/src$ python3 manage.py migrate documents 0023

After regenerating thumbnails, you'll need to clear your cookies(Paperless-ngx comes with updated dependencies that do cookie-processingdifferently) and probably your cache as well.

Paperless runs on Raspberry Pi. However, some things are rather slow onthe Pi and configuring some options in paperless can help improveperformance immensely:

  • Stick with SQLite to save some resources.
  • Consider setting PAPERLESS_OCR_PAGES to 1, so that paperless willonly OCR the first page of your documents. In most cases, this pagecontains enough information to be able to find it.
  • PAPERLESS_TASK_WORKERS and PAPERLESS_THREADS_PER_WORKER areconfigured to use all cores. The Raspberry Pi models 3 and up have 4cores, meaning that paperless will use 2 workers and 2 threads perworker. This may result in sluggish response times duringconsumption, so you might want to lower these settings (example: 2workers and 1 thread to always have some computing power left forother tasks).
  • Keep PAPERLESS_OCR_MODE at its default value skip and considerOCR'ing your documents before feeding them into paperless. Somescanners are able to do this!
  • Set PAPERLESS_OCR_SKIP_ARCHIVE_FILE to with_text to skip archivefile generation for already ocr'ed documents, or always to skip itfor all documents.
  • If you want to perform OCR on the device, consider usingPAPERLESS_OCR_CLEAN=none. This will speed up OCR times and useless memory at the expense of slightly worse OCR results.
  • If using docker, consider setting PAPERLESS_WEBSERVER_WORKERS to 1. This will save some memory.
  • Consider setting PAPERLESS_ENABLE_NLTK to false, to disable themore advanced language processing, which can take more memory andprocessing time.

For details, refer to configuration.

Note

Updating theautomatic matching algorithm takes quite a bit of time. However, the update mechanismchecks if your data has changed before doing the heavy lifting. If youexperience the algorithm taking too much cpu time, consider changing theschedule in the admin interface to daily. You can also manually invokethe task by changing the date and time of the next run to today/now.

The actual matching of the algorithm is fast and works on Raspberry Pias well as on any other device.

Please see the wiki for user-maintained documentation of using nginx with Paperless-ngx.

Please see the wiki for user-maintained documentation of how to configure security tools like Fail2ban with Paperless-ngx.

Setup - Paperless-ngx (2024)
Top Articles
Latest Posts
Article information

Author: Prof. An Powlowski

Last Updated:

Views: 5989

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Prof. An Powlowski

Birthday: 1992-09-29

Address: Apt. 994 8891 Orval Hill, Brittnyburgh, AZ 41023-0398

Phone: +26417467956738

Job: District Marketing Strategist

Hobby: Embroidery, Bodybuilding, Motor sports, Amateur radio, Wood carving, Whittling, Air sports

Introduction: My name is Prof. An Powlowski, I am a charming, helpful, attractive, good, graceful, thoughtful, vast person who loves writing and wants to share my knowledge and understanding with you.