Anti-Fandom Action! Hosting BreezeWiki with Caching and WildCard DNS

There’s this website online that’s a bit notorious for being awful, and also for being everywhere: fandom dot com. Fandom hosts a lot of wikis, some of which have existed for over a decade now. They used to be known as wikkia and provided the quite-useful service of a hosted MediaWiki instance. That’s still what they do actually, but over time they’ve become more and more malignant. I don’t know the full story, what happened with management, whatever, but these days when you go on a Fandom page you’re bombarded with ads for media you don’t care about, weird trivia quizzes, obnoxious animations, and all of this slows your browser down and gets in the way of the page you were actually trying to read. BreezeWiki is a proxy that fixes that, and you can even run your own! You can point the getindie browser extension at your instance or another person’s instance and it’ll turn that pit of despair into a nice smooth browsing experience, and recommend alternative independently hosted wikis if they exist. If that’s all you want to do, go download that extensions, you’re free. But if you want to run your own, that’s what the rest of this post is for.

So before I go into the details on my setup, you should know that there’s Official BreezeWiki Documentation on how to run your own instance and it’s pretty good. It’ll get you from 0 to running most of the time, and those docs will be up to date after this post stops being up to date. But for completeness I’m going to cover the whole thing.

Installing BreezeWiki

You’ll want to use a system with at least 1GB of ram (that’s what I’m using). You can scrape by with less, but BreezeWiki is going to take a few hundred megabytes on its own.

BreezeWiki is written in Racket. If you’re on x86-64 and you don’t want to set up Racket, you can just download the binary distribution of BreezeWiki as the official docs say. You should be able to unpack and run breezewiki-dist/bin/dst. I instead opted to install Racket and run it from a git clone.

I’m running on Debian Bullseye, which has a version of Racket that’s too out of date to run BreezeWiki, but the version in bullseye-backports is new enough. You need the backports repo in your apt sources, and then you can install it.

echo 'deb http://deb.debian.org/debian bullseye-backports main' | sudo tee /etc/sources.list.d/bullseye-backports.list
apt update
sudo apt install -t bullseye-backports racket

More generally, you need at least Racket version 8.4. If your distribution doesn’t provide that, you can get an up to date version of racket from download.racket-lang.org.

After installing Racket, I created a breezewiki user to run the code under:

useradd -m breezewiki

Then I cloned the git repository into /opt/breezewiki:

cd /opt
sudo git clone https://gitdab.com/cadence/breezewiki.git
chown -R breezewiki:breezewiki breezewiki

We need to install the dependencies.

sudo -iu breezewiki bash -c 'cd /opt/breezewiki && raco pkg install --auto'

We also need to configure breezewiki,

sudo -iu breezewiki nano /opt/breezewiki/config.ini

and here’s what my config looks like:

canonical_origin = https://yourcoolbreezewiki.com
debug = false
feature_search_suggestions = true
log_outgoing = false
port = 10416
strict_proxy = false

I want to highlight strict_proxy here. If you turn that on then your BreezeWiki instance will download images from fandom and then re-serve them to anyone using your instance. As a user this is pretty nice because it means even less interaction with fandom, but right now there’s some edge-cases that mean if you turn this on some pages will break and not look right. If you’re ok with that, you can turn it on, but for now I’ve been told it’s best to leave it off. Hopefully I can turn that on later! However, you may want to keep it off forever if you don’t have the bandwidth to support hosting the images yourself.

The last thing is that because we’re reverse proxying breezewiki with Nginx (we’ll get there soon), it doesn’t make sense to have breezewiki listening on a network interface accessible to the broader internet. You could firewall it off, or you can edit the racket code to make it listen on 127.0.0.1 in release mode by copying this command to patch it:

cd /opt/breezewiki
sudo -u breezewiki git apply <<EOF
diff --git a/breezewiki.rkt b/breezewiki.rkt
index 2e2772f..e198783 100644
--- a/breezewiki.rkt
+++ b/breezewiki.rkt
@@ -30,7 +30,7 @@
 (define ch (make-channel))
 (define (start)
   (serve/launch/wait
-   #:listen-ip (if (config-true? 'debug) "127.0.0.1" #f)
+   #:listen-ip "127.0.0.1"
    #:port (string->number (config-get 'port))
    (λ (quit)
      (channel-put ch (lambda () (semaphore-post quit)))
diff --git a/dist.rkt b/dist.rkt
index deb08a8..9d4fdf3 100644
--- a/dist.rkt
+++ b/dist.rkt
@@ -20,7 +20,7 @@
 (require (only-in "src/page-file.rkt" page-file))
 
 (serve/launch/wait
- #:listen-ip (if (config-true? 'debug) "127.0.0.1" #f)
+ #:listen-ip "127.0.0.1"
  #:port (string->number (config-get 'port))
  (λ (quit)
    (dispatcher-tree
EOF

This is technically optional, but I like knowing that all the traffic is going through my nginx server.

Finally, we need a service to run BreezeWiki. My installation is using Systemd, so here’s a systemd service for you to use. Adapt this to other systems as necessary. If you’re using systemd, put this in /etc/systemd/system/breezewiki.service:

[Unit]
Description=breezewiki is cool
After=network.target

[Service]
User=breezewiki
Group=breezewiki
WorkingDirectory=/opt/breezewiki
ExecStart=/usr/bin/racket /opt/breezewiki/dist.rkt
Restart=on-failure
# everything after this point is just hardening
InaccessiblePaths=/etc/nginx /etc/letsencrypt /etc/passwd /etc/group
ReadWritePaths=/opt/breezewiki/storage
ReadOnlyPaths=/etc/racket /etc/resolv.conf /etc/hosts /usr/share/racket /usr/lib/racket /usr/include/racket /usr/share/doc/racket
PrivateDevices=true
ProtectControlGroups=true
ProtectHome=read-only
ProtectKernelTunables=true
ProtectSystem=full
PrivateTmp=true
ProtectProc=invisible
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
NoNewPrivileges=true
RestrictNamespaces=true
RestrictAddressFamilies=~AF_UNIX


[Install]
WantedBy=multi-user.target

Then run

sudo systemctl daemon-reload
sudo systemctl enable --now breezewiki

Give it a minute and check that it’s running ok

sudo systemctl status breezewiki

Updating BreezeWiki Later

If you cloned the source from git, then later on when you want to update BreezeWiki you’ll need to do this:

sudo -iu breezewiki bash -c '
    cd /opt/breezewiki \
    && git pull --rebase --autostash \
    && raco pkg install --auto --skip-installed \
    && raco pkg update --auto
'
sudo systemctl restart breezewiki

The git stash push/pop are only necessary if you applied my patch to make it listen on 127.0.0.1. Hopefully that’ll just be a config option in the ini later you so don’t need that patch at all. If in doubt, just delete the breezewiki folder, re-clone it, and put your config back.

Setting up Nginx

Now we need to set up Nginx.

sudo apt install nginx
sudo rm /etc/nginx/sites-enabled/default
sudo mkdir -p /var/cache/breezewiki/nginx /var/www/breezewiki
sudo chown -R www-data:www-data /var/cache/breezewiki /var/www/breezewiki
sudo nano /etc/nginx/sites-enabled/breezewiki

Here’s the general config file. You should read through this and take note of the comments that tell you when you need to think about a setting and change it to match your setup.

# this sets up a response cache at /var/cache/breezewiki/nginx.
# Leave levels=1:2 alone, leave keys_zone alone You should adjust
# max_size= to be however much space you want to use for caching. If you
# aren't caching images, 60gigs is extremely overkill. You won't see a ton
# of benefit beyond a couple gigs. If you are caching images, go ham.
# inactive= specifies how long a file will stay on disk until it gets deleted
# (this is NOT how long nginx will wait before refreshing the cache for that
# file). You can set it to whatever you want as long as it's longer than your
# cache time that you set later down in the file.
proxy_cache_path /var/cache/breezewiki/nginx levels=1:2 keys_zone=breezewiki_cache:50m
                 max_size=60g inactive=7d use_temp_path=off;


server {

    # If you're going to set up wildcard DNS, leave this as an underscore.
    # otherwise you should set this to whatever domain you're hosting
    # breezewiki on. For example
    # server_name https://yourcoolbreezewiki.com
    server_name _;

    root /var/www/breezewiki;

    # Used if you go for HTTPS letsencrypt verification strategy, not
    # necessary if you're doing wildcard DNS but it doesn't hurt anything.
    location /.well-known {
        allow all;
    }

    # see https://www.nginx.com/blog/nginx-caching-guide/
    location / {
        proxy_cache breezewiki_cache;
        proxy_cache_use_stale error timeout updating http_500 http_502
                              http_503 http_504;
        proxy_cache_lock on;
        proxy_cache_background_update on;
        proxy_ignore_headers Cache-Control;

        # 24 hour caching is probably ok for a wiki? idk.
        proxy_cache_valid 404 10m;
        proxy_cache_valid 200 301 302 72h;
        proxy_cache_valid any 1m;

        # use the cookie too so we cache themes correctly
        proxy_cache_key $host$proxy_host$request_uri$cookie_theme;

        proxy_pass http://127.0.0.1:10416;
        proxy_set_header Host $host;

    }

    # certbot will change this to 443 for you if you're using certbot with the
    # HTTPS verification strategy. If you want to do wildcard DNS, I'll give you
    # some changes to make to this file later in the post.
    listen 80;
}

Now, reload nginx

systemctl reload nginx

If you want to host breezewiki on a single domain then you’re almost done. You just need to set up Letsencrypt for the HTTPS certificate. Go check out certbot’s homepage if you need help using certbot and just want to host breezewiki on a single domain, and then you’re done! If you want to do something a bit more advanced, keep reading.

Bonus! Wildcard DNS with Letsencrypt

BreezeWiki can take advantage of Wildcard DNS to make using it a bit nicer for anyone trying to use it manually (instead of with a browser extensions).

Normally, when you’re on a page and want to use breezewiki you need to go up to the URL (say minecraft DOT fandom DOT com), and then edit it to yourcoolbreezewiki.com/minecraft. Kind of annoying because you need to move the minecraft from the start to the END of the URL. If you set up Wildcard DNS, then you can just change it to minecraft.yourcoolbreezewiki.com which is a bit nicer to do. But, the setup is more complicated because now we need a wildcard DNS entry and a wildcard TLS certificate.

How you set up a wildcard DNS entry depends on your DNS provider. With most user-friendly DNS systems you just create an A (or AAAA) record for * (or *.subdomain if you want to host it on a subdomain) and then set the IP to your server’s IP. That’s pretty simple. The complicated part is getting the TLS certificate, because now you need to use the DNS method for proving you own your domain.

certbot has built-in support for doing this DNS automation for a number of DNS providers. I do not like any of the supported DNS providers, for various reasons. If you’re into hosting your DNS on CloudFlare or AWS then by all means I guess go for it but ehhhhhhhhh no thank you. Instead, I did what’s called ACME Delegation. In short, we’re going to set up a DNS server on our own server- but don’t worry, we don’t need to entirely self-host DNS. Instead, we put a special CNAME record in our normal DNS provider called _acme-challenge. That record will tell Letsencrypt “hey go talk to my DNS server I’m running over here on the side, it has permission to verify that I own this domain”. Pretty neat!

The two pieces of software we’re going to use to do this are joohoi/acme-dns and acme-dns/acme-dns-client. These are both written in go so we’ll need to install the go compiler. Once again, we need a new enough version, and bullseye-backports provides:

sudo apt install -t bullseye-backports golang

Then we need to get the code

cd $HOME
git clone https://github.com/acme-dns/acme-dns-client
git clone https://github.com/joohoi/acme-dns

And build/install them

cd $HOME/acme-dns-client
go build
sudo install --mode 755 -D -t /usr/local/bin acme-dns-client

cd $HOME/acme-dns
go build
sudo install --mode 755 -D -t /usr/local/bin acme-dns
sudo install --mode 644 -D -t /etc/systemd/system acme-dns.service
sudo install --mode 644 -D -t /etc/acme-dns/config.cfg config.cfg

Now to do some configuration

nano /etc/acme-dns/config.cfg

Here’s what my config looks like. I think by default it’s configured to let anyone use your acme-dns service but that seems a bit silly. I didn’t set up authentication for it, but I did limit it to 127.0.0.1.

[general]
listen = "0.0.0.0:53"
protocol = "both"
# domain name to serve the requests off of
domain = "auth.yourcoolbreezewiki.com"
# zone name server
nsname = "auth.yourcoolbreezewiki.com"
# admin email address, where @ is substituted with .
nsadmin = "admin.yourcoolbreezewiki.com"
# predefined records served in addition to the TXT
records = [
    # domain pointing to the public IP of your acme-dns server 
    "auth.yourcoolbreezewiki.com. A 69.69.69.69",
    # specify that auth.yourcoolbreezewiki.com will resolve any *.auth.yourcoolbreezewiki.com records
    "auth.yourcoolbreezewiki.com. NS auth.yourcoolbreezewiki.com.",
]
debug = false

[database]
engine = "sqlite3"
connection = "/var/lib/acme-dns/acme-dns.db"

[api]
ip = "127.0.0.1"
disable_registration = false
port = "2043"
tls = "none"
# optional e-mail address to which Let's Encrypt will send expiration notices for the API's cert
notification_email = ""
# CORS AllowOrigins, wildcards can be used
corsorigins = [
    "*"
]
use_header = false
header_name = "X-Forwarded-For"

[logconfig]
# logging level: "error", "warning", "info" or "debug"
loglevel = "info"
logtype = "stdout"
# format, either "json" or "text"
logformat = "text"

You might run into problems with this if your server is running systemd-resolved. I don’t really know how to help you in that case, because I don’t run systemd-resolved. If you for sure know that you don’t need resolved you can force-disable it with sudo systemctl mask systemd-resolved but seriously go look into the implications of doing this before you do it.

Anyhow, turn on your acme-dns server now.

systemctl daemon-reload
systemctl enable --now acme-dns

Now you need to add a couple DNS records to your domain’s DNS, however you usually do that. You need

Keep that tab open because we’re going to add the CNAME record I mentioned in a moment.

Now we need to configure acme-dns-client which is the bit that certbot is going to use. We do that with a command like this:

sudo acme-dns-client register -d 'yourcoolbreezewiki.com` -s http://127.0.0.1:2043

As part of this process it will give you something you need to copy-paste into your DNS provider. Create a CNAME record with whatever it gives you, which should look a bit like eabe2453-cafe-9999-bad1-80085ace5ff2.auth.yourcoolbreezewiki.com. You’ll need to wait for the CNAME record to propagate.

Once that’s done, you can get your certificate with certbot

sudo certbot certonly --manual --preferred-challenges dns --manual-auth-hook 'acme-dns-client' \
    -d 'yourcoolbreezewiki.com' -d '*.yourcoolbreezewiki.com'

Make sure you have the domain with and without the wildcard! You need both!

That gets the cert and also sets it up for auto-renewal later. Finally you need to modify the nginx config from earlier. At the very end, you would’ve had something like this

    # HTTPS verification strategy. If you want to do wildcard DNS, I'll give you
    # some changes to make to this file later in the post.
    listen 80;
}

That’s going to change to this

    listen 443 ssl;
    ssl_certificate /etc/letsencrypt/live/yourcoolbreezewiki.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/yourcoolbreezewiki.com/privkey.pem;
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;

}

server {
    return 301 https://$host$request_uri;
    listen 80;
    server_name _;
}

(Double check that that is in fact the right path for your certificate).

Now systemctl reload nginx and you should hopefully be done! Yay!!

If you have any questions feel free to contact me or come join the BreezeWiki Matrix where the dev and server operators like me hang out.