ngx_http_map_module

Sat Jun 22 '19

Overview

I like to serve files. Sometimes, I don’t want their URLs to be as guessable as the file names. We can use a map in nginx to serve files under paths that differ from their filenames and the Content-Disposition HTTP header to include their original filename in the response.

This is roughly the approach I use:

  1. We see a request for /top-secret.

  2. We read a mapping and find /top-secret "cat-pictures.tar.gz";.

  3. We add the header Content-Disposition: inline; filename="cat-pictures.tar.gz"

  4. We serve the file named “cat-pictures.tar.gz” from disk.

This means I can visit /top-secret and, if my browser asks me to save the file because it can’t display it or because I told it to, it can use “cat-pictures.tar.gz” as the default filename instead of “top-secret”.

A downside to this is that updating the mapping requires an nginx reload. This is a bit inconvenient for me, as it requires sudo, so I’ve just thrown %wheel ALL=NOPASSWD: /usr/sbin/nginx into sudoers. This is probably bad hygiene. I think sudoers can discriminate on program arguments somehow so you could only allow nginx -t and nginx -s reload or something and that might be better.

Configuring nginx

A very simple nginx configuration I use for this looks roughly like the following:

map $uri $filename {
  include /srv/files.froghat.ca/map;
}

server {
  listen       443 ssl http2;
  listen       [::]:443 ssl http2;

  server_name  files.froghat.ca;
  root         /srv/files.froghat.ca/files;

  location / {
    # Ignore a trailing slash...
    rewrite    ^/(.*)/$ /$1 last;
    # I don't remember why we need this...
    set        $filename_ $filename;
    add_header Content-Disposition 'inline; filename="$filename_"';
    try_files  /$filename =404;
  }
}

The set $filename_ $filename; line should stand out to you as a clear indication that I have no idea what I’m doing and that there are much better blogs for you to be reading.

I wrote that a long time ago and, from what I remember, it was necessary so that add_header would properly interpolate the filename into the string or something. I checked recently and nothing bad seems to happen without that set line, so either it was fixed at some point or I’m just making stuff up. You can probably omit the line and write filename="$filename" instead.

In addition to the nginx configuration above, there is a directory at /srv/files.froghat.ca/files where I keep the files to be served under their original unlisted filenames. And the file containing the mapping at /srv/files.froghat.ca/map

$ cat /srv/files.froghat.ca/files/bar
it works
$ head -n1 /srv/files.froghat.ca/map
/foo "bar";
$ curl -i https://files.froghat.ca/foo
HTTP/2 200
...
content-type: application/octet-stream
content-length: 9
content-disposition: inline; filename="bar"

it works

In the response, we get the appropriate content headers, including the desired Content-Disposition.

Again, keep in mind that, every time the mapping file is updated, nginx needs to be reloaded.

Building the map

This is entirely subject to your desires and use case. If you want the URL to be known when the filename or content is known, you could use a hash on one of those things. Otherwise, you might salt the input to the hash function.

I imagine using a salt and hashing the filename makes the most sense generally. This assumes that if the content of a file changes, but not the filename, then you want the associated URL to remain the same. If, for some reason down the line, you want to reproduce a path association, the filename will probably more stable than the content. So you’ll have a better chance to get the same paths you had earlier if your URLs are filename-dependent rather than content-dependent.

But I mean it’s up to you and what makes you happy.

Now, base64 encoded bytes can be URL safe and not that much longer than the bytes they encode. But I don’t find them especially easy for humans to remember or communicate on their own. So when that’s something I care about, I’ll often use mnemonicode.

The encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words. The words have been chosen to be easy to understand over the phone and recognizable internationally as much as possible.

For example:

$ echo -n derp | mnencode
 Wordlist ver 0.7
rhino friday texas

When I use this, I use sed to capitalize words and remove spaces.

$ echo -n derp |\
  mnencode 2>/dev/null |\
  sed 's:\S*:\u\0:g' |\
  sed 's:[ \.]::g'
RhinoFridayTexas

Again, nginx will match on this in a case-insensitive manner. This is handy because that’s one thing fewer that can go wrong if the URL is being communicated verbally or whatever.

Finally, hash digests can end up being pretty long when encoded with mnencode. SHA-256 has a digest size of 256 bits, which is 24 human words (as opposed to computer memory words). I typically truncate them, but I’m not sure that is typically an acceptable thing to do. So you maybe shouldn’t do that.

SHA-3 has a thing called SHAKE that is apparently designed for variable length outputs.

$ python3 -c 'import sys, hashlib
hash = hashlib.shake_128(b"derp")
sys.stdout.buffer.write(hash.digest(4))' | mnencode
 Wordlist ver 0.7
belgium bombay puzzle

You could also come up with your own encoding that uses a larger vocabulary or something. Three words from a vocabulary of 6,981,463,658,332 is enough to cover 128 bits. That seems doable.