ngx_http_map_module
Sat Jun 22 '19
Overview
I like to serve files. Sometimes, I don’t want their URLs to be as guessable as the file names. We can use a map in nginx to serve files under paths that differ from their filenames and the Content-Disposition HTTP header to include their original filename in the response.
This is roughly the approach I use:
We see a request for
/top-secret
.We read a mapping and find
/top-secret "cat-pictures.tar.gz";
.We add the header
Content-Disposition: inline; filename="cat-pictures.tar.gz"
We serve the file named “cat-pictures.tar.gz” from disk.
This means I can visit /top-secret
and, if my browser asks me to save the file because it can’t display it
or because I told it to,
it can use “cat-pictures.tar.gz” as the default filename instead of “top-secret”.
A downside to this is that updating the mapping requires an nginx reload.
This is a bit inconvenient for me, as it requires sudo,
so I’ve just thrown
%wheel ALL=NOPASSWD: /usr/sbin/nginx
into sudoers.
This is probably bad hygiene.
I think sudoers can discriminate on program arguments somehow
so you could only allow nginx -t
and nginx -s reload
or something
and that might be better.
Configuring nginx
A very simple nginx configuration I use for this looks roughly like the following:
map $uri $filename {
include /srv/files.froghat.ca/map;
}
server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name files.froghat.ca;
root /srv/files.froghat.ca/files;
location / {
# Ignore a trailing slash...
rewrite ^/(.*)/$ /$1 last;
# I don't remember why we need this...
set $filename_ $filename;
add_header Content-Disposition 'inline; filename="$filename_"';
try_files /$filename =404;
}
}
The set $filename_ $filename;
line should stand out to you as
a clear indication that I have no idea what I’m doing
and that there are much better blogs for you to be reading.
I wrote that a long time ago and,
from what I remember,
it was necessary so that add_header
would properly interpolate the filename into the string or something.
I checked recently
and nothing bad seems to happen without that set
line,
so either it was fixed at some point
or I’m just making stuff up.
You can probably omit the line and write filename="$filename"
instead.
In addition to the nginx configuration above,
there is a directory at
/srv/files.froghat.ca/files
where I keep the files to be served
under their original unlisted filenames.
And the file containing the mapping at /srv/files.froghat.ca/map
$ cat /srv/files.froghat.ca/files/bar
it works
$ head -n1 /srv/files.froghat.ca/map
/foo "bar";
$ curl -i https://files.froghat.ca/foo
HTTP/2 200
...
content-type: application/octet-stream
content-length: 9
content-disposition: inline; filename="bar"
it works
In the response, we get the appropriate content headers, including the desired Content-Disposition.
Again, keep in mind that, every time the mapping file is updated, nginx needs to be reloaded.
Building the map
This is entirely subject to your desires and use case. If you want the URL to be known when the filename or content is known, you could use a hash on one of those things. Otherwise, you might salt the input to the hash function.
I imagine using a salt and hashing the filename makes the most sense generally. This assumes that if the content of a file changes, but not the filename, then you want the associated URL to remain the same. If, for some reason down the line, you want to reproduce a path association, the filename will probably more stable than the content. So you’ll have a better chance to get the same paths you had earlier if your URLs are filename-dependent rather than content-dependent.
But I mean it’s up to you and what makes you happy.
Now, base64 encoded bytes can be URL safe and not that much longer than the bytes they encode. But I don’t find them especially easy for humans to remember or communicate on their own. So when that’s something I care about, I’ll often use mnemonicode.
The encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words. The words have been chosen to be easy to understand over the phone and recognizable internationally as much as possible.
For example:
$ echo -n derp | mnencode
Wordlist ver 0.7
rhino friday texas
When I use this, I use sed to capitalize words and remove spaces.
$ echo -n derp |\
mnencode 2>/dev/null |\
sed 's:\S*:\u\0:g' |\
sed 's:[ \.]::g'
RhinoFridayTexas
Again, nginx will match on this in a case-insensitive manner. This is handy because that’s one thing fewer that can go wrong if the URL is being communicated verbally or whatever.
Finally,
hash digests can end up being pretty long when encoded with mnencode
.
SHA-256 has a digest size of 256 bits, which is 24 human words
(as opposed to computer memory words).
I typically truncate them, but I’m not sure that is typically an acceptable thing to do.
So you maybe shouldn’t do that.
SHA-3 has a thing called SHAKE that is apparently designed for variable length outputs.
$ python3 -c 'import sys, hashlib
hash = hashlib.shake_128(b"derp")
sys.stdout.buffer.write(hash.digest(4))' | mnencode
Wordlist ver 0.7
belgium bombay puzzle
You could also come up with your own encoding that uses a larger vocabulary or something. Three words from a vocabulary of 6,981,463,658,332 is enough to cover 128 bits. That seems doable.