Some things I did in 2019

Tue Jan 21 '20

At the end of 2018, I was very interested in learning Rust and using TimescaleDB to build an analytics platform for Twitch.tv (click with caution; loud and obnoxious auto-playing videos).

Comfy Sheep

For the first five-ish months of 2019, I worked on Comfy Sheep. There’s a big writeup in that project readme about what is all there. Briefly, there is a program that scrapes the Twitch.tv API for live-streams and another program that logs chat messages/stream events for the top 60k-100k live-streams. The data collected is stored in a PostgreSQL database that uses TimescaleDB.

The idea was to use the data to inform predictions and decision making about live-streaming. But I don’t know how to do that and I don’t know anybody who does. I guess I didn’t think that through.

TimescaleDB was fun to use and it worked well. Except one time it had a bug that would segfault the PostgreSQL process running the query, causing PostrgreSQL to restart whenever the chat logger tried to write data. When the database would come back up, the logger would connect to it again, try to write the same message again, and trigger a crash again. Fortunately, the bug had already been fixed and all I needed to do was package the new version[*] and update it on the host.

Partial indexes were very useful. A job would look at viewership over the past week to determine which streams should be logged and how to distribute streams to loggers so that they might receive roughly equal amounts of chat activity. The query used an index that only contained samples with at least some number of viewers, reducing the size of the index by about 80%. This, and using vacuum to update the visibility map, were the difference between the job taking eight seconds instead of eight hours or more.

Rust’s borrow checker and its very popular asynchronous runtime named “tokio” gave me a hard time. There is a library called mio (which falls under the tokio project umbrella) that is a very nice wrapper around polling. I wish there was more effort going into figuring out how to write libraries that could be used with event loops generally rather than large, general purpose, all-consuming runtimes. If I want to write something single threaded that just listens and sends on sockets, tokio seems like shooting a fly with a cannon. But minimalism and Rust don’t really seem to go together.

pizza with bazel, mozzerella, and sliced farmer sausage

Super Serious Timer Business

I wanted to do something with WebAssembly in Rust. In March I wrote a very simple timer kind of thing named Super Serious Timer Business and put it on GitLab. It uses a Rust library called Yew. Which is purportedly inspired by Elm.

I saw Elm later that year while looking for reactive-streaming-functional-pipeline things. I was interested in RxJS, Rambda, Most, Fantasy Land – among others – and was hoping to find a way to use Inferno with these mechanisms painlessly-ish.

At some point I saw something talking about how you can use these sort of things to create functions that hook up to events and evaluate to DOM elements. And then I think in that same article they mentioned Elm.

So I tried Elm and it was very fun. It has some nice quality-of-life stuff like quick compile times (I believe the Elm compiler is written in Haskell) and a nice code formatter. I, personally, quite like whitespace as syntax and prefer that to marking up my code with squiggly braces and semicolons – but that’s just me (no hate pls).

I’m looking forward using more of Elm in the future as a gateway drug to functional programming.

SCM_RIGHTS

In May I put up this blag[†] and wrote a program that used a feature of UNIX sockets called SCM_RIGHTS that allows duplicating file descriptors to other processes. I blagged about it.

slightly more burnt pizza with bazel, mozzerella, and sliced farmer sausage

Containers

After that, I got sad for reasons and I played a lot of video games. But then I stopped being sad and started working something that would let me manage and deploy game servers. I never finished it. I think because I kept increasing the scope and eventually it got hard and I got distracted.

I put some of the stuff I wrote for it up on GitLab.

The container stuff worked like this.

Runtime

The plan was to use systemd-nspawn (or possibly runc) to run containers. Both of these seemed very low-drama tools for creating namespaces that supported important things like uid mapping and seccomp. They also can set up a bit of networking if so desired. But I almost prefer to prepare the networking with iproute2 (like ip link ...) through systemd services. Using systemd for the network configuration lets you model some things like dependencies, automatic restarts, and logging or triggers on service failure.

A lot of the services are instanced (the unit name ends in a “@”), so there might only be one unit file for creating network veth pairs (named “container-veth@.service” or something), but we can start multiple instances of the services by putting a string after the “@” (like “container-veth@bob.service”) and the service file can parameterize its behaviour on the instance name.

For more control or variation among instances of a unit, we can place extra configuration in a drop-in directory. For example, if each guest container on some host is an instance of the “container@.service” unit. Then we can add extra configuration for the “bob” container by writing it to “container@bob.service.d/50-extra-stuff.conf”.

Volumes

Other than configuration, containers need a root filesystem directory tree (rootfs) to run in. This is where I got bogged down on a bunch of weird edge cases that I tried to model.

To simplify here, we’ll say that containers use a rootfs and the host can can get a rootfs by extracting an image to a directory somewhere.

It seemed like a good way to to manage images was through tool called casync. We can give it a target directory tree, like our game & operating system, and it breaks all that up into chunks, stores them, and gives us an index (.caidx) that we can use later to reassemble the directory from the chunks.

The chuck storage is re-used for different indexes too. So, in the future, when I want to make a new image of an updated version of this game, chunks that haven’t changed are not written again.

The deltas operate at the level of chunks rather than files. This is particularly nice for games which might ship a large binary blob with only a few differences in it from the previous version.

Chunks themselves are compressed by casync. I found that using zstd for compression was much faster.

The indexes and the chunks that casync creates serve as our images. We can give casync an index of a container filesystem and it will extract it for us. It can also fetch chunks over the network through SSH if they not stored locally.

also a pizza with bazel, mozzerella, and sliced farmer sausage

Isolation

One part of container isolation is mapping each user in a container to an otherwise unused user on the host.

If you segment out the group and user ranges on the host into 16 bit sized ranges, you can create reservations for your containers that look like this:

0             root on host
1000          user on host
0x10000       root on container0
0x10000+1000  user on container0
0x20000       root on container1
0x20000+1000  user on container1
...

The goal is that no range for the host or its guests are overlapping.

An issue emerges where if container0 and container1 both use the same image, then the host needs two copies of it with different owners.

A solution to this is to provide something like a bind mount that allows accessing some path with shifted ownerships. I think something called shiftfs has been trying to make its way into Linux for a while. And it looks like Ubuntu might already ship with it since whenever I search for shiftfs I find a bunch of Ubuntu security notices related to it.

There is also a FUSE implementation of overlayfs called fuse-overlayfs that has an owner shifting feature. But, since that’s FUSE, that’s automatically removed from consideration.

The approach I chose was to use a feature of overlayfs (accessible with the metacopy=on option) which allows modifying file attributes in an overlay without copying the file contents up from the lower layer.

The host then keeps only one copy of each image that its guests are using. When a guest uses an image, we mount an overlay for that guest with the image as the lower layer and shift the owner of every file in the overlay to be suitable for the guest.

During this escapade with containers and image management, I wrote several tools to help make things work. I want to salvage a couple so I yoinked them out of the project they were in and fixed them up a bit (rewrote them) so I could publish them along with this blag post.

shift-own is a binary that lets you chmod a file or directory tree according to some shift and range. So if you have 0x10000 sized reservations and want to make a file or directory (and everything under it) accessible to the reservation starting at 0x30000, then …

shift-own -s 0x30000 -r 0x10000 path/to/whatever

… will do that.
shift-mount will create an overlay with metacopy=on and run the ownership shifting in the overlay.

Here’s an example of both.

# ./shift-mount --oneshot /opt/alpine/ /opt/alpine-int/ /tmp/alpine-shifted/
Mounted /tmp/alpine-shifted/
4484 files under /tmp/alpine-shifted/ shifted with 0x0 using range 0x10000

# ./shift-own -s 0x30000 /tmp/alpine-shifted/ -v
0:0 -> 196608:196608 .. /tmp/alpine-shifted/
0:0 -> 196608:196608 .. /tmp/alpine-shifted/proc
0:0 -> 196608:196608 .. /tmp/alpine-shifted/proc/self
0:0 -> 196608:196608 .. /tmp/alpine-shifted/proc/self/uid_map
0:0 -> 196608:196608 .. /tmp/alpine-shifted/usr
0:0 -> 196608:196608 .. /tmp/alpine-shifted/usr/lib
0:0 -> 196608:196608 .. /tmp/alpine-shifted/usr/lib/libip4tc.so.2
0:0 -> 196608:196608 .. /tmp/alpine-shifted/usr/lib/libreadline.so.8
0:0 -> 196608:196608 .. /tmp/alpine-shifted/usr/lib/engines-1.1
0:0 -> 196608:196608 .. /tmp/alpine-shifted/usr/lib/engines-1.1/afalg.so
...

That’s a bad example because I could have passed -s 0x30000 to shift-mount and it would have done what shift-own did. But you get the idea…