Caching with LVM snapshots for ephemeral self hosted GitHub runners

Thu Mar 16 '23

GitHub Actions (Uber for git hooks) is a modern CI/CD offering by GitHub (Uber for broken links and 404 pages). YAML documents in git repositories on GitHub serve as scripts for runners, allowing you to dispense chores for GitHub’s VM when some event, like a commit, happens in your GitHub project. GitHub has their own runners in the cloud that you can use.

GitHub hosted runners do not persist state between jobs. This is cool because you can expect jobs to start from a consistent and predictable state. But it also can make jobs take longer if they can’t re-use work from previous runs; like incremental rebuilds or downloading and compiling dependencies that change infrequently.

GitHub Actions features something they call a cache action which is a way to copy state between jobs by roughly doing tar c job-cache | curl --form "file=@-" .... But the cache action does not work well for building container images with Podman.

I wanted to try sharing state between jobs by binding a volume from the host to the runner. So wrote a short Python script to do that.

It uses LVM to make snapshots for different jobs and it’s the fun part of this post. But to write that, I had to go through the anguish of running the GitHub runner myself, ensuring it does not persist state between jobs.

This is one of my longer and more meandering blog posts. So here’s an improvised table of contents – though you should read the whole thing because there will be a quiz at the end.

This post explains a bit about GitHub’s software for self hosted runners
pre and post job hooks
using runners with ephemeral to only do one job
that the cache action on GitHub is slow for caching Podman storage
how the lvm-cache-friend script communicates with a workflow and uses LVM to label cache volumes like the “key” in the cache action
how I’m self hosting GitHub’s runner in systemd-nspawn to ensure a consistent state at the start of each workflow job
how to prepare the first/default cache volume in LVM
and the lvm-cache-friend.service file for running the script as a service in systemd.

self hosted runners

As an alternative to GitHub hosted runners, you can run workflows on self hosted runners. (Like some sort of private cloud.) But, these self hosted runners don’t work at all like GitHub’s runners. To be fair, they do tell you this. But I don’t think it prepares you for the cursed architecture that awaits.

As far as I can tell, the GitHub runner doesn’t separate application, configuration, and persistent or runtime state. By default, it sets itself up to automatically update itself. And it runs workflows as the service’s user without dropping any privileges. So programs run from the YAML workflows have write access to the runner that runs them. And that seems like pretty not great hygiene to me. Even ignoring security concerns, this seems like an unnecessary risk if the workflow does something stupid by mistake.

It makes much more sense if workflow execution was done by running a separate program that sets up its environment depending on user requirements. Like how sudo, env, strace, or unshare take another command to run as an argument. Those programs allow users to run other programs under different environments by setting those environments up to the users liking and then starting the other program in that new environment. If workflows were run by creating a new process, it should be easy to allow users to configure the environment and privileges that a workflow runs in by starting its process under sudo or bubblewrap or even a Docker/Podman container probably.

I looked into a couple ways of cleaning this up. Using hooks to run a script before and after a job. And configuring the runner with --ephemeral.

hooks

Runners can be configured to run scripts before and after a job. So you could possibly change the environment before a job and then restore it after. But this is this doesn’t work the same as programs doing fork-exec.

Suppose you wanted jobs to run in a network namespace. My understanding is, it’s typical to fork a process, then the child unshares or something to make a new namespace, the parent sets up the namespace from the outside, finally the child continues in the prepared namespace and will execs the job or some program. This is is how bubblewrap, and programs like it, allow you to run some program of your choosing in a namespace to your liking.

In this case, maybe we want to modify the runner so that it runs in an environment to our liking. But we can’t exec the runner; it’s already running and it’s calling our hooks.

One idea, that I haven’t tried, is using setns to change the namespace of the runner’s process. We can’t just call that in the hook, because it will just change the namespace of the hook. We need the runner to make that call from its process. To my knowledge, it’s possible to attach to a running process with a debugger, like gdb, and use the debugger to make calls in the debugged process. So maybe you can change another process’s namespace with something like:

$ gdb -p 123
(gdb) call (int)pidfd_open(456, 0)
(gdb) call (int)setns($, 0)

But this is a silly thing to do and even if it worked it would likely have a bunch of wacky consequences. I just mention it because instead of saying it’s not possible to get the isolation desired, I’d prefer to just say that I can’t come up with any ways that aren’t absurd.

`--ephemeral`

To install the GitHub runner software, GitHub provides you with a link to a tarbomb you can download and extract all over your current directory to get some scripts used to configure and start the program.

The config.sh script is intended to configure a new runner; requiring a runner name, token, and repository to run workflows for.

There are a few more options like --disableupdate, which prevents the runner from updating itself. And --ephemeral which, as GitHub’s documentation explains, configures the runner to just do one job.

These options let us limit the ambitions of the runner so that we can try starting it in an already limited environment.

caching Podman

Podman’s image storage does not cache well using GitHub’s cache action.

The first problem is that some files, although owned by the current user, have no readable bits set. When the cache action runs tar it will fail to read the file. These bits are fine for Podman, and you can even read them under podman unshare, but the action doesn’t know this and can’t be configured to do this.

I had a stupidly hard time trying to work around this. By far the best approach I’ve seen sets the suid bit on tar so it runs with elevated permissions when the script for the cache action runs it.

The second problem is that caching this is slow. I’m not sure if this is because of how Podman makes layers or if it’s because node_modules is lots of tiny files. But creating or extracting with tar is too slow for it to be worth it.

Not to say it was a problem with tar; as I also tried copying layers using skopeo and that also took about as long building the images without a cache.

I’m starting to think that once a JavaScript project has tens of direct dependencies, it’ll have thousands of implicit dependencies, and hundreds of thousands of files. (This is at least the case with the two recent examples I’ve had experience with.) And doing anything with all of them is not gonna be real super speedy.

lvm-cache-friend

This workflow is running in a system with a mount namespace and where changes are not persisted between jobs because they’re being written to a tmpfs that gets thrown out after the job finishes.[*]

A straightforward way to persist state, when a system is running on a tmpfs, is to bind mount something from the host that will outlive the tmpfs. Like volumes for containers.

The cache action on GitHub has an extra little feature where caches are saved under a key and can be looked up later with a kind of glob match. This way, multiple caches can exist with different keys. Instead of having just one cache shared between each run like with a simple bind mount to one place on the host.

This is an example from the GitHub documentation:

restore-keys: |
  npm-feature-d5ea0750
  npm-feature-
  npm-
The restore key npm-feature- matches any key that starts with the string npm-feature-. For example, both of the keys npm-feature-fd3052de and npm-feature-a9b253ff match the restore key. The cache with the most recent creation date would be used. The keys in this example are searched in the following order:

npm-feature-d5ea0750 matches a specific hash.

npm-feature- matches cache keys prefixed with npm-feature-.

npm- matches any keys prefixed with npm-.

—Example using multiple restore keys

For this off brand cache, I want a similar thing. This “cache” is just a bind mount from the host. But we can mount different paths from the host and even choose the right one using something like those “keys” requested in the workflow.

One detail of the cache action, GitHub explains, is that caches are implicitly scoped to branches.

Workflow runs can restore caches created in either the current branch or the default branch (usually main). If a workflow run is triggered for a pull request, it can also restore caches created in the base branch, including base branches of forked repositories.

—Restrictions for accessing a cache

I didn’t implement that because it seems complicated. Like the key used in the workflow is not a single source of truth about what cache is used. It sounds useful to look for caches in relevant branches, but I can’t think of a good reason to not make that explicit in the keys.

So we want the runner to be able to annotate caches – with its git branch name or a hash of a .lock file or something – and look for caches with similar annotations so it can use them if they’re there.[†]

And it’d be nice if the usage in the workflow was similar to the cache action. So it’s simple and familiar. The information it mostly needs is:

some criteria to look up what cache to use
the path to load the cache in the runner
and the annotations for the new cache being saved.

The service uses thin LVM snapshots because I wanted to try them and see how they worked and because it seems like a good way to a kind of de-duplication at the block layer or something.

When the workflow asks for a cache, we look for a logical volume that best matches the annotations that the workflow requested, we snapshot it to make a new logical volume, and mount that volume into the runner.

In LVM, logical volumes can have tags. We can shove our annotations into those tags to use LVM as a kind of database instead of keeping our own state.[‡]

Here’s an example where annotations from the workflow are namespaced with “friend:cache:” prefix.

$ lvs \
    --sort -lv_time \
    -o vg_name,lv_name,tags,origin \
    --select lv_tags=friend:snapshot \
    --reportformat=json \
    | jq '.report[].lv | .[].lv_tags |= split(",")'
[
  {
    "vg_name": "banana",
    "lv_name": "friend-Y_v--tIL",
    "lv_tags": [
      "friend:cache:dc5e8c5be193d4952bceae567213b9007f5a9fa5bfd85c1dd9a8a4a6cb759422",
      "friend:cache:linux",
      "friend:snapshot"
    ],
    "origin": "friend-Y_v8SbU5"
  },
  {
    "vg_name": "banana",
    "lv_name": "friend-Y_v8SbU5",
    "lv_tags": [
      "friend:cache:dc5e8c5be193d4952bceae567213b9007f5a9fa5bfd85c1dd9a8a4a6cb759422",
      "friend:cache:linux",
      "friend:snapshot"
    ],
    "origin": "friend-Y_v7HD25"
  },
  {
    "vg_name": "banana",
    "lv_name": "friend-Y_v7HD25",
    "lv_tags": [
      "friend:cache:dc5e8c5be193d4952bceae567213b9007f5a9fa5bfd85c1dd9a8a4a6cb759422",
      "friend:cache:linux",
      "friend:snapshot"
    ],
    "origin": "friend-default"
  }
]

Though, that’s kind of a crummy example because the tags for volume each are the same.

Anyway, because the annotations are stored in a list, it seemed easier to support looking up caches based on exact matches of multiple terms in any order rather than a prefix match that GitHub’s action uses with its “key” thing.

Earlier in this post, there’s an example from the GitHub documentation using the restore key npm-feature-. Instead of looking for snapshots prefixed with npm-feature-, we would instead write npm feature to mean two tags (“npm” and “feature”) that are both required and can occur in any order. I think it’s a bit simpler and maps nicer to the LVM tags; but it’s a difference from the syntax that the cache action uses.

Here’s an example of asking for a new snapshot.

ncat -U /run/lvm-cache-friend/socket <<EOF
mount /home/ghrunner/cache
> linux dc5e8c
> linux
< linux dc5e8c
EOF

The first two lines say to use (and snapshot) an existing volume that has both tags “linux” and “dc5e8c” or just “linux”. Those criteria are evaluated in order and, once one matches an existing volume, all subsequent criteria are ignored.

The last line just specifies what annotations to give to the new snapshot. Annotations are individual terms delimited with whitespace and order doesn’t matter. Again, this is a bit different from the GitHub action which does the sort of prefix match thing.

Here’s a usage example in a GitHub workflow as a step in a job. This includes the current branch name to replicate some of the implicit branch name stuff that the cache action uses.

- run: |
    ncat -U /run/lvm-cache-friend/socket <<EOF
    mount /home/ghrunner/cache
    > linux ${{ hashFiles('Cargo.lock','Cargo.toml') }}
    > linux ${{ github.ref_name }}
    > linux
    < linux ${{ github.ref_name }} ${{ hashFiles('Cargo.lock','Cargo.toml') }}
    EOF

ncat turns out to be a nice program for this because, by default, it will wait for the server to disconnect before exiting. The server will create a snapshot and mount it before terminating the connection. This way, and the job does not continue until the mount has been set up (or at least attempted). Although, the success code is not sent so the job will try to continue even if the mount was not successful for some reason.

Mounting the snapshot into the runner’s mount namespace turned out to be kind of a thing. Mostly, I just copied how systemd does it for its machinectl bind command.

There’s a silly detail where, in the runner, we want to move the mount to the location of the workflow’s choosing, but we can only move the mount if it’s under a private mount, not a shared mount. If we try, we might see an error saying something like: “moving a mount residing under a shared mount is unsupported”. But the host can’t mount volumes into a private mount in the runner; I guess because it’s private. So the host needs to mount the volume inside a shared mount and then the runner makes the shared mount private and moves the volume.

Also, the host does this each time the workflow gets a cache. But I don’t think it can reshare/unprivate the private mount, so it makes a new shared mount each time. And that new shared mount has to be in an existing pre-arranged bind mount in the runner. The directory containing the unix socket for making requests is already bind mounted into the runner, so it can just use that and make those temporary shared mounts in there next to the socket.

It’s probably worth noting that we only need to do this because workflows can request caches to be mounted at arbitrary locations. And this is kind of a sketchy idea anyway because it lets the runner mount something to anywhere in its machine even if its user doesn’t otherwise have permission. If the caches were mounted to just one pre-arranged location under a shared mount, we wouldn’t need to move them and it’d be a bit simpler and safer. But it’s a good thing I didn’t do that, otherwise you would have missed out on reading these three paragraphs of an obscure mount behaviour that will never matter or be useful to you in the future.

self hosting GitHub’s runner in systemd-nspawn

I ended up using systemd-nspawn to start a machine that runs the GitHub runner as a service. systemd-nspawn lets us have network and user namespaces so that GitHub’s software is relatively isolated. And it can be configured so workflows can build and run container images with Podman.

I tried to just run the runner under Podman (instead of systemd-nspawn) but had a lot of issues with user namespaces and getting Podman to run as part of workflows under Podman running the GitHub runner. There is a container image for Podman that demonstrates how to run Podman under Podman but it seemed overall way more annoying to troubleshoot than systemd-nspawn.

This is kind of self-inflicted because I wanted to run the GitHub runner in a user namespace where root on the runner is an unprivileged user on the host. This would have been simpler if I hadn’t wanted that but I was stubborn over the thought that having nested user namespaces really shouldn’t be that hard.

The systemd-nspawn machine running the GitHub runner uses some options to prevent persisting state between reboots. By shutting down and restarting the machine after the runner does one job, we can ensure that jobs are run with clean and predictable state.

The service file for the GitHub runner service in the machine is part of how all this works.

[Unit]
...
OnSuccess=poweroff.target

[Service]
...
ExecStartPre=/home/ghrunner/get-token-and-config.sh \
    --unattended \
    --disableupdate \
    --ephemeral \
    --name dingus \
    --replace
ExecStart=/home/ghrunner/bin/runsvc.sh
...

Before running a GitHub runner, you’ll need run a config.sh script to configure it with a runner token and the repository to get workflows from. We’re doing that in ExecStartPre via a helper script called get-token-and-config.sh which uses GitHub’s API to get a runner token and then call config.sh passing along the other arguments given, like --ephemeral in this case, ensuring the service only runs one job before stopping.

The bearer authorization for GitHub’s API (and the the repository URL) is hard-coded; written directly into that helper script and readable by workflows. It’s not great. But this proof of concept already ended up being way more effort than I intended, so I’m perfectly content with it as it is and saying that this part that needs improvement is left an exercise for the reader.

ExecStart runs the runner normally using a script from GitHub’s tarbomb. Since we’re using --ephemeral, runsvc.sh quits after one job is run. Then, OnSuccess ensures that when the service succeeds – when runsvc.sh quits after a job is done – the machine shuts down and any temporary state is released.

On the host, we can configure this machine to restart when it shuts down by adding an override for its systemd-nspawn@.service configuration described below. The reason the machine uses poweroff target instead of the reboot target for the OnSuccess is that reboot.target seemed prevent the system from stopping at all. Every time it would go to stop the GitHub runner service, in order to shut down the system, the OnSuccess would be evaluated and it would start to reboot instead. Kinda funny when you think about it. Anyway, doing the restart in the service on the host is probably cleaner anyway in terms of a separation of concerns or whatever.

I used a Dockerfile and podman build to prepare the rootfs for the systemd-nspawn machine. You can view the Dockerfile on GitHub.

Build the image.
```
podman build . --tag github-runner
```
Create a container from the image and pipe an export of the rootfs to machinectl.[§]

[§]
In these examples, the machine name is github-runner. (If I told you what I really named my machine, they would both of us away for a long time.)
```
podman export $(podman create github-runner) \
  | machinectl import-tar - github-runner
```
machinectl should extract the rootfs to /var/lib/machines/github-runner.

You ought to be able to create a tar from a Dockerfile without podman create just doing podman build --output type=tar, but it seems like there’s a bug where suid bits aren’t set and this messes up some things like newuidmap.
The systemd-nspawn machine runs in a user namespace where root in the machine maps to an unprivileged user on the host.

For this machine, I’ve used --private-users-ownership=chown to modify ownership of the rootfs use a new range of unprivileged users. This preserves ownership relatively within the machine. Whatever uid N is mapped on the host to root in the machine, the first user in the machine (with uid 1000) will have its files owned by N + 1000 on the host.
```
systemd-nspawn -M github-runner \
    --private-users-ownership=chown \
    --private-users=pick \
    /bin/true
```
This lets systemd-nspawn choose a uid itself. Or you can put your own value in there explicitly. It should output something like:
```
Selected user namespace base 570097664 and range 65536.
```
Now, /var/lib/machines/github-runner should be owned by 570097664:570097664 and /var/lib/machines/github-runner/home/ghrunner owned by 570098664:570098664.

systemd-nspawn apparently also supports doing this map at runtime without modifying the files on disk, but I don’t think it works right with some of the options mentioned in the next step.
The /etc/systemd/nspawn/github-runner.nspawn file lets us set defaults for our machine when systemd-nspawn boots it.

It’s important this file is created after the chown above. Otherwise, that command will use these options and the [Files] section interferes with the the chown. If you need to redo the chown, comment out that section temporarily.
```
[Exec]
PrivateUsers=570097664:131072
ResolvConf=replace-host
LinkJournal=try-host

[Files]
ReadOnly=true
Volatile=overlay
BindReadOnly=/run/lvm-cache-friend
```
BindReadOnly is the directory containing the socket to lvm-cache-friend, the program that creates LVM snapshots and mounts them into the namespace.

Volatile will overlay a temporary filesystem over the machine when it runs. Changes are made to that temporary filesystem and not persisted.

PrivateUsers contains a range, twice as large as the 2^16 default. This is so that the container can allocate an entire 2^16 to Podman as part of running workflows.

Part of the Dockerfile[¶] was to delegate subordinate user and groups to allow the GitHub runner user (in the machine) to map the range of uid (65536-131072] from the machine into a user namespace for Podman. In the host, that will be the last half of the 131072 length range.
[¶]
```
echo ghrunner:65536:65536 | tee /etc/subuid /etc/subgid
```
This way, workflows that run Podman can create their own user namespace by using the second 2^16 that we give from the host to the machine. That’s why the host uid range is 131072; for those two 2^16 chunks.

570097664

570163200

570228736

host

0

65536

131072

machine

0

65536

podman
There is also an override for the service that runs systemd-nspawn for this machine. (The configuration above was for systemd-nspawn. Below, appends to the configuration for the systemd service for systemd-nspawn.)

It’s easiest to create and modify this with systemctl edit systemd-nspawn@github-runner. (It will end up creating / modifying a file probably at /etc/systemd/system/systemd-nspawn@github-runner.service.d/override.conf.)
```
[Service]
Restart=always
Environment=SYSTEMD_SECCOMP=0
```
The Restart setting here is sort of the final piece of the puzzle for stateless self hosted GitHub runners.
1. The --ephemeral option to config.sh ensures only one job is run before the runner service quits.
2. OnSuccess=poweroff.target for the service in the machine will shut down the machine when that service quits.
3. Restart=always in systemd-nspawn@github-runner.service on the host ensures that the machine will boot up after shutting down.
4. Volatile=overlay makes sure changes in the machine are made to a temporary file system and not persisted between boots.
The Restart=always option is only while the service is active. If you stop systemd-nspawn by deactivating the service with systemctl stop systemd-nspawn@github-runner.service, then it won’t restart the machine. This is what we want.

That configuration also sets the SYSTEMD_SECCOMP=0 environment variable. Without it, something complains about keyrings and doesn’t work. I don’t understand any of what’s going on there. There might be a more responsible way to fix that. And I’m sure I’ll get around to figuring it out right after I stop disabling selinux.

Those steps set up the machine. It should be bootable with …

systemctl start systemd-nspawn@github-runner.service

… or start it and set it to start when the host boots with …

systemctl enable --now systemd-nspawn@github-runner.service

There’s just a bit of LVM setup and then also running the service that makes and mounts LVM snapshots on the host; which was kinda the entire point of this thing to begin with.

LVM

For the cache, we need a LVM volume for the service to use as a default to make snapshots.

This makes a thin pool in an existing volume group called banana.

lvcreate \
    --type thin-pool \
    --size 30G \
    banana/friend-pool

And this makes create a thin volume in that pool.

lvcreate \
    --thinpool banana/friend-pool \
    --virtualsize 3G \
    --name friend-default \
    --addtag friend:default

The service identifies this as the default volume for snapshots by the tag friend:default (the volume name and group don’t matter). It will use that volume if no other snapshot exists as a suitable choice.

--virtualsize seems to prevent the volume from growing past that size. But I’m not sure what setting --size on the pool does. It seems you can still over-provision it with thin volumes. So you can create more than ten 3G volumes in a 30G pool, but it seems there is still a 30G limit on the actual usage of all combined volumes. So if some IO to a volume would causes the pool to use more than its 30G limit, then that IO fails with some error. At least in my limited testing.

Something else that might be worth looking into is the --discards option to lvcreate. There are some details around thin volumes actually reporting to use less data when files are deleted. Our service for creating and mounting snapshots will mount volumes with the discard option by default. So far, that seems to be good enough. But there might be surprises I’m not aware of.

If it’s not super clear yet; I didn’t end up looking at LVM that much as part of this project. It was mostly struggling with stupid things in GitHub and Podman. There’s a whole lvmthin.7 man page that I haven’t read. And I don’t want to either because then I’ll realize that I did everything wrong and I’ll have to start over.[#]

After creating our default volume, we want to format it so we can mount it; I used xfs.

mkfs.xfs /dev/banana/friend-default

Then mount it so we can set the owner to the ghrunner user in the machine so the runner can use the cache. That user’s uid is 1000. So the we’ll want to use 1000 plus whatever was selected for user namespace of the machine. In the last section, the machine is using 570097664 so the ghrunner user will be 570098664.

mount /dev/banana/friend-default /mnt
chown 570098664:570098664 /mnt
umount /mnt

lvm-cache-friend.service

The program that makes and mounts snapshots is a Python script that depends only on the standard library and some LVM commands being in your PATH.

My service file is really simple.

[Service]
Type=simple
ExecStart=/usr/local/bin/lvm-cache-friend.py
Restart=on-failure

[Install]
WantedBy=multi-user.target

It even has Type=simple. So you know it’s not very complicated.

The runner communicates to the service through a unix socket at /run/lvm-cache-friend/socket. The parent directory /run/lvm-cache-friend is bind mounted into the runner’s machine. It should exist before either this service or the machine boots. Since /run is usually a temporary filesystem, systemd is often configured to set up these paths at boot. I haven’t tested this, but I think systemd will do this for us if we add a file somewhere under /etc/tmpfiles.d containing something like:

d /run/lvm-cache-friend 0755 root root -

The script has some options. By default it will add the tag friend:snapshot to every snapshot it makes. And, for namespacing, prefix any tags that peers add or search for with friend:cache:.

But otherwise, that’s it.

Start the service with.

systemctl enable --now lvm-cache-friend

And if you view the logs with journalctl -efu lvm-cache-friend or something you should hopefully be able to confirm the default volume and socket path being used.

info default lv banana friend-default
info listening on /run/lvm-cache-friend/socket

quick summary

The GitHub workflow contains something like this:

jobs:
  meme:
    runs-on: self-hosted
    env:
      XDG_DATA_HOME: /home/ghrunner/cache/xdg
    steps:
      - uses: actions/checkout@v3
      - run: |
          ncat -U /run/lvm-cache-friend/socket <<EOF
          mount /home/ghrunner/cache
          > linux ${{ hashFiles('Cargo.lock','Cargo.toml') }}
          > linux ${{ github.ref_name }}
          < linux ${{ github.ref_name }} ${{ hashFiles('Cargo.lock','Cargo.toml') }}
          EOF
      - run: podman build .

This connects to the lvm-cache-friend service on the host over a unix socket and asks it to create an LVM snapshot and mount it at /home/ghrunner/cache on the machine.

Setting the XDG_DATA_HOME environment variable tells Podman to use a path somewhere in the mounted volume for container storage instead of ~/.local/share/containers/storage.

Running podman build after making a change to the source code, we see the runner use the cache part of the way; up until copying the changed code at step five.

STEP 1/7: FROM registry.suse.com/bci/rust:1.66 AS build
STEP 2/7: WORKDIR /graph-do-smell
--> Using cache 04e88ce4a178ccb9b25c0c9ccb362e283083fe94e98ae19dcc0ade8896753a02
--> 04e88ce4a17
STEP 3/7: COPY Cargo.toml Cargo.lock .
--> Using cache c150b0fdb80a8537f513da3178453130692ff19e5dbb94dd33765d9cd323effe
--> c150b0fdb80
STEP 4/7: RUN cargo fetch --locked --target x86_64-unknown-linux-gnu
--> Using cache c81843da6bc54b4781629c28a62b8fdc64d1997cb952b6b31275c8d0ee1a211f
--> c81843da6bc
STEP 5/7: COPY . .
--> 5848547a2fc
STEP 6/7: RUN cargo build --locked --release --offline
   Compiling proc-macro2 v1.0.51
   Compiling unicode-ident v1.0.6
   Compiling quote v1.0.23
   Compiling syn v1.0.109
...

Pretty freakin’ neato.

graph-do-smell

I made a repository on GitHub for this at github.com/sqwishy/graph-do-smell. The Dockerfile and the Python script in the misc/ directory show a few more details about building the GitHub runner machine and running the lvm-cache-friend service.

When I started this, I wanted an uncomplicated program to test this with so I picked a weird Rust thing I made a long time ago but never did anything with. All it does is fetch and parse a web page and provide a GraphQL “API” to the HTML document.

For example, getting the list of titles and links from the front page of “hacker” “news”.

cargo run -- \
    '{
       get(url: "https://news.ycombinator.com/")
       {
         select(select: ".athing .titleline > a")
         { text href }
       }
     }' \
     < /dev/null | jq .get.select
 [
   {
     "text": "Copyright Registration Guidance: Works containing material generated by AI",
     "href": "https://www.federalregister.gov/documents/2023/03/16/2023-05321/copyright-registration-guidance-works-containing-material-generated-by-artificial-intelligence"
   },
   {
     "text": "Show HN: GPT Repo Loader – load entire code repos into GPT prompts",
     "href": "https://github.com/mpoon/gpt-repository-loader"
   },
   {
     "text": "Transformers.js",
     "href": "https://xenova.github.io/transformers.js/"
   },
   {
     "text": "A token-smuggling jailbreak for ChatGPT-4",
     "href": "https://twitter.com/alexalbert__/status/1636488551817965568"
   },
   {
     "text": "Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a MacBook",
     "href": "https://github.com/antimatter15/alpaca.cpp"
   },
   {
     "text": "Web Stable Diffusion",
     "href": "https://github.com/mlc-ai/web-stable-diffusion"
   },
   ...

Or the items from the index of froghat.ca.

cargo run -- \
   '{
      get(url: "https://froghat.ca")
      {
        select(select: "li:not(.delimiter)")
        {
          title: select(select: "a.title") { text href }
          time: select(select: "time") { datetime: attr(attr: "datetime") }
        }
      }
    }' < /dev/null | jq .get.select
[
  {
    "title": [
      {
        "text": "Why is it four clicks to view GitHub workflow logs?",
        "href": "https://froghat.ca/2023/02/github-clicks"
      }
    ],
    "time": [
      {
        "datetime": "2023-02-21"
      }
    ]
  },
  {
    "title": [
      {
        "text": "Think Helvetica",
        "href": "https://froghat.ca/2023/02/think-helvetica"
      }
    ],
    "time": [
      {
        "datetime": "2023-02-06"
      }
    ]
  },
  {
    "title": [
      {
        "text": "CEO Robrick-Patbert Froghat’s email to froghat.ca employees",
        "href": "https://froghat.ca/2022/11/to-froghat.ca-employees"
      }
    ],
    "time": [
      {
        "datetime": "2022-11-30"
      }
    ]
  },
  ...

So the workflow for graph-do-smell builds this Rust program in Podman on my self-hosted runner using the funny cache.

Anyway, I had imagined this post to be much shorter. Just featuring snapshots of thin volumes with LVM and using it from a workflow. There ended up being some surprises, like troubleshooting Podman and figuring out the options for the GitHub runner and setting it up to re-register itself every single time it starts up after doing one single job.

After all that, I don’t even know how to end this post. I can inventory the things and explain them and how they’re barely held together; but I can’t convey the experience of it working over time – which would be kind of the neat part. And every time I read this, my brain needs a shower. Maybe it’s trying to remind me that computers are for looking at cat pictures. I guess I can pretend like this unsatisfying conclusion is some poetic parallel to the described experience itself.

On the bright side, the GitHub token this thing is using will expire in a few days and the actual literal daemon will stop working and disappear in time like all things.

570097664	570163200	570228736	host
0	65536	131072	machine
	0	65536	podman