I used to use this as a place to write long-form thoughts, but with the demise of Twitter I want to also write short-form posts here under the #shorts tag. I also post on Mastodon but that’s a little more ephemeral and less searchable than this blog.

Let’s say you’re creating a temporary directory with some cached artifacts to share across several tests. It’s a bit more ephemeral than, say, the Cargo target/ directory, so your initial idea might be to just put your artifacts in the temporary directory. Also, let’s say that you want to reuse the directory across a span of several days or more (perhaps by hashing a set of inputs and using that to name the directory).

If your artifacts span multiple files, do not do this. The reason is that some operating systems and environments have temporary file cleaners, also known as tempfile reapers. macOS ships one by default, and some other environments also have a cleaner like tmpreaper configured. These reapers may end up deleting some of your temporary files but not others, leading to an inconsistent cache with part of it missing.

What can you do instead?#

The easiest alternative is to store artifacts in a single file rather than a directory. That way, either the file exists or it got cleaned up—there’s no in-between, inconsistent state.

  • If you can get away with not extracting files and all you need is random access, a zip file works well.
  • For sequential access or if you need to extract files, a (possibly compressed) tarball is a great option.

Another option is to record the list of files in a manifest, and invalidate the cache if any of those files are missing.

A third option is to store your cache somewhere other than the system temporary directory. But then you need a way to evict old cache entries.

(Also, be aware that if your temporary directory is in a global location like /tmp, users can overwrite each other’s artifacts. You can avoid that with a umask and/or something like OpenOptionsExt::mode).