Dealing with tempfile cleaners
I used to use this as a place to write long-form thoughts, but with the demise of Twitter I want to also write short-form posts here under the #shorts tag. I also post on Mastodon but that’s a little more ephemeral and less searchable than this blog.
Let’s say you’re creating a temporary directory with some cached artifacts to share across several
tests. It’s a bit more ephemeral than, say, the Cargo target/
directory, so your initial idea
might be to just put your artifacts in the temporary
directory. Also, let’s say that you
want to reuse the directory across a span of several days or more (perhaps by hashing a set of
inputs and using that to name the directory).
If your artifacts span multiple files, do not do this. The reason is that some operating systems
and environments have temporary file cleaners, also known as tempfile reapers. macOS ships one by
default, and some other environments also have a cleaner
like tmpreaper
configured. These reapers may end up
deleting some of your temporary files but not others, leading to an inconsistent cache with part of
it missing.
What can you do instead?#
The easiest alternative is to store artifacts in a single file rather than a directory. That way, either the file exists or it got cleaned up—there’s no in-between, inconsistent state.
- If you can get away with not extracting files and all you need is random access, a zip file works well.
- For sequential access or if you need to extract files, a (possibly compressed) tarball is a great option.
Another option is to record the list of files in a manifest, and invalidate the cache if any of those files are missing.
A third option is to store your cache somewhere other than the system temporary directory. But then you need a way to evict old cache entries.
(Also, be aware that if your temporary directory is in a global location like /tmp
, users can
overwrite each other’s artifacts. You can avoid that with a
umask
and/or something like
OpenOptionsExt::mode
).