so I’ve been thinking if maybe there was some way to make file systems more declarative.
2024-11-25 systemd exists, NixOS also exists… both enable you to specify what state your system should reach, rather than how it should be done.
2024-11-25 well, right. actually, systemd does have an imperative element to it, because you have to tell it what command should be executed to get your process spinning. I’d argue that is still way better than a shell script, because it can restart failing services selectively without any extra work on your part.
2024-11-25
my idea was basically to enable the programmer to specify an expected file system structure declaratively, like so:
root = dir { crates = dir "crates", readme = file "README.md", }
and then you’d be able to access the readme file via
root.readme
, rather than by the usual fopen-fwrite-fclose interface.2024-11-25 a few weeks ago however, I had a different revelation: what if instead of interfacing with the underlying file system, we… build one? like… you know. a virtual file system?
2024-11-25 I was familiar with the idea of creating an API exposing a virtual file system through LÖVE, which exposes bindings to PhysicsFS.
2024-11-25 I know of at least one game that makes use of PhysicsFS outside of LÖVE, but I had never used it myself in a project.
2024-11-25
a fractal of files
2024-11-25 I started designing. I knew I at least needed the ability to enumerate and read files, so I needed at least these two functions:
trait Dir { /// List all entries under the given path. fn dir(&self, path: &VPath) -> Vec<VPathBuf>; /// Return the byte content of the entry at the given path, or `None` if the path does not /// contain any content. fn content(&self, path: &VPath) -> Option<Vec<u8>>; }
this alone already gave me an insane amount of insight!
2024-11-25 first of all, from the perspective of the program, do we really need to differentiate between directories and files?
2024-11-25 compare this code for walking the file system:
fn walk_dir_rec(dir: &dyn Dir, path: &VPath, mut f: impl FnMut(&VPath)) { for entry in dir.dir(path) { f(&entry); walk_dir_rec(dir, &entry, f); } } fn process_all_png_files(dir: &dyn Dir) { walk_dir_rec(dir, VPath::ROOT, |path| { if path.extension() == Some("png") { if let Some(content) = dir.content(path) { // do stuff with the file } } }); }
2024-11-25 to this code, which has to differentiate between files and directories, because calling
dir
on a file orcontent
on a directory is an error:fn walk_dir_rec(dir: &dyn Dir, path: &VPath, mut f: impl FnMut(&VPath)) { for entry in dir.dir(path) { f(&entry); if entry.kind == DirEntryKind::Dir { walk_dir_rec(dir, &entry, f); } } } fn process_all_png_files(dir: &dyn Dir) { walk_dir_rec(dir, VPath::ROOT, |entry| { if entry.path.extension() == Some("png") && entry.kind == DirEntryKind::File { if let Some(content) = dir.content(entry.path) { // do stuff with the file } } }); }
to me, the logic seems a lot simpler in the former case, separating the concerns of walking the directory in
walk_dir_rec
from the concerns of reading the files inprocess_all_png_files
!2024-11-25 this does not automatically mean it’s a good idea to design an operating system around this, but it’s interesting to think about the properties that emerge from removing the separation.
2024-11-25 it may not even be the greatest idea to interface with the physical file system in this way, if the communication has to be bidirectional—since real world file systems separate files from directories, think about what happens if your program tries to write
content
to an entry which already has adir
.2024-11-25
second… this looks a lot like resource forks! so imagine that you can add even more metadata to file system entries, by adding more methods to this trait.
2024-11-25
with these two functions, the ability to join paths, and remove their prefixes, this is enough to start building interesting things.
2024-11-25 since we’d like our file system to be composable, we’ll need a composition operator first. I’m naming mine
MemDir
, because it represents an in-memorydir
with entries. I’ll spare you the implementation details, but it acts more or less like a hash map:let mut dir = MemDir::new(); dir.add(VPath::new("README.txt"), readme_txt); dir.add(VPath::new("src"), src);
2024-11-25
get real
2024-11-25 I ended up needing a few more resource forks to implement all the existing functionality.
pub trait Dir: Debug { /// List all entries under the provided path. fn dir(&self, path: &VPath) -> Vec<DirEntry>; /// Return the byte content of the entry at the given path. fn content(&self, path: &VPath) -> Option<Vec<u8>>; /// Get a string signifying the current version of the provided path's content. /// If the content changes, the version must also change. /// /// Returns None if there is no content or no version string is available. fn content_version(&self, path: &VPath) -> Option<String>; /// Returns the size of the image at the given path, or `None` if the entry is not an image /// (or its size cannot be known.) fn image_size(&self, _path: &VPath) -> Option<ImageSize> { None } /// Returns a path relative to `config.site` indicating where the file will be available /// once served. /// /// May return `None` if the file is not served. fn anchor(&self, _path: &VPath) -> Option<VPathBuf> { None } /// If a file can be written persistently, returns an [`EditPath`] representing the file in /// persistent storage. /// /// An edit path can then be made into an [`Edit`]. fn edit_path(&self, _path: &VPath) -> Option<EditPath> { None } }
2024-11-25 content_version
andanchor
are both used to assemble URLs out ofDir
entries. I have a functionurl
which, given a root URL, returns a URL with a?v=
parameter for cache busting.pub fn url(site: &str, dir: &dyn Dir, path: &VPath) -> Option<String> { let anchor = dir.anchor(path)?; if let Some(version) = dir.content_version(path) { Some(format!("{}/{anchor}?v={version}", site)) } else { Some(format!("{}/{anchor}", site)) } }
2024-11-25 image_size
is used to automatically determine the size of images at build time. that way I can addwidth="" height=""
attributes to all<img>
tags, preventing layout shift.2024-11-25
one notable piece of functionality that is currently missing is version history. to be honest, I’m still figuring that one out in my head; I have the feeling it’s not exactly going to be simple, but it should end up being a lot more principled than whatever this was.
2024-11-25
Radio Edit (radio edit)
2024-11-25 edit_path
returns anEditPath
, which represents a location somewhere in persistent storage. having anEditPath
, you can construct anEdit
./// Represents a pending edit operation that can be written to persistent storage later. #[derive(Debug, Clone, PartialEq, Eq)] pub enum Edit { /// An edit that doesn't do anything. NoOp, /// Write the given string to a file. Write(EditPath, String), /// Execute a sequence of edits in order. Seq(Vec<Edit>), /// Execute the provided edits in parallel. All(Vec<Edit>), /// Makes an edit dry. /// /// A dry edit only logs what operations would be performed, does not perform the I/O. Dry(Box<Edit>), }
Edit
s take many shapes and forms, but the most important one for us isWrite
: it allows you to write a file to the disk.the other ones are for composing
Edit
s together into larger ones.2024-11-25 NoOp
can be used when you need to produce anEdit
, but don’t actually want to perform any operations.2024-11-25 this runs contrary to my opinion on
None
enums, for one reason: would you rather have to handleOption<Edit>
everywhere, or just assume whateverEdit
you’re being passed is valid?2024-11-25
although I said before that the fork-based virtual file system is a leaky abstraction when you introduce writing to the physical file system, I don’t think this particular API is susceptible to this—since it can expose
EditPath
s for entries that can actually be written (ones with acontent
), you can disallow writing to directories that way.2024-11-25 also, TOCTOU bugs are a thing, but I disregard those as they don’t really fit into a compiler’s threat model.
2024-11-25
improvise, adapt, overcome
2024-11-25 thanks to the
Dir
’s inherent composability, it is trivial to build adapters on top of it. I have a few in the treehouse myself.2024-11-25 Blake3ContentVersionCache
is the sole implementor ofDir::content_version
. its purpose is to computecontent_version
s and cache them in memory for each path. as the name suggests, versions are computed using a (truncated) BLAKE3 hash.2024-11-25 Overlay
combines a base directory with an overlay directory, first routing requests to the overlay directory, and if those fail, routing them to the base directory. this allows me to overlay aMemDir
with thestatic
directory and arobots.txt
on top of aTreehouseDir
—which together form the compiler’s target directory.2024-11-25
for the curious, here’s roughly how the treehouse’s virtual file systems are structured:
source: ImageSizeCache(Blake3ContentVersionCache(MemDir { "treehouse.toml": BufferedFile(..), // content read at startup "static": Anchored(PhysicalDir("static"), "static"), "template": PhysicalDir("template"), "content": PhysicalDir("content"), })) target: Overlay( HtmlCanonicalize(ContentCache(TreehouseDir)), MemDir { "static": Cd(source, "static"), "robots.txt": Cd(source, "static/robots.txt"), }, )
2024-11-25
a fatal flaw
2024-11-25 one idea I’ve had to fix this was to change the API shape to a single trait method.
pub trait Dir { fn forks(&self, path: &VPath, forks: &mut Forks); } impl Dir for MyDir { fn forks(&self, path: &VPath, forks: &mut Forks) { forks.insert(|| MyFork); } } impl<T> Dir for AdapterDir<T> { fn forks(&self, path: &VPath, forks: &mut Forks) { self.inner.forks(path, forks); forks.insert(|| MyMomsEpicSilverware); } }
but that hasn’t come to fruition yet, as I have no idea how to make it efficient yet object-safe… I’m yet to add profiling to the treehouse, so I don’t want to make risky performance decisions like this at this point.
2024-11-25