Incrementally replacing axle’s initrd (Draft)

25 Feb, 2022 Reading time: 17 minutes

When a computer starts up, before the operating system can display applications, menus, and wallpapers, the operating system needs to first perform a complex initialization sequence. This sequence serves to get both the underlying hardware, and the operating system’s own software components, ready for further use.

While the full details of this sequence are out of the scope of this blog post, here’s a rough rundown of what this might look like:

The computer receives power
The motherboard starts running its boot firmware (either BIOS or UEFI)
The boot firmware performs basic hardware initialization
The boot firmware loads the OS’s bootloader from persistent storage
The bootloader loads the OS image from persistent storage
The bootloader jumps to the OS entry point
The OS performs further hardware initialization
The OS initializes its internal software stack
Ready for use

We’re going to take a closer look at just a few of these steps, and dig into a big change I recently made to how axle boots up.

I mentioned above that, as part of the bootloader’s functioning, it will load the OS image from persistent storage. What, exactly, is this OS image?

A program’s structure

Let’s leave operating systems aside for the moment, and talk more generally about programs. Programs are made up of two things: code and data. While it’s true that it’s bytes all the way down, it’s more colloquially useful to delineate the two¹.

The data of a program refers to all of the static² bytes that the code depends on. Take the following program:

#include <stdio.h>

int main(int argc, char** argv) {
    printf("Hello, world!\n");
    return 0;
}

The data that’ll be bundled in the program file is the bytes making up the string Hello, world!\n. Other common types of program data include statically-initialized arrays, mutable global variables, etc.

The code is the encoded assembly instructions that the compiler toolchain has generated. This should be semantically equivalent to the program’s higher-level source code, but expressed in terms that the target CPU can understand. (PT: Footnote about how it might not be semantically equivalent? i.e. compiler exploits ambiguity as an optimisation)

Hm, okay - code and data. That’s all well and good, but what does this look like? It’s not as though the code and data live in the Platonic ideal realm, free and formless. No, form indeed. Programs exist as structured collections of bytes, where these bytes are the code and data. To use a common metaphor, a stored program is generally represented as a file – at least until it starts running.

This file will, naturally, contain the code and data that constitutes the program, but how? I can open a text file and read it, line-by-line. I can open an image file and view its decoded pixels. Where would I go in a program file to read its stored code or static data?

The answer, as with anything interesting, is that it depends.

With all of the examples we’re thinking of (text files, image files, program files), there needs to be an agreement on the format of the data contained within the file. For example, in a text file, we might all agree that the file should contain ASCII-encoded bytes. For an image file, we might decide that we’re expecting a series of RGB triplets.

File Formats

Let’s think about that a bit further. Say we do indeed want to encode an image file as a series of RGB triplets. That’d mean that a file that contains this pixel data³:

00 ff cc
cc 00 ff
ff 00 cc

… might be rendered into something like this image:

[Image]

But how did we know that the image should be 3 pixels wide by 3 pixels high?

The answer is we don’t, and the example I gave above cheated using a priori[TODO check] knowledge.

Let’s take a look at what this might look like if we guessed the dimensions of the image wrongly.

[Image]

Since we don’t know where each row ends, the pixel data gets wrapped around at incorrect places, resulting in a warped and garbled image.

We’ll actually need some extra meta data that informs us about the structure of the image file. This meta data is typically stored at the beginning, or head of the file data, and is therefore referred to as a header.

The header might contain some data like the following:

struct image_header {
    width_in_px: usize,
    height_in_px: usize,
}

Therefore, our full file from above might now look something like the following:

03 03 
00 ff cc 
cc 00 ff 
ff 00 cc

This is great! We can now unambiguously parse this file data, with a precise understanding of where the splits are between rows of pixel data.

Except… purely from looking at the file data, how did we know this was an image at all? From the perspective of an agent interpreting this series of bytes, we don’t have much indication of how this data should be interpreted. Is it an image with our header that we described above? Is it a series of ASCII-encoded text bytes? Is it code and data for running a program?

The computing world employs a few techniques to handle this sort of thing. One of these that’s readily visible is a file extension. If the file ends in .txt, this is a fair hint that we should try to interpret the bytes as straight ASCII. For a file ending in .png, we might reasonably try to interpret the bytes as structured data described by the [PNG standard].

The obvious drawback here is that users are typically free to change file extensions to whatever they like. If the user changed the extension of a file to one indicating structured data entirely different from the real structure of the data, the program trying to interpret that file format would surely give up in a fit of impotent rage.

So, we’ve got another trick up our sleeve to help identify files. Let’s take another look at our image header from earlier.

struct image_header {
    width_in_px: usize,
    height_in_px: usize,
}

We can plonk more data in here to help solve our problem!

struct image_header {
    file_type: usize,
    width_in_px: usize,
    height_in_px: usize,
}

If every type of file starts off with a file_type field in its header, and every type of file coordinates well enough to store their own unique value there, we can use this to unambiguously identify the type of file this is, and therefore how the succeeding bytes should be interpreted. This file_type value at the start of a file is pretty common, and is referred to as the file’s magic⁴.

Sure - we’ve got files, they’ve got structure, and that structure tells us what kind of data is going to follow.

What kind of structure might we find in a program file?

Program File Formats

Just like there are several image formats whose use depends on platform and context, so too are there multiple ways to represent a program as a file.

For program files, the formats tend to stick pretty close to one of the Big 3 operating systems; macOS uses a nifty format called Mach-O, Linux uses ELF, and Windows uses PE.

Surely, being its own OS, we’ve got full control over program loading in axle. What’ll we go with?

Well, even though axle is its own little island, unmoored from many of the restraining conventions that anchor programming in the typical environments, it still has to play nice with the outside world in several important ways.

Firstly, axle’s source tree gets compiled by existing compiler suites, like GCC and rustc, and these compilers only know how to generate program files in whatever formats those project’s authors have implemented support for. Using these compilers to generate the program files for axle, we’ll have a much easier time if we ask them to generate the output files in formats they already know how to support.

Secondly, while axle has nearly full control over how programs are loaded (and thus its program loader could add support for any format it likes), there is one important axle component is obligated to format its program file in the structure defined by the ELF standard: the bootloader.

Taking a look at the boot sequence outlined at the top of this post, it takes a number of steps before axle is first allowed to run its own code during boot! Prior to that, the 3rd-party firmware which initiates the boot process gets to do its own thing, then loads and jumps to axle’s bootloader. This 3rd-party firmware, as mentioned, will generally be either BIOS or UEFI, but 2022-era axle is UEFI-only. BIOS is pretty dated, so let’s not even go there, but know that the same sort of thing is going on.

UEFI is a standardized boot environment that most modern hardware supports and expects. When the computer boots, the UEFI firmware in the motherboard will perform some initialization, then will load the operating system’s bootloader program file before passing control to it. This step, of the UEFI firmware jumping to the bootloader, demarcates the operating system’s first opportunity to run code. Since UEFI is standardized, the firmware is allowed to dictate what type of program file the bootloader should be, and UEFI will only try to interpret the program file in that way. Spoiler: it’s ELF.

Okay, so we’ve got two constraints pulling us towards ELF:

Compiler toolchains generally support ELF well.
UEFI mandates that the bootloader is ELF.
ELF is well-understood and well-documented.

Off-by-one, it’ll be the death of me.

We could, of course, use ELF only where it’s absolutely required and use something custom elsewhere, but at least for now we’ll stick with ELF for program files throughout the whole OS stack.

What’s an OS distribution?

On that note, what kind of program files are there in the OS stack? Roughly speaking, we’ve got 3 categories:

Boot

Bootloader, kernel image

Kernel drivers

Privileged subsystems launched by the kernel to provide system services or to interact with hardware

Userland

Programs loaded by the kernel and running at user-level

Note how there’s a hierarchy of program-loading here. At the initial level, the bootloader ELF is loaded and ran by the motherboard’s UEFI firmware. The bootloader then loads the kernel ELF. The kernel loads the ELFs corresponding to boot services and device drivers. As of now, we’ve got three distinct ELF loaders within the system:

The ELF loader in the UEFI firmware (to load the bootloader)
The ELF loader in the bootloader (to load the kernel)
The ELF loader in the kernel (to load boot services)

This is quite a lot of duplicated functionality! Any discrepancy in the functionality or validations offered by each of these three loaders could lead to bugs or security vulnerabilities⁵. The ELF loader in the UEFI firmware, at least, is out of our control - it’s flashed into the motherboard. Keep in mind how duplicating parsing code is a good hint that we should take a closer look at our design - we’ll revisit this idea later.

But first: we’re clear that each of these loaders takes in some bytes, structured into a header, then program code and program data. How do each of these loaders ‘get’ the bytes comprising the program to be loaded in the first place? That is, how does each of these loaders read the file containing the program it wants to load?

For the ELF loader in the UEFI firmware, life is simple: the UEFI environment provides us with a filesystem API, backed by persistent storage, that bootloaders can use to access and read the kernel program file. It’s then the bootloader’s responsibility to interpret those bytes as the ELF specification deigns.

Once we’re running in the kernel, however, we can no longer rely on any facilities offered by UEFI: these are no longer available once the bootloader exits.

So where does the kernel load files from?

Well, once the operating system has finished starting up, the system might have a file server daemon that coordinates with a persistent storage driver. These services would fulfill the function of reading an ELF buffer from a file, then passing the buffer to the kernel to load into a new task context. But how would these daemons and drivers themselves be loaded?

We need some kind of trick that’s specific to the boot process, to bootstrap⁶ the system into a more stable state. When the bootloader maps the kernel ELF from persistent storage using UEFI facilities, we can have it also map a special file that contains the programs the kernel needs to be available before the system can continue further. This special file might be represented a conceptual ‘disk’, loaded into RAM, that contains the code and data needed to complete system initialisation. Thus, this special file is referred to as an initialisation ramdisk (or initrd, if you’re cool).

So! The responsibilities granted to our bootloader have expanded⁷: not only is it responsible for loading the kernel ELF program file into memory, it must also load the kernel’s initrd, which contains the necessary programs to continue the boot. The kernel will then know how to read the initrd’s contents, and to launch the boot programs contained therein.

What format is this initrd? It surely isn’t ELF, since ELF is only for structuring a single program file, whereas our initrd will contain multiple program files (as well as possibly other things, such as images).

We’re in luck: since the initrd image is only intended to be consumed by the kernel, we can make its format anything we like! Before we consider the structure we’ll use, it’s important to consider exactly what we’d like to accomplish with our initrd.

Within the axle source tree, we’ve got a multitude of source code files, each of which corresponds to a build product. One of these build products might be the kernel, while another might be a userspace application like a browser. Yet another might be one of these programs that the kernel must launch at boot-time as part of the bootstrapping process. Once we’ve compiled all the source code files into build products, we might generate a directory of program files which we’d like to make available to the kernel as an initrd to facilitate the bootstrap at boot-time:

[Initrd directory image]

When we’re done, we’d like a file something like this to pop out the other side:

[Initrd generated image]

Within this singular initrd file, we’d like to embed all of the input files that we wanted to make available to the kernel at boot-time. This initrd file, along with the kernel file and bootloader file, will be flashed to the persistent media (like a USB or hard-drive) connected to the computer that axle will boot on. axle’s bootloader will then load the kernel and initrd, and tell the kernel about where the initrd is located, so that the kernel can make use of it.

Coming back, we’d like to come up with some format to structure the contents of this initrd, such that we can retrieve the individual files that are contained within it. One structure for an initrd, commonly found floating around the OSDev literature, is a ’table-of-contents’ approach. This approach was popularized by a series of kernel development tutorials, and it’s what we’ll start out with here.

Our file header will begin with something like the following⁸:

struct initrd_header {
    num_files: u32,
}

Directly after that, we’ll have an entry for each file describing its attributes (with num_files entries in total):

struct file_within_initrd_desc {
    name: [char; 64],
    start_offset: usize,
    file_size: usize
}

Already, we’re boxing ourselves into some major limitations - can you spot them? Here’s a few that stick out:

File names can’t be longer than 64 bytes
Directories aren’t supported; everything lives in a top-level folder
The filesystem is read-only and can’t be modified once generated

Some of these are inherent to what we’re trying to do; there’s not much point in the initrd being modifiable, since it’s explicitly intended as a sealed system volume that’s generated when the OS release is built. However, there’s no good reason to disallow ourselves the niceities of structured directories and arbitrary-length file names. For now, though, this’ll have to do.

After the series of file_within_initrd_desc structures, the initrd will contain the raw data of each of the files, glued back-to-back-to-back. To read a given file, the kernel will need to iterate the initrd’s header, scanning for a file_within_initrd_desc structure that matches the file name we’d like to read, then will extract the right bytes from the file sandwiched within the larger initrd image.

Okay! To turn this initrd from concept into reality, we’ll need two pieces of code:

A generator that runs on our development machine (i.e. my Mac) to turn a folder of build products into an initrd image.
A decoder inside the OS that can take the initrd and read file data out of it.

Of course, these two components will need to exactly agree on the format of the initrd file. For now, we’ll use our human-eyes to make sure our structure definitions and logic match up, but once again: that’s relatively brittle, and something better will come on the horizon.

Now, let’s track the flow of initrd programs:

We generate the OS distribution on the development machine
As part of generating the distribution, we transform the directory containing boot-time programs into an initrd file
This initrd file is flashed to the UEFI boot media, alongside the kernel and bootloader
axle boots up on a machine
axle’s bootloader loads the kernel and initrd
axle’s bootloader informs the kernel of the initrd’s location
axle’s kernel initializes
axle’s kernel interprets the initrd and loads the boot services
axle’s file server launches
Later, a userspace program wants to launch a file within the initrd. It sends a launch request to the file server
The file server interprets the initrd, finds the requested file, and sends it to the kernel for launch

Okay, that was a lot! Let’s take a look at a few key points there:

First of all, notice how the bootloader needs to inform the kernel of the memory range where the initrd has been loaded, so that the kernel can interact with it. How is the bootloader going to manage to pass information to the kernel?

Once again, the bootloader and kernel are fully in-house systems: they can coordinate in whatever fashion we like to make things convenient for ourselves. One thing we can do is define another structure that the bootloader will populate with information, then place the memory address of that structure in a register just before jumping to the kernel’s entry point. When the kernel starts running, it’ll know to look in that register for a pointer to this structure containing boot-time information.

struct boot_info {
    // ...
    initrd_start: usize,
    initrd_size: usize,
    // ...
}

Great! The bootloader can now reliably tell the kernel exactly where in memory it’s loaded the initrd image.

Going back to our sequence above, one wrinkle in our approach makes itself apparent: both the kernel and the file server need to understand the format of the initrd! This means that any time we’d like to update the format of the initrd, we need to update three places: the generator, the kernel parser, and the file-server parser. Note that each of these three environments are very different, as well: the generator runs on my host Mac, the kernel runs in a bare-metal environment, and the file-server parser has access to the C standard library. This is not a great state of affairs, and we’ll be doing ourselves a big service if we can find some way to write this code only once.

Something better

struct boot_info2 {
    initrd_start: usize,
    initrd_size: usize,

    new_initrd_start: usize,
    new_initrd_size: usize,
}

While we’d like to move to a better design, it’d be inconvenient to go back to a from-scratch environment where we can’t finish the boot process. Instead, while we’re iterating on our new initrd, let’s continue to launch from the old initrd. The bootloader will be temporarily updated to load them both and tell the kernel about them. Once we’re ready to make the switch, we can drop the original initrd and all its supporting code.

But first, it’s time to unwrinkle something we’ve smoothed over up to this point: the bootloader contains code to load an ELF, retrieved

PT: digression about how zero-init data takes up no size?

What kind of data makes up an operating system?

Well, an operating system isn’t just one thing! Depending on the design, an OS can be made up of many different programs, all working in tandem to provide a computing environment. We’ll need to get a little more specific with our terminology.

While I won’t cover all the intricacies within this blog post

either BIOS or UEFI
UEFI can be quite complex

Tasks:

Set up a virtual address space (required for paging)
Use UEFI facilities to select a framebuffer
Map initrd and kernel
UEFI required for 64-bit OS
much more advanced environment than BIOS

I made a major change to the way axle boots up

Intro
Changing initrd
- Digression about file formats and headers
Types of binaries in axle
Bootloader, kernel, initrd flashed to persistent storage
initrd contains images for kernel to bootstrap to accessing persistent storage
ELF is a sensible choice for binary format
initrd format is controlled
Initial initrd
Loaded by bootloader, then interpreted by kernel
Then file_server
file_server then interprets initrd again
Two initrd interpreters
Rewriting file_server in rust
- Support directories and arbitrary length filenames
- serde
Not rewriting the whole world at once: use old kernel initrd interpreter, old file_server, bootloader loads two initrd’s (old and new)
Once we’re ready, only load the new initrd
But wait… now that the old initrd has been ripped out, how do we launch the file_server?
Kernel doesn’t need to understand initrd anymore: just give the kernel the file_server ELF
file_server will then request info about the initrd image, interpret it, and be responsible for all other launches
Show old list in kernel of boot programs, has turned into file in initrd that file_server is responsible for

It’s more correct to say that programs are made up of two kinds of data: data-as-code, and data-as-data. This is, clearly, not pedagogically helpful. ↩︎
Note that we’re excluding data generated at runtime in our definition. We only care about programs in their stored on-disk state here. ↩︎
Note that I’ve arranged these bytes into a 3x3 grid for ease of reading, but this structure wouldn’t be present in the raw file data. It’d be a straight list of bytes. ↩︎
Also known as its magic number. For example, every PNG file in the world starts off with the same few bytes, 0xcheckme, so 0xcheckme could be referred to as the PNG magic. ↩︎
Indeed, this sort of differential between parsers can lead to game-breaking exploits in production operating systems. ↩︎
The nomenclature of getting around a chicken-and-egg problem like this is ‘bootstrapping’, from the concept of ‘pulling oneself up with their own bootstraps’. This is also where the term for boot itself originates! ↩︎
Note the full responsibilities of the bootloader are larger than even this: it’s also responsible for things like choosing a resolution and initialising a virtual-address space. ↩︎
Note we’re omitting a magic number from this file header. Since this file is generated specifically for the kernel’s boot environment, there’s no need to differentiate it from any other type of file: the initrd is the only file that’ll ever be loaded here. ↩︎

Phillip Tennen