Incrementally replacing axle’s initrd
Reading time: 16 minutes
✱ Note
When a computer starts up, before the operating system can display applications, menus, and wallpapers, the operating system needs to first perform a complex initialization sequence. This sequence serves to get both the underlying hardware, and the operating system’s own software components, ready for further use.
While the full details of this sequence are out of the scope of this blog post, here’s a rough rundown of what this might look like:
-
The computer receives power
-
The motherboard starts running its boot firmware (either BIOS or UEFI)
-
The boot firmware performs basic hardware initialization
-
The boot firmware loads the OS’s bootloader from persistent storage
-
The bootloader loads the OS image from persistent storage
-
The bootloader jumps to the OS entry point
-
The OS performs further hardware initialization
-
The OS initializes its internal software stack
-
Ready for use
We’re going to take a closer look at just a few of these steps, and dig into a big change I recently made to how axle boots up.
I mentioned above that, as part of the bootloader’s functioning, it will load the OS image from persistent storage. What, exactly, is this OS image?
A program’s structure
Let’s leave operating systems aside for the moment, and talk more generally about programs. Programs are made up of two things: code and data. While it’s true that it’s bytes all the way down, it’s more colloquially useful to delineate the two✱.
✱ Note
The data of a program refers to all the static✱ bytes that the code depends on. Take the following program:
✱ Note
#include <stdio.h>
int main(int argc, char** argv) {
printf("Hello, world!\n");
return 0;
}
The data that’ll be bundled in the program file is the bytes making up the string Hello, world!\n
. Other common types of program data include statically-initialized arrays, mutable global variables, etc.
The code is the encoded assembly instructions that the compiler toolchain has generated. This should be semantically equivalent to the program’s higher-level source code, but expressed in terms that the target CPU can understand. (PT: Footnote about how it might not be semantically equivalent? i.e. compiler exploits ambiguity as an optimisation)
Hm, okay - code and data. That’s all well and good, but what does this look like? It’s not as though the code and data live in the Platonic ideal realm, free and formless. No, form indeed. Programs exist as structured collections of bytes, where these bytes are the code and data. To use a common metaphor, a stored program is generally represented as a file – at least until it starts running.
This file will, naturally, contain the code and data that constitutes the program, but how? I can open a text file and read it, line-by-line. I can open an image file and view its decoded pixels. Where would I go in a program file to read its stored code or static data?
The answer, as with anything interesting, is that it depends.
With all the examples we’re thinking of (text files, image files, program files), there needs to be an agreement on the format of the data contained within the file. For example, in a text file, we might all agree that the file should contain ASCII-encoded bytes. For an image file, we might decide that we’re expecting a series of RGB triplets.
File Formats
Let’s think about that a bit further. Say we do indeed want to encode an image file as a series of RGB triplets. That’d mean that a file that contains this pixel data✱:
✱ Note
00 ff cc
cc 00 ff
ff 00 cc
… might be rendered into something like this image:
[Image]
But how did we know that the image should be 3 pixels wide by 3 pixels high?
The answer is we don’t, and the example I gave above cheated using a priori[TODO check] knowledge.
Let’s take a look at what this might look like if we guessed the dimensions of the image wrongly.
[Image]
Since we don’t know where each row ends, the pixel data gets wrapped around at incorrect places, resulting in a warped and garbled image.
We’ll actually need some extra meta data that informs us about the structure of the image file. This meta data is typically stored at the beginning, or head of the file data, and is therefore referred to as a header.
The header might contain some data like the following:
struct image_header {
width_in_px: usize,
height_in_px: usize,
}
Therefore, our full file from above might now look something like the following:
03 03
00 ff cc
cc 00 ff
ff 00 cc
This is great! We can now unambiguously parse this file data, with a precise understanding of where the splits are between rows of pixel data.
Except… purely from looking at the file data, how did we know this was an image at all? From the perspective of an agent interpreting this series of bytes, we don’t have much indication of how this data should be interpreted. Is it an image with our header that we described above? Is it a series of ASCII-encoded text bytes? Is it code and data for running a program?
The computing world employs a few techniques to handle this sort of thing. One of these that’s readily visible is a file extension. If the file ends in .txt
, this is a fair hint that we should try to interpret the bytes as straight ASCII. For a file ending in .png
, we might reasonably try to interpret the bytes as structured data described by the [PNG standard].
The obvious drawback here is that users are typically free to change file extensions to whatever they like. If the user changed the extension of a file to one indicating structured data entirely different from the real structure of the data, the program trying to interpret that file format would surely give up in a fit of impotent rage.
So, we’ve got another trick up our sleeve to help identify files. Let’s take another look at our image header from earlier.
struct image_header {
width_in_px: usize,
height_in_px: usize,
}
We can plonk more data in here to help solve our problem!
struct image_header {
file_type: usize,
width_in_px: usize,
height_in_px: usize,
}
If every type of file starts off with a file_type
field in its header, and every type of file coordinates well enough to store their own unique value there, we can use this to unambiguously identify the type of file this is, and therefore how the succeeding bytes should be interpreted. This file_type
value at the start of a file is pretty common, and is referred to as the file’s magic ✱.
✱ Note
0xcheckme
could be referred to as the PNG magic.Sure - we’ve got files, they’ve got structure, and that structure tells us what kind of data is going to follow.
What kind of structure might we find in a program file?
Program File Formats
Just like there are several image formats whose use depends on platform and context, so too are there multiple ways to represent a program as a file.
For program files, the formats tend to stick pretty close to one of the Big 3 operating systems; macOS uses a nifty format called Mach-O, Linux uses ELF, and Windows uses PE.
Surely, being its own OS, we’ve got full control over program loading in axle. What’ll we go with?
Well, even though axle is its own little island, unmoored from many of the restraining conventions that anchor programming in the typical environments, it still has to play nice with the outside world in several important ways.
Firstly, axle’s source tree gets compiled by existing compiler suites, like GCC and rustc, and these compilers only know how to generate program files in whatever formats those project’s authors have implemented support for. Using these compilers to generate the program files for axle, we’ll have a much easier time if we ask them to generate the output files in formats they already know how to support.
Secondly, while axle has nearly full control over how programs are loaded (and thus its program loader could add support for any format it likes), there is one important axle component is obligated to format its program file in the structure defined by the ELF standard: the bootloader.
Taking a look at the boot sequence outlined at the top of this post, it takes a number of steps before axle is first allowed to run its own code during boot! Prior to that, the 3rd-party firmware which initiates the boot process gets to do its own thing, then loads and jumps to axle’s bootloader. This 3rd-party firmware, as mentioned, will generally be either BIOS or UEFI, but 2022-era axle is UEFI-only. BIOS is pretty dated, so let’s not even go there, but know that the same sort of thing is going on.
UEFI is a standardized boot environment that most modern hardware supports and expects. When the computer boots, the UEFI firmware in the motherboard will perform some initialization, then will load the operating system’s bootloader program file before passing control to it. This step, of the UEFI firmware jumping to the bootloader, demarcates the operating system’s first opportunity to run code. Since UEFI is standardized, the firmware is allowed to dictate what type of program file the bootloader should be, and UEFI will only try to interpret the program file in that way. Spoiler: it’s ELF.
Okay, so we’ve got two constraints pulling us towards ELF:
-
Compiler toolchains generally support ELF well.
-
UEFI mandates that the bootloader is ELF.
-
ELF is well-understood and well-documented.
Off-by-one, it’ll be the death of me.
We could, of course, use ELF only where it’s absolutely required and use something custom elsewhere, but at least for now we’ll stick with ELF for program files throughout the whole OS stack.
What’s an OS distribution?
On that note, what kind of program files are there in the OS stack? Roughly speaking, we’ve got 3 categories:
- Boot
- Bootloader, kernel image
- Kernel drivers
- Privileged subsystems launched by the kernel to provide system services or to interact with hardware
- Userland
- Programs loaded by the kernel and running at user-level
Note how there’s a hierarchy of program-loading here. At the initial level, the bootloader ELF is loaded and ran by the motherboard’s UEFI firmware. The bootloader then loads the kernel ELF. The kernel loads the ELFs corresponding to boot services and device drivers. As of now, we’ve got three distinct ELF loaders within the system:
- The ELF loader in the UEFI firmware (to load the bootloader)
- The ELF loader in the bootloader (to load the kernel)
- The ELF loader in the kernel (to load boot services)
This is quite a lot of duplicated functionality! Any discrepancy in the functionality or validations offered by each of these three loaders could lead to bugs or security vulnerabilities✱. The ELF loader in the UEFI firmware, at least, is out of our control - it’s flashed into the motherboard. Keep in mind how duplicating parsing code is a good hint that we should take a closer look at our design - we’ll revisit this idea later.
✱ Note
But first: we’re clear that each of these loaders takes in some bytes, structured into a header, then program code and program data. How do each of these loaders ‘get’ the bytes comprising the program to be loaded in the first place? That is, how does each of these loaders read the file containing the program it wants to load?
For the ELF loader in the UEFI firmware, life is simple: the UEFI environment provides us with a filesystem API, backed by persistent storage, that bootloaders can use to access and read the kernel program file. It’s then the bootloader’s responsibility to interpret those bytes as the ELF specification deigns.
Once we’re running in the kernel, however, we can no longer rely on any facilities offered by UEFI: these are no longer available once the bootloader exits.
So where does the kernel load files from?
Well, once the operating system has finished starting up, the system might have a file server daemon that coordinates with a persistent storage driver. These services would fulfill the function of reading an ELF buffer from a file, then passing the buffer to the kernel to load into a new task context. But how would these daemons and drivers themselves be loaded?
We need some kind of trick that’s specific to the boot process, to bootstrap✱ the system into a more stable state. When the bootloader maps the kernel ELF from persistent storage using UEFI facilities, we can have it also map a special file that contains the programs the kernel needs to be available before the system can continue further. This special file might be represented a conceptual ‘disk’, loaded into RAM, that contains the code and data needed to complete system initialisation. Thus, this special file is referred to as an initialisation ramdisk (or initrd, if you’re cool).
✱ Note
So! The responsibilities granted to our bootloader have expanded✱: not only is it responsible for loading the kernel ELF program file into memory, it must also load the kernel’s initrd, which contains the necessary programs to continue the boot. The kernel will then know how to read the initrd’s contents, and to launch the boot programs contained therein.
✱ Note
What format is this initrd? It surely isn’t ELF, since ELF is only for structuring a single program file, whereas our initrd will contain multiple program files (as well as possibly other things, such as images).
We’re in luck: since the initrd image is only intended to be consumed by the kernel, we can make its format anything we like! Before we consider the structure we’ll use, it’s important to consider exactly what we’d like to accomplish with our initrd.
Within the axle source tree, we’ve got a multitude of source code files, each of which corresponds to a build product. One of these build products might be the kernel, while another might be a userspace application like a browser. Yet another might be one of these programs that the kernel must launch at boot-time as part of the bootstrapping process. Once we’ve compiled all the source code files into build products, we might generate a directory of program files which we’d like to make available to the kernel as an initrd to facilitate the bootstrap at boot-time:
[Initrd directory image]
When we’re done, we’d like a file something like this to pop out the other side:
[Initrd generated image]
Within this singular initrd file, we’d like to embed all of the input files that we wanted to make available to the kernel at boot-time. This initrd file, along with the kernel file and bootloader file, will be flashed to the persistent media (like a USB or hard-drive) connected to the computer that axle will boot on. axle’s bootloader will then load the kernel and initrd, and tell the kernel about where the initrd is located, so that the kernel can make use of it.
Coming back, we’d like to come up with some format to structure the contents of this initrd, such that we can retrieve the individual files that are contained within it. One structure for an initrd, commonly found floating around the OSDev literature, is a ’table-of-contents’ approach. This approach was popularized by a series of kernel development tutorials, and it’s what we’ll start out with here.
Our file header will begin with something like the following✱:
✱ Note
struct initrd_header {
num_files: u32,
}
Directly after that, we’ll have an entry for each file describing its attributes (with num_files
entries in total):
struct file_within_initrd_desc {
name: [char; 64],
start_offset: usize,
file_size: usize
}
Already, we’re boxing ourselves into some major limitations - can you spot them? Here’s a few that stick out:
- File names can’t be longer than 64 bytes
- Directories aren’t supported; everything lives in a top-level folder
- The filesystem is read-only and can’t be modified once generated
Some of these are inherent to what we’re trying to do; there’s not much point in the initrd being modifiable, since it’s explicitly intended as a sealed system volume that’s generated when the OS release is built. However, there’s no good reason to disallow ourselves the niceities of structured directories and arbitrary-length file names. For now, though, this’ll have to do.
After the series of file_within_initrd_desc
structures, the initrd will contain the raw data of each of the files, glued back-to-back-to-back. To read a given file, the kernel will need to iterate the initrd’s header, scanning for a file_within_initrd_desc
structure that matches the file name we’d like to read, then will extract the right bytes from the file sandwiched within the larger initrd image.
Okay! To turn this initrd from concept into reality, we’ll need two pieces of code:
-
A generator that runs on our development machine (i.e. my Mac) to turn a folder of build products into an initrd image.
-
A decoder inside the OS that can take the initrd and read file data out of it.
Of course, these two components will need to exactly agree on the format of the initrd file. For now, we’ll use our human-eyes to make sure our structure definitions and logic match up, but once again: that’s relatively brittle, and something better will come on the horizon.
Now, let’s track the flow of initrd programs:
-
We generate the OS distribution on the development machine
-
As part of generating the distribution, we transform the directory containing boot-time programs into an initrd file
-
This initrd file is flashed to the UEFI boot media, alongside the kernel and bootloader
-
axle boots up on a machine
-
axle’s bootloader loads the kernel and initrd
-
axle’s bootloader informs the kernel of the initrd’s location
-
axle’s kernel initializes
-
axle’s kernel interprets the initrd and loads the boot services
-
axle’s file server launches
-
Later, a userspace program wants to launch a file within the initrd. It sends a launch request to the file server
-
The file server interprets the initrd, finds the requested file, and sends it to the kernel for launch
Okay, that was a lot! Let’s take a look at a few key points there:
First of all, notice how the bootloader needs to inform the kernel of the memory range where the initrd has been loaded, so that the kernel can interact with it. How is the bootloader going to manage to pass information to the kernel?
Once again, the bootloader and kernel are fully in-house systems: they can coordinate in whatever fashion we like to make things convenient for ourselves. One thing we can do is define another structure that the bootloader will populate with information, then place the memory address of that structure in a register just before jumping to the kernel’s entry point. When the kernel starts running, it’ll know to look in that register for a pointer to this structure containing boot-time information.
struct boot_info {
// ...
initrd_start: usize,
initrd_size: usize,
// ...
}
Great! The bootloader can now reliably tell the kernel exactly where in memory it’s loaded the initrd image.
Going back to our sequence above, one wrinkle in our approach makes itself apparent: both the kernel and the file server need to understand the format of the initrd! This means that any time we’d like to update the format of the initrd, we need to update three places: the generator, the kernel parser, and the file-server parser. Note that each of these three environments are very different, as well: the generator runs on my host Mac, the kernel runs in a bare-metal environment, and the file-server parser has access to the C standard library. This is not a great state of affairs, and we’ll be doing ourselves a big service if we can find some way to write this code only once.
Something better
struct boot_info2 {
initrd_start: usize,
initrd_size: usize,
new_initrd_start: usize,
new_initrd_size: usize,
}
While we’d like to move to a better design, it’d be inconvenient to go back to a from-scratch environment where we can’t finish the boot process. Instead, while we’re iterating on our new initrd, let’s continue to launch from the old initrd. The bootloader will be temporarily updated to load them both and tell the kernel about them. Once we’re ready to make the switch, we can drop the original initrd and all its supporting code.
This post is a draft!