Operating systems

Written by hannes
Published: 2016-04-09 (last updated: 2017-01-24)

Sorry to be late with this entry, but I had to fix some issues.

What is an operating system?

Wikipedia says: "An operating system (OS) is system software that manages computer hardware and software resources and provides common services for computer programs." Great. In other terms, it is an abstraction layer. Applications don't need to deal with the low-level bits (device drivers) of the computer.

But if we look at the landscape of deployed operating systems, there is a lot more going on than abstracting devices: usually this includes process management (scheduler), memory management (virtual memory), C library, user management (including access control), persistent storage (file system), network stack, etc. all being part of the kernel, and executed in kernel space. A counterexample is Minix, which consists of a tiny microkernel, and executes the above mentioned services as user-space processes.

We are (or at least I am) interested in robust systems. Development is done by humans, thus will always be error-prone. Even a proof of its functional correctness can be flawed if the proof system is inconsistent or the specification is wrong. We need to have damage control in place by striving for the principle of least authority. The goods to guard is the user data (passwords, personal information, private mails, ...), which lives in memory.

A CPU contains protection rings, where the kernel runs in ring 0 and thus has full access to the hardware, including memory. A flaw in the kernel is devastating for the security of the entire system, it is part of the trusted computing base). Every byte of kernel code should be carefully developed and audited. If we can contain code into areas with less authority, we should do so. Obviously, the mechanism to contain code needs to be carefully audited as well, since it will likely need to run in privileged mode.

In a virtualised world, we run a hypervisor in ring -1, on top of which we run an operating system kernel. The hypervisor gives access to memory and hardware to virtual machines, schedules those virtual machines on processors, and should isolate the virtual machines from each other (by using the MMU).

there's no cloud, just other people's computers

This ominous "cloud" uses hypervisors on huge amount of physical machines, and executes off-the-shelf operating systems as virtual machines on top. Accounting is done by resource usage (time, bandwidth, storage).

From scratch

Ok, now we have hypervisors which already deals with memory and scheduling. Why should we have the very same functionality again in the (general purpose) operating system running as virtual machine?

Additionally, earlier in my life (back in 2005 at the Dutch hacker camp "What the hack") I proposed (together with Andreas Bogk) to phase out UNIX before 2038-01-19 (this is when time_t overflows, unless promoted to 64 bit), and replace it with Dylan. A random comment about our talk on the Internet is "the proposal that rewriting an entire OS in a language with obscure syntax was somewhat original. However, I now somewhat feel a strange urge to spend some time on Dylan, which is really weird..."

Being without funding back then, we didn't get far (hugest success was a TCP/IP stack in Dylan), and as mentioned earlier I went into formal methods and mechanised proofs of full functional correctness properties.

MirageOS

At the end of 2013, David pointed me to MirageOS, an operating system developed from scratch in the functional and statically typed language OCaml. I've not used much OCaml before, but some other functional programming languages. Since then, I spend nearly every day on developing OCaml libraries (with varying success on being happy with my code). In contrast to Dylan, there are more than two people developing MirageOS.

The idea is straightforward: use a hypervisor, and its hardware abstractions (virtualised input/output and network device), and execute the OCaml runtime directly on it. No C library included (since May 2015, see this thread). The virtual machine, based on the OCaml runtime and composed of OCaml libraries, uses a single address space and runs in ring 0.

As mentioned above, all code which runs in ring 0 needs to be carefully developed and checked since a flaw in it can jeopardise the security properties of the entire system: the TCP/IP library should not have access to the private key used for the TLS handshake. If we trust the OCaml runtime, especially its memory management, there is no way for the TCP/IP library to access the memory of the TLS subsystem: the TLS API does not expose the private key via an API call, and being in a memory safe language, a library cannot read arbitrary memory. There is no real need to isolate each library into a separate address spaces. In my opinion, using capabilities for memory access would be a great improvement, similar to barrelfish. OCaml has a C foreign function call interface which can be used to read arbitrary memory -- you have to take care that all C bits of the system are not malicious (it is fortunately difficult to embed C code into MirageOS, thus only few bits written in C are in MirageOS (such as (loop and allocation free) crypto primitives). To further read up on the topic, there is a nice article about the security.

This website is 12MB in size (and I didn't even bother to strip yet), which includes the static CSS and JavaScript (bootstrap, jquery, fonts), HTTP, TLS (also X.509, ASN.1, crypto), git (and irmin), TCP/IP libraries. The memory management in MirageOS is straightforward: the hypervisor provides the OCaml runtime with a chunk of memory, which immediately takes all of it.

This is much simpler to configure and deploy than a UNIX operating system: There is no virtual memory, no process management, no file system (the markdown content is held in memory with irmin!), no user management in the image.

At compile (configuration) time, the TLS keys are baked into the image, in addition to the url of the remote git repository, the IPv4 address and ports the image should use: The full command line for configuring this website is: mirage configure --no-opam --xen -i Posts -n "full stack engineer" -r https://github.com/hannesm/hannes.nqsb.io.git --dhcp false --network 0 --ip 198.167.222.205 --netmask 255.255.255.0 --gateways 198.167.222.1 --tls 443 --port 80. It relies on the fact that the TLS certificate chain and private key are in the tls/ subdirectory, which is transformed to code and included in the image (using crunch). An improvement would be to use an ELF section, but there is no code yet. After configuring and installing the required dependencies, a make builds the statically linked image.

Deployment is done via xl create canopy.xl. The file canopy.xl is automatically generated by mirage --configure (but might need modifications). It contains the full path to the image, the name of the bridge interface, and how much memory the image can use:

name = 'canopy'
kernel = 'mir-canopy.xen'
builder = 'linux'
memory = 256
on_crash = 'preserve'
vif = [ 'bridge=br0' ]

To rephrase: instead of running on a multi-purpose operating system including processes, file system, etc., this website uses a set of libraries, which are compiled and statically linked into the virtual machine image.

MirageOS uses the module system of OCaml to define how interfaces should be, thus an application developer does not need to care whether they are using the TCP/IP stack written in OCaml, or the sockets API of a UNIX operating system. This also allows to compile and debug your library on UNIX using off-the-shelf tools before deploying it as a virtual machine (NB: this is a lie, since there is code which is only executed when running on Xen, and this code can be buggy) ;).

Most of the MirageOS ecosystem is developed under MIT/ISC/BSD license, which allows everybody to use it for whichever project they want.

Did I mention that by using less code the attack vector shrinks? In addition to that, using a memory safe programming language, where the developer does not need to care about memory management and bounds checks, immediately removes several classes of security problems (namely spatial and temporal memory issues), once the runtime is trusted. The OCaml runtime was reviewed by the French Agence nationale de la sécurité des systèmes d’information in 2013, leading to some changes, such as separation of immutable strings (String) from mutable byte vectors (Bytes).

The attack surface is still big enough: logical issues, resource management, and there is no access control. This website does not need access control, publishing of content is protected by relying on GitHub's access control.

I hope I gave some insight into what the purpose of an operating systems is, and how MirageOS fits into the picture. I'm interested in feedback, either via twitter or as an issue on the data repository on GitHub.

Other updates in the MirageOS ecosystem

  • this website is based on Canopy, the content is stored as markdown in a git repository
  • it was running in a FreeBSD jail, but when I compiled too much the underlying zfs file system wasn't happy (and is now hanging in kernel space in a read)
  • no remote power switch (borrowed to a friend 3 weeks ago), nobody was willing to go to the data centre and reboot
  • I wanted to move it anyways to a host where I can deploy Xen guest VMs
  • turns out the Xen compilation and deployment mode needed some love:
  • I was travelling
  • good news: it now works on Xen, and there is an atom feed
  • life of an "eat your own dogfood" full stack engineer ;)