Published: 2016-06-11 (last updated: 2016-06-11)
I was busy writing code, text, talks, and also spend a week without Internet, where I ground and brewed 15kg espresso.
Size of a MirageOS unikernel
There have been lots of claims and myths around the concrete size of MirageOS unikernels. In this article I'll apply some measurements which overapproximate the binary sizes. The tools used for the visualisations are available online, and soon hopefully upstreamed into the mirage tool. This article uses mirage-2.9.0 (which might be outdated at the time of reading).
Let us start with a very minimal unikernel, consisting of a
module Main (C: V1_LWT.CONSOLE) = struct let start c = C.log_s c "hello world" end
and the following
open Mirage let () = register "console" [ foreign "Unikernel.Main" (console @-> job) $ default_console ]
mirage configure --unix and
mirage build, we end up (at least on a 64bit FreeBSD-11 system with OCaml 4.02.3) with a 2.8MB
main.native, dynamically linked against
ldd ftw), or a 4.5MB Xen virtual image (built on a 64bit Linux computer).
_build directory, we can find some object files and their byte sizes:
7144 key_gen.o 14568 main.o 3552 unikernel.o
These do not sum up to 2.8MB ;)
We did not specify any dependencies ourselves, thus all bits have been injected automatically by the
mirage tool. Let us dig a bit deeper what we actually used.
mirage configure generates a
Makefile which includes the dependent OCaml libraries, and the packages which are used:
LIBS = -pkgs functoria.runtime, mirage-clock-unix, mirage-console.unix, mirage-logs, mirage-types.lwt, mirage-unix, mirage.runtime PKGS = functoria lwt mirage-clock-unix mirage-console mirage-logs mirage-types mirage-types-lwt mirage-unix
I explained bits of our configuration DSL Functoria earlier. The mirage-clock device is automatically injected by mirage, providing an implementation of the
CLOCK device. We use a mirage-console device, where we print the
hello world. Since
mirage-2.9.0 the logging library (and its reporter, mirage-logs) is automatically injected as well, which actually uses the clock. Also, the mirage type signatures are required. The mirage-unix contains a
main, and provides the argument vector
argv (all symbols in the
Looking into the archive files of those libraries, we end up with ~92KB (NB
mirage-types only contains types, and thus no runtime data):
15268 functoria/functoria-runtime.a 3194 mirage-clock-unix/mirage-clock.a 12514 mirage-console/mirage_console_unix.a 24532 mirage-logs/mirage_logs.a 14244 mirage-unix/OS.a 21964 mirage/mirage-runtime.a
This still does not sum up to 2.8MB since we're missing the transitive dependencies.
Visualising recursive dependencies
Let's use a different approach: first recursively find all dependencies. We do this by using
ocamlfind to read
META files which contain a list of dependent libraries in their
requires line. As input we use
LIBS from the Makefile snippet above. The code (OCaml script) is available here. The colour scheme is red for pieces of the OCaml distribution, yellow for input packages, and orange for the dependencies.
This is the UNIX version only, the Xen version looks similar (but worth mentioning).
While a dependency graphs gives a big picture of what the composed libraries of a MirageOS unikernel, we also want to know how many bytes they contribute to the unikernel. The dependency graph only contains the OCaml-level dependencies, but MirageOS has in addition to that a
pkg-config universe of the libraries written in C (such as mini-os, openlibm, ...).
We overapproximate the sizes here by assuming that a linker simply concatenates all required object files. This is not true, since the sum of all objects is empirically factor two of the actual size of the unikernel.
I developed a pie chart visualisation, but a friend of mine reminded me that such a chart is pretty useless for comparing slices for the human brain. I spent some more time to develop a treemap visualisation to satisfy the brain. The implemented algorithm is based on squarified treemaps, but does not use implicit mutable state. In addition, the provided script parses common linker flags (
-o -L -l) and collects arguments to be linked in. It can be passed to
ocamlopt as the C linker, more instructions at the end of
treemap.ml (which should be cleaned up and integrated into the mirage tool, as mentioned earlier).
As mentioned above, this is an overapproximation. The
libgcc.a is only needed on Xen (see this comment), I have not yet tracked down why there is a
libasmrun.a and a
More complex examples
Besides the hello world, I used the same tools on our BTC Piñata.
OCaml does not yet do dead code elimination, but there is a PR based on the flambda middle-end which does so. I haven't yet investigated numbers using that branch.
Those counting statistics could go into more detail (e.g. using
nm to count the sizes of concrete symbols - which opens the possibility to see which symbols are present in the objects, but not in the final binary). Also, collecting the numbers for each module in a library would be great to have. In the end, it would be great to easily spot the source fragments which are responsible for a huge binary size (and getting rid of them).