Counting Bytes

Written by hannes
Classified under: mirageosbackground
Published: 2016-06-11 (last updated: 2016-06-11)

I was busy writing code, text, talks, and also spend a week without Internet, where I ground and brewed 15kg espresso.

Size of a MirageOS unikernel

There have been lots of claims and myths around the concrete size of MirageOS unikernels. In this article I'll apply some measurements which overapproximate the binary sizes. The tools used for the visualisations are available online, and soon hopefully upstreamed into the mirage tool. This article uses mirage-2.9.0 (which might be outdated at the time of reading).

Let us start with a very minimal unikernel, consisting of a unikernel.ml:

module Main (C: V1_LWT.CONSOLE) = struct
  let start c = C.log_s c "hello world"
end

and the following config.ml:

open Mirage

let () =
  register "console" [
    foreign "Unikernel.Main" (console @-> job) $ default_console
  ]

If we mirage configure --unix and mirage build, we end up (at least on a 64bit FreeBSD-11 system with OCaml 4.02.3) with a 2.8MB main.native, dynamically linked against libthr, libm and libc (ldd ftw), or a 4.5MB Xen virtual image (built on a 64bit Linux computer).

In the _build directory, we can find some object files and their byte sizes:

 7144 key_gen.o
14568 main.o
 3552 unikernel.o

These do not sum up to 2.8MB ;)

We did not specify any dependencies ourselves, thus all bits have been injected automatically by the mirage tool. Let us dig a bit deeper what we actually used. mirage configure generates a Makefile which includes the dependent OCaml libraries, and the packages which are used:

LIBS   = -pkgs functoria.runtime, mirage-clock-unix, mirage-console.unix, mirage-logs, mirage-types.lwt, mirage-unix, mirage.runtime
PKGS   = functoria lwt mirage-clock-unix mirage-console mirage-logs mirage-types mirage-types-lwt mirage-unix

I explained bits of our configuration DSL Functoria earlier. The mirage-clock device is automatically injected by mirage, providing an implementation of the CLOCK device. We use a mirage-console device, where we print the hello world. Since mirage-2.9.0 the logging library (and its reporter, mirage-logs) is automatically injected as well, which actually uses the clock. Also, the mirage type signatures are required. The mirage-unix contains a sleep, a main, and provides the argument vector argv (all symbols in the OS module).

Looking into the archive files of those libraries, we end up with ~92KB (NB mirage-types only contains types, and thus no runtime data):

15268 functoria/functoria-runtime.a
 3194 mirage-clock-unix/mirage-clock.a
12514 mirage-console/mirage_console_unix.a
24532 mirage-logs/mirage_logs.a
14244 mirage-unix/OS.a
21964 mirage/mirage-runtime.a

This still does not sum up to 2.8MB since we're missing the transitive dependencies.

Visualising recursive dependencies

Let's use a different approach: first recursively find all dependencies. We do this by using ocamlfind to read META files which contain a list of dependent libraries in their requires line. As input we use LIBS from the Makefile snippet above. The code (OCaml script) is available here. The colour scheme is red for pieces of the OCaml distribution, yellow for input packages, and orange for the dependencies.

This is the UNIX version only, the Xen version looks similar (but worth mentioning).

You can spot at the right that mirage-bootvar uses re, which provoked me to open a PR, but Jon Ludlam already had a nicer PR which is now merged (and a new release is in preparation).

Counting bytes

While a dependency graphs gives a big picture of what the composed libraries of a MirageOS unikernel, we also want to know how many bytes they contribute to the unikernel. The dependency graph only contains the OCaml-level dependencies, but MirageOS has in addition to that a pkg-config universe of the libraries written in C (such as mini-os, openlibm, ...).

We overapproximate the sizes here by assuming that a linker simply concatenates all required object files. This is not true, since the sum of all objects is empirically factor two of the actual size of the unikernel.

I developed a pie chart visualisation, but a friend of mine reminded me that such a chart is pretty useless for comparing slices for the human brain. I spent some more time to develop a treemap visualisation to satisfy the brain. The implemented algorithm is based on squarified treemaps, but does not use implicit mutable state. In addition, the provided script parses common linker flags (-o -L -l) and collects arguments to be linked in. It can be passed to ocamlopt as the C linker, more instructions at the end of treemap.ml (which should be cleaned up and integrated into the mirage tool, as mentioned earlier).

As mentioned above, this is an overapproximation. The libgcc.a is only needed on Xen (see this comment), I have not yet tracked down why there is a libasmrun.a and a libxenasmrun.a.

More complex examples

Besides the hello world, I used the same tools on our BTC Piñata.

Conclusion

OCaml does not yet do dead code elimination, but there is a PR based on the flambda middle-end which does so. I haven't yet investigated numbers using that branch.

Those counting statistics could go into more detail (e.g. using nm to count the sizes of concrete symbols - which opens the possibility to see which symbols are present in the objects, but not in the final binary). Also, collecting the numbers for each module in a library would be great to have. In the end, it would be great to easily spot the source fragments which are responsible for a huge binary size (and getting rid of them).

I'm interested in feedback, either via twitter or as an issue on the data repository on GitHub.