urn:uuid:981361ca-e71d-4997-a52c-baeee78e4156full stack engineer2023-11-29T13:31:13-00:00<p>Core Internet protocols require operational experiments, even if formally specified</p> 2023-11-28T21:17:01-00:00<p>The <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol (TCP)</a> is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).</p> <p>As common for Internet protocols, TCP is specified in a series of so-called requests for comments (RFC). The latest revised version from August 2022 is <a href="https://datatracker.ietf.org/doc/html/rfc9293">RFC 9293</a>; the initial one was <a href="https://datatracker.ietf.org/doc/html/rfc793">RFC 793</a> from September 1981.</p> <h1 id="my-brief-personal-tcp-story">My brief personal TCP story</h1> <p>My interest in TCP started back in 2006 when we worked on a <a href="https://github.com/dylan-hackers/network-night-vision">network stack in Dylan</a> (these days abandoned). Ever since, I wanted to understand the implementation tradeoffs in more detail, including attacks and how to prevent a TCP stack from being vulnerable.</p> <p>In 2012, I attended ICFP in Copenhagen while a PhD student at ITU Copenhagen. There, <a href="https://www.cl.cam.ac.uk/~pes20/">Peter Sewell</a> gave an invited talk &quot;Tales from the jungle&quot; about rigorous methods for real-world infrastructure (C semantics, hardware (concurrency) behaviour of CPUs, TCP/IP, and likely more). Working on formal specifications myself in (<a href="https://en.itu.dk/-/media/EN/Research/PhD-Programme/PhD-defences/2013/130731-Hannes-Mehnert-PhD-dissertation-finalpdf.pdf">my dissertation</a>), and having a strong interest in real systems, I was immediately hooked by his perspective.</p> <p>To dive a bit more into <a href="https://www.cl.cam.ac.uk/~pes20/Netsem/">network semantics</a>, the work done on TCP by Peter Sewell, et al., is a formal specification (or a model) of TCP/IP and the Unix sockets API developed in HOL4. It is a label transition system with nondeterministic choices, and the model itself is executable. It has been validated with the real world by collecting thousands of traces on Linux, Windows, and FreeBSD, which have been checked by the model for validity. This copes with the different implementations of the English prose of the RFCs. The network semantics research found several issues in existing TCP stacks and reported them upstream to have them fixed (though, there still is some special treatment, e.g., for the &quot;BSD listen bug&quot;).</p> <p>In 2014, I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the <a href="https://dl.acm.org/doi/10.1145/3243650">journal paper</a> (<a href="http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf">author's copy</a>) published. At the same time, the <a href="https://mirage.io">MirageOS</a> melting pot was happening at University of Cambridge, where I contributed with David OCaml-TLS and other things.</p> <p>My intention was to understand TCP better and use the specification as a basis for a TCP stack for MirageOS. The <a href="https://github.com/mirage/mirage-tcpip">existing one</a> (which is still used) has technical debt: a high issue to number of lines ratio. The Lwt monad is ubiquitous, which makes testing and debugging pretty hard, and also utilising multiple cores with OCaml Multicore won't be easy. Plus it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me implement things such as restart-on-failure in <a href="https://github.com/robur-coop/albatross">Albatross</a>, there are certain connection states which will never be exited.</p> <h1 id="the-rise-of-µtcp">The rise of <a href="https://github.com/robur-coop/utcp">µTCP</a></h1> <p>Back in Cambridge, I didn't manage to write a TCP stack based on the model, but in 2019, I restarted that work and got µTCP (the formal model manually translated to OCaml) to compile and do TCP session setup and teardown. Since it was a model that uses nondeterminism, this couldn't be translated one-to-one into an executable program, but there are places where decisions have to be made. Due to other projects, I worked only briefly in 2021 and 2022 on µTCP, but finally in the Summer of 2023, I motivated myself to push µTCP into a usable state. So far I've spend 25 days in 2023 on µTCP. Thanks to <a href="https://tarides.com">Tarides</a> for supporting my work.</p> <p>Since late August, we have been running some unikernels using µTCP, e.g., the <a href="https://retreat.mirage.io">retreat</a> website. This allows us to observe µTCP and find and solve issues that occur in the real world. It turned out that the model is not always correct (i.e., there is no retransmit timer in the close wait state, which avoids proper session teardowns). We report statistics about how many TCP connections are in which state to an Influx time series database and view graphs rendered by Grafana. If there are connections that are stuck for multiple hours, this indicates a resource leak that should be addressed. Grafana was tremendously helpful to find out where to look for resource leaks. Still, there's work to understand the behaviour, look at what the model does, what µTCP does, what the RFC says, and eventually what existing deployed TCP stacks do.</p> <h1 id="the-secondary-nameserver-issue">The secondary nameserver issue</h1> <p>One of our secondary nameservers attempts to receive zones (via AXFR using TCP) from another nameserver that is currently not running. Thus it replies to each SYN packet a corresponding RST. Below I graphed the network utilisation (send data/packets is positive y-axis, receive part on the negative) over time (on the x-axis) on the left and memory usage (bytes on y-axis) over time (x-axis) on the right of our nameserver. You can observe that both increases over time, and roughly every 3 hours, the unikernel hits its configured memory limit (64 MB), crashes with <em>out of memory</em>, and is restarted. The graph below is using the <code>mirage-tcpip</code> stack.</p> <p><a href="/static/img/a.ns.mtcp.png"><img src="/static/img/a.ns.mtcp.png" width="750" /></a></p> <p>Now, after switching over to µTCP, graphed below, there's much less network utilisation and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. On their left side, both graphs contain a few hours of <code>mirage-tcpip</code>, and shortly after 20:00 on Nov 23rd, µTCP got deployed.</p> <p><a href="/static/img/a.ns.mtcp-utcp.png"><img src="/static/img/a.ns.mtcp-utcp.png" width="750" /></a></p> <p>Investigating the involved parts showed that an unestablished TCP connection has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and it doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the <a href="https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml">TCP.S</a> module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the <a href="https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c">required code changes</a> was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilisation increased enormously.</p> <p><a href="/static/img/a.ns.utcp-ev.png"><img src="/static/img/a.ns.utcp-ev.png" width="750" /></a></p> <p>Now, the network utilisation is unwanted. This was hidden by the application waiting forever while the TCP connection getting established. Our bug fix uncovered another issue -- a tight loop:</p> <ul> <li>The nameserver attempts to connect to the other nameserver (<code>request</code>); </li> <li>This results in a <code>TCP.create_connection</code> which errors after one roundtrip; </li> <li>This leads to a <code>close</code>, which attempts a <code>request</code> again. </li> </ul> <p>This is unnecessary since the DNS server code has a timer to attempt to connect to the remote nameserver periodically (but takes a break between attempts). After understanding this behaviour, we worked on <a href="https://github.com/mirage/ocaml-dns/pull/347">the fix</a> and redeployed the nameserver again. On the left edge, the has the tight loop (so you have a baseline for comparison), and at 16:05, we deployed the fix. Since then it looks pretty smooth, both in memory usage and in network utilisation.</p> <p><a href="/static/img/a.ns.utcp-fixed.png"><img src="/static/img/a.ns.utcp-fixed.png" width="750" /></a></p> <p>To give you the entire picture, below is the graph where you can spot the <code>mirage-tcpip</code> stack (lots of network, restarting every 3 hours), µTCP-without-informing-application (run for 3 * ~36 hours), DNS-server-high-network-utilization (which only lasted for a brief period, thus it is more a point in the graph), and finally the unikernel with both fixes applied.</p> <p><a href="/static/img/a.ns.all.png"><img src="/static/img/a.ns.all.png" width="750" /></a></p> <h1 id="conclusion">Conclusion</h1> <p>What can we learn from that? Choosing convenient tooling is crucial for effective debugging. Also, fixing one issue may uncover other issues. And of course, the <code>mirage-tcpip</code> was running with the DNS-server that had a tight reconnect loop. But, below the line: should such an application lead to memory leaks? I don't think so. My approach is that all core network libraries should work in a non-resource-leaky way with any kind of application on top of it. When one TCP connection returns an error (and thus is destroyed), the TCP stack should have no more resources used for that connection.</p> <p>We'll take more time to investigate issues of µTCP in production, plan to write further documentation and blog posts, and hopefully soon will be ready for an initial public release. In the meantime, you can follow our development repository.</p> <p>We at <a href="https://robur.coop">Robur</a> are working as a collective since 2018 on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust, and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continue our work by <a href="https://aenderwerk.de/donate/">donating</a> (select Robur from the drop-down or put &quot;donation Robur&quot; in the purpose of the bank transfer).</p> <p>If you have any questions, reach us best via eMail to team AT robur DOT coop.</p> urn:uuid:96688956-0808-5d44-b795-1d64cbb4f947Redeveloping TCP from the ground up2023-11-29T13:31:13-00:00hannes<p>fleet management for MirageOS unikernels using a mutually authenticated TLS handshake</p> 2022-11-17T12:41:11-00:00<p>EDIT (2023-05-16): Updated with albatross release version 2.0.0.</p> <h2 id="deploying-mirageos-unikernels">Deploying MirageOS unikernels</h2> <p>More than five years ago, I posted <a href="/Posts/VMM">how to deploy MirageOS unikernels</a>. My motivation to work on this topic is that I'm convinced of reduced complexity, improved security, and more sustainable resource footprint of MirageOS unikernels, and want to ease deployment thereof. More than one year ago, I described <a href="/Posts/Deploy">how to deploy reproducible unikernels</a>.</p> <h2 id="albatross">Albatross</h2> <p>In recent months we worked hard on the underlying infrastructure: <a href="https://github.com/roburio/albatross">albatross</a>. Albatross is the orchestration system for MirageOS unikernels that use solo5 with <a href="https://github.com/Solo5/solo5/blob/master/docs/architecture.md">hvt or spt tender</a>. It deals with three tasks:</p> <ul> <li>unikernel creation (destroyal, restart) </li> <li>capturing console output </li> <li>collecting metrics in the host system about unikernels </li> </ul> <p>An addition to the above is dealing with multiple tenants on the same machine: remote management of your unikernel fleet via TLS, and resource policies.</p> <h2 id="history">History</h2> <p>The initial commit of albatross was in May 2017. Back then it replaced the shell scripts and manual <code>scp</code> of unikernel images to the server. Over time it evolved and adapted to new environments. Initially a solo5 unikernel would only know of a single network interface, these days there can be multiple distinguished by name. Initially there was no support for block devices. Only FreeBSD was supported in the early days. Nowadays we built daily packages for Debian, Ubuntu, FreeBSD, and have support for NixOS, and the client side is supported on macOS as well.</p> <h3 id="asn.1">ASN.1</h3> <p>The communication format between the albatross daemons and clients was changed multiple times. I'm glad that albatross uses ASN.1 as communication format, which makes extension with optional fields easy, and also allows &quot;choice&quot; (the sum type) to be not tagged (the binary is the same as no choice type), thus adding choice to an existing grammar, and preserving the old in the default (untagged) case is a decent solution.</p> <p>So, if you care about backward and forward compatibility, as we do, since we may be in control of which albatross servers are deployed on our machine, but not what albatross versions the clients are using -- it may be wise to look into ASN.1. Recent efforts (json with schema, ...) may solve similar issues, but ASN.1 is as well very tiny in size.</p> <h2 id="what-resources-does-a-unikernel-need">What resources does a unikernel need?</h2> <p>A unikernel is just an operating system for a single service, there can't be much it can need.</p> <h3 id="name">Name</h3> <p>So, first of all a unikernel has a name, or a handle. This is useful for reporting statistics, but also to specify which console output you're interested in. The name is a string with printable ASCII characters (and dash '-' and dot '.'), with a length up to 64 characters - so yes, you can use an UUID if you like.</p> <h3 id="memory">Memory</h3> <p>Another resource is the amount of memory assigned to the unikernel. This is specified in megabyte (as solo5 does), with the range being 10 (below not even a hello world wants to start) to 1024.</p> <h3 id="arguments">Arguments</h3> <p>Of course, you can pass via albatross boot parameters to the unikernel. Albatross doesn't impose any restrictions here, but the lower levels may.</p> <h3 id="cpu">CPU</h3> <p>Due to multiple tenants, and side channel attacks, it looked right at the beginning like a good idea to restrict each unikernel to a specific CPU. This way, one tenant may use CPU 5, and another CPU 9 - and they'll not starve each other (best to make sure that these CPUs are in different packages). So, albatross takes a number as the CPU, and executes the solo5 tender within <code>taskset</code>/<code>cpuset</code>.</p> <h3 id="fail-behaviour">Fail behaviour</h3> <p>In normal operations, exceptional behaviour may occur. I have to admit that I've seen MirageOS unikernels that suffer from not freeing all the memory they have allocated. To avoid having to get up at 4 AM just to start the unikernel that went out of memory, there's the possibility to restart the unikernel when it exited. You can even specify on which exit codes it should be restarted (the exit code is the only piece of information we have from the outside what caused the exit). This feature was implemented in October 2019, and has been very precious since then. :)</p> <h3 id="network">Network</h3> <p>This becomes a bit more complex: a MirageOS unikernel can have network interfaces, and solo5 specifies a so-called manifest with a list of these (name and type, and type is so far always basic). Then, on the actual server there are bridges (virtual switches) configured. Now, these may have the same name, or may need to be mapped. And of course, the unikernel expects a tap interface that is connected to such a bridge, not the bridge itself. Thus, albatross creates tap devices, attaches these to the respective bridges, and takes care about cleaning them up on teardown. The albatross client verifies that for each network interface in the manifest, there is a command-line argument specified (<code>--net service:my_bridge</code> or just <code>--net service</code> if the bridge is named service). The tap interface name is not really of interest to the user, and will not be exposed.</p> <h3 id="block-devices">Block devices</h3> <p>On the host system, it's just a file, and passed to the unikernel. There's the need to be able to create one, dump it, and ensure that each file is only used by one unikernel. That's all that is there.</p> <h2 id="metrics">Metrics</h2> <p>Everyone likes graphs, over time, showing how much traffic or CPU or memory or whatever has been used by your service. Some of these statistics are only available in the host system, and it is also crucial for development purposes to compare whether the bytes sent in the unikernel sum up to the same on the host system's tap interface.</p> <p>The albatross-stats daemon collects metrics from three sources: network interfaces, getrusage (of a child process), VMM debug counters (to count VM exits etc.). Since the recent 1.5.3, albatross-stats now connects at startup to the albatross-daemon and then retrieves the information which unikernels are up and running, and starts periodically collecting data in memory.</p> <p>Other clients, being it a dump on your console window, a write into an rrd file (good old MRTG times), or a push to influx, can use the stats data to correlate and better analyse what is happening on the grand scale of things. This helped a lot by running several unikernels with different opam package sets to figure out which opam packages leave their hands on memory over time.</p> <p>As a side note, if you make the unikernel name also available in the unikernel, it can tag its own metrics with the same identifier, and you can correlate high-level events (such as amount of HTTP requests) with low-level things &quot;allocated more memory&quot; or &quot;consumed a lot of CPU&quot;.</p> <h2 id="console">Console</h2> <p>There's not much to say about the console, just that the albatross-console daemon is running with low privileges, and reading from a FIFO that the unikernel writes to. It never writes anything to disk, but keeps the last 1000 lines in memory, available from a client asking for it.</p> <h2 id="the-daemons">The daemons</h2> <p>So, the main albatross-daemon runs with superuser privileges to create virtual machines, and opens a unix domain socket where the clients and other daemons are connecting to. The other daemons are executed with normal user privileges, and never write anything to disk.</p> <p>The albatross-daemon keeps state about the running unikernels, and if it is restarted, the unikernels are started again. Maybe worth to mention that this lead sometimes to headaches (due to data being dumped to disk, and the old format should always be supported), but was also a huge relief to not have to care about creating all the unikernels just because albatross-daemon was killed.</p> <h2 id="remote-management">Remote management</h2> <p>There's one more daemon program: albatross-tls-endpoint. It accepts clients via a remote TCP connection, and establish a mutual-authenticated TLS handshake. When done, the command is forwarded to the respective Unix domain socket, and the reply is sent back to the client.</p> <p>The daemon itself has a X.509 certificate to authenticate, but the client is requested to show its certificate chain as well. This by now requires TLS 1.3, so the client certificates are sent over the encrypted channel.</p> <p>A step back, X.509 certificate contains a public key and a signature from one level up. When the server knows about the root (or certificate authority (CA)) certificate, and following the chain can verify that the leaf certificate is valid. Additionally, a X.509 certificate is a ASN.1 structure with some fixed fields, but also contains extensions, a key-value store where the keys are object identifiers, and the values are key-dependent data. Also note that this key-value store is cryptographically signed.</p> <p>Albatross uses the object identifier, assigned to Camelus Dromedarius (MirageOS - 1.3.6.1.4.1.49836.42) to encode the command to be executed. This means that once the TLS handshake is established, the command to be executed is already transferred.</p> <p>In the leaf certificate, there may be the &quot;create unikernel&quot; command with the unikernel image, it's boot parameters, and other resources. Or a &quot;read the console of my unikernel&quot;. In the intermediate certificates (from root to leaf), resource policies are encoded (this path may only have X unikernels running with a total of Y MB memory, and Z MB of block storage, using CPUs A and B, accessing bridges C and D). From the root downwards these policies may only decrease. When a unikernel should be created (or other commands are executed), the policies are verified to hold. If they do not, an error is reported.</p> <h2 id="fleet-management">Fleet management</h2> <p>Of course it is very fine to create your locally compiled unikernel to your albatross server, go for it. But in terms of &quot;what is actually running here?&quot; and &quot;does this unikernel need to be updated because some opam package had a security issues?&quot;, this is not optimal.</p> <p>Since we provide <a href="https://builds.robur.coop">daily reproducible builds</a> with the current HEAD of the main opam-repository, and these unikernels have no configuration embedded (but take everything as boot parameters), we just deploy them. They come with the information what opam packages contributed to the binary, which environment variables were set, and which system packages were installed with which versions.</p> <p>The whole result of reproducible builds for us means: we have a hash of a unikernel image that we can lookup in our build infrastructure, and take a look whether there is a newer image for the same job. And if there is, we provide a diff between the packages contributed to the currently running unikernel and the new image. That is what the albatross-client update command is all about.</p> <p>Of course, your mileage may vary and you want automated deployments where each git commit triggers recompilation and redeployment. The downside would be that sometimes only dependencies are updated and you've to cope with that.</p> <p>There is a client <code>albatross-client</code>, depending on arguments either connects to a local Unix domain socket, or to a remote albatross instance via TCP and TLS, or outputs a certificate signing request for later usage. Data, such as the unikernel ELF image, is compressed in certificates.</p> <h2 id="installation">Installation</h2> <p>For Debian and Ubuntu systems, we provide package repositories. Browse the dists folder for one matching your distribution, and add it to <code>/etc/apt/sources.list</code>:</p> <pre><code>$ wget -q -O /etc/apt/trusted.gpg.d/apt.robur.coop.gpg https://apt.robur.coop/gpg.pub $ echo &quot;deb https://apt.robur.coop ubuntu-20.04 main&quot; &gt;&gt; /etc/apt/sources.list # replace ubuntu-20.04 with e.g. debian-11 on a debian buster machine $ apt update $ apt install solo5 albatross </code></pre> <p>On FreeBSD:</p> <pre><code>$ fetch -o /usr/local/etc/pkg/robur.pub https://pkg.robur.coop/repo.pub # download RSA public key $ echo 'robur: { url: &quot;https://pkg.robur.coop/${ABI}&quot;, mirror_type: &quot;srv&quot;, signature_type: &quot;pubkey&quot;, pubkey: &quot;/usr/local/etc/pkg/robur.pub&quot;, enabled: yes }' &gt; /usr/local/etc/pkg/repos/robur.conf # Check https://pkg.robur.coop which ABI are available $ pkg update $ pkg install solo5 albatross </code></pre> <p>Please ensure to have at least version 2.0.0 of albatross installed.</p> <p>For other distributions and systems we do not (yet?) provide binary packages. You can compile and install them using opam (<code>opam install solo5 albatross</code>). Get in touch if you're keen on adding some other distribution to our reproducible build infrastructure.</p> <h2 id="conclusion">Conclusion</h2> <p>After five years of development and operating albatross, feel free to get it and try it out. Or read the code, discuss issues and shortcomings with us - either at the issue tracker or via eMail.</p> <p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions. We are a non-profit company, and rely on <a href="https://robur.coop/Donate">donations</a> for doing our work - everyone can contribute.</p> urn:uuid:1f354218-e8c3-5136-a2ca-c88f3c2878d8Deploying reproducible unikernels with albatross2023-05-16T17:21:47-00:00hannes<p>Re-developing an opam cache from scratch, as a MirageOS unikernel</p> 2022-09-29T13:04:14-00:00<p>We at <a href="https://robur.coop">robur</a> developed <a href="https://git.robur.coop/robur/opam-mirror">opam-mirror</a> in the last month and run a public opam mirror at https://opam.robur.coop (updated hourly).</p> <h1 id="what-is-opam-and-why-should-i-care">What is opam and why should I care?</h1> <p><a href="https://opam.ocaml.org">Opam</a> is the OCaml package manager (also used by other projects such as <a href="https://coq.inria.fr">coq</a>). It is a source based system: the so-called repository contains the metadata (url to source tarballs, build dependencies, author, homepage, development repository) of all packages. The main repository is hosted on GitHub as <a href="https://github.com/ocaml/opam-repository">ocaml/opam-repository</a>, where authors of OCaml software can contribute (as pull request) their latest releases.</p> <p>When opening a pull request, automated systems attempt to build not only the newly released package on various platforms and OCaml versions, but also all reverse dependencies, and also with dependencies with the lowest allowed version numbers. That's crucial since neither semantic versioning has been adapted across the OCaml ecosystem (which is tricky, for example due to local opens any newly introduced binding will lead to a major version bump), neither do many people add upper bounds of dependencies when releasing a package (nobody is keen to state &quot;my package will not work with <a href="https://erratique.ch/software/cmdliner">cmdliner</a> in version 1.2.0&quot;).</p> <p>So, the opam-repository holds the metadata of lots of OCaml packages (around 4000 at the moment this article was written) with lots of versions (in total 25000) that have been released. It is used by the opam client to figure out which packages to install or upgrade (using a solver that takes the version bounds into consideration).</p> <p>Of course, opam can use other repositories (overlays) or forks thereof. So nothing stops you from using any other opam repository. The url to the source code of each package may be a tarball, or a git repository or other version control systems.</p> <p>The vast majority of opam packages released to the opam-repository include a link to the source tarball and a cryptographic hash of the tarball. This is crucial for security (under the assumption the opam-repository has been downloaded from a trustworthy source - check back later this year for updates on <a href="/Posts/Conex">conex</a>). At the moment, there are some weak spots in respect to security: md5 is still allowed, and the hash and the tarball are downloaded from the same server: anyone who is in control of that server can inject arbitrary malicious data. As outlined above, we're working on infrastructure which fixes the latter issue.</p> <h1 id="how-does-the-opam-client-work">How does the opam client work?</h1> <p>Opam, after initialisation, downloads the <code>index.tar.gz</code> from <code>https://opam.ocaml.org/index.tar.gz</code>, and uses this as the local opam universe. An <code>opam install cmdliner</code> will resolve the dependencies, and download all required tarballs. The download is first tried from the cache, and if that failed, the URL in the package file is used. The download from the cache uses the base url, appends the archive-mirror, followed by the hash algorithm, the first two characters of the has of the tarball, and the hex encoded hash of the archive, i.e. for cmdliner 1.1.1 which specifies its sha512: <code>https://opam.ocaml.org/cache/sha512/54/5478ad833da254b5587b3746e3a8493e66e867a081ac0f653a901cc8a7d944f66e4387592215ce25d939be76f281c4785702f54d4a74b1700bc8838a62255c9e</code>.</p> <h1 id="how-does-the-opam-repository-work">How does the opam repository work?</h1> <p>According to DNS, opam.ocaml.org is a machine at amazon. It likely, apart from the website, uses <code>opam admin index</code> periodically to create the index tarball and the cache. There's an observable delay between a package merge in the opam-repository and when it shows up at opam.ocaml.org. Recently, there was <a href="https://discuss.ocaml.org/t/opam-ocaml-org-is-currently-down-is-that-where-indices-are-kept-still/">a reported downtime</a>.</p> <p>Apart from being a single point of failure, if you're compiling a lot of opam projects (e.g. a continuous integration / continuous build system), it makes sense from a network usage (and thus sustainability perspective) to move the cache closer to where you need the source archives. We're also organising the MirageOS <a href="http://retreat.mirage.io">hack retreats</a> in a northern African country with poor connectivity - so if you gather two dozen camels you better bring your opam repository cache with you to reduce the bandwidth usage (NB: this requires at the moment cooperation of all participants to configure their default opam repository accordingly).</p> <h1 id="re-developing-opam-admin-create-as-mirageos-unikernel">Re-developing &quot;opam admin create&quot; as MirageOS unikernel</h1> <p>The need for a local opam cache at our <a href="https://builds.robur.coop">reproducible build infrastructure</a> and the retreats, we decided to develop <a href="https://git.robur.coop/robur/opam-mirror">opam-mirror</a> as a <a href="https://mirage.io">MirageOS unikernel</a>. Apart from a useful showcase using persistent storage (that won't fit into memory), and having fun while developing it, our aim was to reduce our time spent on system administration (the <code>opam admin index</code> is only one part of the story, it needs a Unix system and a webserver next to it - plus remote access for doing software updates - which has quite some attack surface.</p> <p>Another reason for re-developing the functionality was that the opam code (what opam admin index actually does) is part of the opam source code, which totals to 50_000 lines of code -- looking up whether one or all checksums are verified before adding the tarball to the cache, was rather tricky.</p> <p>In earlier years, we avoided persistent storage and block devices in MirageOS (by embedding it into the source code with <a href="https://github.com/mirage/ocaml-crunch">crunch</a>, or using a remote git repository), but recent development, e.g. of <a href="https://somerandomidiot.com/blog/2022/03/04/chamelon/">chamelon</a> sparked some interest in actually using file systems and figuring out whether MirageOS is ready in that area. A month ago we started the opam-mirror project.</p> <p>Opam-mirror takes a remote repository URL, and downloads all referenced archives. It serves as a cache and opam-repository - and does periodic updates from the remote repository. The idea is to validate all available checksums and store the tarballs only once, and store overlays (as maps) from the other hash algorithms.</p> <h1 id="code-development-and-improvements">Code development and improvements</h1> <p>Initially, our plan was to use <a href="https://github.com/mirage/ocaml-git">ocaml-git</a> for pulling the repository, <a href="https://github.com/yomimono/chamelon">chamelon</a> for persistent storage, and <a href="https://github.com/inhabitedtype/httpaf">httpaf</a> as web server. With <a href="https://github.com/mirage/ocaml-tar">ocaml-tar</a> recent support of <a href="https://github.com/mirage/ocaml-tar/pull/88">gzip</a> we should be all set, and done within a few days.</p> <p>There is already a gap in the above plan: which http client to use - in the best case something similar to our <a href="https://github.com/roburio/http-lwt-client">http-lwt-client</a> - in MirageOS: it should support HTTP 1.1 and HTTP 2, TLS (with certificate validation), and using <a href="https://github.com/roburio/happy-eyeballs">happy-eyeballs</a> to seemlessly support both IPv6 and legacy IPv4. Of course it should follow redirect, without that we won't get far in the current Internet.</p> <p>On the path (over the last month), we fixed file descriptor leaks (memory leaks) in <a href="https://github.com/dinosaure/paf-le-chien">paf</a> -- which is used as a runtime for httpaf and h2.</p> <p>Then we ran into some trouble with chamelon (<a href="https://github.com/yomimono/chamelon/issues/11">out of memory</a>, some degraded peformance, it reporting out of disk space), and re-thought our demands for opam-mirror. Since the cache is only ever growing (new packages are released), there's no need to ever remove anything: it is append-only. Once we figured that out, we investigated what needs to be done in ocaml-tar (where tar is in fact a tape archive, and was initially designed as file format to be appended to) to support appending to an archive.</p> <p>We also re-thought our bandwidth usage, and instead of cloning the git remote at startup, we developed <a href="https://git.robur.coop/robur/git-kv">git-kv</a> which can dump and restore the git state.</p> <p>Also, initially we computed all hashes of all tarballs, but with the size increasing (all archives are around 7.5GB) this lead to a major issue of startup time (around 5 minutes on a laptop), so we wanted to save and restore the maps as well.</p> <p>Since neither git state nor the maps are suitable for tar's append-only semantics, and we didn't want to investigate yet another file system - such as <a href="https://github.com/mirage/ocaml-fat">fat</a> may just work fine, but the code looks slightly bitrot, and the reported issues and non-activity doesn't make this package very trustworthy from our point of view. Instead, we developed <a href="https://github.com/reynir/mirage-block-partition">mirage-block-partition</a> to partition a block device into two. Then we just store the maps and the git state at the end - the end of a tar archive is 2 blocks of zeroes, so stuff at the far end aren't considered by any tooling. Extending the tar archive is also possible, only the maps and git state needs to be moved to the end (or recomputed). As file system, we developed <a href="https://git.robur.coop/reynir/oneffs">oneffs</a> which stores a single value on the block device.</p> <p>We observed a high memory usage, since each requested archive was first read from the block device into memory, and then sent out. Thanks to Pierre Alains <a href="https://github.com/mirage/mirage-kv/pull/28">recent enhancements</a> of the mirage-kv API, there is a <code>get_partial</code>, that we use to chunk-wise read the archive and send it via HTTP. Now, the memory usage is around 20MB (the git repository and the generated tarball are kept in memory).</p> <p>What is next? Downloading and writing to the tar archive could be done chunk-wise as well; also dumping and restoring the git state is quite CPU intensive, we would like to improve that. Adding the TLS frontend (currently done on our site by our TLS termination proxy <a href="https://github.com/roburio/tlstunnel">tlstunnel</a>) similar to how <a href="https://github.com/roburio/unipi">unipi</a> does it, including let's encrypt provisioning -- should be straightforward (drop us a note if you'd be interesting in that feature).</p> <h1 id="conclusion">Conclusion</h1> <p>To conclude, we managed within a month to develop this opam-mirror cache from scratch. It has a reasonable footprint (CPU and memory-wise), is easy to maintain and easy to update - if you want to use it, we also provide <a href="https://builds.robur.coop/job/opam-mirror">reproducible binaries</a> for solo5-hvt. You can use our opam mirror with <code>opam repository set-url default https://opam.robur.coop</code> (revert to the other with <code>opam repository set-url default https://opam.ocaml.org</code>) or use it as a backup with <code>opam repository add robur --rank 2 https://opam.robur.coop</code>.</p> <p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions. We are a non-profit company, and rely on <a href="https://robur.coop/Donate">donations</a> for doing our work - everyone can contribute.</p> urn:uuid:0dbd251f-32c7-57bd-8e8f-7392c0833a09Mirroring the opam repository and all tarballs2023-11-20T16:58:35-00:00hannes<p>How to monitor your MirageOS unikernel with albatross and monitoring-experiments</p> 2022-03-08T11:26:31-00:00<h1 id="introduction-to-monitoring">Introduction to monitoring</h1> <p>At <a href="https://robur.coop">robur</a> we use a range of MirageOS unikernels. Recently, we worked on improving the operations story thereof. One part is shipping binaries using our <a href="https://builds.robur.coop">reproducible builds infrastructure</a>. Another part is, once deployed we want to observe what is going on.</p> <p>I first got into touch with monitoring - collecting and graphing metrics - with <a href="https://oss.oetiker.ch/mrtg/">MRTG</a> and <a href="https://munin-monitoring.org/">munin</a> - and the simple network management protocol <a href="https://en.wikipedia.org/wiki/Simple_Network_Management_Protocol">SNMP</a>. From the whole system perspective, I find it crucial that the monitoring part of a system does not add pressure. This favours a push-based design, where reporting is done at the disposition of the system.</p> <p>The rise of monitoring where graphs are done dynamically (such as <a href="https://grafana.com/">Grafana</a>) and can be programmed (with a query language) by the operator are very neat, it allows to put metrics in relation after they have been recorded - thus if there's a thesis why something went berserk, you can graph the collected data from the past and prove or disprove the thesis.</p> <h1 id="monitoring-a-mirageos-unikernel">Monitoring a MirageOS unikernel</h1> <p>From the operational perspective, taking security into account - either the data should be authenticated and integrity-protected, or being transmitted on a private network. We chose the latter, there's a private network interface only for monitoring. Access to that network is only granted to the unikernels and metrics collector.</p> <p>For MirageOS unikernels, we use the <a href="https://github.com/mirage/metrics">metrics</a> library - which design shares the idea of <a href="https://erratique.ch/software/logs">logs</a> that only if there's a reporter registered, work is performed. We use the Influx line protocol via TCP to report via <a href="https://www.influxdata.com/time-series-platform/telegraf/">Telegraf</a> to <a href="https://www.influxdata.com/">InfluxDB</a>. But due to the design of <a href="https://github.com/mirage/metrics">metrics</a>, other reporters can be developed and used -- prometheus, SNMP, your-other-favourite are all possible.</p> <p>Apart from monitoring metrics, we use the same network interface for logging via syslog. Since the logs library separates the log message generation (in the OCaml libraries) from the reporting, we developed <a href="https://github.com/hannesm/logs-syslog">logs-syslog</a>, which registers a log reporter sending each log message to a syslog sink.</p> <p>We developed a small library for metrics reporting of a MirageOS unikernel into the <a href="https://github.com/roburio/monitoring-experiments">monitoring-experiments</a> package - which also allows to dynamically adjust log level and disable or enable metrics sources.</p> <h2 id="required-components">Required components</h2> <p>Install from your operating system the packages providing telegraf, influxdb, and grafana.</p> <p>Setup telegraf to contain a socket listener:</p> <pre><code>[[inputs.socket_listener]] service_address = &quot;tcp://192.168.42.14:8094&quot; keep_alive_period = &quot;5m&quot; data_format = &quot;influx&quot; </code></pre> <p>Use a unikernel that reports to Influx (below the heading &quot;Unikernels (with metrics reported to Influx)&quot; on <a href="https://builds.robur.coop">builds.robur.coop</a>) and provide <code>--monitor=192.168.42.14</code> as boot parameter. Conventionally, these unikernels expect a second network interface (on the &quot;management&quot; bridge) where telegraf (and a syslog sink) are running. You'll need to pass <code>--net=management</code> and <code>--arg='--management-ipv4=192.168.42.x/24'</code> to albatross-client.</p> <p>Albatross provides a <code>albatross-influx</code> daemon that reports information from the host system about the unikernels to influx. Start it with <code>--influx=192.168.42.14</code>.</p> <h2 id="adding-monitoring-to-your-unikernel">Adding monitoring to your unikernel</h2> <p>If you want to extend your own unikernel with metrics, follow along these lines.</p> <p>An example is the <a href="https://github.com/roburio/dns-primary-git">dns-primary-git</a> unikernel, where on the branch <code>future</code> we have a single commit ahead of main that adds monitoring. The difference is in the unikernel configuration and the main entry point. See the <a href="https://builds.robur.coop/job/dns-primary-git-monitoring/build/latest/">binary builts</a> in contrast to the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/">non-monitoring builts</a>.</p> <p>In config, three new command line arguments are added: <code>--monitor=IP</code>, <code>--monitor-adjust=PORT</code> <code>--syslog=IP</code> and <code>--name=STRING</code>. In addition, the package <code>monitoring-experiments</code> is required. And a second network interface <code>management_stack</code> using the prefix <code>management</code> is required and passed to the unikernel. Since the syslog reporter requires a console (to report when logging fails), also a console is passed to the unikernel. Each reported metrics includes a tag <code>vm=&lt;name&gt;</code> that can be used to distinguish several unikernels reporting to the same InfluxDB.</p> <p>Command line arguments:</p> <pre><code class="language-patch"> let doc = Key.Arg.info ~doc:&quot;The fingerprint of the TLS certificate.&quot; [ &quot;tls-cert-fingerprint&quot; ] in Key.(create &quot;tls_cert_fingerprint&quot; Arg.(opt (some string) None doc)) +let monitor = + let doc = Key.Arg.info ~doc:&quot;monitor host IP&quot; [&quot;monitor&quot;] in + Key.(create &quot;monitor&quot; Arg.(opt (some ip_address) None doc)) + +let monitor_adjust = + let doc = Key.Arg.info ~doc:&quot;adjust monitoring (log level, ..)&quot; [&quot;monitor-adjust&quot;] in + Key.(create &quot;monitor_adjust&quot; Arg.(opt (some int) None doc)) + +let syslog = + let doc = Key.Arg.info ~doc:&quot;syslog host IP&quot; [&quot;syslog&quot;] in + Key.(create &quot;syslog&quot; Arg.(opt (some ip_address) None doc)) + +let name = + let doc = Key.Arg.info ~doc:&quot;Name of the unikernel&quot; [&quot;name&quot;] in + Key.(create &quot;name&quot; Arg.(opt string &quot;ns.nqsb.io&quot; doc)) + let mimic_impl random stackv4v6 mclock pclock time = let tcpv4v6 = tcpv4v6_of_stackv4v6 $ stackv4v6 in let mhappy_eyeballs = mimic_happy_eyeballs $ random $ time $ mclock $ pclock $ stackv4v6 in </code></pre> <p>Requiring <code>monitoring-experiments</code>, registering command line arguments:</p> <pre><code class="language-patch"> package ~min:&quot;3.7.0&quot; ~max:&quot;3.8.0&quot; &quot;git-mirage&quot;; package ~min:&quot;3.7.0&quot; &quot;git-paf&quot;; package ~min:&quot;0.0.8&quot; ~sublibs:[&quot;mirage&quot;] &quot;paf&quot;; + package &quot;monitoring-experiments&quot;; + package ~sublibs:[&quot;mirage&quot;] ~min:&quot;0.3.0&quot; &quot;logs-syslog&quot;; ] in foreign - ~keys:[Key.abstract remote_k ; Key.abstract axfr] + ~keys:[ + Key.abstract remote_k ; Key.abstract axfr ; + Key.abstract name ; Key.abstract monitor ; Key.abstract monitor_adjust ; Key.abstract syslog + ] ~packages </code></pre> <p>Added console and a second network stack to <code>foreign</code>:</p> <pre><code class="language-patch"> &quot;Unikernel.Main&quot; - (random @-&gt; pclock @-&gt; mclock @-&gt; time @-&gt; stackv4v6 @-&gt; mimic @-&gt; job) + (console @-&gt; random @-&gt; pclock @-&gt; mclock @-&gt; time @-&gt; stackv4v6 @-&gt; mimic @-&gt; stackv4v6 @-&gt; job) + </code></pre> <p>Passing a console implementation (<code>default_console</code>) and a second network stack (with <code>management</code> prefix) to <code>register</code>:</p> <pre><code class="language-patch">+let management_stack = generic_stackv4v6 ~group:&quot;management&quot; (netif ~group:&quot;management&quot; &quot;management&quot;) let () = register &quot;primary-git&quot; - [dns_handler $ default_random $ default_posix_clock $ default_monotonic_clock $ - default_time $ net $ mimic_impl] + [dns_handler $ default_console $ default_random $ default_posix_clock $ default_monotonic_clock $ + default_time $ net $ mimic_impl $ management_stack] </code></pre> <p>Now, in the unikernel module the functor changes (console and second network stack added):</p> <pre><code class="language-patch">@@ -4,17 +4,48 @@ open Lwt.Infix -module Main (R : Mirage_random.S) (P : Mirage_clock.PCLOCK) (M : Mirage_clock.MCLOCK) (T : Mirage_time.S) (S : Mirage_stack.V4V6) (_ : sig e nd) = struct +module Main (C : Mirage_console.S) (R : Mirage_random.S) (P : Mirage_clock.PCLOCK) (M : Mirage_clock.MCLOCK) (T : Mirage_time.S) (S : Mirage _stack.V4V6) (_ : sig end) (Management : Mirage_stack.V4V6) = struct module Store = Irmin_mirage_git.Mem.KV(Irmin.Contents.String) module Sync = Irmin.Sync(Store) </code></pre> <p>And in the <code>start</code> function, the command line arguments are processed and used to setup syslog and metrics monitoring to the specified addresses. Also, a TCP listener is waiting for monitoring and logging adjustments if <code>--monitor-adjust</code> was provided:</p> <pre><code class="language-patch"> module D = Dns_server_mirage.Make(P)(M)(T)(S) + module Monitoring = Monitoring_experiments.Make(T)(Management) + module Syslog = Logs_syslog_mirage.Udp(C)(P)(Management) - let start _rng _pclock _mclock _time s ctx = + let start c _rng _pclock _mclock _time s ctx management = + let hostname = Key_gen.name () in + (match Key_gen.syslog () with + | None -&gt; Logs.warn (fun m -&gt; m &quot;no syslog specified, dumping on stdout&quot;) + | Some ip -&gt; Logs.set_reporter (Syslog.create c management ip ~hostname ())); + (match Key_gen.monitor () with + | None -&gt; Logs.warn (fun m -&gt; m &quot;no monitor specified, not outputting statistics&quot;) + | Some ip -&gt; Monitoring.create ~hostname ?listen_port:(Key_gen.monitor_adjust ()) ip management); connect_store ctx &gt;&gt;= fun (store, upstream) -&gt; load_git None store upstream &gt;&gt;= function | Error (`Msg msg) -&gt; </code></pre> <p>Once you compiled the unikernel (or downloaded a binary with monitoring), and start that unikernel by passing <code>--net:service=tap0</code> and <code>--net:management=tap10</code> (or whichever your <code>tap</code> interfaces are), and as unikernel arguments <code>--ipv4=&lt;my-ip-address&gt;</code> and <code>--management-ipv4=192.168.42.2/24</code> for IPv4 configuration, <code>--monitor=192.168.42.14</code>, <code>--syslog=192.168.42.10</code>, <code>--name=my.unikernel</code>, <code>--monitor-adjust=12345</code>.</p> <p>With this, your unikernel will report metrics using the influx protocol to 192.168.42.14 on port 8094 (every 10 seconds), and syslog messages via UDP to 192.168.0.10 (port 514). You should see your InfluxDB getting filled and syslog server receiving messages.</p> <p>When you configure <a href="https://grafana.com/docs/grafana/latest/getting-started/getting-started-influxdb/">Grafana to use InfluxDB</a>, you'll be able to see the data in the data sources.</p> <p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions.</p> urn:uuid:b8f1fa5b-d8dd-5a54-a9e4-064b9dcd053eAll your metrics belong to influx2023-05-16T17:21:47-00:00hannes<p>Finally, we provide reproducible binary MirageOS unikernels together with packages to reproduce them and setup your own builder</p> 2021-06-30T13:13:37-00:00<h2 id="introduction">Introduction</h2> <p>MirageOS development focus has been a lot on tooling and the developer experience, but to accomplish <a href="https://robur.coop">our</a> goal to &quot;get MirageOS into production&quot;, we need to lower the barrier. This means for us to release binary unikernels. As described <a href="/Posts/NGI">earlier</a>, we received a grant for &quot;Deploying MirageOS&quot; from <a href="https://pointer.ngi.eu">NGI Pointer</a> to work on the required infrastructure. This is joint work with <a href="https://reynir.dk/">Reynir</a>.</p> <p>We provide at <a href="https://builds.robur.coop">builds.robur.coop</a> binary unikernel images (and supplementary software). Doing binary releases of MirageOS unikernels is challenging in two aspects: firstly to be useful for everyone, a binary unikernel should not contain any configuration (such as private keys, certificates, etc.). Secondly, the binaries should be <a href="https://reproducible-builds.org">reproducible</a>. This is crucial for security; everyone can reproduce the exact same binary and verify that our build service did only use the sources. No malware or backdoors included.</p> <p>This post describes how you can deploy MirageOS unikernels without compiling it from source, then dives into the two issues outlined above - configuration and reproducibility - and finally describes how to setup your own reproducible build infrastructure for MirageOS, and how to bootstrap it.</p> <h2 id="deploying-mirageos-unikernels-from-binary">Deploying MirageOS unikernels from binary</h2> <p>To execute a MirageOS unikernel, apart from a hypervisor (Xen/KVM/Muen), a tender (responsible for allocating host system resources and passing these to the unikernel) is needed. Using virtio, this is conventionally done with qemu on Linux, but its code size (and attack surface) is huge. For MirageOS, we develop <a href="https://github.com/solo5/solo5">Solo5</a>, a minimal tender. It supports <em>hvt</em> - hardware virtualization (Linux KVM, FreeBSD BHyve, OpenBSD VMM), <em>spt</em> - sandboxed process (a tight seccomp ruleset (only a handful of system calls allowed, no hardware virtualization needed), Linux only). Apart from that, <a href="https://muen.sk"><em>muen</em></a> (a hypervisor developed in Ada), <em>virtio</em> (for some cloud deployments), and <em>xen</em> (PVHv2 or Qubes 4.0) - <a href="https://github.com/Solo5/solo5/blob/master/docs/building.md">read more</a>. We deploy our unikernels as hvt with FreeBSD BHyve as hypervisor.</p> <p>On <a href="https://builds.robur.coop">builds.robur.coop</a>, next to the unikernel images, <a href="https://builds.robur.coop/job/solo5-hvt/"><em>solo5-hvt</em> packages</a> are provided - download the binary and install it. A <a href="https://github.com/NixOS/nixpkgs/tree/master/pkgs/os-specific/solo5">NixOS package</a> is already available - please note that <a href="https://github.com/Solo5/solo5/pull/494">soon</a> packaging will be much easier (and we will work on packages merged into distributions).</p> <p>When the tender is installed, download a unikernel image (e.g. the <a href="https://builds.robur.coop/job/traceroute/build/latest/">traceroute</a> described in <a href="/Posts/Traceroute">an earlier post</a>), and execute it:</p> <pre><code>$ solo5-hvt --net:service=tap0 -- traceroute.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 </code></pre> <p>If you plan to orchestrate MirageOS unikernels, you may be interested in <a href="https://github.com/roburio/albatross">albatross</a> - we provide <a href="https://builds.robur.coop/job/albatross/">binary packages as well for albatross</a>. An upcoming post will go into further details of how to setup albatross.</p> <h2 id="mirageos-configuration">MirageOS configuration</h2> <p>A MirageOS unikernel has a specific purpose - composed of OCaml libraries - selected at compile time, which allows to only embed the required pieces. This reduces the attack surface drastically. At the same time, to be widely useful to multiple organisations, no configuration data must be embedded into the unikernel.</p> <p>Early MirageOS unikernels such as <a href="https://github.com/mirage/mirage-www">mirage-www</a> embed content (blog posts, ..) and TLS certificates and private keys in the binary (using <a href="https://github.com/mirage/ocaml-crunch">crunch</a>). The <a href="https://github.com/mirage/qubes-mirage-firewall">Qubes firewall</a> (read the <a href="http://roscidus.com/blog/blog/2016/01/01/a-unikernel-firewall-for-qubesos/">blog post by Thomas</a> for more information) used to include the firewall rules until <a href="https://github.com/mirage/qubes-mirage-firewall/releases/tag/v0.6">v0.6</a> in the binary, since <a href="https://github.com/mirage/qubes-mirage-firewall/tree/v0.7">v0.7</a> the rules are read dynamically from QubesDB. This is big usability improvement.</p> <p>We have several possibilities to provide configuration information in MirageOS, on the one hand via boot parameters (can be pre-filled at development time, and further refined at configuration time, but those passed at boot time take precedence). Boot parameters have a length limitation.</p> <p>Another option is to <a href="https://github.com/roburio/tlstunnel/">use a block device</a> - where the TLS reverse proxy stores the configuration, modifiable via a TCP control socket (authentication using a shared hmac secret).</p> <p>Several other unikernels, such as <a href="https://github.com/Engil/Canopy">this website</a> and <a href="https://github.com/roburio/caldav">our CalDAV server</a>, store the content in a remote git repository. The git URI and credentials (private key seed, host key fingerprint) are passed via boot parameter.</p> <p>Finally, another option that we take advantage of is to introduce a post-link step that rewrites the binary to embed configuration. The tool <a href="https://github.com/dinosaure/caravan">caravan</a> developed by Romain that does this rewrite is used by our <a href="https://github.com/roburio/openvpn/tree/robur/mirage-router">openvpn router</a> (<a href="https://builds.robur.coop/job/openvpn-router/build/latest/">binary</a>).</p> <p>In the future, some configuration information - such as monitoring system, syslog sink, IP addresses - may be done via DHCP on one of the private network interfaces - this would mean that the DHCP server has some global configuration option, and the unikernels no longer require that many boot parameters. Another option we want to investigate is where the tender shares a file as read-only memory-mapped region from the host system to the guest system - but this is tricky considering all targets above (especially virtio and muen).</p> <h2 id="behind-the-scenes-reproducible-builds">Behind the scenes: reproducible builds</h2> <p>To provide a high level of assurance and trust, if you distribute binaries in 2021, you should have a recipe how they can be reproduced in a bit-by-bit identical way. This way, different organisations can run builders and rebuilders, and a user can decide to only use a binary if it has been reproduced by multiple organisations in different jurisdictions using different physical machines - to avoid malware being embedded in the binary.</p> <p>For a reproduction to be successful, you need to collect the checksums of all sources that contributed to the built, together with other things (host system packages, environment variables, etc.). Of course, you can record the entire OS and sources as a tarball (or file system snapshot) and distribute that - but this may be suboptimal in terms of bandwidth requirements.</p> <p>With opam, we already have precise tracking which opam packages are used, and since opam 2.1 the <code>opam switch export</code> includes <a href="https://github.com/ocaml/opam/pull/4040">extra-files (patches)</a> and <a href="https://github.com/ocaml/opam/pull/4055">records the VCS version</a>. Based on this functionality, <a href="https://github.com/roburio/orb">orb</a>, an alternative command line application using the opam-client library, can be used to collect (a) the switch export, (b) host system packages, and (c) the environment variables. Only required environment variables are kept, all others are unset while conducting a build. The only required environment variables are <code>PATH</code> (sanitized with an allow list, <code>/bin</code>, <code>/sbin</code>, with <code>/usr</code>, <code>/usr/local</code>, and <code>/opt</code> prefixes), and <code>HOME</code>. To enable Debian's <code>apt</code> to install packages, <code>DEBIAN_FRONTEND</code> is set to <code>noninteractive</code>. The <code>SWITCH_PATH</code> is recorded to allow orb to use the same path during a rebuild. The <code>SOURCE_DATE_EPOCH</code> is set to enable tools that record a timestamp to use a static one. The <code>OS*</code> variables are only used for recording the host OS and version.</p> <p>The goal of reproducible builds can certainly be achieved in several ways, including to store all sources and used executables in a huge tarball (or docker container), which is preserved for rebuilders. The question of minimal trusted computing base and how such a container could be rebuild from sources in reproducible way are open.</p> <p>The opam-repository is a community repository, where packages are released to on a daily basis by a lot of OCaml developers. Package dependencies usually only use lower bounds of other packages, and the continuous integration system of the opam repository takes care that upon API changes all reverse dependencies include the right upper bounds. Using the head commit of opam-repository usually leads to a working package universe.</p> <p>For our MirageOS unikernels, we don't want to stay behind with ancient versions of libraries. That's why our automated building is done on a daily basis with the head commit of opam-repository. Since our unikernels are not part of the main opam repository (they include the configuration information which target to use, e.g. <em>hvt</em>), and we occasionally development versions of opam packages, we use <a href="https://git.robur.coop/robur/unikernel-repo">the unikernel-repo</a> as overlay.</p> <p>If no dependent package got a new release, the resulting binary has the same checksum. If any dependency was released with a newer release, this is picked up, and eventually the checksum changes.</p> <p>Each unikernel (and non-unikernel) job (e.g. <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/">dns-primary</a> outputs some artifacts:</p> <ul> <li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/bin/primary_git.hvt">binary image</a> (in <code>bin/</code>, unikernel image, OS package) </li> <li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/build-environment"><code>build-environment</code></a> containing the environment variables used for this build </li> <li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/system-packages"><code>system-packages</code></a> containing all packages installed on the host system </li> <li>the <a href="https://builds.robur.coop/job/dns-primary-git/build/latest/f/opam-switch"><code>opam-switch</code></a> that contains all opam packages, including git commit or tarball with checksum, and potentially extra patches, used for this build </li> <li>a job script and console output </li> </ul> <p>To reproduce such a built, you need to get the same operating system (OS, OS_FAMILY, OS_DISTRIBUTION, OS_VERSION in build-environment), the same set of system packages, and then you can <code>orb rebuild</code> which sets the environment variables and installs the opam packages from the opam-switch.</p> <p>You can <a href="https://builds.robur.coop/job/dns-primary-git/">browse</a> the different builds, and if there are checksum changes, you can browse to a diff between the opam switches to reason whether the checksum change was intentional (e.g. <a href="https://builds.robur.coop/compare/ba9ab091-9400-4e8d-ad37-cf1339114df8/23341f6b-cd26-48ab-9383-e71342455e81/opam-switch">here</a> the checksum of the unikernel changed when the x509 library was updated).</p> <p>The opam reproducible build infrastructure is driven by:</p> <ul> <li><a href="https://github.com/roburio/orb">orb</a> conducting reproducible builds (<a href="https://builds.robur.coop/job/orb/">packages</a>) </li> <li><a href="https://github.com/roburio/builder">builder</a> scheduling builds in contained environments (<a href="https://builds.robur.coop/job/builder/">packages</a>) </li> <li><a href="https://git.robur.coop/robur/builder-web">builder-web</a> storing builds in a database and providing a HTTP interface (<a href="https://builds.robur.coop/job/builder-web/">packages</a>) </li> </ul> <p>These tools are themselves reproducible, and built on a daily basis. The infrastructure executing the build jobs installs the most recent packages of orb and builder before conducting a build. This means that our build infrastructure is reproducible as well, and uses the latest code when it is released.</p> <h2 id="conclusion">Conclusion</h2> <p>Thanks to NGI funding we now have reproducible MirageOS binary builds available at <a href="https://builds.robur.coop">builds.robur.coop</a>. The underlying infrastructure is reproducible, available for multiple platforms (Ubuntu using docker, FreeBSD using jails), and can be easily bootstrapped from source (once you have OCaml and opam working, getting builder and orb should be easy). All components are open source software, mostly with permissive licenses.</p> <p>We also have an index over sha-256 checksum of binaries - in the case you find a running unikernel image where you forgot which exact packages were used, you can do a reverse lookup.</p> <p>We are aware that the web interface can be improved (PRs welcome). We will also work on the rebuilder setup and run some rebuilds.</p> <p>Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions.</p> urn:uuid:331831d8-6093-5dd7-9164-445afff953cbDeploying binary MirageOS unikernels2023-11-20T16:58:35-00:00hannes<p>Elliptic curves (ECDSA/ECDH) are supported in a maintainable and secure way.</p> 2021-04-23T13:33:06-00:00<h2 id="introduction">Introduction</h2> <p>Tl;DR: mirage-crypto-ec, with x509 0.12.0, and tls 0.13.0, provide fast and secure elliptic curve support in OCaml and MirageOS - using the verified <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a> stack (Coq to OCaml to executable which generates C code that is interfaced by OCaml). In x509, a long standing issue (countryName encoding), and archive (PKCS 12) format is now supported, in addition to EC keys. In tls, ECDH key exchanges are supported, and ECDSA and EdDSA certificates.</p> <h2 id="elliptic-curve-cryptography">Elliptic curve cryptography</h2> <p><a href="https://mirage.io/blog/tls-1-3-mirageos">Since May 2020</a>, our <a href="https://usenix15.nqsb.io">OCaml-TLS</a> stack supports TLS 1.3 (since tls version 0.12.0 on opam).</p> <p>TLS 1.3 requires elliptic curve cryptography - which was not available in <a href="https://github.com/mirage/mirage-crypto">mirage-crypto</a> (the maintained fork of <a href="https://github.com/mirleft/ocaml-nocrypto">nocrypto</a>).</p> <p>There are two major uses of elliptic curves: <a href="https://en.wikipedia.org/wiki/Elliptic-curve_Diffie%E2%80%93Hellman">key exchange (ECDH)</a> for establishing a shared secret over an insecure channel, and <a href="https://en.wikipedia.org/wiki/Elliptic_Curve_Digital_Signature_Algorithm">digital signature (ECDSA)</a> for authentication, integrity, and non-repudiation. (Please note that the construction of digital signatures on Edwards curves (Curve25519, Ed448) is called EdDSA instead of ECDSA.)</p> <p>Elliptic curve cryptoraphy is <a href="https://eprint.iacr.org/2020/615">vulnerable</a> <a href="https://raccoon-attack.com/">to</a> <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-5407">various</a> <a href="https://github.com/mimoo/timing_attack_ecdsa_tls">timing</a> <a href="https://minerva.crocs.fi.muni.cz/">attacks</a> - have a read of the <a href="https://blog.trailofbits.com/2020/06/11/ecdsa-handle-with-care/">overview article on ECDSA</a>. When implementing elliptic curve cryptography, it is best to avoid these known attacks. Gladly, there are some projects which address these issues by construction.</p> <p>In addition, to use the code in MirageOS, it should be boring C code: no heap allocations, only using a very small amount of C library functions -- the code needs to be compiled in an environment with <a href="https://github.com/mirage/ocaml-freestanding/tree/v0.6.4/nolibc">nolibc</a>.</p> <p>Two projects started in semantics, to solve the issue from the grounds up: <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a> and <a href="https://github.com/project-everest/hacl-star/">hacl-star</a>: their approach is to use a proof system (<a href="https://coq.inria.fr">Coq</a> or <a href="https://www.fstar-lang.org/">F*</a> to verify that the code executes in constant time, not depending on data input. Both projects provide as output of their proof systems C code.</p> <p>For our initial TLS 1.3 stack, <a href="https://github.com/pascutto/">Clément</a>, <a href="https://github.com/NathanReb/">Nathan</a> and <a href="https://github.com/emillon/">Etienne</a> developed <a href="https://github.com/mirage/fiat">fiat-p256</a> and <a href="https://github.com/mirage/hacl">hacl_x5519</a>. Both were one-shot interfaces for a narrow use case (ECDH for NIST P-256 and X25519), worked well for their purpose, and allowed to gather some experience from the development side.</p> <h3 id="changed-requirements">Changed requirements</h3> <p>Revisiting our cryptography stack with the elliptic curve perspective had several reasons, on the one side the customer project <a href="https://www.nitrokey.com/products/nethsm">NetHSM</a> asked for feasibility of ECDSA/EdDSA for various elliptic curves, on the other side <a href="https://github.com/mirage/ocaml-dns/pull/251">DNSSec</a> uses elliptic curve cryptography (ECDSA), and also <a href="https://www.wireguard.com/">wireguard</a> relies on elliptic curve cryptography. The number of X.509 certificates using elliptic curves is increasing, and we don't want to leave our TLS stack in a state where it can barely talk to a growing number of services on the Internet.</p> <p>Looking at <a href="https://github.com/project-everest/hacl-star/"><em>hacl-star</em></a>, their <a href="https://hacl-star.github.io/Supported.html">support</a> is limited to P-256 and Curve25519, any new curve requires writing F*. Another issue with hacl-star is C code quality: their C code does neither <a href="https://github.com/mirage/hacl/issues/46">compile with older C compilers (found on Oracle Linux 7 / CentOS 7)</a>, nor when enabling all warnings (&gt; 150 are generated). We consider the C compiler as useful resource to figure out undefined behaviour (and other problems), and when shipping C code we ensure that it compiles with <code>-Wall -Wextra -Wpedantic --std=c99 -Werror</code>. The hacl project <a href="https://github.com/mirage/hacl/tree/master/src/kremlin">ships</a> a bunch of header files and helper functions to work on all platforms, which is a clunky <code>ifdef</code> desert. The hacl approach is to generate a whole algorithm solution: from arithmetic primitives, group operations, up to cryptographic protocol - everything included.</p> <p>In contrast, <a href="https://github.com/mit-plv/fiat-crypto/"><em>fiat-crypto</em></a> is a Coq development, which as part of compilation (proof verification) generates executables (via OCaml code extraction from Coq). These executables are used to generate modular arithmetic (as C code) given a curve description. The <a href="https://github.com/mirage/mirage-crypto/tree/main/ec/native">generated C code</a> is highly portable, independent of platform (word size is taken as input) - it only requires a <code>&lt;stdint.h&gt;</code>, and compiles with all warnings enabled (once <a href="https://github.com/mit-plv/fiat-crypto/pull/906">a minor PR</a> got merged). Supporting a new curve is simple: generate the arithmetic code using fiat-crypto with the new curve description. The downside is that group operations and protocol needs to implemented elsewhere (and is not part of the proven code) - gladly this is pretty straightforward to do, especially in high-level languages.</p> <h3 id="working-with-fiat-crypto">Working with fiat-crypto</h3> <p>As mentioned, our initial <a href="https://github.com/mirage/fiat">fiat-p256</a> binding provided ECDH for the NIST P-256 curve. Also, BoringSSL uses fiat-crypto for ECDH, and developed the code for group operations and cryptographic protocol on top of it.</p> <p>The work needed was (a) ECDSA support and (b) supporting more curves (let's focus on NIST curves). For ECDSA, the algorithm requires modular arithmetics in the field of the group order (in addition to the prime). We generate these primitives with fiat-crypto (named <code>npYYY_AA</code>) - that required <a href="https://github.com/mit-plv/fiat-crypto/commit/e31a36d5f1b20134e67ccc5339d88f0ff3cb0f86">a small fix in decoding hex</a>. Fiat-crypto also provides inversion <a href="https://github.com/mit-plv/fiat-crypto/pull/670">since late October 2020</a>, <a href="https://eprint.iacr.org/2021/549">paper</a> - which allowed to reduce our code base taken from BoringSSL. The ECDSA protocol was easy to implement in OCaml using the generated arithmetics.</p> <p>Addressing the issue of more curves was also easy to achieve, the C code (group operations) are macros that are instantiated for each curve - the OCaml code are functors that are applied with each curve description.</p> <p>Thanks to the test vectors (as structured data) from <a href="https://github.com/google/wycheproof/">wycheproof</a> (and again thanks to Etienne, Nathan, and Clément for their OCaml code decodin them), I feel confident that our elliptic curve code works as desired.</p> <p>What was left is X25519 and Ed25519 - dropping the hacl dependency entirely felt appealing (less C code to maintain from fewer projects). This turned out to require more C code, which we took from BoringSSL. It may be desirable to reduce the imported C code, or to wait until a project on top of fiat-crypto which provides proven cryptographic protocols is in a usable state.</p> <p>To avoid performance degradation, I distilled some <a href="https://github.com/mirage/mirage-crypto/pull/107#issuecomment-799701703">X25519 benchmarks</a>, turns out the fiat-crypto and hacl performance is very similar.</p> <h3 id="achievements">Achievements</h3> <p>The new opam package <a href="https://mirage.github.io/mirage-crypto/doc/mirage-crypto-ec/Mirage_crypto_ec/index.html">mirage-crypto-ec</a> is released, which includes the C code generated by fiat-crypto (including <a href="https://github.com/mit-plv/fiat-crypto/pull/670">inversion</a>), <a href="https://github.com/mirage/mirage-crypto/blob/main/ec/native/point_operations.h">point operations</a> from BoringSSL, and some <a href="https://github.com/mirage/mirage-crypto/blob/main/ec/mirage_crypto_ec.ml">OCaml code</a> for invoking these functions and doing bounds checks, and whether points are on the curve. The OCaml code are some functors that take the curve description (consisting of parameters, C function names, byte length of value) and provide Diffie-Hellman (Dh) and digital signature algorithm (Dsa) modules. The nonce for ECDSA is computed deterministically, as suggested by <a href="https://tools.ietf.org/html/rfc6979">RFC 6979</a>, to avoid private key leakage.</p> <p>The code has been developed in <a href="https://github.com/mirage/mirage-crypto/pull/101">NIST curves</a>, <a href="https://github.com/mirage/mirage-crypto/pull/106">removing blinding</a> (since we use operations that are verified to be constant-time), <a href="https://github.com/mirage/mirage-crypto/pull/108">added missing length checks</a> (reported by <a href="https://github.com/greg42">Greg</a>), <a href="https://github.com/mirage/mirage-crypto/pull/107">curve25519</a>, <a href="https://github.com/mirage/mirage-crypto/pull/117">a fix for signatures that do not span the entire byte size (discovered while adapting X.509)</a>, <a href="https://github.com/mirage/mirage-crypto/pull/118">fix X25519 when the input has offset &lt;&gt; 0</a>. It works on x86 and arm, both 32 and 64 bit (checked by CI). The development was partially sponsored by Nitrokey.</p> <p>What is left to do, apart from further security reviews, is <a href="https://github.com/mirage/mirage-crypto/issues/109">performance improvements</a>, <a href="https://github.com/mirage/mirage-crypto/issues/112">Ed448/X448 support</a>, and <a href="https://github.com/mirage/mirage-crypto/issues/105">investigating deterministic k for P521</a>. Pull requests are welcome.</p> <p>When you use the code, and encounter any issues, please <a href="https://github.com/mirage/mirage-crypto/issues">report them</a>.</p> <h2 id="layer-up---x.509-now-with-ecdsa-eddsa-and-pkcs-12-support-and-a-long-standing-issue-fixed">Layer up - X.509 now with ECDSA / EdDSA and PKCS 12 support, and a long-standing issue fixed</h2> <p>With the sign and verify primitives, the next step is to interoperate with other tools that generate and use these public and private keys. This consists of serialisation to and deserialisation from common data formats (ASN.1 DER and PEM encoding), and support for handling X.509 certificates with elliptic curve keys. Since X.509 0.12.0, it supports EC private and public keys, including certificate validation and issuance.</p> <p>Releasing X.509 also included to go through the issue tracker and attempt to solve the existing issues. This time, the <a href="https://github.com/mirleft/ocaml-x509/issues/69">&quot;country name is encoded as UTF8String, while RFC demands PrintableString&quot;</a> filed more than 5 years ago by <a href="https://github.com/reynir">Reynir</a>, re-reported by <a href="https://github.com/paurkedal">Petter</a> in early 2017, and again by <a href="https://github.com/NightBlues">Vadim</a> in late 2020, <a href="https://github.com/mirleft/ocaml-x509/pull/140">was fixed by Vadim</a>.</p> <p>Another long-standing pull request was support for <a href="https://tools.ietf.org/html/rfc7292">PKCS 12</a>, the archive format for certificate and private key bundles. This has <a href="https://github.com/mirleft/ocaml-x509/pull/114">been developed and merged</a>. PKCS 12 is a widely used and old format (e.g. when importing / exporting cryptographic material in your browser, used by OpenVPN, ...). Its specification uses RC2 and 3DES (see <a href="https://unmitigatedrisk.com/?p=654">this nice article</a>), which are the default algorithms used by <code>openssl pkcs12</code>.</p> <h2 id="one-more-layer-up---tls">One more layer up - TLS</h2> <p>In TLS we are finally able to use ECDSA (and EdDSA) certificates and private keys, this resulted in slightly more complex configuration - the constraints between supported groups, signature algorithms, ciphersuite, and certificates are intricate:</p> <p>The ciphersuite (in TLS before 1.3) specifies which key exchange mechanism to use, but also which signature algorithm to use (RSA/ECDSA). The supported groups client hello extension specifies which elliptic curves are supported by the client. The signature algorithm hello extension (TLS 1.2 and above) specifies the signature algorithm. In the end, at load time the TLS configuration is validated and groups, ciphersuites, and signature algorithms are condensed depending on configured server certificates. At session initiation time, once the client reports what it supports, these parameters are further cut down to eventually find some suitable cryptographic parameters for this session.</p> <p>From the user perspective, earlier the certificate bundle and private key was a pair of <code>X509.Certificate.t list</code> and <code>Mirage_crypto_pk.Rsa.priv</code>, now the second part is a <code>X509.Private_key.t</code> - all provided constructors have been updates (notably <code>X509_lwt.private_of_pems</code> and <code>Tls_mirage.X509.certificate</code>).</p> <h2 id="finally-conduit-and-mirage">Finally, conduit and mirage</h2> <p>Thanks to <a href="https://github.com/dinosaure">Romain</a>, conduit* 4.0.0 was released which supports the modified API of X.509 and TLS. Romain also developed patches and released mirage 3.10.3 which supports the above mentioned work.</p> <h2 id="conclusion">Conclusion</h2> <p>Elliptic curve cryptography is now available in OCaml using verified cryptographic primitives from the fiat-crypto project - <code>opam install mirage-crypto-ec</code>. X.509 since 0.12.0 and TLS since 0.13.0 and MirageOS since 3.10.3 support this new development which gives rise to smaller EC keys. Our old bindings, fiat-p256 and hacl_x25519 have been archived and will no longer be maintained.</p> <p>Thanks to everyone involved on this journey: reporting issues, sponsoring parts of the work, helping with integration, developing initial prototypes, and keep motivating me to continue this until the release is done.</p> <p>In the future, it may be possible to remove zarith and gmp from the dependency chain, and provide EC-only TLS servers and clients for MirageOS. The benefit will be much less C code (libgmp-freestanding.a is 1.5MB in size) in our trusted code base.</p> <p>Another potential project that is very close now is a certificate authority developed in MirageOS - now that EC keys, PKCS 12, revocation lists, ... are implemented.</p> <h2 id="footer">Footer</h2> <p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> urn:uuid:16427713-5da1-50cd-b17c-ca5b5cca431dCryptography updates in OCaml and MirageOS2021-11-19T18:04:52-00:00hannes<p>Home office, MirageOS unikernels, 2020 recap, 2021 tbd</p> 2021-01-25T12:45:54-00:00<h2 id="introduction">Introduction</h2> <p>2020 was an intense year. I hope you're healthy and keep being healthy. I am privileged (as lots of software engineers and academics are) to be able to work from home during the pandemic. Let's not forget people in less privileged situations, and let’s try to give them as much practical, psychological and financial support as we can these days. And as much joy as possible to everyone around :)</p> <p>I cancelled the autumn MirageOS retreat due to the pandemic. Instead I collected donations for our hosts in Marrakech - they were very happy to receive our financial support, since they had a difficult year, since their income is based on tourism. I hope that in autumn 2021 we'll have an on-site retreat again.</p> <p>For 2021, we (at <a href="https://robur.coop">robur</a>) got a grant from the EU (via <a href="https://pointer.ngi.eu">NGI pointer</a>) for &quot;Deploying MirageOS&quot; (more details below), and another grant from <a href="https://ocaml-sf.org">OCaml software foundation</a> for securing the opam supply chain (using <a href="https://github.com/hannesm/conex">conex</a>). Some long-awaited releases for MirageOS libraries, namely a <a href="https://discuss.ocaml.org/t/ann-first-release-of-awa-ssh">ssh implementation</a> and a rewrite of our <a href="https://discuss.ocaml.org/t/ann-release-of-ocaml-git-v3-0-duff-encore-decompress-etc/">git implementation</a> have already been published.</p> <p>With my MirageOS view, 2020 was a pretty successful year, where we managed to add more features, fixed lots of bugs, and paved the road ahead. I want to thank <a href="https://ocamllabs.io/">OCamlLabs</a> for funding work on MirageOS maintenance.</p> <h2 id="recap-2020">Recap 2020</h2> <p>Here is a very subjective random collection of accomplishments in 2020, where I was involved with some degree.</p> <h3 id="nethsm">NetHSM</h3> <p><a href="https://www.nitrokey.com/products/nethsm">NetHSM</a> is a hardware security module in software. It is a product that uses MirageOS for security, and is based on the <a href="https://muen.sk">muen</a> separation kernel. We at <a href="https://robur.coop">robur</a> were heavily involved in this product. It already has been security audited by an external team. You can pre-order it from Nitrokey.</p> <h3 id="tls-1.3">TLS 1.3</h3> <p>Dating back to 2016, at the <a href="https://www.ndss-symposium.org/ndss2016/tron-workshop-programme/">TRON</a> (TLS 1.3 Ready or NOt), we developed a first draft of a 1.3 implementation of <a href="https://github.com/mirleft/ocaml-tls">OCaml-TLS</a>. Finally in May 2020 we got our act together, including ECC (ECDH P256 from <a href="https://github.com/mit-plv/fiat-crypto/">fiat-crypto</a>, X25519 from <a href="https://project-everest.github.io/">hacl</a>) and testing with <a href="https://github.com/tlsfuzzer/tlsfuzzer">tlsfuzzer</a>, and release tls 0.12.0 with TLS 1.3 support. Later we added <a href="https://github.com/mirleft/ocaml-tls/pull/414">ECC ciphersuites to TLS version 1.2</a>, implemented <a href="https://github.com/mirleft/ocaml-tls/pull/414">ChaCha20/Poly1305</a>, and fixed an <a href="https://github.com/mirleft/ocaml-tls/pull/424">interoperability issue with Go's implementation</a>.</p> <p><a href="https://github.com/mirage/mirage-crypto">Mirage-crypto</a> provides the underlying cryptographic primitives, initially released in March 2020 as a fork of <a href="https://github.com/mirleft/ocaml-nocrypto">nocrypto</a> -- huge thanks to <a href="https://github.com/pqwy">pqwy</a> for his great work. Mirage-crypto detects <a href="https://github.com/mirage/mirage-crypto/pull/53">CPU features at runtime</a> (thanks to <a href="https://github.com/Julow">Julow</a>) (<a href="https://github.com/mirage/mirage-crypto/pull/96">bugfix for bswap</a>), using constant time modular exponentation (powm_sec) and hardens against Lenstra's CRT attack, supports <a href="https://github.com/mirage/mirage-crypto/pull/39">compilation on Windows</a> (thanks to <a href="https://github.com/avsm">avsm</a>), <a href="https://github.com/mirage/mirage-crypto/pull/90">async entropy harvesting</a> (thanks to <a href="https://github.com/seliopou">seliopou</a>), <a href="https://github.com/mirage/mirage-crypto/pull/65">32 bit support</a>, <a href="https://github.com/mirage/mirage-crypto/pull/72">chacha20/poly1305</a> (thanks to <a href="https://github.com/abeaumont">abeaumont</a>), <a href="https://github.com/mirage/mirage-crypto/pull/84">cross-compilation</a> (thanks to <a href="https://github.com/EduardoRFS">EduardoRFS</a>) and <a href="https://github.com/mirage/mirage-crypto/pull/78">various</a> <a href="https://github.com/mirage/mirage-crypto/pull/81">bug</a> <a href="https://github.com/mirage/mirage-crypto/pull/83">fixes</a>, even <a href="https://github.com/mirage/mirage-crypto/pull/95">memory leak</a> (thanks to <a href="https://github.com/talex5">talex5</a> for reporting several of these issues), and <a href="https://github.com/mirage/mirage-crypto/pull/99">RSA</a> <a href="https://github.com/mirage/mirage-crypto/pull/100">interoperability</a> (thanks to <a href="https://github.com/psafont">psafont</a> for investigation and <a href="https://github.com/mattjbray">mattjbray</a> for reporting). This library feels very mature now - being used by multiple stakeholders, and lots of issues have been fixed in 2020.</p> <h3 id="qubes-firewall">Qubes Firewall</h3> <p>The <a href="https://github.com/mirage/qubes-mirage-firewall/">MirageOS based Qubes firewall</a> is the most widely used MirageOS unikernel. And it got major updates: in May <a href="https://github.com/linse">Steffi</a> <a href="https://groups.google.com/g/qubes-users/c/Xzplmkjwa5Y">announced</a> her and <a href="https://github.com/yomimono">Mindy's</a> work on improving it for Qubes 4.0 - including <a href="https://www.qubes-os.org/doc/vm-interface/#firewall-rules-in-4x">dynamic firewall rules via QubesDB</a>. Thanks to <a href="https://prototypefund.de/project/portable-firewall-fuer-qubesos/">prototypefund</a> for sponsoring.</p> <p>In October 2020, we released <a href="https://mirage.io/blog/announcing-mirage-39-release">Mirage 3.9</a> with PVH virtualization mode (thanks to <a href="https://github.com/mato">mato</a>). There's still a <a href="https://github.com/mirage/qubes-mirage-firewall/issues/120">memory leak</a> to be investigated and fixed.</p> <h3 id="ipv6">IPv6</h3> <p>In December, with <a href="https://mirage.io/blog/announcing-mirage-310-release">Mirage 3.10</a> we got the IPv6 code up and running. Now MirageOS unikernels have a dual stack available, besides IPv4-only and IPv6-only network stacks. Thanks to <a href="https://github.com/nojb">nojb</a> for the initial code and <a href="https://github.com/MagnusS">MagnusS</a>.</p> <p>Turns out this blog, but also robur services, are now available via IPv6 :)</p> <h3 id="albatross">Albatross</h3> <p>Also in December, I pushed an initial release of <a href="https://github.com/roburio/albatross">albatross</a>, a unikernel orchestration system with remote access. <em>Deploy your unikernel via a TLS handshake -- the unikernel image is embedded in the TLS client certificates.</em></p> <p>Thanks to <a href="https://github.com/reynir">reynir</a> for statistics support on Linux and improvements of the systemd service scripts. Also thanks to <a href="https://github.com/cfcs">cfcs</a> for the initial Linux port.</p> <h3 id="ca-certs">CA certs</h3> <p>For several years I postponed the problem of how to actually use the operating system trust anchors for OCaml-TLS connections. Thanks to <a href="https://github.com/emillon">emillon</a> for initial code, there are now <a href="https://github.com/mirage/ca-certs">ca-certs</a> and <a href="https://github.com/mirage/ca-certs-nss">ca-certs-nss</a> opam packages (see <a href="https://discuss.ocaml.org/t/ann-ca-certs-and-ca-certs-nss">release announcement</a>) which fills this gap.</p> <h2 id="unikernels">Unikernels</h2> <p>I developed several useful unikernels in 2020, and also pushed <a href="https://mirage.io/wiki/gallery">a unikernel gallery</a> to the Mirage website:</p> <h3 id="traceroute-in-mirageos">Traceroute in MirageOS</h3> <p>I already wrote about <a href="/Posts/Traceroute">traceroute</a> which traces the routing to a given remote host.</p> <h3 id="unipi---static-website-hosting">Unipi - static website hosting</h3> <p><a href="https://github.com/roburio/unipi">Unipi</a> is a static site webserver which retrieves the content from a remote git repository. Let's encrypt certificate provisioning and dynamic updates via a webhook to be executed for every push.</p> <h4 id="tlstunnel---tls-demultiplexing">TLSTunnel - TLS demultiplexing</h4> <p>The physical machine this blog and other robur infrastructure runs on has been relocated from Sweden to Germany mid-December. Thanks to UPS! Fewer IPv4 addresses are available in the new data center, which motivated me to develop <a href="https://github.com/roburio/tlstunnel">tlstunnel</a>.</p> <p>The new behaviour is as follows (see the <code>monitoring</code> branch):</p> <ul> <li>listener on TCP port 80 which replies with a permanent redirect to <code>https</code> </li> <li>listener on TCP port 443 which forwards to a backend host if the requested server name is configured </li> <li>its configuration is stored on a block device, and can be dynamically changed (with a custom protocol authenticated with a HMAC) </li> <li>it is setup to hold a wildcard TLS certificate and in DNS a wildcard entry is pointing to it </li> <li>setting up a new service is very straightforward: only the new name needs to be registered with tlstunnel together with the TCP backend, and everything will just work </li> </ul> <h2 id="section">2021</h2> <p>The year started with a release of <a href="https://discuss.ocaml.org/t/ann-first-release-of-awa-ssh">awa</a>, a SSH implementation in OCaml (thanks to <a href="https://github.com/haesbaert">haesbaert</a> for initial code). This was followed by a <a href="https://discuss.ocaml.org/t/ann-release-of-ocaml-git-v3-0-duff-encore-decompress-etc/">git 3.0 release</a> (thanks to <a href="https://github.com/dinosaure">dinosaure</a>).</p> <h3 id="deploying-mirageos---ngi-pointer">Deploying MirageOS - NGI Pointer</h3> <p>For 2021 we at robur received funding from the EU (via <a href="https://pointer.ngi.eu/">NGI pointer</a>) for &quot;Deploying MirageOS&quot;, which boils down into three parts:</p> <ul> <li>reproducible binary releases of MirageOS unikernels, </li> <li>monitoring (and other devops features: profiling) and integration into existing infrastructure, </li> <li>and further documentation and advertisement. </li> </ul> <p>Of course this will all be available open source. Please get in touch via eMail (team aT robur dot coop) if you're eager to integrate MirageOS unikernels into your infrastructure.</p> <p>We discovered at an initial meeting with an infrastructure provider that a DNS resolver is of interest - even more now that dnsmasq suffered from <a href="https://www.jsof-tech.com/wp-content/uploads/2021/01/DNSpooq_Technical-Whitepaper.pdf">dnspooq</a>. We are already working on an <a href="https://github.com/mirage/ocaml-dns/pull/251">implementation of DNSSec</a>.</p> <p>MirageOS unikernels are binary reproducible, and <a href="https://github.com/rjbou/orb/pull/1">infrastructure tools are available</a>. We are working hard on a web interface (and REST API - think of it as &quot;Docker Hub for MirageOS unikernels&quot;), and more tooling to verify reproducibility.</p> <h3 id="conex---securing-the-supply-chain">Conex - securing the supply chain</h3> <p>Another funding from the <a href="http://ocaml-sf.org/">OCSF</a> is to continue development and deploy <a href="https://github.com/hannesm/conex">conex</a> - to bring trust into opam-repository. This is a great combination with the reproducible build efforts, and will bring much more trust into retrieving OCaml packages and using MirageOS unikernels.</p> <h3 id="mirageos-4.0">MirageOS 4.0</h3> <p>Mirage so far still uses ocamlbuild and ocamlfind for compiling the virtual machine binary. But the switch to dune is <a href="https://github.com/mirage/mirage/issues/1195">close</a>, a lot of effort has been done. This will make the developer experience of MirageOS much more smooth, with a per-unikernel monorepo workflow where you can push your changes to the individual libraries.</p> <h2 id="footer">Footer</h2> <p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> urn:uuid:bc7675a5-47d0-5ce1-970c-01ed07fdf404The road ahead for MirageOS in 20212021-11-19T18:04:52-00:00hannes<p>A MirageOS unikernel which traces the path between itself and a remote host.</p> 2020-06-24T10:38:10-00:00<h2 id="traceroute">Traceroute</h2> <p>Is a diagnostic utility which displays the route and measures transit delays of packets across an Internet protocol (IP) network.</p> <pre><code class="language-bash">$ doas solo5-hvt --net:service=tap0 -- traceroute.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 --host=198.167.222.207 | ___| __| _ \ | _ \ __ \ \__ \ ( | | ( | ) | ____/\___/ _|\___/____/ Solo5: Bindings version v0.6.5 Solo5: Memory map: 512 MB addressable: Solo5: reserved @ (0x0 - 0xfffff) Solo5: text @ (0x100000 - 0x212fff) Solo5: rodata @ (0x213000 - 0x24bfff) Solo5: data @ (0x24c000 - 0x317fff) Solo5: heap &gt;= 0x318000 &lt; stack &lt; 0x20000000 2020-06-22 15:41:25 -00:00: INF [netif] Plugging into service with mac 76:9b:36:e0:e5:74 mtu 1500 2020-06-22 15:41:25 -00:00: INF [ethernet] Connected Ethernet interface 76:9b:36:e0:e5:74 2020-06-22 15:41:25 -00:00: INF [ARP] Sending gratuitous ARP for 10.0.42.2 (76:9b:36:e0:e5:74) 2020-06-22 15:41:25 -00:00: INF [udp] UDP interface connected on 10.0.42.2 2020-06-22 15:41:25 -00:00: INF [application] 1 10.0.42.1 351us 2020-06-22 15:41:25 -00:00: INF [application] 2 192.168.42.1 1.417ms 2020-06-22 15:41:25 -00:00: INF [application] 3 192.168.178.1 1.921ms 2020-06-22 15:41:25 -00:00: INF [application] 4 88.72.96.1 16.716ms 2020-06-22 15:41:26 -00:00: INF [application] 5 * 2020-06-22 15:41:27 -00:00: INF [application] 6 92.79.215.112 16.794ms 2020-06-22 15:41:27 -00:00: INF [application] 7 145.254.2.215 21.305ms 2020-06-22 15:41:27 -00:00: INF [application] 8 145.254.2.217 22.05ms 2020-06-22 15:41:27 -00:00: INF [application] 9 195.89.99.1 21.088ms 2020-06-22 15:41:27 -00:00: INF [application] 10 62.115.9.133 20.105ms 2020-06-22 15:41:27 -00:00: INF [application] 11 213.155.135.82 30.861ms 2020-06-22 15:41:27 -00:00: INF [application] 12 80.91.246.200 30.716ms 2020-06-22 15:41:27 -00:00: INF [application] 13 80.91.253.163 28.315ms 2020-06-22 15:41:27 -00:00: INF [application] 14 62.115.145.27 30.436ms 2020-06-22 15:41:27 -00:00: INF [application] 15 80.67.4.239 42.826ms 2020-06-22 15:41:27 -00:00: INF [application] 16 80.67.10.147 47.213ms 2020-06-22 15:41:27 -00:00: INF [application] 17 198.167.222.207 48.598ms Solo5: solo5_exit(0) called </code></pre> <p>This means with a traceroute utility you can investigate which route is taken to a destination host, and what the round trip time(s) on the path are. The sample output above is taken from a virtual machine on my laptop to the remote host 198.167.222.207. You can see there are 17 hops between us, with the first being my laptop with a tiny round trip time of 351us, the second and third are using private IP addresses, and are my home network. The round trip time of the fourth hop is much higher, this is the first hop on the other side of my DSL modem. You can see various hops on the public Internet: the packets pass from my Internet provider's backbone across some exchange points to the destination Internet provider somewhere in Sweden.</p> <p>The implementation of traceroute relies mainly on the time-to-live (ttl) field (in IPv6 lingua it is &quot;hop limit&quot;) of IP packets, which is meant to avoid route cycles that would infinitely forward IP packets in circles. Every router, when forwarding an IP packet, first checks that the ttl field is greater than zero, and then forwards the IP packet where the ttl is decreased by one. If the ttl field is zero, instead of forwarding, an ICMP time exceeded packet is sent back to the source.</p> <p>Traceroute works by exploiting this mechanism: a series of IP packets with increasing ttls is sent to the destination. Since upfront the length of the path is unknown, it is a reactive system: first send an IP packet with a ttl of one, if a ICMP time exceeded packet is returned, send an IP packet with a ttl of two, etc. -- until an ICMP packet of type destination unreachable is received. Since some hosts do not reply with a time exceeded message, it is crucial for not getting stuck to use a timeout for each packet: when the timeout is reached, an IP packet with an increased ttl is sent and an unknown for the ttl is printed (see the fifth hop in the example above).</p> <p>The packets send out are conventionally UDP packets without payload. From a development perspective, one question is how to correlate the ICMP packet with the sent UDP packet. Conveniently, ICMP packets contain the IP header and the first eight bytes of the next protocol - the UDP header containing source port, destination port, checksum, and payload length (each fields of size two bytes). This means when we record the outgoing ports together with the sent timestamp, and correlate the later received ICMP packet to the sent packet. Great.</p> <p>But as a functional programmer, let's figure whether we can abolish the (globally shared) state. Since the ICMP packet contains the original IP header and the first eight bytes of the UDP header, this is where we will embed data. As described above, the data is the sent timestamp and the value of the ttl field. For the latter, we can arbitrarily restrict it to 31 (5 bits). For the timestamp, it is mainly a question about precision and maximum expected round trip time. Taking the source and destination port are 32 bits, using 5 for ttl, remaining are 27 bits (an unsigned value up to 134217727). Looking at the decimal representation, 1 second is likely too small, 13 seconds are sufficient for the round trip time measurement. This implies our precision is 100ns, by counting the digits.</p> <p>Finally to the code. First we need forth and back conversions between ports and ttl, timestamp:</p> <pre><code class="language-OCaml">(* takes a time-to-live (int) and timestamp (int64, nanoseconda), encodes them into 16 bit source port and 16 bit destination port: - the timestamp precision is 100ns (thus, it is divided by 100) - use the bits 27-11 of the timestamp as source port - use the bits 11-0 as destination port, and 5 bits of the ttl *) let ports_of_ttl_ts ttl ts = let ts = Int64.div ts 100L in let src_port = 0xffff land (Int64.(to_int (shift_right ts 11))) and dst_port = 0xffe0 land (Int64.(to_int (shift_left ts 5))) lor (0x001f land ttl) in src_port, dst_port (* inverse operation of ports_of_ttl_ts for the range (src_port and dst_port are 16 bit values) *) let ttl_ts_of_ports src_port dst_port = let ttl = 0x001f land dst_port in let ts = let low = Int64.of_int (dst_port lsr 5) and high = Int64.(shift_left (of_int src_port) 11) in Int64.add low high in let ts = Int64.mul ts 100L in ttl, ts </code></pre> <p>They should be inverse over the range of valid input: ports are 16 bit numbers, ttl expected to be at most 31, ts a int64 expressed in nanoseconds.</p> <p>Related is the function to print out one hop and round trip measurement:</p> <pre><code class="language-OCaml">(* write a log line of a hop: the number, IP address, and round trip time *) let log_one now ttl sent ip = let now = Int64.(mul (logand (div now 100L) 0x7FFFFFFL) 100L) in let duration = Mtime.Span.of_uint64_ns (Int64.sub now sent) in Logs.info (fun m -&gt; m &quot;%2d %a %a&quot; ttl Ipaddr.V4.pp ip Mtime.Span.pp duration) </code></pre> <p>The most logic is when a ICMP packet is received:</p> <pre><code class="language-OCaml">module Icmp = struct type t = { send : int -&gt; unit Lwt.t ; log : int -&gt; int64 -&gt; Ipaddr.V4.t -&gt; unit ; task_done : unit Lwt.u ; } let connect send log task_done = let t = { send ; log ; task_done } in Lwt.return t (* This is called for each received ICMP packet. *) let input t ~src ~dst buf = let open Icmpv4_packet in (* Decode the received buffer (the IP header has been cut off already). *) match Unmarshal.of_cstruct buf with | Error s -&gt; Lwt.fail_with (Fmt.strf &quot;ICMP: error parsing message from %a: %s&quot; Ipaddr.V4.pp src s) | Ok (message, payload) -&gt; let open Icmpv4_wire in (* There are two interesting cases: Time exceeded (-&gt; send next packet), and Destination (port) unreachable (-&gt; we reached the final host and can exit) *) match message.ty with | Time_exceeded -&gt; (* Decode the payload, which should be an IPv4 header and a protocol header *) begin match Ipv4_packet.Unmarshal.header_of_cstruct payload with | Ok (pkt, off) when (* Ensure this packet matches our sent packet: the protocol is UDP and the destination address is the host we're tracing *) pkt.Ipv4_packet.proto = Ipv4_packet.Marshal.protocol_to_int `UDP &amp;&amp; Ipaddr.V4.compare pkt.Ipv4_packet.dst (Key_gen.host ()) = 0 -&gt; let src_port = Cstruct.BE.get_uint16 payload off and dst_port = Cstruct.BE.get_uint16 payload (off + 2) in (* Retrieve ttl and sent timestamp, encoded in the source port and destination port of the UDP packet we sent, and received back as ICMP payload. *) let ttl, sent = ttl_ts_of_ports src_port dst_port in (* Log this hop. *) t.log ttl sent src; (* Sent out the next UDP packet with an increased ttl. *) let ttl' = succ ttl in Logs.debug (fun m -&gt; m &quot;ICMP time exceeded from %a to %a, now sending with ttl %d&quot; Ipaddr.V4.pp src Ipaddr.V4.pp dst ttl'); t.send ttl' | Ok (pkt, _) -&gt; (* Some stray ICMP packet. *) Logs.debug (fun m -&gt; m &quot;unsolicited time exceeded from %a to %a (proto %X dst %a)&quot; Ipaddr.V4.pp src Ipaddr.V4.pp dst pkt.Ipv4_packet.proto Ipaddr.V4.pp pkt.Ipv4_packet.dst); Lwt.return_unit | Error e -&gt; (* Decoding error. *) Logs.warn (fun m -&gt; m &quot;couldn't parse ICMP time exceeded payload (IPv4) (%a -&gt; %a) %s&quot; Ipaddr.V4.pp src Ipaddr.V4.pp dst e); Lwt.return_unit end | Destination_unreachable when Ipaddr.V4.compare src (Key_gen.host ()) = 0 -&gt; (* We reached the final host, and the destination port was not listened to *) begin match Ipv4_packet.Unmarshal.header_of_cstruct payload with | Ok (_, off) -&gt; let src_port = Cstruct.BE.get_uint16 payload off and dst_port = Cstruct.BE.get_uint16 payload (off + 2) in (* Retrieve ttl and sent timestamp. *) let ttl, sent = ttl_ts_of_ports src_port dst_port in (* Log the final hop. *) t.log ttl sent src; (* Wakeup the waiter task to exit the unikernel. *) Lwt.wakeup t.task_done (); Lwt.return_unit | Error e -&gt; (* Decoding error. *) Logs.warn (fun m -&gt; m &quot;couldn't parse ICMP unreachable payload (IPv4) (%a -&gt; %a) %s&quot; Ipaddr.V4.pp src Ipaddr.V4.pp dst e); Lwt.return_unit end | ty -&gt; Logs.debug (fun m -&gt; m &quot;ICMP unknown ty %s from %a to %a: %a&quot; (ty_to_string ty) Ipaddr.V4.pp src Ipaddr.V4.pp dst Cstruct.hexdump_pp payload); Lwt.return_unit end </code></pre> <p>Now, the remaining main unikernel is the module <code>Main</code>:</p> <pre><code class="language-OCaml">module Main (R : Mirage_random.S) (M : Mirage_clock.MCLOCK) (Time : Mirage_time.S) (N : Mirage_net.S) = struct module ETH = Ethernet.Make(N) module ARP = Arp.Make(ETH)(Time) module IPV4 = Static_ipv4.Make(R)(M)(ETH)(ARP) module UDP = Udp.Make(IPV4)(R) (* Global mutable state: the timeout task for a sent packet. *) let to_cancel = ref None (* Send a single packet with the given time to live. *) let rec send_udp udp ttl = (* This is called by the ICMP handler which successfully received a time exceeded, thus we cancel the timeout task. *) (match !to_cancel with | None -&gt; () | Some t -&gt; Lwt.cancel t ; to_cancel := None); (* Our hop limit is 31 - 5 bit - should be sufficient for most networks. *) if ttl &gt; 31 then Lwt.return_unit else (* Create a timeout task which: - sleeps for --timeout interval - logs an unknown hop - sends another packet with increased ttl *) let cancel = Lwt.catch (fun () -&gt; Time.sleep_ns (Duration.of_ms (Key_gen.timeout ())) &gt;&gt;= fun () -&gt; Logs.info (fun m -&gt; m &quot;%2d *&quot; ttl); send_udp udp (succ ttl)) (function Lwt.Canceled -&gt; Lwt.return_unit | exc -&gt; Lwt.fail exc) in (* Assign this timeout task. *) to_cancel := Some cancel; (* Figure out which source and destination port to use, based on ttl and current timestamp. *) let src_port, dst_port = ports_of_ttl_ts ttl (M.elapsed_ns ()) in (* Send packet via UDP. *) UDP.write ~ttl ~src_port ~dst:(Key_gen.host ()) ~dst_port udp Cstruct.empty &gt;&gt;= function | Ok () -&gt; Lwt.return_unit | Error e -&gt; Lwt.fail_with (Fmt.strf &quot;while sending udp frame %a&quot; UDP.pp_error e) (* The main unikernel entry point. *) let start () () () net = let cidr = Key_gen.ipv4 () and gateway = Key_gen.ipv4_gateway () in let log_one = fun port ip -&gt; log_one (M.elapsed_ns ()) port ip (* Create a task to wait on and a waiter to wakeup. *) and t, w = Lwt.task () in (* Setup network stack: ethernet, ARP, IPv4, UDP, and ICMP. *) ETH.connect net &gt;&gt;= fun eth -&gt; ARP.connect eth &gt;&gt;= fun arp -&gt; IPV4.connect ~cidr ~gateway eth arp &gt;&gt;= fun ip -&gt; UDP.connect ip &gt;&gt;= fun udp -&gt; let send = send_udp udp in Icmp.connect send log_one w &gt;&gt;= fun icmp -&gt; (* The callback cascade for an incoming network packet. *) let ethif_listener = ETH.input ~arpv4:(ARP.input arp) ~ipv4:( IPV4.input ~tcp:(fun ~src:_ ~dst:_ _ -&gt; Lwt.return_unit) ~udp:(fun ~src:_ ~dst:_ _ -&gt; Lwt.return_unit) ~default:(fun ~proto ~src ~dst buf -&gt; match proto with | 1 -&gt; Icmp.input icmp ~src ~dst buf | _ -&gt; Lwt.return_unit) ip) ~ipv6:(fun _ -&gt; Lwt.return_unit) eth in (* Start the callback in a separate asynchronous task. *) Lwt.async (fun () -&gt; N.listen net ~header_size:Ethernet_wire.sizeof_ethernet ethif_listener &gt;|= function | Ok () -&gt; () | Error e -&gt; Logs.err (fun m -&gt; m &quot;netif error %a&quot; N.pp_error e)); (* Send the initial UDP packet with a ttl of 1. This entails the domino effect to receive ICMP packets, send out another UDP packet with ttl increased by one, etc. - until a destination unreachable is received, or the hop limit is reached. *) send 1 &gt;&gt;= fun () -&gt; t end </code></pre> <p>The configuration (<code>config.ml</code>) for this unikernel is as follows:</p> <pre><code class="language-OCaml">open Mirage let host = let doc = Key.Arg.info ~doc:&quot;The host to trace.&quot; [&quot;host&quot;] in Key.(create &quot;host&quot; Arg.(opt ipv4_address (Ipaddr.V4.of_string_exn &quot;141.1.1.1&quot;) doc)) let timeout = let doc = Key.Arg.info ~doc:&quot;Timeout (in millisecond)&quot; [&quot;timeout&quot;] in Key.(create &quot;timeout&quot; Arg.(opt int 1000 doc)) let ipv4 = let doc = Key.Arg.info ~doc:&quot;IPv4 address&quot; [&quot;ipv4&quot;] in Key.(create &quot;ipv4&quot; Arg.(required ipv4 doc)) let ipv4_gateway = let doc = Key.Arg.info ~doc:&quot;IPv4 gateway&quot; [&quot;ipv4-gateway&quot;] in Key.(create &quot;ipv4-gateway&quot; Arg.(required ipv4_address doc)) let main = let packages = [ package ~sublibs:[&quot;ipv4&quot;; &quot;udp&quot;; &quot;icmpv4&quot;] &quot;tcpip&quot;; package &quot;ethernet&quot;; package &quot;arp-mirage&quot;; package &quot;mirage-protocols&quot;; package &quot;mtime&quot;; ] in foreign ~keys:[Key.abstract ipv4 ; Key.abstract ipv4_gateway ; Key.abstract host ; Key.abstract timeout] ~packages &quot;Unikernel.Main&quot; (random @-&gt; mclock @-&gt; time @-&gt; network @-&gt; job) let () = register &quot;traceroute&quot; [ main $ default_random $ default_monotonic_clock $ default_time $ default_network ] </code></pre> <p>And voila, that's all the code. If you copy it together (or download the two files from <a href="https://github.com/roburio/traceroute">the GitHub repository</a>), and have OCaml, opam, and <a href="https://mirage.io/wiki/install">mirage (&gt;= 3.8.0)</a> installed, you should be able to:</p> <pre><code class="language-bash">$ mirage configure -t hvt $ make depend $ make $ solo5-hvt --net:service=tap0 -- traceroute.hvt ... ... get the output shown at top ... </code></pre> <p>Enhancements may be to use a different protocol (TCP? or any other protocol ID (may be used to encode more information), encode data into IPv4 ID, or the full 8 bytes of the upper protocol), encrypt/authenticate the data transmitted (and verify it has not been tampered with in the ICMP reply), improve error handling and recovery, send multiple packets for improved round trip time measurements, ...</p> <p>If you develop enhancements you'd like to share, please sent a pull request to the git repository.</p> <p>Motivation for this traceroute unikernel was while talking with <a href="https://twitter.com/networkservice">Aaron</a> and <a href="https://github.com/phaer">Paul</a>, who contributed several patches to the IP stack which pass the ttl through.</p> <p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> urn:uuid:ed3036f6-83d2-5e80-b3da-4ccbedb5ae9eTraceroute2021-11-19T18:04:52-00:00hannes<p>A tutorial how to deploy authoritative name servers, let's encrypt, and updating entries from unix services.</p> 2019-12-23T21:30:53-00:00<h2 id="goal">Goal</h2> <p>Have your domain served by OCaml-DNS authoritative name servers. Data is stored in a git remote, and let's encrypt certificates can be requested to DNS. This software is deployed since more than two years for several domains such as <code>nqsb.io</code> and <code>robur.coop</code>. This present the authoritative server side, and certificate library of the OCaml-DNS implementation formerly known as <a href="/Posts/DNS">µDNS</a>.</p> <h2 id="prerequisites">Prerequisites</h2> <p>You need to own a domain, and be able to delegate the name service to your own servers. You also need two spare public IPv4 addresses (in different /24 networks) for your name servers. A git server or remote repository reachable via git over ssh. Servers which support <a href="https://github.com/solo5/solo5">solo5</a> guests, and have the corresponding tender installed. A computer with <a href="https://opam.ocaml.org">opam</a> (&gt;= 2.0.0) installed.</p> <h2 id="data-preparation">Data preparation</h2> <p>Figure out a way to get the DNS entries of your domain in a <a href="https://tools.ietf.org/html/rfc1034">&quot;master file format&quot;</a>, i.e. what bind uses.</p> <p>This is a master file for the <code>mirage</code> domain, defining <code>$ORIGIN</code> to avoid typing the domain name after each hostname (use <code>@</code> if you need the domain name only; if you need to refer to a hostname in a different domain end it with a dot (<code>.</code>), i.e. <code>ns2.foo.com.</code>). The default time to live <code>$TTL</code> is an hour (3600 seconds). The zone contains a <a href="https://tools.ietf.org/html/rfc1035#section-3.3.13">start of authority (<code>SOA</code>) record</a> containing the nameserver, hostmaster, serial, refresh, retry, expiry, and minimum. Also, a single <a href="https://tools.ietf.org/html/rfc1035#section-3.3.11">name server (<code>NS</code>) record</a> <code>ns1</code> is specified with an accompanying <a href="https://tools.ietf.org/html/rfc1035#section-3.4.1">address (<code>A</code>) records</a> pointing to their IPv4 address.</p> <pre><code class="language-shell">git-repo&gt; cat mirage $ORIGIN mirage. $TTL 3600 @ SOA ns1 hostmaster 1 86400 7200 1048576 3600 @ NS ns1 ns1 A 127.0.0.1 www A 1.1.1.1 git-repo&gt; git add mirage &amp;&amp; git commit -m initial &amp;&amp; git push </code></pre> <h2 id="installation">Installation</h2> <p>On your development machine, you need to install various OCaml packages. You don't need privileged access if common tools (C compiler, make, libgmp) are already installed. You have <code>opam</code> installed.</p> <p>Let's create a fresh <code>switch</code> for the DNS journey:</p> <pre><code class="language-shell">$ opam init $ opam update $ opam switch create udns 4.14.1 # waiting a bit, a fresh OCaml compiler is getting bootstrapped $ eval `opam env` #sets some environment variables </code></pre> <p>The last command set environment variables in your current shell session, please use the same shell for the commands following (or run <code>eval $(opam env)</code> in another shell and proceed in there - the output of <code>opam switch</code> sohuld point to <code>udns</code>).</p> <h3 id="validation-of-our-zonefile">Validation of our zonefile</h3> <p>First let's check that OCaml-DNS can parse our zonefile:</p> <pre><code class="language-shell">$ opam install dns-cli #installs ~/.opam/udns/bin/ozone and other binaries $ ozone &lt;git-repo&gt;/mirage # see ozone --help successfully checked zone </code></pre> <p>Great. Error reporting is not great, but line numbers are indicated (<code>ozone: zone parse problem at line 3: syntax error</code>), <a href="https://github.com/mirage/ocaml-dns/tree/v4.2.0/zone">lexer and parser are lex/yacc style</a> (PRs welcome).</p> <p>FWIW, <code>ozone</code> accepts <code>--old &lt;filename&gt;</code> to check whether an update from the old zone to the new is fine. This can be used as <a href="https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks">pre-commit hook</a> in your git repository to avoid bad parse states in your name servers.</p> <h3 id="getting-the-primary-up">Getting the primary up</h3> <p>The next step is to compile the primary server and run it to serve the domain data. Since the git-via-ssh client is not yet released, we need to add a custom opam repository to this switch.</p> <pre><code class="language-shell"># get the `mirage` application via opam $ opam install lwt mirage # get the source code of the unikernels $ git clone https://github.com/roburio/dns-primary-git.git $ cd dns-primary-git # let's build the server first as unix application $ mirage configure #--no-depext if you have all system dependencies $ make # run it $ dist/primary-git # starts a unix process which clones https://github.com/roburio/udns.git # attempts to parse the data as zone files, and fails on parse error $ dist/primary-git --remote=https://my-public-git-repository # this should fail with ENOACCESS since the DNS server tries to listen on port 53 # which requires a privileged user, i.e. su, sudo or doas $ sudo dist/primary-git --remote=https://my-public-git-repository # leave it running, run the following programs in a different shell # test it $ host ns1.mirage 127.0.0.1 ns1.mirage has address 127.0.0.1 $ dig any mirage @127.0.0.1 # a DNS packet printout with all records available for mirage </code></pre> <p>That's exciting, the DNS server serving answers from a remote git repository.</p> <h3 id="securing-the-git-access-with-ssh">Securing the git access with ssh</h3> <p>Let's authenticate the access by using ssh, so we feel ready to push data there as well. The primary-git unikernel already includes an experimental <a href="https://github.com/haesbaert/awa-ssh">ssh client</a>, all we need to do is setting up credentials - in the following a RSA keypair and the server fingerprint.</p> <pre><code class="language-shell"># collect the RSA host key fingerprint $ ssh-keyscan &lt;git-server&gt; &gt; /tmp/git-server-public-keys $ ssh-keygen -l -E sha256 -f /tmp/git-server-public-keys | grep ED25519 256 SHA256:a5kkkuo7MwTBkW+HDt4km0gGPUAX0y1bFcPMXKxBaD0 &lt;git-server&gt; (ED25519) # we're interested in the SHA256:yyy only # generate a ssh keypair $ awa_gen_key --keytype ed25510 # installed by the make step above in ~/.opam/udns/bin private key: ed25519:nO7ervdJqzPfuvdM/J4qImipwVoI5gl53fpqgjZnv9w= ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICyliWbwWWXBc1+DRIzReLQ4UFiVGXJ6Paw1Jts+XQte awa@awa.local # please run your own awa_gen_key, don't use the numbers above </code></pre> <p>The public key needs is in standard OpenSSH format and needs to be added to the list of accepted keys on your server - the exact steps depend on your git server, if you're running your own with <a href="https://github.com/tv42/gitosis">gitosis</a>, add it as new public key file and grant that key access to the data repository. If you use gitlab or github, you may want to create a new user account and with the generated key.</p> <p>The private key is not displayed, but only the seed required to re-generate it, when using the same random number generator, in our case <a href="https://mirage.github.io/mirage-crypto/doc/mirage-crypto-rng/Mirage_crypto_rng/index.html">fortuna implemented by mirage-crypto</a> - used by both <code>awa_gen_key</code> and <code>primary-git</code>. The seed is provided as command-line argument while starting <code>primary-git</code>:</p> <pre><code class="language-shell"># execute with git over ssh, authenticator from ssh-keyscan, seed from awa_gen_key $ ./primary-git --authenticator=SHA256:a5kkkuo7MwTBkW+HDt4km0gGPUAX0y1bFcPMXKxBaD0 --ssh-key=ed25519:nO7ervdJqzPfuvdM/J4qImipwVoI5gl53fpqgjZnv9w= --remote=git@&lt;git-server&gt;:repo-name.git # started up, you can try the host and dig commands from above if you like </code></pre> <p>To wrap up, we now have a primary authoritative name server for our zone running as Unix process, which clones a remote git repository via ssh on startup and then serves it.</p> <h3 id="authenticated-data-updates">Authenticated data updates</h3> <p>Our remote git repository is the source of truth, if you need to add a DNS entry to the zone, you git pull, edit the zone file, remember to increase the serial in the SOA line, run <code>ozone</code>, git commit and push to the repository.</p> <p>So, the <code>primary-git</code> needs to be informed of git pushes. This requires a communication channel from the git server (or somewhere else, e.g. your laptop) to the DNS server. I prefer in-protocol solutions over adding yet another protocol stack, no way my DNS server will talk HTTP REST.</p> <p>The DNS protocol has an extension for <a href="https://tools.ietf.org/html/rfc1996">notifications of zone changes</a> (as a DNS packet), usually used between the primary and secondary servers. The <code>primary-git</code> accepts these notify requests (i.e. bends the standard slightly), and upon receival pulls the remote git repository, and serves the fresh zone files. Since a git pull may be rather excessive in terms of CPU cycles and network bandwidth, only authenticated notifications are accepted.</p> <p>The DNS protocol specifies in another extension <a href="https://tools.ietf.org/html/rfc2845">authentication (DNS TSIG)</a> with transaction signatures on DNS packets including a timestamp and fudge to avoid replay attacks. As key material hmac secrets distribued to both the communication endpoints are used.</p> <p>To recap, the primary server is configured with command line parameters (for remote repository url and ssh credentials), and serves data from a zonefile. If the secrets would be provided via command line, a restart would be necessary for adding and removing keys. If put into the zonefile, they would be publicly served on request. So instead, we'll use another file, still in zone file format, in the top-level domain <code>_keys</code>, i.e. the <code>mirage._keys</code> file contains keys for the <code>mirage</code> zone. All files ending in <code>._keys</code> are parsed with the normal parser, but put into an authentication store instead of the domain data store, which is served publically.</p> <p>For encoding hmac secrets into DNS zone file format, the <a href="https://tools.ietf.org/html/rfc4034#section-2"><code>DNSKEY</code></a> format is used (designed for DNSsec). The <a href="https://www.isc.org/bind/">bind</a> software comes with <code>dnssec-keygen</code> and <code>tsig-keygen</code> to generate DNSKEY output: flags is 0, protocol is 3, and algorithm identifier for SHA256 is 163 (SHA384 164, SHA512 165). This is reused by the OCaml DNS library. The key material itself is base64 encoded.</p> <p>Access control and naming of keys follows the DNS domain name hierarchy - a key has the form name._operation.domain, and has access granted to domain and all subdomains of it. Two operations are supported: update and transfer. In the future there may be a dedicated notify operation, for now we'll use update. The name part is ignored for the update operation.</p> <p>Since we now embedd secret information in the git repository, it is a good idea to restrict access to it, i.e. make it private and not publicly cloneable or viewable. Let's generate a first hmac secret and send a notify:</p> <pre><code class="language-shell">$ dd if=/dev/random bs=1 count=32 | b64encode - begin-base64 644 - kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= ==== [..] git-repo&gt; echo &quot;personal._update.mirage. DNSKEY 0 3 163 kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg=&quot; &gt; mirage._keys git-repo&gt; git add mirage._keys &amp;&amp; git commit -m &quot;add hmac secret&quot; &amp;&amp; git push # now we need to restart the primary git to get the git repository with the key $ ./primary-git --ssh-key=... # arguments from above, remote git, host key fingerprint, private key seed # now test that a notify results in a git pull $ onotify 127.0.0.1 mirage --key=personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= # onotify was installed by dns-cli in ~/.opam/udns/bin/onotify, see --help for options # further changes to the hmac secrets don't require a restart anymore, a notify packet is sufficient :D </code></pre> <p>Ok, this onotify command line could be setup as a git post-commit hook, or run manually after each manual git push.</p> <h3 id="secondary">Secondary</h3> <p>It's time to figure out how to integrate the secondary name server. An already existing bind or something else that accepts notifications and issues zone transfers with hmac-sha256 secrets should work out of the box. If you encounter interoperability issues, please get in touch with me.</p> <p>The <code>secondary</code> unikernel is available from another git repository:</p> <pre><code class="language-shell"># get the secondary sources $ git clone https://github.com/roburio/dns-secondary.git $ cd dns-secondary </code></pre> <p>It's only command line argument is a list of hmac secrets used for authenticating that the received data originates from the primary server. Data is initially transferred by a <a href="https://tools.ietf.org/html/rfc5936">full zone transfer (AXFR)</a>, later updates (upon refresh timer or notify request sent by the primary) use <a href="https://tools.ietf.org/html/rfc1995">incremental (IXFR)</a>. Zone transfer requests and data are authenticated with transaction signatures again.</p> <p>Convenience by OCaml DNS is that transfer key names matter, and are of the form <primary-ip>.<secondary-ip>._transfer.domain, i.e. <code>1.1.1.1.2.2.2.2._transfer.mirage</code> if the primary server is 1.1.1.1, and the secondary 2.2.2.2. Encoding the IP address in the name allows both parties to start the communication: the secondary starts by requesting a SOA for all domains for which keys are provided on command line, and if an authoritative SOA answer is received, the AXFR is triggered. The primary server emits notification requests on startup and then on every zone change (i.e. via git pull) to all secondary IP addresses of transfer keys present for the specific zone in addition to the notifications to the NS records in the zone.</p> <pre><code class="language-shell">$ mirage configure $ make $ ./dist/secondary </code></pre> <h3 id="ip-addresses-and-routing">IP addresses and routing</h3> <p>Both primary and secondary serve the data on the DNS port (53) on UDP and TCP. To run both on the same machine and bind them to different IP addresses, we'll use a layer 2 network (ethernet frames) with a host system software switch (bridge interface <code>service</code>), the unikernels as virtual machines (or seccomp-sandboxed) via the <a href="https://github.com/solo5/solo5">solo5</a> backend. Using xen is possible as well. As IP address range we'll use 10.0.42.0/24, and the host system uses the 10.0.42.1.</p> <p>The primary git needs connectivity to the remote git repository, thus on a laptop in a private network we need network address translation (NAT) from the bridge where the unikernels speak to the Internet where the git repository resides.</p> <pre><code class="language-shell"># on FreeBSD: # configure NAT with pf, you need to have forwarding enabled $ sysctl net.inet.ip.forwarding: 1 $ echo 'nat pass on wlan0 inet from 10.0.42.0/24 to any -&gt; (wlan0)' &gt;&gt; /etc/pf.conf $ service pf restart # make tap interfaces UP on open() $ sysctl net.link.tap.up_on_open: 1 # bridge creation, naming, and IP setup $ ifconfig bridge create bridge0 $ ifconfig bridge0 name service $ ifconfig bridge0 10.0.42.1/24 # two tap interfaces for our unikernels $ ifconfig tap create tap0 $ ifconfig tap create tap1 # add them to the bridge $ ifconfig service addm tap0 addm tap1 </code></pre> <h3 id="primary-and-secondary-setup">Primary and secondary setup</h3> <p>Let's update our zone slightly to reflect the IP changes.</p> <pre><code class="language-shell">git-repo&gt; cat mirage $ORIGIN mirage. $TTL 3600 @ SOA ns1 hostmaster 2 86400 7200 1048576 3600 @ NS ns1 @ NS ns2 ns1 A 10.0.42.2 ns2 A 10.0.42.3 # we also need an additional transfer key git-repo&gt; cat mirage._keys personal._update.mirage. DNSKEY 0 3 163 kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= 10.0.42.2.10.0.42.3._transfer.mirage. DNSKEY 0 3 163 cDK6sKyvlt8UBerZlmxuD84ih2KookJGDagJlLVNo20= git-repo&gt; git commit -m &quot;updates&quot; . &amp;&amp; git push </code></pre> <p>Ok, the git repository is ready, now we need to compile the unikernels for the virtualisation target (see <a href="https://mirage.io/wiki/hello-world#Building-for-Another-Backend">other targets</a> for further information).</p> <pre><code class="language-shell"># back to primary $ cd ../dns-primary-git $ mirage configure -t hvt # or e.g. -t spt (and solo5-spt below) # installs backend-specific opam packages, recompiles some $ make [...] $ solo5-hvt --net:service=tap0 -- primary_git.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1 --seed=.. --authenticator=.. --remote=... # should now run as a virtual machine (kvm, bhyve), and clone the git repository $ dig any mirage @10.0.42.2 # should reply with the SOA and NS records, and also the name server address records in the additional section # secondary $ cd ../dns-secondary $ mirage configure -t hvt $ make $ solo5-hvt --net:service=tap1 -- secondary.hvt --ipv4=10.0.42.3/24 --keys=10.0.42.2.10.0.42.3._transfer.mirage:SHA256:cDK6sKyvlt8UBerZlmxuD84ih2KookJGDagJlLVNo20= # an ipv4-gateway is not needed in this setup, but in real deployment later # it should start up and transfer the mirage zone from the primary $ dig any mirage @10.0.42.3 # should now output the same information as from 10.0.42.2 # testing an update and propagation # edit mirage zone, add a new record and increment the serial number git-repo&gt; echo &quot;foo A 127.0.0.1&quot; &gt;&gt; mirage git-repo&gt; vi mirage &lt;- increment serial git-repo&gt; git commit -m 'add foo' . &amp;&amp; git push $ onotify 10.0.42.2 mirage --key=personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= # now check that it worked $ dig foo.mirage @10.0.42.2 # primary $ dig foo.mirage @10.0.42.3 # secondary got notified and transferred the zone </code></pre> <p>You can also check the behaviour when restarting either of the VMs, whenever the primary is available the zone is synchronised. If the primary is down, the secondary still serves the zone. When the secondary is started while the primary is down, it won't serve any data until the primary is online (the secondary polls periodically, the primary sends notifies on startup).</p> <h3 id="dynamic-data-updates-via-dns-pushed-to-git">Dynamic data updates via DNS, pushed to git</h3> <p>DNS is a rich protocol, and it also has builtin <a href="https://tools.ietf.org/html/rfc2136">updates</a> that are supported by OCaml DNS, again authenticated with hmac-sha256 and shared secrets. Bind provides the command-line utility <code>nsupdate</code> to send these update packets, a simple <code>oupdate</code> unix utility is available as well (i.e. for integration of dynamic DNS clients). You know the drill, add a shared secret to the primary, git push, notify the primary, and voila we can dynamically in-protocol update. An update received by the primary via this way will trigger a git push to the remote git repository, and notifications to the secondary servers as described above.</p> <pre><code class="language-shell"># being lazy, I reuse the key above $ oupdate 10.0.42.2 personal._update.mirage:SHA256:kJJqipaQHQWqZL31Raar6uPnepGFIdtpjkXot9rv2xg= my-other.mirage 1.2.3.4 # let's observe the remote git git-repo&gt; git pull # there should be a new commit generated by the primary git-repo&gt; git log # test it, should return 1.2.3.4 $ dig my-other.mirage @10.0.42.2 $ dig my-other.mirage @10.0.42.3 </code></pre> <p>So we can deploy further <code>oupdate</code> (or <code>nsupdate</code>) clients, distribute hmac secrets, and have the DNS zone updated. The source of truth is still the git repository, where the primary-git pushes to. Merge conflicts and timing of pushes is not yet dealt with. They are unlikely to happen since the primary is notified on pushes and should have up-to-date data in storage. Sorry, I'm unsure about the error semantics, try it yourself.</p> <h3 id="lets-encrypt">Let's encrypt!</h3> <p><a href="https://letsencrypt.org/">Let's encrypt</a> is a certificate authority (CA), which certificate is shipped as trust anchor in web browsers. They specified a protocol for <a href="https://tools.ietf.org/html/draft-ietf-acme-acme-05">automated certificate management environment (ACME)</a>, used to get X509 certificates for your services. In the protocol, a certificate signing request (publickey and hostname) is sent to let's encrypt servers, which sends a challenge to proof the ownership of the hostnames. One widely-used way to solve this challenge is running a web server, another is to serve it as text record from the authoritative DNS server.</p> <p>Since I avoid persistent storage when possible, and also don't want to integrate a HTTP client stack in the primary server, I developed a third unikernel that acts as (hidden) secondary server, performs the tedious HTTP communication with let's encrypt servers, and stores all data in the public DNS zone.</p> <p>For encoding of certificates, the DANE working group specified <a href="https://tools.ietf.org/html/rfc6698.html#section-7.1">TLSA</a> records in DNS. They are quadruples of usage, selector, matching type, and ASN.1 DER-encoded material. We set usage to 3 (domain-issued certificate), matching type to 0 (no hash), and selector to 0 (full certificate) or 255 (private usage) for certificate signing requests. The interaction is as follows:</p> <ol> <li>Primary, secondary, and let's encrypt unikernels are running </li> <li>A service (<code>ocertify</code>, <code>unikernels/certificate</code>, or the <code>dns-certify.mirage</code> library) demands a TLS certificate, and has a hmac-secret for the primary DNS </li> <li>The service generates a certificate signing request with the desired hostname(s), and performs an nsupdate with TLSA 255 <DER encoded signing-request> </li> <li>The primary accepts the update, pushes the new zone to git, and sends notifies to secondary and let's encrypt unikernels which (incrementally) transfer the zone </li> <li>The let's encrypt unikernel notices while transferring the zone a signing request without a certificate, starts HTTP interaction with let's encrypt </li> <li>The let's encrypt unikernel solves the challenge, sends the response as update of a TXT record to the primary nameserver </li> <li>The primary pushes the TXT record to git, and notifies secondaries (which transfer the zone) </li> <li>The let's encrypt servers request the TXT record from either or both authoritative name servers </li> <li>The let's encrypt unikernel polls for the issued certificate and send an update to the primary TLSA 0 <DER encoded certificate> </li> <li>The primary pushes the certificate to git, notifies secondaries (which transfer the zone) </li> <li>The service polls TLSA records for the hostname, and use it upon retrieval </li> </ol> <p>Note that neither the signing request nor the certificate contain private key material, thus it is fine to serve them publically. Please also note, that the service polls for the certificate for the hostname in DNS, which is valid (start and end date) certificate and uses the same public key, this certificate is used and steps 3-10 are not executed.</p> <p>The let's encrypt unikernel does not serve anything, it is a reactive system which acts upon notification from the primary. Thus, it can be executed in a private address space (with a NAT). Since the OCaml DNS server stack needs to push notifications to it, it preserves all incoming signed SOA requests as candidates for notifications on update. The let's encrypt unikernel ensures to always have a connection to the primary to receive notifications.</p> <pre><code class="language-shell"># getting let's encrypt up and running $ cd .. $ git clone https://github.com/roburio/dns-letsencrypt-secondary.git $ cd dns-letsencrypt-secondary $ mirage configure -t hvt $ make # run it $ solo5-hvt --net:service=tap2 -- letsencrypt.hvt --keys=... # test it $ ocertify 10.0.42.2 foo.mirage </code></pre> <p>For actual testing with let's encrypt servers you need to have the primary and secondary deployed on your remote hosts, and your domain needs to be delegated to these servers. Good luck. And ensure you have backup your git repository.</p> <p>As fine print, while this tutorial was about the <code>mirage</code> zone, you can stick any number of zones into the git repository. If you use a <code>_keys</code> file (without any domain prefix), you can configure hmac secrets for all zones, i.e. something to use in your let's encrypt unikernel and secondary unikernel. Dynamic addition of zones is supported, just create a new zonefile and notify the primary, the secondary will be notified and pick it up. The primary responds to a signed SOA for the root zone (i.e. requested by the secondary) with the SOA response (not authoritative), and additionally notifications for all domains of the primary.</p> <h3 id="conclusion-and-thanks">Conclusion and thanks</h3> <p>This tutorial presented how to use the OCaml DNS based unikernels to run authoritative name servers for your domain, using a git repository as the source of truth, dynamic authenticated updates, and let's encrypt certificate issuing.</p> <p>There are further steps to take, such as monitoring (<code>mirage configure --monitoring</code>), which use a second network interface for reporting syslog and metrics to telegraf / influx / grafana. Some DNS features are still missing, most prominently DNSSec.</p> <p>I'd like to thank all people involved in this software stack, without other key components, including <a href="https://github.com/mirage/ocaml-git">git</a>, <a href="https://github.com/mirage/mirage-crypto">mirage-crypto</a>, <a href="https://github.com/mirage/awa-ssh">awa-ssh</a>, <a href="https://github.com/solo5/sol5">solo5</a>, <a href="https://github.com/mirage/mirage">mirage</a>, <a href="https://github.com/mmaker/ocaml-letsencrypt">ocaml-letsencrypt</a>, and more.</p> <p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> urn:uuid:e3d4fd9e-e379-5c86-838e-46034ddd435dDeploying authoritative OCaml-DNS servers as MirageOS unikernels2023-03-02T17:20:44-00:00hannes<p>MirageOS unikernels are reproducible :)</p> 2019-12-16T18:29:30-00:00<h2 id="reproducible-builds-summit">Reproducible builds summit</h2> <p>I'm just back from the <a href="https://reproducible-builds.org/events/Marrakesh2019/">Reproducible builds summit 2019</a>. In 2018, several people developing <a href="https://ocaml.org">OCaml</a> and <a href="https://opam.ocaml.org">opam</a> and <a href="https://mirage.io">MirageOS</a>, attended <a href="https://reproducible-builds.org/events/paris2018/">the Reproducible builds summit in Paris</a>. The notes from last year on <a href="https://reproducible-builds.org/events/paris2018/report/#Toc11410_331763073">opam reproducibility</a> and <a href="https://reproducible-builds.org/events/paris2018/report/#Toc11681_331763073">MirageOS reproducibility</a> are online. After last years workshop, Raja started developing the opam reproducibilty builder <a href="https://github.com/rjbou/orb">orb</a>, which I extended at and after this years summit. This year before and after the facilitated summit there were hacking days, which allowed further interaction with participants, writing some code and conduct experiments. I had this year again an exciting time at the summit and hacking days, thanks to our hosts, organisers, and all participants.</p> <h2 id="goal">Goal</h2> <p>Stepping back a bit, first look on the <a href="https://reproducible-builds.org/">goal of reproducible builds</a>: when compiling source code multiple times, the produced binaries should be identical. It should be sufficient if the binaries are behaviourally equal, but this is pretty hard to check. It is much easier to check <strong>bit-wise identity of binaries</strong>, and relaxes the burden on the checker -- checking for reproducibility is reduced to computing the hash of the binaries. Let's stick to the bit-wise identical binary definition, which also means software developers have to avoid non-determinism during compilation in their toolchains, dependent libraries, and developed code.</p> <p>A <a href="https://reproducible-builds.org/docs/test-bench/">checklist</a> of potential things leading to non-determinism has been written up by the reproducible builds project. Examples include recording the build timestamp into the binary, ordering of code and embedded data. The reproducible builds project also developed <a href="https://packages.debian.org/sid/disorderfs">disorderfs</a> for testing reproducibility and <a href="https://diffoscope.org/">diffoscope</a> for comparing binaries with file-dependent readers, falling back to <code>objdump</code> and <code>hexdump</code>. A giant <a href="https://tests.reproducible-builds.org/">test infrastructure</a> with <a href="https://tests.reproducible-builds.org/debian/index_variations.html">lots of variations</a> between the builds, mostly using Debian, has been setup over the years.</p> <p>Reproducibility is a precondition for trustworthy binaries. See <a href="https://reproducible-builds.org/#why-does-it-matter">why does it matter</a>. If there are no instructions how to get from the published sources to the exact binary, why should anyone trust and use the binary which claims to be the result of the sources? It may as well contain different code, including a backdoor, bitcoin mining code, outputting the wrong results for specific inputs, etc. Reproducibility does not imply the software is free of security issues or backdoors, but instead of a audit of the binary - which is tedious and rarely done - the source code can be audited - but the toolchain (compiler, linker, ..) used for compilation needs to be taken into account, i.e. trusted or audited to not be malicious. <strong>I will only ever publish binaries if they are reproducible</strong>.</p> <p>My main interest at the summit was to enhance existing tooling and conduct some experiments about the reproducibility of <a href="https://mirage.io">MirageOS unikernels</a> -- a unikernel is a statically linked ELF binary to be run as Unix process or <a href="https://github.com/solo5/solo5">virtual machine</a>. MirageOS heavily uses <a href="https://ocaml.org">OCaml</a> and <a href="https://opam.ocaml.org">opam</a>, the OCaml package manager, and is an opam package itself. Thus, <em>checking reproducibility of a MirageOS unikernel is the same problem as checking reproducibility of an opam package</em>.</p> <h2 id="reproducible-builds-with-opam">Reproducible builds with opam</h2> <p>Testing for reproducibility is achieved by taking the sources and compile them twice independently. Afterwards the equality of the resulting binaries can be checked. In trivial projects, the sources is just a single file, or originate from a single tarball. In OCaml, opam uses <a href="https://github.com/ocaml/opam-repository">a community repository</a> where OCaml developers publish their package releases to, but can also use custom repositores, and in addition pin packages to git remotes (url including branch or commit), or a directory on the local filesystem. Manually tracking and updating all dependent packages of a MirageOS unikernel is not feasible: our hello-world compiled for hvt (kvm/BHyve) already has 79 opam dependencies, including the OCaml compiler which is distribued as opam package. The unikernel serving this website depends on 175 opam packages.</p> <p>Conceptually there should be two tools, the <em>initial builder</em>, which takes the latest opam packages which do not conflict, and exports exact package versions used during the build, as well as hashes of binaries. The other tool is a <em>rebuilder</em>, which imports the export, conducts a build, and outputs the hashes of the produced binaries.</p> <p>Opam has the concept of a <code>switch</code>, which is an environment where a package set is installed. Switches are independent of each other, and can already be exported and imported. Unfortunately the export is incomplete: if a package includes additional patches as part of the repository -- sometimes needed for fixing releases where the actual author or maintainer of a package responds slowly -- these package neither the patches end up in the export. Also, if a package is pinned to a git branch, the branch appears in the export, but this may change over time by pushing more commits or even force-pushing to that branch. In <a href="https://github.com/ocaml/opam/pull/4040">PR #4040</a> (under discussion and review), also developed during the summit, I propose to embed the additional files as base64 encoded values in the opam file. To solve the latter issue, I modified the export mechanism to <a href="https://github.com/ocaml/opam/pull/4055">embed the git commit hash (PR #4055)</a>, and avoid sources from a local directory and which do not have a checksum.</p> <p>So the opam export contains the information required to gather the exact same sources and build instructions of the opam packages. If the opam repository would be self-contained (i.e. not depend on any other tools), this would be sufficient. But opam does not run in thin air, it requires some system utilities such as <code>/bin/sh</code>, <code>sed</code>, a GNU make, commonly <code>git</code>, a C compiler, a linker, an assembler. Since opam is available on various operating systems, the plugin <code>depext</code> handles host system dependencies, e.g. if your opam package requires <code>gmp</code> to be installed, this requires slightly different names depending on host system or distribution, take a look at <a href="https://github.com/ocaml/opam-repository/blob/master/packages/conf-gmp/conf-gmp.1/opam">conf-gmp</a>. This also means, opam has rather good information about both the opam dependencies and the host system dependencies for each package. Please note that the host system packages used during compilation are not yet recorded (i.e. which <code>gmp</code> package was installed and used during the build, only that a <code>gmp</code> package has to be installed). The base utilities mentioned above (C compiler, linker, shell) are also not recorded yet.</p> <p>Operating system information available in opam (such as architecture, distribution, version), which in some cases maps to exact base utilities, is recorded in the build-environment, a separate artifact. The environment variable <a href="https://reproducible-builds.org/specs/source-date-epoch/"><code>SOURCE_DATE_EPOCH</code></a>, used for communicating the same timestamp when software is required to record a timestamp into the resulting binary, is also captured in the build environment.</p> <p>Additional environment variables may be captured or used by opam packages to produce different output. To avoid this, both the initial builder and the rebuilder are run with minimal environment variables: only <code>PATH</code> (normalised to a whitelist of <code>/bin</code>, <code>/usr/bin</code>, <code>/usr/local/bin</code> and <code>/opt/bin</code>) and <code>HOME</code> are defined. Missing information at the moment includes CPU features: some libraries (gmp?, nocrypto) emit different code depending on the CPU feature.</p> <h2 id="tooling">Tooling</h2> <p><em>TL;DR: A <strong>build</strong> builds an opam package, and outputs <code>.opam-switch</code>, <code>.build-hashes.N</code>, and <code>.build-environment.N</code>. A <strong>rebuild</strong> uses these artifacts as input, builds the package and outputs another <code>.build-hashes.M</code> and <code>.build-environment.M</code>.</em></p> <p>The command-line utility <code>orb</code> can be installed and used:</p> <pre><code class="language-sh">$ opam pin add orb git+https://github.com/hannesm/orb.git#active $ orb build --twice --keep-build-dir --diffoscope &lt;your-favourite-opam-package&gt; </code></pre> <p>It provides two subcommands <code>build</code> and <code>rebuild</code>. The <code>build</code> command takes a list of local opam <code>--repos</code> where to take opam packages from (defaults to <code>default</code>), a compiler (either a variant <code>--compiler=4.09.0+flambda</code>, a version <code>--compiler=4.06.0</code>, or a pin to a local development version <code>--compiler-pin=~/ocaml</code>), and optionally an existing switch <code>--use-switch</code>. It creates a switch, builds the packages, and emits the opam export, hashes of all files installed by these packages, and the build environment. The flags <code>--keep-build</code> retains the build products, opam's <code>--keep-build-dir</code> in addition temporary build products and generated source code. If <code>--twice</code> is provided, a rebuild (described next) is executed after the initial build.</p> <p>The <code>rebuild</code> command takes a directory with the opam export and build environment to build the opam package. It first compares the build-environment with the host system, sets the <code>SOURCE_DATE_EPOCH</code> and switch location accordingly and executes the import. Once the build is finished, it compares the hashes of the resulting files with the previous run. On divergence, if build directories were kept in the previous build, and if diffoscope is available and <code>--diffoscope</code> was provided, diffoscope is run on the diverging files. If <code>--keep-build-dir</code> was provided as well, <code>diff -ur</code> can be used to compare the temporary build and sources, including build logs.</p> <p>The builds are run in parallel, as opam does, this parallelism does not lead to different binaries in my experiments.</p> <h2 id="results-and-discussion">Results and discussion</h2> <p><strong>All MirageOS unikernels I have deployed are reproducible \o/</strong>. Also, several binaries such as <code>orb</code> itself, <code>opam</code>, <code>solo5-hvt</code>, and all <code>albatross</code> utilities are reproducible.</p> <p>The unikernel range from hello world, web servers (e.g. this blog, getting its data on startup via a git clone to memory), authoritative DNS servers, CalDAV server. They vary in size between 79 and 200 opam packages, resulting in 2MB - 16MB big ELF binaries (including debug symbols). The <a href="https://github.com/roburio/reproducible-unikernel-repo">unikernel opam repository</a> contains some reproducible unikernels used for testing. Some work-in-progress enhancements are needed to achieve this:</p> <p>At the moment, the opam package of a MirageOS unikernel is automatically generated by <code>mirage configure</code>, but only used for tracking opam dependencies. I worked on <a href="https://github.com/mirage/mirage/pull/1022">mirage PR #1022</a> to extend the generated opam package with build and install instructions.</p> <p>As mentioned above, if locale is set, ocamlgraph needs to be patched to emit a (locale-dependent) timestamp.</p> <p>The OCaml program <a href="https://github.com/mirage/ocaml-crunch"><code>crunch</code></a> embeds a subdirectory as OCaml code into a binary, which we use in MirageOS quite regularly for static assets, etc. This plays in several ways into reproducibility: on the one hand, it needs a timestamp for its <code>last_modified</code> functionality (and adheres since <a href="https://github.com/mirage/ocaml-crunch/pull/45">June 2018</a> to the <code>SOURCE_DATE_EPOCH</code> spec, thanks to Xavier Clerc). On the other hand, it used before version 3.2.0 (released Dec 14th) hashtables for storing the file contents, where iteration is not deterministic (the insertion is not sorted), <a href="https://github.com/mirage/ocaml-crunch/pull/51">fixed in PR #51</a> by using a Map instead.</p> <p>In functoria, a tool used to configure MirageOS devices and their dependencies, can emit a list of opam packages which were required to build the unikernel. This uses <code>opam list --required-by --installed --rec &lt;pkgs&gt;</code>, which uses the cudf graph (<a href="https://github.com/mirage/functoria/pull/189#issuecomment-566696426">thanks to Raja for explanation</a>), that is during the rebuild dropping some packages. The <a href="https://github.com/mirage/functoria/pull/189">PR #189</a> avoids by not using the <code>--rec</code> argument, but manually computing the fixpoint.</p> <p>Certainly, the choice of environment variables, and whether to vary them (as <a href="https://tests.reproducible-builds.org/debian/index_variations.html">debian does</a>) or to not define them (or normalise) while building, is arguably. Since MirageOS does neither support time zone nor internationalisation, there is no need to prematurely solving this issue. On related note, even with different locale settings, MirageOS unikernels are reproducible apart from an <a href="https://github.com/backtracking/ocamlgraph/pull/90">issue in ocamlgraph #90</a> embedding the output of <a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/date.html"><code>date</code></a>, which is different depending on <code>LANG</code> and locale (<code>LC_*</code>) settings.</p> <p>Prior art in reproducible MirageOS unikernels is the <a href="https://github.com/mirage/qubes-mirage-firewall/">mirage-qubes-firewall</a>. Since <a href="https://github.com/mirage/qubes-mirage-firewall/commit/07ff3d61477383860216c69869a1ffee59145e45">early 2017</a> it is reproducible. Their approach is different by building in a docker container with the opam repository pinned to an exact git commit.</p> <h2 id="further-work">Further work</h2> <p>I only tested a certain subset of opam packages and MirageOS unikernels, mainly on a single machine (my laptop) running FreeBSD, and am happy if others will test reproducibility of their OCaml programs with the tools provided. There could as well be CI machines rebuilding opam packages and reporting results to a central repository. I'm pretty sure there are more reproducibility issues in the opam ecosystem. I developed an <a href="https://github.com/roburio/reproducible-testing-repo">reproducible testing opam repository</a> with opam packages that do not depend on OCaml, mainly for further tooling development. Some tests were also conducted on a Debian system with the same result. The variations, apart from build time, were using a different user, and different locale settings.</p> <p>As mentioned above, more environment, such as the CPU features, and external system packages, should be captured in the build environment.</p> <p>When comparing OCaml libraries, some output files (cmt / cmti / cma / cmxa) are not deterministic, but contain minimal diverge where I was not able to spot the root cause. It would be great to fix this, likely in the OCaml compiler distribution. Since the final result, the binary I'm interested in, is not affected by non-identical intermediate build products, I hope someone (you?) is interested in improving on this side. OCaml bytecode output also seems to be non-deterministic. There is <a href="https://github.com/coq/coq/issues/11229">a discussion on the coq issue tracker</a> which may be related.</p> <p>In contrast to initial plans, I did not used the <a href="https://reproducible-builds.org/specs/build-path-prefix-map/"><code>BUILD_PATH_PREFIX_MAP</code></a> environment variable, which is implemented in OCaml by <a href="https://github.com/ocaml/ocaml/pull/1515">PR #1515</a> (and followups). The main reasons are that something in the OCaml toolchain (I suspect the bytecode interpreter) needed absolute paths to find libraries, thus I'd need a symlink from the left-hand side to the current build directory, which was tedious. Also, my installed assembler does not respect the build path prefix map, and BUILD_PATH_PREFIX_MAP is not widely supported. See e.g. the Debian <a href="https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/diffoscope-results/ocaml-zarith.html">zarith</a> package with different build paths and its effects on the binary.</p> <p>I'm fine with recording the build path (switch location) in the build environment for now - it turns out to end up only once in MirageOS unikernels, likely by the last linking step, which <a href="http://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html">hopefully soon be solved by llvm 9.0</a>.</p> <p>What was fun was to compare the unikernel when built on Linux with gcc against a built on FreeBSD with clang and lld - spoiler: they emit debug sections with different dwarf versions, it is pretty big. Other fun differences were between OCaml compiler versions: the difference between minor versions (4.08.0 vs 4.08.1) is pretty small (~100kB as human-readable output), while the difference between major version (4.08.1 vs 4.09.0) is rather big (~900kB as human-readable diff).</p> <p>An item on my list for the future is to distribute the opam export, build hashes and build environment artifacts in a authenticated way. I want to integrate this as <a href="https://in-toto.io/">in-toto</a> style into <a href="https://github.com/hannesm/conex">conex</a>, my not-yet-deployed implementation of <a href="https://theupdateframework.github.io/">tuf</a> for opam that needs further development and a test installation, hopefully in 2020.</p> <p>If you want to support our work on MirageOS unikernels, please <a href="https://robur.coop/Donate">donate to robur</a>. I'm interested in feedback, either via <a href="https://twitter.com/h4nnes">twitter</a>, <a href="https://mastodon.social/@hannesm">hannesm@mastodon.social</a> or via eMail.</p> urn:uuid:09922d6b-56c8-595d-8086-5aef9656cbc4Reproducible MirageOS unikernel builds2021-11-19T18:04:52-00:00hannes