<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Tarides RSS Feed]]></title><description><![CDATA[Tarides RSS Feed]]></description><link>https://tarides.com</link><generator>Dream</generator><lastBuildDate>Wed, 08 Apr 2026 00:00:00 GMT</lastBuildDate><item><title><![CDATA[ VSCode Walkthrough: Installing OCaml in 1 Click]]></title><description><![CDATA[<p>We are making it easier to get started with OCaml! Tarides dedicates a lot of resources towards improving the developer experience in OCaml, and that includes thinking about how we can reduce friction (and frustration!) for newcomers. Recently, this effort has produced a walkthrough in VSCode for setting up OCaml, essentially a ‘one-click installation’ using <code>opam</code>.</p>
<p>To test it out, install VSCode and the OCaml Platform extension. Having them both installed will automatically open the walkthrough. Should you need to manually open the walkthrough after installing the OCaml Platform extension, start by clicking on the help menu in the top right corner, then select ‘open walkthrough’, and select the one titled ‘OCaml: Setup opam dev environment (manual)’.</p>
<h2>Mission</h2>
<p>VSCode is by far the most <a href="https://survey.stackoverflow.co/2024/technology#most-popular-technologies-new-collab-tools-learn">popular editor among beginners</a>, mainly thanks to its helpful UI, which gives users plenty of support: welcome/setup panels, interactive menus, selection tools, and so on. By targeting VSCode first we are making installation easier for newcomers, who are more likely to struggle to set up OCaml. It will also benefit anyone who prefers the VSCode editor and provide maintainers with helpful feedback to improve other workflows.</p>
<p>Our mission at the start of the project was to focus on reducing friction by minimising reliance on the terminal and external setup documentation. One of the earliest pain points when installing OCaml is understanding where to find and install everything you need to get started with writing code. Even if you know where to find everything, it can be challenging to get to it all in the right order. OCaml.org has an <a href="https://ocaml.org/docs/installing-ocaml">installation guide</a>, which is a great resource, but it’s a lot of information to take in and can be hard to parse.</p>
<p>We care about growing the OCaml community, and feedback suggests that installation is a common pain point for newcomers. To improve their experience, we need to ensure that the pathways to installation and setting up projects are smooth.</p>
<h2>The Walkthrough</h2>
<p>The new VSCode walkthrough opens a window when users install the OCaml plugin. The page guides the user through five steps, providing clear instructions and a UI that indicates when each step is in progress and when it succeeds. When the user clicks on a step, the walkthrough opens an integrated terminal and completes it for them.</p>
<p>The steps are:</p>
<ul>
<li>Install <code>opam</code>: VSCode opens a terminal and runs the <code>opam</code> install script.</li>
<li>Initialise <code>opam</code>: VSCode opens a terminal and runs <code>opam init</code>. Initialising <code>opam</code> prepares the user’s system to use <code>opam</code> to manage OCaml packages and compilers.</li>
<li>Activate the <code>opam</code> switch: This step prompts the user to select and activate the <code>opam</code> switch. It explains that an <code>opam</code> switch is an isolated OCaml environment where users can install different versions of OCaml and packages.</li>
<li>Install Platform Tools: The next click installs some key OCaml tools: OCaml LSP for editor support, <code>odoc</code> to generate documentation, <code>ocamlformat</code> to format code, and <code>utop</code> as an interactive REPL.</li>
<li>Check Installation: This step verifies the OCaml installation by running <code>utop</code>.</li>
<li>Congratulations: Success! OCaml is installed, and users are asked to fill in an optional feedback form.</li>
</ul>
<p>Here's what the start screen for the walkthrough looks like:
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/VSCode-light-170w~szXkGLPztCDiLQxytSj9vw.webp 170w, /blog/images/VSCode-light-340w~bZ4IRhLnRFStUvLZtCke6Q.webp 340w, /blog/images/VSCode-light-680w~3pSd9PbwvthxRJs69M_Zrg.webp 680w, /blog/images/VSCode-light-1360w~cGaZLYvkX4jLZiMvlFU5Qg.webp 1360w" src="/blog/images/VSCode-light-1360w~cGaZLYvkX4jLZiMvlFU5Qg.webp" alt="The first window for the VSCode walkthrough, which lists the steps described above"></p>
<h2>Until Next Time!</h2>
<p>There are still things that the team is hoping to iterate on in the coming months. Using the input from the feedback form, the team wants to improve the walkthrough as well as simplify the documentation on OCaml.org to reflect the new changes. Once Dune Package Management has its first full release, the team also plans to create a similar walkthrough for that workflow to support its users.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2026-04-08-vscode-walkthrough-installing-ocaml-in-1-click</link><guid isPermaLink="false">https://tarides.com/blog/2026-04-08-vscode-walkthrough-installing-ocaml-in-1-click.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Wed, 08 Apr 2026 00:00:00 GMT</pubDate></item><item><title><![CDATA[Announcing `ciao-lwt`: A Library for Migrating Lwt to Eio]]></title><description><![CDATA[<p>The I/O library <a href="https://github.com/ocaml-multicore/eio">Eio</a>, which uses effects and direct-style concurrency, was released in 2024. Since then, users have seized the opportunity to <a href="/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin/">test it in their own projects</a>, and several OCaml devs have ported applications to Eio.</p>
<p>Now, with a new library called <a href="https://github.com/tarides/ciao-lwt">ciao-lwt</a>, users can automate part of the migration process from Lwt to Eio. One of our engineers, Jules Aguillon, has been developing the library and using it for the <a href="https://ocsigen.org/home/intro.html">Ocsigen</a> project. This post shares his work and will introduce you to <code>ciao-lwt</code>, how to try it, and what limitations you should expect.</p>
<p>The project is made possible thanks to a grant from the <a href="https://nlnet.nl">NLnet Foundation</a>, which funds research and development projects furthering internet technologies and the open internet, and the <a href="https://nlnet.nl/core/">NGI Zero Core fund</a> of the European commission.</p>
<h2>Why Would I Switch to Eio?</h2>
<p>Ultimately, the concurrency library you choose comes down to a matter of taste, but Eio has some nice characteristics that you may find worth the switch. Since it is direct-style, Eio does not require you use a monad for concurrency, which gets rid of the so-called ‘<a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">function colouring problem</a>’. The resulting code is faster, less complex, and has some nice security capabilities.</p>
<p>You can read <a href="/blog/2024-03-20-eio-1-0-release-introducing-a-new-effects-based-i-o-library-for-ocaml/">our blog post on Eio 1.0</a> for more context.</p>
<h2>Ciao-Lwt</h2>
<p>The library contains a collection of tools for translating an Lwt library into Eio code. Lwt marks concurrent code and non-concurrent (sync and async) code using bind operators or bindings, and functions must be explicitly marked as sync or async for the program to run. These are the ‘bind’ and ‘map’ operators, including <code>Lwt.bind</code>, <code>Lwt.map</code>, <code>let*</code>, <code>let+</code>, as well as the infix operators  <code>&gt;&gt;=</code> and <code>&gt;&gt;|</code>.</p>
<p>The first step in turning Lwt into Eio code is to get rid of the bind operators. They weave through every part of Lwt code and are time-intensive to remove one by one. <code>Ciao-lwt</code> can automate the process and remove the bindings for you.  However, the library has some limitations, which will be explained in more detail below.</p>
<p>At the moment, the library contains the following tools:</p>
<ul>
<li><a href="https://github.com/tarides/ciao-lwt?tab=readme-ov-file#remove-usages-of-lwt_ppx"><code>lwt-ppx-to-let-syntax</code></a>: Removes instances of <code>lwt_ppx</code> and replaces them with Lwt library function calls,</li>
<li><a href="https://github.com/tarides/ciao-lwt?tab=readme-ov-file#find-implicit-forks"><code>lwt_lint</code></a>: Finds implicit forks in Lwt code,</li>
<li><a href="https://github.com/tarides/ciao-lwt?tab=readme-ov-file#migrate-from-Lwt_log-to-Logs"><code>lwt-log-to-logs</code></a>: Rewrites files containing <code>Lwt_log</code> and migrates them to use <code>Logs</code>,</li>
<li><a href="https://github.com/tarides/ciao-lwt?tab=readme-ov-file#migrate-from-Lwt-to-Eio"><code>lwt-to-direct-style</code></a>:  Finds and rewrites <code>Lwt</code> and other Lwt modules, turning them into Eio code instead.</li>
</ul>
<p>Lastly, something to bear in mind is that <code>ciao-lwt</code> uses <a href="https://github.com/ocaml/merlin">Merlin</a>'s index to locate every use of <code>Lwt</code>.</p>
<h2>How Do I Try It?</h2>
<p>To get started with <code>ciao-lwt</code>, the first thing to do is <a href="https://github.com/tarides/ciao-lwt">visit the repo</a> and install the tools in the <code>opam</code> switch you’re using to build your projects using the command:</p>
<pre><code>opam install ciao_lwt lwt_lint lwt_ppx_to_let_syntax
</code></pre>
<p>To make reviewing the change easier, make sure your code is formatted. The tool will entirely reformat the file it touches, which may make actual changes harder to see.</p>
<p>The first step is to remove any use of <code>lwt_ppx</code> (for example the <code>let%lwt</code> syntax):</p>
<pre><code>lwt-ppx-to-let-syntax .
dune fmt # Remove formatting changes created by the tool
</code></pre>
<p>This operation is purely syntactical, the tool simply walks the given directory tree and parses every <code>.ml</code> files it finds, updating the files that contain usages of <code>lwt_ppx</code>.</p>
<p>Before running the next tool, try eliminating common causes of implicit forks:</p>
<pre><code>lwt-lint .
</code></pre>
<p>This operation is also purely syntactical. The tool warns about every occurrence of <code>let _ = ..</code> and <code>ignore</code> that doesn’t have a type annotation. This helps you find cases where an Lwt promise is disregarded. To silence each warning, add a type annotation, for example: <code>let _ : my_t = ..</code> and <code>ignore (.. :my_t)</code>.</p>
<p>If you use <code>Lwt_log</code>, you can migrate to <code>Logs</code> easily with:</p>
<pre><code>dune build @ocaml-index # Build the index (required)
ciao-lwt to-logs --migrate .
dune fmt # Remove formatting changes created by the tool
</code></pre>
<p>This tool works similarly to <code>ciao-lwt to-eio</code> described below. It is provided as a separate command because your program will likely work as before but it lets you review this step independently and it simplifies the next step.</p>
<p>Finally, migrate to Eio:</p>
<pre><code>dune build @ocaml-index # Build the index (required)
ciao-lwt to-eio --migrate .
dune fmt # Remove formatting changes created by the tool
</code></pre>
<p>This operation migrates the common uses of Lwt, but the transition is not yet complete.</p>
<h2>Limitations &amp; Considerations</h2>
<p><code>Ciao-lwt</code> is still considered experimental and a work-in-progress, which you should bear in mind when you try it. Your feedback and input is very welcome and will help the team improve the tools.</p>
<p>It sounds obvious, but as a promise-based concurrency library, Lwt creates a lot of promises. Everything that is concurrent in Lwt is a promise; it specifies actions that will happen at a later time. Some promises are so-called ‘implicit forks’, which do not use the bindings we mentioned earlier.</p>
<p>Let's look at an implicit fork:</p>
<pre><code>let _ =
  let a = operation_1 () in
  let* b = operation_2 () in
  let* a = a in
  Lwt.return (a + b)
</code></pre>
<p>Here, <code>let a = operation_1 () in</code> 'forks', meaning it creates a concurrent thread. Since there are no binding operators or <code>Lwt</code> function calls, <code>ciao-lwt</code> can't detect this fork syntactically or with Merlin's index.</p>
<p>As a result, while Lwt would run <code>operation_1</code> and <code>operation_2</code> concurrently, after <code>ciao-lwt</code> converts it to Eio it would instead run sequentially:</p>
<pre><code>let _ =
  let a = operation_1 () in
  let b = operation_2 () in
  let a = a in
  a + b
</code></pre>
<p>Users need to be aware of how <code>ciao-lwt</code> handles 'implicit forks' so they can fix bugs introduced in the migration.</p>
<p>The most helpful tool to verify your new code is the OCaml compiler. Your resulting code will likely not typecheck, and OCaml's typechecker can guide you towards the manual changes you will need to make. It's not foolproof, and you will still need to be on the lookout for concurrency bugs, but <code>ciao-lwt</code>'s tools in combination with the OCaml compiler will give you a nice head start on your journey from Lwt to Eio.</p>
<h2>Until Next Time</h2>
<p>Tarides remains committed to creating new tools that make new and old workflows easier. We hope <code>ciao-lwt</code> proves useful to you, and appreciate any feedback you have to share.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2026-03-05-announcing-ciao-lwt-a-library-for-migrating-lwt-to-eio</link><guid isPermaLink="false">https://tarides.com/blog/2026-03-05-announcing-ciao-lwt-a-library-for-migrating-lwt-to-eio.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate></item><item><title><![CDATA[Announcing New Wasm_of_ocaml Optimisations]]></title><description><![CDATA[<p>2025 was a good year for WASM support in OCaml! In February 2025, we announced the full release of Wasm_of_ocaml (also known as WSOO), a compiler translating OCaml bytecode to WebAssembly. Since then, our teams have been working on different improvements to the OCaml ecosystem, including boosting the performance of WSOO.</p>
<p>WSOO is already known for its speed. Users switching from Js_of_ocaml (JSOO), which translates OCaml bytecode to JavaScript, can expect significant performance improvements. Back in early 2025, Jane Street reported that they saw significant <a href="/blog/2025-02-19-the-first-wasm-of-ocaml-release-is-out/">improvements</a> using WSOO in comparison to JSOO. With the most recent optimisations WSOO is even faster!</p>
<p>You can try Wasm_of_ocaml for your own projects by visiting the manual, which includes installation instructions, <a href="https://ocsigen.org/js_of_ocaml/latest/manual/wasm_overview">on the Ocsigen website</a>.</p>
<h2>How Have We Made Wasm_of_ocaml Faster?</h2>
<p>Almost as soon as the feature-complete release of Wasm_of_ocaml was out, a team at Tarides began working on optimisations for the library. Jérôme Vouillon has led the way, testing each change and measuring the resulting performance improvements. Let’s take a look at some of the PRs that have come out of this effort and what has changed:</p>
<ul>
<li>
<p>Inlining pass: Jérôme rewrote the inlining pass to adjust its behaviour in certain cases. The change avoids inlining code in a loop that WASM engines can’t optimise because V8 currently has no way to switch to a more efficient code while executing a loop. It is now also more assertive in inlining functors and functions, including <code>List.fold_left</code>. The PR is <a href="https://github.com/ocsigen/js_of_ocaml/pull/1935">#1935</a>.</p>
</li>
<li>
<p>The OCaml standard library contains a <code>Bigarray</code> module that offers an API to manipulate an array of numerical values (integers of floating point values of various sizes). When compiling a program to WebAssembly, <code>Bigarray</code> operations are translated behind the scenes as operations on <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Typed_arrays">Javascript “typed arrays”</a>.Up to now, these operations were implemented in terms of direct access to these arrays, but this method went through calls to Javascript functions that incurred some overhead.</p>
<p>Javascript offers an alternative API called <code>DataView</code>, whose operations are recognised and compiled as direct memory accesses by Wasm engines such as Google’s V8. We have modified the backend function supporting <code>Bigarray</code> to use <code>DataView</code> and benchmarked it with a program performing millions of <code>Bigarray</code> accesses, and we are observing an impressive 3.9x speedup.</p>
<p>Check out <a href="https://github.com/ocsigen/js_of_ocaml/pull/1979">PR #1979 to look more closely at the changes</a>.</p>
</li>
<li>
<p>Function call optimisations: There are a whole host of changes targeting function calls:</p>
<ul>
<li>PR <a href="https://github.com/ocsigen/js_of_ocaml/pull/2041">#2041</a> optimises the representation of closures using more precise types.</li>
<li>PR <a href="https://github.com/ocsigen/js_of_ocaml/pull/2044">#2044</a> optimises calls to a statically known function.</li>
<li>PR <a href="https://github.com/ocsigen/js_of_ocaml/pull/2059">#2059</a> omits the code pointer when it is not used (because of the previous optimisation). This reduces the amount of memory allocated by the program, but also allows <code>Binaryen</code> to perform some global optimisations.</li>
</ul>
</li>
<li>
<p>Integer optimisations:  Wasm_of_ocaml uses 31-bit integers to allow for a <a href="https://ocaml.org/docs/memory-representation#distinguishing-integers-and-pointers-at-runtime">uniform representation of integers and references</a>. The integers have to be converted to 32-bit integers to perform numeric operations. PR <a href="https://github.com/ocsigen/js_of_ocaml/pull/2032">#2032</a> optimises this workflow by avoiding unnecessary conversions between 31 and 32-bit integers, which speeds up performance.</p>
</li>
<li>
<p>Number unboxing: Avoids boxing numbers (both <a href="https://github.com/ocsigen/js_of_ocaml/pull/2069">within functions #2069</a> and <a href="https://github.com/ocsigen/js_of_ocaml/pull/2101">outside of functions #2101</a>) when the boxed value is not used. The change significantly improved the microbenchmarks <code>almabench</code> and <code>fft</code>.</p>
</li>
<li>
<p>Number comparisons and <code>bigarray</code> operations: Specialisation of number comparisons and <code>bigarray</code> operations in PR <a href="https://github.com/ocsigen/js_of_ocaml/pull/1954">#1954</a>, based on a type analysis, optimises the performance of functions. Future work to improve on this optimisation centres around using hints from the OCaml compiler.</p>
</li>
</ul>
<p>Hopefully, you now have a good sense of just how much has been tweaked and improved to make WSOO faster! But, you may ask, just <em>how much faster</em> are we talking about?</p>
<h2>Benchmarks</h2>
<p>Several of these changes have contributed to improving WSOO's performance, visualised in the graphs below. The first one is the <a href="https://github.com/linoscope/CAMLBOY">CAMLboy benchmark</a>, showing steady improvement over time until it was one third faster.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/CAMLBoy-bench-170w~g9I0gwHa3pnM8DMEcFaiWw.webp 170w, /blog/images/CAMLBoy-bench-340w~m1tiaF4quAJuoCFyc2Nh-Q.webp 340w, /blog/images/CAMLBoy-bench-680w~9UQYzMaokwWfb66CI0AliA.webp 680w, /blog/images/CAMLBoy-bench-1360w~ZKIIldzmpp7-z09HOdethg.webp 1360w" src="/blog/images/CAMLBoy-bench-1360w~ZKIIldzmpp7-z09HOdethg.webp" alt="A graph in blue showing a delta of performance improvement"></p>
<p>The second graph show the performance improvement for some selected microbenchmarks. Numerical benchmarks, like almabench, fft, nucleic, and raytrace, show a nice improvements. The integer optimisation helps for fib, quicksort, and fft. The raytrace microbenchmarks show the most improvement, thanks to better inlining and number unboxing across function.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/microbench-wsoo-170w~SJTc8JzfRvLzCrG0GO9qig.webp 170w, /blog/images/microbench-wsoo-340w~7SsE0DgRHCGO7nDuKUZzxQ.webp 340w, /blog/images/microbench-wsoo-680w~iMGeWnAvt6pkn5_kRmqRMw.webp 680w, /blog/images/microbench-wsoo-1360w~GMMAz3M8DNJMAhG6MjH0yQ.webp 1360w" src="/blog/images/microbench-wsoo-1360w~GMMAz3M8DNJMAhG6MjH0yQ.webp" alt="A graph of benchmark comparisons showing improvements"></p>
<h2>What’s Next for Wasm_of_ocaml?</h2>
<p>Look out for the next milestone for WSOO: adding WASI 0.1 support! The team have implemented an alternative runtime based on <a href="https://wasi.dev/">WASI</a>, a group of API specifications for software compiled to the <a href="https://www.w3.org/TR/wasm-core-2/">WebAssembly standard (W3C)</a>. WASI support will enable users to execute OCaml programs in new environments, from browsers to clouds to embedded devices. The <a href="https://github.com/ocsigen/js_of_ocaml/pull/1831">PR is still open</a> and feedback is always welcome.</p>
<p>Furthermore, another one of the team’s goals is to test an implementation of effect handlers based on the Stack Switching proposal, you can find the details in <a href="https://github.com/ocsigen/js_of_ocaml/pull/1832">#1882</a>.</p>
<p>Lastly, the team has more work planned to improve the performance of both Js_of_ocaml and Wasm_of_ocaml in 2026.</p>
<h2>Until Next Time</h2>
<p>Remember to visit the manual on Ocsigen’s website to learn more about <a href="https://ocsigen.org/js_of_ocaml/latest/manual/wasm_overview">Wasm_of_ocaml</a> and to get started with the compiler if you haven’t already. The WSOO team is always keen to hear feedback and figure out how they can improve the user experience. Please share your thoughts and experiences with the compiler on OCaml’s <a href="https://discuss.ocaml.org/">discussion forum</a>!</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2026-02-11-announcing-new-wasm-of-ocaml-optimisations</link><guid isPermaLink="false">https://tarides.com/blog/2026-02-11-announcing-new-wasm-of-ocaml-optimisations.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate></item><item><title><![CDATA[ OCaml.org Now Uses `odoc` 3: What’s New?]]></title><description><![CDATA[<p>The team behind the documentation tool <code>odoc</code> have been hard at work on the latest update. The recent update from <code>odoc</code> 2.x to <code>odoc</code> 3, (now 3.1!), brings new features that make navigating documentation easier and give authors more customisation options. Fortunately for users of <a href="http://OCaml.org">OCaml.org</a>, its package documentation pages now use  <code>odoc</code> 3! This post will give you an overview of <code>odoc</code>, the new features, and how they improve the user experience on <a href="http://OCaml.org">OCaml.org</a>.</p>
<h2>What is <code>odoc</code>?</h2>
<p>Briefly, <code>odoc</code> is a documentation generator and rendering tool. It can read <em>doc comments</em> from source files and transform them into formats such as HTML, LaTeX, or manual (‘man’) pages (as of 3.1, <code>odoc</code> also supports an experimental markdown backend). Furthermore, it enables cross-referencing, based on an extended version of <code>ocamldoc</code> markup, allowing you to create links for functions, types, modules, and documentation pages. Documentation generated by <code>odoc</code> can include links to the source code of functions, making navigating between the documentation and the code easier for the reader. The tool also automatically highlights syntax in code snippets to make documentation more legible.</p>
<p>You can learn more about <code>odoc</code> on <a href="https://ocaml.github.io/odoc/">its website</a> and on its <a href="https://ocaml.org/p/odoc/3.1.0">OCaml.org</a> page.</p>
<h2>What’s New With <code>odoc</code> 3?</h2>
<p>3.0 was a big release, bringing multiple new features to the documentation generator. The main new features that came with the update are:</p>
<ol>
<li>Search by type: <a href="https://doc.sherlocode.com/">Sherlodoc</a> now lets users complete several specific searches in <code>odoc</code> documentation. These searches include searching by name, searching for constructors of a type, using _ to omit a subtype and search for consumers of a type, plus several more specific search options. This makes it easier for users to find lost values and navigate the docs. Previously, Sherlodoc was a separate project, but now it is integrated into <code>odoc</code>’s code base.</li>
<li>Global sidebar: <code>odoc</code> can now generate <a href="https://ocaml.github.io/odoc/odoc/odoc_for_authors.html#page-tags">global sidebars</a> and breadcrumbs for pages. Previously, pages could ‘hack together’ their sidebars and breadcrumbs (this is what OCaml.org was doing), and now it can all be managed through one tool. The change offers greater choice for authors to organise and present documentation more clearly with minimal fuss.</li>
<li>Support for Manuals: <code>odoc</code> 3 comes with several features that help users create <code>mld</code> pages. These <code>mld</code> pages are written using <code>odoc</code>’s markup language. For example, the update gives users the ability to use hierarchical manuals and to link to the manual pages of other packages or API docs for libraries that aren’t in their direct dependencies.</li>
<li><code>odoc</code> Driver: This driver helps users manage some of the more involved settings and features when running <code>odoc</code>. For example, rendering source code is much easier than when the feature was first introduced in <code>odoc</code> 2, since <code>odoc_driver</code> now manages it for you.</li>
<li>Generated documentation and source code: You can jump straight from items in docs to the source code using a ‘source link’, which navigates readers to a rendered version of the source code, without creating a custom driver.</li>
<li>Media: Images, videos, and audio files can be added to documentation generated by <code>odoc</code>. This will make tutorials and documentation more intuitive and user-friendly, offering more ways to learn and share knowledge. Check out this <a href="https://ocaml.org/p/vg/0.9.5/doc/tutorial.html">OCaml.org tutorial</a> to see images using <code>odoc</code> in action!</li>
<li>Cross-package links: Now, <code>odoc</code> lets you use cross-package links in documentation, meaning you can reference any other package's modules and pages on a doc page. It allows users to generate interconnected pages of documentation to help readers navigate intuitively and smoothly.</li>
<li>Support for incremental build systems: The update makes managing build dependencies easier by supporting incremental build systems and allowing for better shared build caches. In the future, when <code>odoc</code> 3 rules have been implemented in <a href="https://dune.build">Dune</a>, the integration with Dune’s cache will make this feature especially useful.</li>
</ol>
<h2>OCaml.org and <code>odoc</code> 3</h2>
<p>Once <code>odoc</code> 3 had been released, the OCaml.org team started working on upgrading the package docs section of the site to use the new version. Since the tool underpins the entire documentation pipeline of the OCaml community, it was important to get buy-in from them and give them a chance to review the proposed changes ahead of time.</p>
<p>So, the team <a href="https://discuss.ocaml.org/t/help-test-the-new-odoc-3-powered-package-documentation-pages/16795/18">shared updates on OCaml Discuss</a> with links to the ‘staging’ version of OCaml.org, where users could explore the changes and share their feedback. It allowed them to help discover bugs and discuss how they felt about the implementation, which gave the maintainers crucial information to make the transition as smooth as possible. Previewing changes on the staging website also helped ensure the update did not break any existing features.</p>
<p>When the team were updating the docs pages, they needed to consider the significant changes to the CLI and new driver that <code>odoc</code> 3 had made, as well as the new feature of linking to other packages’ docs. They also adjusted the build and caching method employed by the documentation pipeline to improve support for incremental builds. With these background changes done, the website was ready for its makeover! You can check out <a href="https://github.com/ocaml/ocaml.org/pull/3124">PR #3124</a> to look at the patch.</p>
<p>Although updating OCaml.org to use <code>odoc</code> 3 was good for users, since they could take advantage of the new features, it was also useful for authors who could explore the tooling before they used it themselves. For example, seeing how global sidebars, breadcrumbs, and media are now implemented through <code>odoc</code> and supported on the website.</p>
<h2>Tarides and <code>odoc</code></h2>
<p>Part of Tarides' mission is to encourage new users to adopt OCaml by making it easier to learn and use the language in their projects. Improving <code>odoc</code> is a quality-of-life upgrade for developers that makes creating documentation easier. In turn, this benefits other users by hopefully encouraging more documentation to be generated.</p>
<p>In the same vein, we are also committed to maintaining and improving <a href="http://ocaml.org/">OCaml.org</a> as a resource for the entire community. This includes upgrading it to use the latest versions of tools like <code>odoc</code> and testing and implementing new features.</p>
<p>OCaml is a language with many strengths, including its <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">secure-by-design features</a>. We want to give as many new people and organisations access to the right resources to make using OCaml easier. By making languages like OCaml mainstream, safer and more efficient code becomes more prevalent.</p>
<h2>Try <code>odoc</code> 3 for Your Documentation!</h2>
<p>You can try <code>odoc</code> 3 for yourself and experiment with the new features. The readme and installation instructions are <a href="https://ocaml.org/p/odoc/3.1.0">on OCaml.org</a>, which is an excellent place to get started. The maintainers of <code>odoc</code> welcome any feedback and input, either on <a href="https://discuss.ocaml.org/">Discuss</a> or directly <a href="https://github.com/ocaml/odoc/tree/master">in the GitHub repo</a>.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up to our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2026-01-29-ocaml-org-now-uses-odoc-3-what-s-new</link><guid isPermaLink="false">https://tarides.com/blog/2026-01-29-ocaml-org-now-uses-odoc-3-what-s-new.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Thu, 29 Jan 2026 00:00:00 GMT</pubDate></item><item><title><![CDATA[Creating `ocaml.nvim` to Bring Neovim Support to OCaml's LSP Server]]></title><description><![CDATA[<p>We are happy to announce the release of the new <code>ocaml.nvim</code> plugin! It provides Neovim users with access to advanced <code>ocaml-lsp</code> features without getting involved in complicated editor-side logic. Think of it as the <a href="https://ocaml.org/backstage/2025-10-14-ocaml-nvim-a-neovim-plugin-for-ocaml">Neovim sibling</a> of the Emacs plugin <code>ocaml-eglot</code> – which we have also <a href="/blog/2025-11-27-bringing-emacs-support-to-ocaml-s-lsp-server-with-ocaml-eglot/">covered on the blog</a> – simplifying maintenance while still providing custom OCaml features. This is part of Tarides' ongoing effort to improve the OCaml user experience by introducing new features and improving existing ones.</p>
<p>In this post, we provide a brief overview of the relationship between Merlin, LSP, and Neovim in OCaml, the goals and features of <code>ocaml.nvim</code>, and offer some insight into how the engineers approached this project. We encourage you to install the plugin if you’re a NeoVim user (or just curious!) and give it a try. The <a href="https://github.com/tarides/ocaml.nvim">repository is public</a> and comes with documentation about its features and how to install the plugin. To help the team out, you can share your feedback on the <a href="https://discuss.ocaml.org/">OCaml Discuss forum</a>, in the repo, or by emailing <a href="mailto:charlene@tarides.com">Charlène</a> directly.</p>
<h2>Merlin, LSP, &amp; NeoVim</h2>
<p>The <a href="https://microsoft.github.io/language-server-protocol/">Language Server Protocol</a> (LSP) was created to address a programming landscape of a growing number of incompatible editors. The open protocol standardises the interactions between an editor and a server providing IDE services, creating a standard model supporting any compatible editor.</p>
<p>Created before the LSP, OCaml’s editor-agnostic server, Merlin, has offered advanced in-editor functionalities to the language for a long time. It quickly gained popularity within the community and was supported by several editors, including Emacs and Vim. However, supporting and maintaining each individual editor was resource intensive, as they required a tailored approach whenever they, OCaml, or Merlin updated.</p>
<p>Now, thanks to the fact that most modern editors use LSP to communicate with programming languages, OCaml can use LSP defaults to integrate Merlin with each editor’s LSP plugins. Neovim is one of the most popular editors (alongside Emacs, Vim, and VSCode) in the OCaml community. The new <code>ocaml.nvim</code> extension enables Neovim to support custom commands that are not possible with the standard LSP. The plugin <code>ocaml.nvim</code> works with generic Neovim LSP to provide access to advanced <code>ocaml-lsp</code> features, including all the advanced Merlin commands not supported by generic LSP clients. This approach addresses the so-called ‘editor burnout’ problem, which we have described in greater detail in our <a href="/blog/2025-11-27-bringing-emacs-support-to-ocaml-s-lsp-server-with-ocaml-eglot/">post on <code>ocaml-eglot</code></a>, <code>ocaml.nvim</code>’s ‘sibling’.</p>
<h2>OCaml.nvim: Creating a New Plugin</h2>
<p>The goal of the project was to create a modern and idiomatic Neovim plugin that would implement the server’s custom requests in a way that respected the design of the editor. We wanted to create a good foundation for community and industry users to build on, with a minimalist and well-documented plugin.</p>
<p>The <code>ocaml.nvim</code> plugin, of course, also supports key custom OCaml features, including advanced type-enclosing functionality, syntax-aware navigation, search by type, construct, and switching between the corresponding <code>.ml</code> and <code>.mli</code>. Check out <a href="https://github.com/tarides/ocaml.nvim">the <code>ocaml.nvim</code></a> repository to read more about the supported features.</p>
<h3>Code Actions and Custom Requests</h3>
<p>Since this plugin aimed to enhance the standard LSP features, we needed to rely on an additional protocol to link specific information to a particular command.
LSP offers two types of special requests: <code>code actions</code> and <code>custom requests</code>.</p>
<p>The LSP already lists <code>code actions</code>, but when there are numerous options, this special request can be confusing to use.
Moreover, <code>code actions</code> prevent the use of parameters. They do not, for example, let the user select between multiple choices.
This is where <code>custom requests</code> come into play.</p>
<p><code>Custom requests</code> allow the language server to converse with the user as much as needed, for example, to collect choices. As a consequence, they require an extra plugin to translate the conversation from the server to the user, as the LSP cannot generalise the infinity of commands we could imagine in all languages.</p>
<p>Every function is available on the Merlin side and accessible via <a href="https://github.com/ocaml/ocaml-lsp"><code>ocaml-lsp</code></a>.
There are two types of calls.</p>
<ul>
<li><code>Ocaml-lsp</code> API: This is the language server for OCaml. It provides the default features to the LSP. It also offers more than that. We can connect to the API to request information based on the cursor location or the server's knowledge of the current file.</li>
<li><code>MerlinCallCompatible</code>: This is also provided by <code>ocaml-lsp</code>, and allows us to invoke the Merlin commands. It requires additional arguments and returns the result in a character string format; here, we used <a href="https://www.json.org/json-en.html">JSON</a>.</li>
</ul>
<h3>UI Elements</h3>
<p>Our objective was to provide a clear interface. To achieve this, we implemented four main UI elements for the entire plugin.</p>
<ul>
<li><strong>Selector</strong>: opens a floating window where the focus is initialised at the first line. Then, you can use the arrows and enter to move and select the wanted line.</li>
</ul>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/selector-170w~589GpSmVs6YS-VbF_Nhp0Q.webp 170w, /blog/images/selector-340w~EzbLKgR7p83XYF3qDlO29Q.webp 340w, /blog/images/selector-680w~YePRvKGvQbaDCkm_ED8PWg.webp 680w, /blog/images/selector-1360w~tJ_bf9zaVtOkbV95LyD-Ww.webp 1360w" src="/blog/images/selector-1360w~tJ_bf9zaVtOkbV95LyD-Ww.webp" alt="The floating window with the first line ‘sys.command string -> int’ highlighted"></p>
<ul>
<li><strong>New Window</strong>: This is built-in with <code>nvim</code>, and splits the current window to open a new buffer.</li>
</ul>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/new_window-170w~h6LsFJSI9CF7yHpdz6mqUQ.webp 170w, /blog/images/new_window-340w~cyswlvU5RrErWY2Fg6DF3A.webp 340w, /blog/images/new_window-680w~H7BnMN2qTb51Y1U_V0jWdQ.webp 680w, /blog/images/new_window-1360w~0lEM8ZGtonWzGSH1d4hCTg.webp 1360w" src="/blog/images/new_window-1360w~0lEM8ZGtonWzGSH1d4hCTg.webp" alt="The floating window with a split and a new buffer"></p>
<ul>
<li><strong>One-line Display</strong>: This is the simplest UI, simply a <code>print</code>. If it takes less than one line, it is displayed where you enter the commands.</li>
</ul>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/one_line_display-170w~k8rvKa-yENBmZ4vB7AiuHQ.webp 170w, /blog/images/one_line_display-340w~5sQVUMZrDhiFTVIdwiJqWw.webp 340w, /blog/images/one_line_display-680w~ZiqO-tgu2v3vfRJVBzkMfw.webp 680w, /blog/images/one_line_display-1360w~tReqPqPAiLYzRuopW_Rqhg.webp 1360w" src="/blog/images/one_line_display-1360w~tReqPqPAiLYzRuopW_Rqhg.webp" alt="The floating window displaying the string in a single line"></p>
<ul>
<li><strong>Multi-line Display</strong>: By default, if the string in <code>print</code> exceeds one line, it displays it in multiple lines. However, this behaviour is not ideal because it requires pressing Enter to erase the text in an unappealing manner. So, the plugin provides an alternative that is similar to the UI selector.</li>
</ul>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/multi_line_display-170w~JqOJD_ZncXAc8ncLnVJFDw.webp 170w, /blog/images/multi_line_display-340w~nVOGTpIjwzDKkwjU-mHZIg.webp 340w, /blog/images/multi_line_display-680w~8EKMFXwcThDbOfN6vEsZkg.webp 680w, /blog/images/multi_line_display-1360w~Z0lM3F-VKdOC4JPnSoYwvQ.webp 1360w" src="/blog/images/multi_line_display-1360w~Z0lM3F-VKdOC4JPnSoYwvQ.webp" alt="The floating window with a box containing the information from the string"></p>
<p>Regarding customisation, since everyone has their own keymaps, we provided default ones and allow users to modify them to suit their needs and preferences.</p>
<h2>Try it Out!</h2>
<p>Visit the <a href="https://github.com/tarides/ocaml.nvim"><code>ocaml.nvim</code> repository on GitHub</a> to get started with the new plugin. The repo includes the installation instructions, a straightforward process using <code>lazy.nvim</code>, as well as a features list and some examples.</p>
<p>Since the team is working towards a 1.0 release, any and all feedback is welcome. Remember to share your thoughts, suggestions, and questions on <a href="https://discuss.ocaml.org">Discuss</a> or in the repo through issues and commits.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-12-10-creating-ocaml-nvim-to-bring-neovim-support-to-ocaml-s-lsp-server</link><guid isPermaLink="false">https://tarides.com/blog/2025-12-10-creating-ocaml-nvim-to-bring-neovim-support-to-ocaml-s-lsp-server.html</guid><dc:creator><![CDATA[ Isabella Salenius, Charlène Gros ]]></dc:creator><pubDate>Wed, 10 Dec 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[ICFP 2025: Looking Back at the Biggest Functional Programming Conference of the Year]]></title><description><![CDATA[<p>The biggest functional programming conference on the calendar took place in Singapore from October 12 to 18 this year. With a jam-packed schedule featuring talks, workshops, and events, ICFP 2025 brought together passionate functional programming developers from around the world. As an active contributor to the OCaml ecosystem, Tarides is committed to supporting the growth of this vibrant community and exploring new applications for OCaml. We look forward to discusing, presenting, and discovering the latest developments in the language at the conferece every year.</p>
<p>If you didn’t get the chance to go this year, or if you just want to relive it, this post has you covered! We’re going to take a look at everything OCaml at the conference – from OxCaml to improved editors and establishing a community code of conduct  – and hear first-hand from some of our engineers who attended.</p>
<p>Next year’s ICFP is in Indianapolis, and you should keep an eye <a href="https://icfp26.sigplan.org/">on the website</a> to register and submit your papers.</p>
<h2>Experience Reports</h2>
<p>Several of my colleagues attended ICFP, and just like <a href="/blog/2024-10-23-looking-back-on-our-experience-at-icfp/">last year</a>, I asked them a bunch of questions about their experience when they returned! Firstly, I asked them all why they enjoy ICFP and why they choose to attend it year after year. David Allsopp highlighted the social networking opportunities: “For those of us lucky enough to be able to attend them, conferences grease the social gears of collaboration”. As an example, David mentioned how a talk on range analysis in Standard ML inspired him and Ryan Gibb (who incidentally work next door to each other in Cambridge) to start hacking on package-management solving right then and there! Read more about <a href="https://www.dra27.uk/blog/platform/2025/10/18/icfp-2025.html">David’s time at ICFP in his blog post</a>.</p>
<p>Sudha Parimala explained that “ICFP has many different tracks that are highly relevant to Functional Programming in general and also to OCaml. It's a good place to meet folks working on similar things and discuss ideas.” In addition to the various tracks, ICFP also hosts events that provide opportunities to meet new people. “Even though I couldn't make it this time, I'd highly recommend the ‘Women in PL’ dinner hosted at ICFP. It gives you a good platform to connect with other women programmers and researchers.” If you’re attending ICFP next year, keep an eye out for these organised events.</p>
<p>Speaking of next year, the submission deadline for papers is on February 19, so, in the words of Sudha: “If you're working on cool OCaml stuff, consider submitting it to the OCaml workshop next year!” and we look forward to seeing you there!</p>
<h2>The OCaml Workshop and Talks</h2>
<p>Let’s dive into everything OCaml (and some exciting honourable mentions!) that happened at ICFP this year. Where possible, I’ve linked the recordings of the talks so you can recreate the conference experience from home!</p>
<p>The OCaml Workshop was great, as usual, and took place on October 17. The following talks were presented by developers of many different affiliations, including IIT Madras, NIT Trichy, Tarides, the University of Illinois Urbana-Champaign, Jane Street, Bloomberg, and the University of Cambridge.</p>
<h3>OCaml Workshop Talks</h3>
<ul>
<li>
<p><a href="https://www.youtube.com/watch?v=ekbLoe24iXg">A Mechanically Verified Garbage Collector for OCaml</a> by Sheera Shamsu, Dipesh Kafle, Dhruv Maroo, Kartik Nagar, Karthikeyan Bhargavan, and KC Sivaramakrishnan. OCaml’s garbage collector (GC) is crucial to the functioning of its runtime system and overall correctness and safety. This talk shared details about a project to create a correct, proof-oriented GC that can evolve with the language over time.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=cgpnBdXsW2c">OCaml Package Management with (only!) Dune</a> by Stephen Sherratt, Marek Kubica, and Rudi Grinberg. This talk showcased how developers can now use Dune to download and install packages without relying on any other tool (like <code>opam</code>).</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=CIS_ljgqSQw">How the OCaml Community Established Its Code of Conduct</a> by Sudha Parimala. As Sudha said, “OCaml has had a thriving open source community for almost three decades now, but a code of conduct was only introduced in 2022.” Her talk described the process behind establishing that code of conduct, including creating the team, gathering feedback, and lessons learned.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=m_fBmuoGwtM">Embedding WebAssembly in OCaml for Safe Program Construction</a> by Hunter DeMeyer. This talk proposed WasML: a library enforcing syntactic and semantic constraints in <code>wasm</code> programs via OCaml types.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=Kqs5DXwcyVI">smaws: An AWS SDK for OCaml</a> by Chris Armstrong. Introduced a new Amazon Web Services-based software development kit for OCaml, <code>smaws</code>, exploring the challenges and design choices behind the new library.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=PekeGxGlc3Q">Toward a More Secure OCaml Ecosystem</a> by Maksim Grankin. All about the new OCaml Security Team, launched by the OCaml Software Foundation (OCSF), the motivations behind creating the team, its goals, and the security challenges facing language ecosystems today.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=Ub8k1BcSRLQ">Three Steps for OCaml to Crest the AI Humps</a> by Sadiq Jaffer, Jonathan Ludlam, Ryan Gibb, Thomas Gazagnaire, and Anil Madhavapeddy. The talk looked at how well-represented OCaml is amongst open-weight models and what could make it stand out, outlining potential changes the ecosystem can make to work better with AI coding assistants.</p>
<p>In his <a href="https://toao.com/blog/ai-existential-ocaml">blog post about the talk</a>, Sadiq described how “The gap between coding agent performance on mainstream versus niche languages could prove fatal to smaller language communities. New developers increasingly judge a programming language not just on its traditional tooling (compilers, debuggers, libraries) but also on how well AI coding agents support it. If agents struggle with OCaml, fewer new developers will choose to learn it, creating a vicious cycle.”</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=q4oEKMTeXk4">A New Era of OCaml Editing: Powered by Merlin, Delivered via LSP</a> by Xavier Van de Woestyne, Sonja Heinze, Ulysse Gérard, and Muluh Godson. Explored how the OCaml ecosystem is managing the maintenance burden resulting from an increasing number of features available for editors by adopting the generic open standard Language Server Protocol (LSP). In his own words, Xavier said that “I presented some work carried out in the name of ‘ecumenism’ for OCaml editor support! How to make Merlin and OCaml-LSP-server interact to reduce the need for (heavy) maintenance between the wide variety of editors. We had to walk the line between minimising maintenance without losing functionality. The entire work is described in the paper: <a href="https://conf.researchr.org/details/icfp-splash-2025/ocaml-2025-papers/7/A-New-Era-of-OCaml-Editing-Powered-by-Merlin-Delivered-via-LSP">A New Era of OCaml Editing Powered by Merlin, Delivered via LSP</a>”.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=It0i9xFqCtE">Taming the Flat Float Array Optimisation: Tracking Separability in the Type System</a> by Diana Kalinichenko and Richard A. Eisenberg. This talk presented an approach to flat float array optimisation that is used in <a href="https://oxcaml.org/">OxCaml</a>, which enabled previously rejected non-separable types to be used while maintaining compatibility with existing code and unlocking new optimisations for arrays of known non-float types.</p>
</li>
</ul>
<p>Of course, there are more talks at the conference outside of the workshop, and this year, attendees were spoiled for choice with plenty of OCaml topics to explore!</p>
<h3>Miscellaneous OCaml Talks</h3>
<ul>
<li>
<p><a href="https://www.youtube.com/watch?v=qTSpEKZohKY">A Guided Tour Through Oxidised OCaml</a> by Gavin Gray, Anil Madhavapeddy, KC Sivaramakrishnan, Will Crichton, Shriram Krishnamurthi, Chris Casinghino, and Richard A. Eisenberg. This collaborative tutorial was a combined effort by members from Brown University, the University of Cambridge, Jane Street, IIT Madras, and Tarides. It took participants on a tour of the most significant extensions of OxCaml, Jane Street’s production compiler for performance-oriented programming, including fearless concurrency, data layout, and location control. The creators of the workshop encourage you to work through the <a href="https://gavinleroy.com/oxcaml-tutorial-icfp25/">slides online</a>, try the <a href="https://github.com/oxcaml/tutorial-icfp25">activities</a>, and give them feedback by filling in the <a href="https://gavinleroy.com/oxcaml-icfp-activity/">quiz</a>.</p>
<p>In his <a href="https://anil.recoil.org/notes/icfp25-oxcaml">blog post on the tutorial</a>, Anil shared that “the tutorial itself [...] went fantastically! Both sessions were completely full, with participants online as well”.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=1_Y6hmj-VpY&amp;t=1s">Functional Networking for Millions of Docker Desktops (Experience Report)</a> by Anil Madhavapeddy, David J. Scott, Patrick Ferris, Ryan Gibb, and Thomas Gazagnaire from the University of Cambridge, Docker, and Tarides at the ‘Applications and SRC Talks’. This talk reflected on a decade of functional OCaml code in production as part of Docker’s container architecture across millions of desktops.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=AxiqBFf4zqg">Implicit Modules, a Middle Step Towards Modular Implicits</a> by Samuel Vivien and Didier Rémy from Inria and PSL in the ML Family Workshop. Described a new proposal to extend OCaml with implicit module arguments, a long-term project with the first step, modular explicits, about to be integrated with mainstream OCaml.</p>
<p>This was one of Xavier’s most memorable talks of the conference, and he described it as “A careful presentation that outlined the status of the project, what has been done, what remains to be done, and which is very promising for the future of OCaml!”</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=abDWZ9D8kEE">The Wild West of Post-POSIX IO Interfaces</a> by Anil Madhavapeddy from the University of Cambridge as a keynote speech in the Language Semantics &amp; Type Systems track. The talk tracked the evolution of asynchronous and shared-memory I/O, from the POSIX programming model to OCaml 5’s Eio library.</p>
<p><a href="https://anil.recoil.org/notes/icfp25-post-posix">Anil distilled the most important part of his talk</a> as follows:  “So I made one key argument to the audience: it's time to accept that standards such as POSIX are now holding back the development of good language runtimes, and we need to embrace the diversity of highly concurrent, shared-memory interfaces.”</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=5pQcX3XMhhE">Generating Hazel Programs From Ill-Typed OCaml Programs</a> by Patrick Ferris and Anil Madhavapeddy from the University of Cambridge at the Type-Driven Development workshop. Highlighted the work on Hazel, a young programming language, and the development of a compiler to Hazel that can generate ill-typed code to help its development and testing.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=tg-5liTtCe4">OCaml Blockly</a>  by Kenichi Asai of Ochanomizu University from the presentations for papers in the Journal of Functional Programming. This talk introduced OCaml Blockly, which is a block-based programming environment based on Google Blockly, but for OCaml. It is useful because it knows the scoping and type rules of OCaml: meaning that for any complete program in OCaml Blockly, its OCaml counterpart will compile successfully.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=5y-SGiA8ycs">Formal Semantics and Program Logics for a Fragment of OCaml</a> by Remy Seassau, Irene Yoon, Jean-Marie Madiot, and François Pottier from Inria in the ICFP Papers track. Their talk shared work towards a formal definition of OCaml and a foundational program verification environment for the language by presenting a formal definition of a nontrivial sequential fragment of OCaml: OLang.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=b0SIWDdmRVU">A Tale of Two Lambdas: A Haskeller’s Journey Into OCaml</a> by Richard A. Eisenberg from Jane Street at the Haskell track. Richard shared his experience with both the Haskell and OCaml ecosystems and communities, offering insights on how they are similar, how they are different and, most importantly, what they can learn from each other.</p>
<p>This talk was a highlight for both Anil and David! Anil wrote in his <a href="https://anil.recoil.org/notes/icfp25-what-i-learnt">blog post series</a> that “This year, Richard Eisenberg was my absolute highlight with a superb session on what he's learnt from being the rare breed of someone steeped deeply both in Haskell and OCaml. The room was so packed for this talk that they had to create an overflow room streaming it in the corridors!” and David enjoyed the entire Haskell track, saying “Haskell did very nicely this year, with Richard Eisenberg’s likewise excellent keynote on the Friday”.</p>
</li>
</ul>
<p>In addition to all the amazing OCaml talks, there were, of course, plenty of presentations on other topics and programming languages!</p>
<h3>Honourable Mentions: Non-OCaml Talks</h3>
<ul>
<li>
<p><a href="https://www.youtube.com/live/IIRJeleXeuU?si=F1fY7arkGJpm44mB">Programming for the Planet</a>. This was the second ever PROPL workshop after the first one was held in London last year. It hosted several talks centred around using functional programming methods to address challenges to biodiversity, climate, and the vast amounts of planetary data that the sector yields. <a href="https://anil.recoil.org/notes/icfp25-propl">Anil shared that</a> “By far my favourite aspect of PROPL was the sheer ambition on display when it comes to leveraging the network effects around computer technology to accelerate the pace of environmental action”.</p>
</li>
<li>
<p><a href="https://conf.researchr.org/details/icfp-splash-2025/olivierfest-2025-papers/7/Continuations-in-Music">Continuations in Music</a>  by Youyou Cong at ‘Continuations at Work’ explored the usefulness of continuations in computer programming languages and music. David thought it was “thought-provoking, and potentially (even) more to go further back in musical history to late mediaeval and early Tudor, where the pattern work of composition and imitative properties are even more abundant (continuations with combinators for inversion, canon, etc.)”.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=elhNJyLgjDU">Polynomial-Time Program Equivalence for Machine Knitting</a> by Nathan Hurtig, Jenny Han Lin, Thomas S. Price, Adriana Schulz, James McCann, and Gilbert Bernstein from the University of Washington, University of Utah, and Carnegie Mellon University. Presented their algorithm at the ‘Applications and SRC Talks’ to canonicalise algebraic representations of the topological semantics of machine-knitting programs. David commented that it stood out to him “not because I can follow the category theory, but because it was a beautiful exposition”.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=9S_VY_PVnZk">Join Points in Practice</a> was a keynote by Simon Peyton Jones all about join points as a powerful optimisation tool. Both Xavier and David especially emphasised how enjoyable Simon’s presentation style is: “One should always attend a Simon P-J talk, and his packed Haskell keynote was no exception”, David said, and Xavier commented: “His presentation, his gift of the gab, and his enthusiasm make him the most impressive and enjoyable speaker I have ever seen. So seeing him live was a must for me, and I wasn't disappointed by the delivery”.</p>
</li>
<li>
<p><a href="https://www.youtube.com/watch?v=dtMOicQfUtI">Freer Arrows and Why You Need Them in Haskell</a> by Grant VanDomelen, Gan Shen, Lindsey Kuper, and Yao Li at the Haskell track. This talk discussed freer arrows, an expressive structure that is amenable to static analysis. Xavier said: “I use arrows extensively for my website, and I thought it was great to see their defunctionalised (freer) representation with very clear examples!”</p>
</li>
<li>
<p><a href="https://www.youtube.com/live/F_7S90vFEsE?si=4q2joS4bcglm_rWd&amp;t=1">Functional Art, Music, Modelling and Design (FARM)</a> This workshop welcomed submissions across art, crafting, and design, and hosted talks on music, vector graphics drawing, and even coats of arms! My colleague Stephen Sherratt attended the workshop and presented the talk <a href="https://www.youtube.com/watch?v=TNYRrrc2J0Y">Software-defined Declarative Synthesiser Live-Coding in a Jupyter Notebook</a>, where he generated music in real-time by writing Rust code into a Jupyter notebook. He also particularly enjoyed the talks <a href="https://youtu.be/XW9Gc5UGCvM?si=GEi8ZZ0M3bKT2ZYG&amp;t=14254">Weft – Enabling Tidal on the Web</a> by Matthew Kaney and William Payne, and <a href="https://www.youtube.com/watch?v=PTkj7LM5QXI">Generalising Turtle Geometry: An Extensible Language for Vector Graphics Drawing</a> by Alice Rixte.</p>
</li>
</ul>
<h2>ICFP 2026: See You Next Year!</h2>
<p>As mentioned above, next year’s ICFP will take place in Indianapolis, Indiana, from August 24 to 29. It will be another great opportunity to meet developers, discuss everything functional programming and OCaml, and stay up to date with the latest changes and releases. Keep an eye <a href="https://icfp26.sigplan.org/">on the website</a> and <a href="https://bsky.app/profile/icfp-conference.bsky.social">ICFP’s socials</a> to register, and, of course, submit your talks before the February deadline. We hope to see you there!</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-12-04-icfp-2025-looking-back-at-the-biggest-functional-programming-conference-of-the-year</link><guid isPermaLink="false">https://tarides.com/blog/2025-12-04-icfp-2025-looking-back-at-the-biggest-functional-programming-conference-of-the-year.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Thu, 04 Dec 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Bringing Emacs Support to OCaml's LSP Server with `ocaml-eglot`]]></title><description><![CDATA[<p>The team of people working on editors and editor support at Tarides is excited to announce the release of <code>ocaml-eglot</code>! The project is part of Tarides’ efforts to improve the OCaml developer experience across different platforms and workflows, a high-priority goal continuously evolving with community feedback.</p>
<p>Bringing Emacs integration to OCaml’s LSP server benefits both the user and the maintainer. If you use Emacs and want to start using OCaml, or switch to a more simplified setup, check out the <a href="https://github.com/tarides/ocaml-eglot"><code>ocaml-eglot</code> repository</a> on GitHub to try the new Emacs minor mode.</p>
<p>This post will give you some background to the development of the new tool, as well as the benefits and limitations of LSP, and the features of <code>ocaml-eglot</code>. Let’s dive in!</p>
<h2>The Problem: ‘Editor Burnout’</h2>
<p>The goal of the <code>ocaml-eglot</code> project was to address a problem the engineers had dubbed <em><code>editor burnout</code></em>. Developers rely on editors to simplify their coding workflow, and over the years, the creation of more and more editor features has transformed editors into sophisticated, feature-rich development environments. However, all these features need to be added and maintained in every editor. Maintaining support for so many different features across different editors, including updating the support every time something changes on the language server's end, can quickly become untenable. ‘Editor burnout’ refers to the pressure this puts on maintainers.</p>
<p>In OCaml, the editor-agnostic server <a href="https://ocaml.github.io/merlin/">Merlin</a> is used to provide IDE-like services. By providing contextual information about the code, Merlin lets developers use simple text editors to write OCaml and benefit from features that typically come from a fully integrated IDE. However, Merlin also had a high maintenance cost due to each code editor needing its own integration layer.</p>
<p>So, now that we understand the problem, what is the solution?</p>
<h2>LSP and OCaml</h2>
<p>LSP, or the <em>Language Server Protocol</em>, is a widely documented open protocol that standardises the interactions between an editor and a server providing IDE services. LSP defines a collection of standard features across programming languages, which has contributed to its widespread adoption. This adoption has made LSP a standard protocol across editors, including <a href="https://code.visualstudio.com/">Visual Studio Code</a>, <a href="https://www.vim.org">Vim</a>, <a href="https://www.gnu.org/software/emacs/">Emacs</a>, and many more.</p>
<p>The language server implementation for LSP in OCaml is <code>ocaml-lsp</code>. It uses Merlin as a library. It was originally designed to integrate with <a href="https://code.visualstudio.com">Visual Studio Code</a> when paired with the <code>vscode-ocaml-platform</code> plugin. We can significantly reduce the maintenance burden by relying on LSP's defaults for editor compatibility and only providing support for OCaml-specific features. This benefits not only the maintainers, but also the user by ensuring the plugins remain performant, compatible, maintainable, and up-to-date.</p>
<p>LSP aims to be compatible with as many languages as possible, making some assumptions about how those languages are structured and function. Inevitably, these assumptions cannot cover all the features of every language. This is true of OCaml, where the editing experience relies on custom features outside the scope of the LSP.</p>
<p>The solution to this incompatibility is to create a <em>client-side extension</em> that covers what the editor’s standard LSP support does not. That way, we have both the basic LSP compatibility and an extension that adds support for OCaml-specific features. As we’ve hinted above, this has the added benefit of keeping the maintenance workload on the editor side down by delegating the standard LSP handling to the generic LSP plug-ins.</p>
<p>Some of these OCaml-specific editor features include <a href="https://github.com/tarides/ocaml-eglot?tab=readme-ov-file#type-enclosings">type-enclosing</a>, <a href="https://github.com/tarides/ocaml-eglot?tab=readme-ov-file#construct-expression">construct and navigation between holes</a>, <a href="https://github.com/tarides/ocaml-eglot?tab=readme-ov-file#destruct-or-case-analysis">destruct</a>, and <a href="https://github.com/tarides/ocaml-eglot?tab=readme-ov-file#search-for-values">search by types</a>.</p>
<h2>OCaml-Eglot</h2>
<p>As an editor popular with the OCaml community, let’s take a brief look at how Emacs and OCaml work together. In Emacs, developers can attach a "buffer"/file to a major mode to handle a feature of a language like OCaml: features like syntax highlighting, for example. One file is always attached to just one major mode.</p>
<p>OCaml has four major modes:</p>
<ul>
<li><code>caml-mode</code>: the original,</li>
<li><code>tuareg</code>: a full reimplementation of <code>caml-mode</code> and the most common choice by users,</li>
<li><code>ocaml-ts-mode</code>: an experimental version of <code>caml-mode</code> based on tree-sitter grammar,</li>
<li><code>neocaml</code>: an experimental full reimplementation of <code>tuareg</code> based on tree-sitter grammar.</li>
</ul>
<p>Now, we can also attach one or multiple <code>minor-mode</code>s to a file, and this is where <code>ocaml-eglot</code> comes into play. For example, we can use a major mode (we generally recommend Tuareg) and link <code>ocaml-eglot</code> to it as a minor mode, thereby attaching LSP features to all files in which Tuareg is active.</p>
<p>Eglot is the default LSP client bundled with Emacs, and <code>ocaml-eglot</code> provides full OCaml language support in Emacs as an alternative to Merlin integration. (By the way, thanks to the <code>ocaml-eglot</code> client using LSP’s defaults, its code size is a lot smaller than the traditional OCaml <code>Emacs</code> mode, which also makes it easier to maintain!).</p>
<p>The ideal user of <code>ocaml-eglot</code> is someone who is already an active Emacs user and wants to start using OCaml with minimal start-up hassle. The simplified configuration, automated setup, and consistency across different editors and languages are helpful both to people new to OCaml and to seasoned users with multiple editors, since they improve the workflow. The plugin supports all the features of the integration of Merlin into Emacs, <code>merlin.el</code>, meaning that users don’t lose any functionality with the new system. The <code>ocaml-eglot</code> project is also actively maintained, and users can expect regular future updates and a tool that evolves with the times.</p>
<h3>Creating OCaml-Eglot</h3>
<p>Let's peek behind the curtain at the development of <code>ocaml-eglot</code>. There are two common approaches that developers who implement server languages tend to use to add features outside of the LSP. These are <a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_codeAction">Code Actions</a> and Custom Requests:</p>
<ul>
<li>Code Action: A contextual action that can be triggered from a document perspective, can perform a file modification, and potentially broadcast a command that can be interpreted by the client. Code Actions are more
‘integrated’, which means that they sometimes even work ‘out of the box’ with the client. However, they are limited in terms of interactions and the command lifecycle.</li>
<li>Custom Request: Not formally part of the protocol, but since LSP is a protocol layered on top of a regular server that can handle JSON RPC messages and responses, developers can still use arbitrary requests to provide extra features. Custom Requests give developers more power to add interactions and experiences, but always need specific editor integration.</li>
</ul>
<p>The design process behind OCaml-eglot essentially boiled down to identifying all the features offered by <code>merlin.el</code> that were not covered by the LSP, and then adding them using Code Actions or Custom Requests. During this process, the developers asked themselves two questions to help them decide which approach to use:</p>
<ul>
<li>Should the feature be configured by arguments that are <strong>independent of the context:</strong> If the answer is yes, they used a Custom Request; if no, they used a Code Action.</li>
<li>Does the feature require <strong>additional interaction</strong> such as choosing one option from a set of possible results?: If yes, they used a Custom Request; if no, they used a Code Action.</li>
</ul>
<p>Of course, things were a little more complicated than this in reality, but it still gives you a good idea of the types of technical decisions the team made during development.</p>
<h2>Try it Out!</h2>
<p>Install <code>ocaml-eglot</code> by checking out its <a href="https://github.com/tarides/ocaml-eglot">GitHub repository</a> and following the instructions. When you have had a chance to test it out in your projects, please share your experience on <a href="https://discuss.ocaml.org">OCaml Discuss</a> to give other users an idea of what to expect and the maintainers an idea on what to improve!</p>
<p>Installing <code>ocaml-eglot</code> is just like installing a regular Emacs package. It is available on <a href="https://melpa.org/#/ocaml-eglot">Melpa</a> and can be installed in many different ways, for example with GNU’s <code>use package</code>. More detailed instructions are available <a href="https://github.com/tarides/ocaml-eglot?tab=readme-ov-file#ocaml-eglot">in the repo’s readme</a>, including instructions on recommended configurations for <code>ocaml-eglot</code>.</p>
<h3>Features</h3>
<p>Some of the features that <code>ocaml-eglot</code> comes with are:</p>
<ul>
<li>Error navigation: Quickly jump to the next or previous error(s).</li>
<li>Type information: Display types under cursor with adjustable verbosity and navigate enclosing expressions.</li>
<li>Code generation: Pattern match construction, case completion, and wildcard refinement via the ‘destruct’ feature.</li>
<li>Navigation: Jump between language constructs like let, module, function, match, and navigate phrases and pattern cases.</li>
<li>Search: Find definitions, declarations, and references. The team also recently introduced a new Xref Backend inspired by one used by Jane Street for years.</li>
</ul>
<p>Check out the project's readme to discover the full list of commands offered by <code>ocaml-eglot</code>. The new mode is ‘agile’, meaning that the team can also incubate new features quickly, like the <a href="https://github.com/tarides/ocaml-eglot/pull/65">refactor extract at toplevel</a>.</p>
<h2>Until Next Time</h2>
<p>For some really helpful background that goes into more detail than we did in this post, I recommend that you read the paper <a href="https://conf.researchr.org/details/icfp-splash-2025/ocaml-2025-papers/7/A-New-Era-of-OCaml-Editing-Powered-by-Merlin-Delivered-via-LSP">“A New Era of OCaml Editing Powered by Merlin, Delivered by LSP”</a> by Xavier van de Woestyne, Sonja Heinze, Ulysse Gérard, and Muluh Godson.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up to our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-11-27-bringing-emacs-support-to-ocaml-s-lsp-server-with-ocaml-eglot</link><guid isPermaLink="false">https://tarides.com/blog/2025-11-27-bringing-emacs-support-to-ocaml-s-lsp-server-with-ocaml-eglot.html</guid><dc:creator><![CDATA[ Isabella Salenius, Xavier Van de Woestyne ]]></dc:creator><pubDate>Thu, 27 Nov 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Announcing Unikraft Support for MirageOS Unikernels]]></title><description><![CDATA[<p>We are happy to announce the release of Unikraft backend support for MirageOS unikernels! Our team, consisting of Fabrice Buoro, Virgile Robles, Nicolas Osborne, and me (Samuel Hym), have been working on this feature for the past year. We’re excited to bring it to the community and hope to see many people try it out.</p>
<p>This post will give you an overview of the release, including some background on Unikraft and performance graphs. You will already be familiar with most of this if you read our <a href="https://discuss.ocaml.org/t/mirageos-on-unikraft/16975">post on Discuss</a>.</p>
<h2>What is Unikraft and Why Did We Choose It?</h2>
<p><a href="https://unikraft.org/">Unikraft</a> is a unikernel development kit: a large <a href="https://unikraft.org/docs/internals/architecture">collection of components</a> that can be combined in any configuration that the user wants in the unikernel tradition of modularity. Unikraft's scope is larger than that of <a href="https://github.com/Solo5/solo5/">Solo5</a> (a backend that <a href="/blog/2025-02-06-mirageos-on-ocaml-5/">MirageOS also supports</a>). It aims to make it easy to turn any Unix server into an efficient unikernel.</p>
<p>In fact, the initial motivation behind exploring Unikraft as a MirageOS backend was to experiment and see what performance levels we could reach. We were particularly excited about using their Virtio-based network interface, as <code>virtio</code> is currently only implemented for one specific x86_64-only backend in Solo5. Some of the immediate performance differences we observed are detailed further down, but performance is not all we hope to gain from the Unikraft backend in the long run.</p>
<p>Unikraft is on the road to being multicore compatible (i.e., having one unikernel using multiple cores). While this is not ready yet and significant effort is needed to get there, it means that the MirageOS backend will eventually benefit from these efforts and support the full set of OCaml 5 features.</p>
<p>Furthermore, the Unikraft community (which is quite active) is experimenting with a variety of other targets, such as bare-metal for some platforms or new hypervisors (e.g. seL4). Any new target Unikraft supports can then be supported ‘for free’ by MirageOS too. For example, the Unikraft backend has already resulted in Firecracker being a new supported virtual machine monitor (VMM) for MirageOS.</p>
<p>Lastly, since Unikraft is POSIX-compatible (for a large subset of syscalls), it has the potential to enable MirageOS unikernels to embed OCaml libraries that have not been ported yet. This compatibility could be especially useful for large libraries which are hard to port (<a href="https://ocaml.xyz/">owl</a> comes to mind).</p>
<h2>How Does Unikraft Support Work?</h2>
<p>Adding the new MirageOS backend required that we create or modify a series of components:</p>
<ul>
<li>An <a href="https://github.com/mirage/ocaml-unikraft">OCaml cross compiler</a> that can build the new backend by building its corresponding runtime and providing a way to build unikernel images (instead of normal executables).</li>
<li>New libraries for <a href="https://github.com/mirage/mirage-unikraft">Unikraft system support</a>, including its <a href="https://github.com/mirage/mirage-net-unikraft">network</a> and <a href="https://github.com/mirage/mirage-block-unikraft">block</a> devices.</li>
<li><a href="https://github.com/mirage/mirage/pull/1607">Support for the new backends</a> in the <code>mirage</code> tool.</li>
</ul>
<p>Using Unikraft with a QEMU or a Firecracker backend is as simple as choosing the <code>unikraft-qemu</code> or the <code>unikraft-firecracker</code> target when configuring a unikernel.</p>
<h3>The OCaml/Unikraft Cross Compiler</h3>
<p>To build the <a href="https://github.com/mirage/ocaml-unikraft">OCaml cross compiler</a> with Unikraft, we used the <a href="https://unikraft.org/">Unikraft</a> core, Unikraft <a href="https://github.com/mirage/unikraft-lib-musl">lib-musl</a>, and <a href="https://musl.libc.org/">musl</a>. The <a href="https://musl.libc.org/">musl</a> library is the C library recommended by Unikraft for building programs using the POSIX interface. The combination made it easy to build the OCaml 5 runtime, particularly because it provided an implementation of the <code>pthread</code> API, which is now used in many places in the runtime. It could also potentially make it easier to port some libraries that depend on <code>Unix</code> to work on Unikraft backends.</p>
<p>The OCaml cross compiler builds upon the work that has been upstreamed to ease the <a href="https://discuss.ocaml.org/t/building-an-ocaml-cross-compiler-with-ocaml-5-3/15918">creation of cross compilers</a>, using almost the same series of patches as for <code>ocaml-solo5</code>. So the only versions of the compiler that are currently supported for OCaml/Unikraft are OCaml 5.3 and 5.4. All the patches have been upstreamed to OCaml so there should no longer be any patches required by OCaml 5.5.</p>
<p>Note that we didn’t go with the full standard Unikraft POSIX stack, which includes <a href="https://savannah.nongnu.org/projects/lwip/">lwIP</a> to provide network support. We had a prototype at some point relying on <code>lwIP</code> to validate our progress on other building blocks, but it raised many incompatibility issues with the standard MirageOS network stack, so we dropped support for <code>lwIP</code> for now. Instead, we developed the libraries required to plug the MirageOS stacks into the low-level interfaces provided by the Unikraft core.</p>
<h3>The New MirageOS Libraries for Unikraft Support</h3>
<p>Unikraft support comes with packages using the standard names, <code>mirage-block-unikraft</code> and <code>mirage-net-unikraft</code>, to support the block and network devices. These libraries are implemented directly on top of the low-level Unikraft APIs, and therefore use <code>virtio</code> on both QEMU and Firecracker VMMs.</p>
<p>To evaluate the quality of the implementations for these devices, we ran a couple of small benchmarks using OCaml 5.3 and Unikraft 0.18.0. You can find the benchmarks (including the unikernels along with some scripts to set them up and run them) in the <code>benchmarks</code> directory in <a href="https://github.com/Firobe/mirage-skeleton/tree/benchmarks">@Firobe’s fork of mirage-skeleton, benchmarks branch</a>.</p>
<h4>Network Device</h4>
<p>To measure the performance of the network stack, we tweaked the simple <a href="https://github.com/mirage/mirage-skeleton/tree/main/device-usage/network">network skeleton</a> unikernel to compute statistics and used a variable number of clients all sending 512MB of null bytes. We have run this benchmark on both a couple of <code>x86_64</code> laptops and on an LX2160 <code>aarch64</code> board, all running a GNU/Linux OS.</p>
<p>We have observed a lot of variability in the performance of the <code>solo5-spt</code> unikernel (sometimes better, sometimes worse than <code>unikraft-qemu</code>) depending on the actual computer used, so these measurements should be taken with a grain of salt.</p>
<p>On two different <code>x86_64</code> laptops:
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft1-170w~GO3z452NUPqB6z9mB8lzKQ.webp 170w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft1-340w~xj6FFSSwPSqo3R6udiFuNg.webp 340w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft1-680w~3TTzsVwszUkYqVxQLQsHqg.webp 680w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft1-1360w~f2v_2IkapxtU_o7pASCW0w.webp 1360w" src="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft1-1360w~f2v_2IkapxtU_o7pASCW0w.webp" alt="Network benchmark: Unikraft solo5-spt and solo5-hvt, in decreasing performance order">
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft2-170w~HM4BaGTvSQ98hUIvluAzfg.webp 170w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft2-340w~-Gzp8DJtNeMpJpFIyksJFQ.webp 340w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft2-680w~vtrb3v3a8859pd2oSsza4A.webp 680w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft2-1360w~SYj52Vo8QGKy0GSy04VasQ.webp 1360w" src="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft2-1360w~SYj52Vo8QGKy0GSy04VasQ.webp" alt="Network benchmark: Unikraft or solo5-spt are fatest, depending on the number of connections, solo5-hvt slower"></p>
<p>On the LX2160 <code>aarch64</code> board:
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft3-170w~G2yNOcNu_XKMT8pMtJq-xQ.webp 170w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft3-340w~igGIhY56MkXIdb4y0-MA5A.webp 340w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft3-680w~n2XEyuib35mVd2MyZOsFIQ.webp 680w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft3-1360w~hfvl6i_jK8lOOTXlV0XHqw.webp 1360w" src="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft3-1360w~hfvl6i_jK8lOOTXlV0XHqw.webp" alt="Network benchmark: Unikraft solo5-spt and solo5-hvt, in decreasing performance order"></p>
<h4>Block Device</h4>
<p>To measure the performance of the block devices, we wrote a simple unikernel copying data from one disk to another. We can see that the performance of <code>unikraft-qemu</code> is lower than <code>solo5-hvt</code> for small buffer sizes, but fortunately, the situation improves with larger buffer sizes. We only ran this benchmark on an <code>x86_64</code> laptop, as there is currently an <a href="https://github.com/unikraft/unikraft/issues/1622">issue with two block devices</a> on <code>aarch64</code> on Unikraft.
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft4-170w~kd0STnLNSbxQa4o6q1DpKQ.webp 170w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft4-340w~hRVvEQX35rZKG04bHkaidA.webp 340w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft4-680w~h4RWil_ShQDulcM4llWbiQ.webp 680w, /blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft4-1360w~U5R7JpfSJ1Bis7jRAxouKw.webp 1360w" src="/blog/images/2025-09-10.unikraftmirageos/mirageos-unikraft4-1360w~U5R7JpfSJ1Bis7jRAxouKw.webp" alt="Block device benchmark: solo5-spt fatest on small buffer sizes, Unikraft-QEMU fatest on larger buffer sizes, Unikraft-Firecracker and solo5-hvt slower"></p>
<p>It is worth mentioning that I/Os can be parallelised, which also yields a significant performance boost. Indeed, <code>mirage-block-unikraft</code> can leverage the parallelised <code>virtio</code> backend of QEMU and Firecracker, which solves the problem of limiting I/Os to what the hardware supports in terms of both parallelism and sector size.</p>
<h3>Current Limitations</h3>
<ol>
<li>In our tests, only Linux appeared to be well supported for compiling Unikraft at the moment. As a result, we have restricted our packages to that OS for now.</li>
<li>Unikraft supports various backends. At the moment, we’ve only added support and tested its two major ones: QEMU and Firecracker.</li>
</ol>
<h2>Try it Out</h2>
<p>To try the new Unikraft backend for MirageOS, you need to use an OCaml 5.3 or 5.4 switch, so that you can install <code>mirage</code> and the OCaml/Unikraft cross compiler. The short version could be:</p>
<pre><code>$ opam switch create unikraft-test 5.4.0 # or 5.3.0, and if needed
$ opam install mirage ocaml-unikraft-backend-qemu ocaml-unikraft-x86_64
</code></pre>
<p>See below for some explanations about the numerous OCaml/Unikraft packages.
From then on, you can follow the standard procedure (see how to <a href="https://mirage.io/docs/install">install MirageOS</a> and how to <a href="https://mirage.io/docs/hello-world">build a hello-world unikernel</a>) to build your unikernel with the Unikraft backend of your choice, which should boil down to something like:</p>
<pre><code>$ mirage configure -t unikraft-qemu
$ make
</code></pre>
<h3>Details About the Various Packages for the OCaml/Unikraft Cross Compiler</h3>
<p>The <a href="https://github.com/mirage/ocaml-unikraft">OCaml cross compiler</a> to Unikraft is split up into 14 packages (see the <a href="https://github.com/ocaml/opam-repository/pull/27856">PR to opam-repository</a> for more details) so that users can:</p>
<ul>
<li>Choose which of the backends (QEMU or Firecracker) and which of the architectures (<code>x86_64</code> and <code>arm64</code>) they want to install, where all combinations can be installed at the same time.</li>
<li>Choose which architecture is generated when they use the <code>unikraft</code> OCamlfind toolchain by installing one of the two <code>ocaml-unikraft-default-&lt;arch&gt;</code> packages.</li>
<li>Install the <code>ocaml-unikraft-option-debug</code> to enable the (really verbose!) debugging messages.</li>
</ul>
<p>Furthermore, virtual packages can be installed to make sure that one of the architecture-specific packages is indeed installed:</p>
<ul>
<li><code>ocaml-unikraft</code> can be installed to make sure that there is indeed a <code>unikraft</code> OCamlfind toolchain installed.</li>
<li><code>ocaml-unikraft-backend-qemu</code> and <code>ocaml-unikraft-backend-firecracker</code> can be installed to make sure that the <code>unikraft</code> OCamlfind toolchain supports the corresponding backend.</li>
</ul>
<p>Those virtual packages will be used by the <code>mirage</code> tool when the target is <code>unikraft-qemu</code> or <code>unikraft-firecracker</code>.</p>
<p>All those packages use one of two version numbers. The backend packages use the Unikraft version number (0.18.0 and 0.20.0 have been tested and packaged) while the latest OCaml cross-compiler packages use version 1.1.0.</p>
<h2>Conclusion</h2>
<p>We are still experimenting with this new backend. We expect to run it in production in the coming months, but it may need improvements nevertheless. Notably absent from this release is an early attempt to leverage Unikraft’s POSIX compatibility to implement Mirage interfaces instead of hooking directly to Unikraft’s internal components. This early version used Unikraft’s <code>lwIP</code>-based network stack instead of Mirage’s (fooling Mirage into thinking it was running on Unix), and it may be interesting to revisit this kind of deployment, in particular for easy inclusion of Unix-only OCaml libraries in unikernels.</p>
<p>We are eager for reviews, comments, and discussion on the implementation, design, and approach of this new Mirage backend, and hope it will be useful to others.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-11-13-announcing-unikraft-support-for-mirageos-unikernels</link><guid isPermaLink="false">https://tarides.com/blog/2025-11-13-announcing-unikraft-support-for-mirageos-unikernels.html</guid><dc:creator><![CDATA[ Samuel Hym ]]></dc:creator><pubDate>Thu, 13 Nov 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Supporting OCurrent: FLOSS/Fund Backs Maintenance for OCaml's Native CI Framework]]></title><description><![CDATA[<p>We’re pleased to share that <a href="https://floss.fund">FLOSS/Fund</a> has provided Tarides with a grant to support the ongoing maintenance of <strong>OCurrent</strong>, the OCaml-native CI and workflow framework. This support is part of the <a href="https://floss.fund/blog/second-tranche-2025-anniversary/">second tranche of FLOSS Fund’s 2025 round</a>, and it will help us focus on ensuring OCurrent’s stability, improving its infrastructure, strengthening its documentation, and continuing community support.</p>
<h2>What is OCurrent?</h2>
<p><a href="https://www.ocurrent.org/">OCurrent</a> is both a small embedded domain-specific language (eDSL) for describing workflows and pipelines, and the wider CI infrastructure that powers much of the OCaml ecosystem. The OCurrent library enables developers to express build, test, and deployment logic directly in OCaml, with automatic dependency tracking and selective re-execution of steps when inputs change. This makes it particularly well-suited for long-running, continuously evolving systems where correctness and reproducibility are key. Pipelines written in OCurrent are self-adjusting: when a Git branch, Docker image, or external dependency is updated, the pipeline reacts automatically.</p>
<p>OCurrent underpins the continuous integration and build infrastructure for the OCaml community. The <a href="https://github.com/ocurrent/ocaml-ci">ocaml-ci</a> service, which provides CI for OCaml projects hosted on GitHub, is implemented using OCurrent. Similarly, <a href="https://github.com/ocurrent/opam-repo-ci">opam-repo-ci</a> tests submissions to the <code>opam</code> repository. The same infrastructure is used for building and maintaining <a href="https://hub.docker.com/r/ocaml/opam">Docker base images</a> and services like <em>ocaml-docs-ci</em> and <em>ocaml-multicore-ci</em>. Together, they keep the OCaml ecosystem’s packages, documentation, and base environments current across compiler versions, architectures, and operating systems.</p>
<h2>How Will We Use the FLOSS Fund Grant?</h2>
<p>This grant will enable us to dedicate time to maintaining OCurrent’s core components and the CI infrastructure, enhancing their performance and reliability, and refining the developer experience. The funding will help sustain the infrastructure that powers <em>ocaml-ci</em> and <em>opam-repo-ci</em>, as well as the automated build and deployment pipelines that many OCaml projects rely on.</p>
<p>For an ecosystem like OCaml’s, with its diversity of compiler versions, platforms, and tooling, having a reliable and type-safe pipeline engine is crucial. OCurrent enables reproducible builds and continuous integration, reducing friction for developers and ensuring that the ecosystem remains healthy and up to date. As described in <a href="/blog/2023-07-12-ocaml-ci-renovated/">our earlier post on the renovated ocaml-ci</a>, OCurrent powers the “zero-configuration” CI experience that has become a cornerstone of OCaml’s development workflow.</p>
<h2>A Note of Thanks</h2>
<p>We’re grateful to FLOSS Fund for recognising the importance of this work and for supporting the continued development of the open-source infrastructure that keeps OCaml projects running smoothly. Thanks also to the many contributors, users, and maintainers of OCurrent, <em>ocaml-ci</em>, <em>opam-repo-ci</em>, and related projects. Your participation and feedback continue to shape the future of the OCaml ecosystem.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-10-30-supporting-ocurrent-floss-fund-backs-maintenance-for-ocaml-s-native-ci-framework</link><guid isPermaLink="false">https://tarides.com/blog/2025-10-30-supporting-ocurrent-floss-fund-backs-maintenance-for-ocaml-s-native-ci-framework.html</guid><dc:creator><![CDATA[ KC Sivaramakrishnan ]]></dc:creator><pubDate>Thu, 30 Oct 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5.4 Release: New Features, Fixes, and More!]]></title><description><![CDATA[<p>It's everyone's favourite time of the year – the time for a new OCaml release! 5.4 brings improvements and optimisations as well as new features, some of which may be familiar to long-time members of the OCaml community. Today's post highlights some of the work done to improve the language for everyone. As always, we can't cover everything, and for a more exhaustive list, I recommend checking out the <a href="https://github.com/ocaml/ocaml/blob/5.4/Changes">changelog</a> in the OCaml repo.</p>
<p>Let's dive in!</p>
<h2>Immutable Arrays</h2>
<p>Immutable arrays, as their name suggests, are like regular arrays in OCaml, with the major difference that their contents cannot be modified after they are created. Immutability is often a useful property for data structures and extending this to arrays provides various improvements over regular arrays.</p>
<p>Immutable arrays can improve safety in cases where mutation isn't required and only random-access packed memory is needed. In such a situation, an immutable array clearly communicates the design intention to the developer, improving their use of the code and improving safety as a consequence. Since the immutability property does not allow a function to change the contents, immutable arrays also improve the reasoning properties of the code, which verification tools can use. More concretely, a new predefined type <code>'a iarray</code> and an <code>Iarray</code> module (with corresponding <code>Iarraylabels</code>) have been added to the <code>Stdlib</code>.</p>
<p>Lastly, immutable arrays can be safely coerced since there's no risk of inserting incompatible types. Consider this code example:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Positive</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source"> = </span><span class="ocaml-keyword-other-ocaml">private</span><span class="ocaml-source"> int
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">make</span><span class="ocaml-source"> : int -&gt; t
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">make</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">invalid_arg</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">make</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">nums</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">6</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">lst</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Positive</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-source">nums</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">arr</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">of_list</span><span class="ocaml-source"> </span><span class="ocaml-source">lst</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">iarr</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Iarray</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">of_list</span><span class="ocaml-source"> </span><span class="ocaml-source">lst</span><span class="ocaml-source">
</span></code></pre>
<p>You can sum up <code>lst</code> with <code>List.fold_left (+) 0 (lst :&gt; int list)</code>, but if you do that with <code>Array.fold_left (+) 0 (arr :&gt; int array)</code> you will be told <code>Type Positive.t array is not a subtype of int array</code>. Conversely, <code>Iarray.fold_left (+) 0 (iarr :&gt; int iarray)</code> works, which is very useful, as the only workaround with an array is to coerce individually or - more often - copy the items to a new array, which is unfortunate (because the copy is just to satisfy type safety).</p>
<p>Pull Request <a href="https://github.com/ocaml/ocaml/pull/13097">#13097</a> introduced immutable arrays to OCaml 5. It is a feature <a href="https://www.janestreet.com/">Jane Street</a> has been using internally in their <a href="https://github.com/tarides/tarides.com/pull/oxcaml.org">OxCaml branch</a>.  The team at Tarides collaborated with Jane Street engineers to upstream this feature. The resulting PR sparked some great discussions in the community, and after considering their feedback, it was merged to form part of the 5.4 release.</p>
<h2>Labelled Tuples</h2>
<p>Labelled tuples allow the developer to label tuple elements, giving useful names to constructed values in cases where labelled function arguments allow them to name parameters. Reordered and partial patterns are resolved during type checking, and the labels are erased during translation to lambda. Labelled tuples do not support extracting an element of the tuple by the label.</p>
<p>One example of this being useful is when developers want to compute two values from a list without mixing them up. Labelled tuples can help prevent them from accidentally returning the pair in the wrong order or mixing up the order of the initial values. This code was given as a motivating example by the authors:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">sum_and_product</span><span class="ocaml-source"> </span><span class="ocaml-source">ints</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> ~</span><span class="ocaml-source">sum</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> ~</span><span class="ocaml-source">product</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fold_left</span><span class="ocaml-source"> </span><span class="ocaml-source">ints</span><span class="ocaml-source"> ~</span><span class="ocaml-source">init</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">f</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">~</span><span class="ocaml-source">sum</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> ~</span><span class="ocaml-source">product</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">elem</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">sum</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">elem</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">sum</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">product</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">elem</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">product</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">          ~</span><span class="ocaml-source">sum</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> ~</span><span class="ocaml-source">product</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>The function <code>~f</code> has type <code>sum:int * product:int -&gt; int -&gt; sum:int * product:int</code> which enforces the ordering when constructing the return tuple.</p>
<p>Jane Street has been using labelled tuples internally for almost a year as part of their <a href="https://oxcaml.org/documentation/miscellaneous-extensions/labeled-tuples/">OxCaml branch</a>, and reports that it has been a useful and popular feature. A core developer, David Allsopp, also benefited from it when he was working on the Relocatable OCaml Project. He used labelled tuples in the <a href="https://github.com/ocaml/ocaml/pull/14014/files#diff-74b2bd0c4072976502190e79aa388834d77bb7311de8a517ff13d6fc464a0012R159">test harness</a> for the feature.</p>
<p>To find out more, check out Chris's <a href="https://www.youtube.com/live/KLWiEf3x3kc?si=TLHVrsMNz72T63JT&amp;t=25505">talk from the 2024 ML workshop</a>, <a href="https://tyconmismatch.com/papers/ml2024_labeled_tuples.pdf">the ML workshop talk proposal</a>, and <a href="https://github.com/ocaml/ocaml/pull/13498">PR #13498</a>, which documents the changes.</p>
<h2>Frame Pointers</h2>
<p>Frame pointers are used to <em>walk the stack</em> of function calls in a program. Tools like profilers and debuggers use frame pointers to walk the stack and reconstruct the call graph for programs. Having this support means that third-party debugging and performance tools, like perf, eBPF, Dtrace, GDB, and LLDB now work much better with OCaml.</p>
<p>The history of frame pointers in OCaml is a long and somewhat complicated one. The compiler has had an option for frame pointers on AMD64 since OCaml 4.01, released in 2013. After the multicore update in OCaml 5, support returned for Linux/AMD64 (<a href="https://github.com/ocaml/ocaml/pull/11144">#11144</a>) in the 5.1 release and macOS/AMD64(<a href="https://github.com/ocaml/ocaml/pull/13163">#13163</a>) in the 5.3 release. In 5.4, support was added for ARM64 on both Linux and macOS, along with documentation on using frame pointers to profile OCaml code. The team has started work on supporting RISC-V, s390x, and Power architectures, which we hope to see implemented in the future.</p>
<p>For a fuller view of all the work, you can check out PRs <a href="https://github.com/ocaml/ocaml/issues/13500">#13500</a>, <a href="https://github.com/ocaml/ocaml/issues/13575">#13575</a>, <a href="https://github.com/ocaml/ocaml/pull/13635">#13635</a>, and <a href="https://github.com/ocaml/ocaml/pull/13751">#13751</a>, which have improved frame pointer support in OCaml 5.4. Tim McGilchrist also wrote a <a href="https://lambdafoo.com/posts/2025-02-24-ocaml-frame-pointers.html">blog post on his website</a> about implementing frame pointers in OCaml.</p>
<h2>Atomic Field Accesses</h2>
<p>Previously, OCaml 5 had limited support for atomic operations, only supporting them on the special <code>'a Atomic.t</code> type.  After a lot of discussion on the best solutions, consensus landed on introducing records with atomic fields to improve performance when implementing concurrent data structures. Instead of requiring a field to be of type <code>'a Atomic.t</code> and introduce an indirection on that record field, having records with atomic fields is more efficient.</p>
<p>PR <a href="https://github.com/ocaml/ocaml/pull/13404">#13404</a> implements atomic operations, including two features described in <a href="https://github.com/ocaml/RFCs/pull/39">'atomic record fields' RFC</a>. First, atomic record fields are now just record fields marked with an atomic attribute, and their reads and writes are compiled into atomic operations. Second, the PR implements atomic locations, a compiler-supported way to describe an atomic field within a record to perform atomic operations in addition to read and write operations.</p>
<h2>Unloadable Runtime</h2>
<p>OCaml 5.4 reintroduces 'memory cleanup upon exit' mode. The purpose of this mode is to enable something called the 'unloadable runtime'. Suppose you're running a program written in another programming language, and you use an OCaml library as a shared library. When control switches over from the OCaml library to the main program, you want the handover to happen cleanly, with all OCaml runtime resources being 'unloaded', including the stack, heap sections, code fragments, custom operations, buffers, and tables.</p>
<p>In the update from OCaml 4 to 5, support for the unloadable runtime was lost as multiple domains complicated matters. An OCaml program generally stops and exits once the main domain runs to completion, and the same behaviour needs to be adopted for multiple domains. When the main domain stops, the other domains are also terminated. Terminating all running domains was the trickiest aspect to implement since only stopping for garbage collection was originally supported. Now, with 'memory cleanup upon exit' mode, all domains can be terminated, and the runtime can be unloaded before control is handed back, ensuring a clean handover to a host program. Learn more in PR <a href="https://github.com/ocaml/ocaml/pull/12964">#12964</a>, which introduced the feature in 5.4.</p>
<h2>… And Many More!</h2>
<h3>GC Performance Improvements (Stephen Dolan, Nick Barnes, Gabriel Scherer, reviews by François Bobot, Josh Berdine, Damien Doligez, Tim McGilchrist, Guillaume Munch-Maccagnoni, benchmarking by Nicolás Ojeda Bär, and reports  by Emilio Jesús Gallego Arias and Olivier Nicole)</h3>
<p>This project improved the performance of garbage collection in the runtime. Changes to the way ephemerons are treated by the minor GC, allowing values from ephemeron keys to be collected and optimising them so they are not re-marked, have expanded their functionality whilst keeping the minor GC performant. A change to the major GC’s pacing, fixing a bug where the <code>work_counter</code> would get out of sync with the <code>alloc_counter</code>, also boosted performance. Finally, a new <code>Gc.ramp_up</code> callback will be introduced, allowing users to mark ramp-up phases of memory consumption and preventing slowdowns as a result of collection work being done twice. The work is spread across several PRs, including <a href="https://github.com/ocaml/ocaml/pull/13643">#13643</a>, <a href="https://github.com/ocaml/ocaml/pull/13827">#13827</a>, <a href="https://github.com/ocaml/ocaml/pull/13736">#13736</a>, <a href="https://github.com/ocaml/ocaml/issues/13300">#13300</a>, and  <a href="https://github.com/ocaml/ocaml/pull/13861">#13861</a>.</p>
<h3>Software Prefetching Support (Tim McGilchrist, review by Nick Barnes, Antonin Décimo, Stephen Dolan and Miod Vallat)</h3>
<p>5.4 enables software prefetching instructions for several architectures, including ARM64, s390x, PPC64, and RISC-V. This feature advises the processor to 'pre-fetch' data from slower memory and store it in faster memory before it is needed. This can help improve the overall performance of programs, and in fact, the PR <a href="https://github.com/ocaml/ocaml/pull/13582">#13582</a> contains several benchmarks showing speed-ups as a result of the changes.</p>
<h3>Review of Locking in the Multicore Runtime (Guillaume Munch-Maccagnoni, review by Gabriel Scherer, tests by Jan Midtgaard)</h3>
<p>Part of the task between updates is to identify and address problems. In PRs <a href="https://github.com/ocaml/ocaml/pull/13227">#13227</a> and <a href="https://github.com/ocaml/ocaml/pull/13714">#13714</a>, issues surrounding the <code>caml_plat_lock_non_blocking</code> and <code>caml_plat_lock_blocking</code> caused deadlocks in the runtime. After auditing and testing the code to see what was going wrong, tighter constraints about when and how the commands could be mixed helped solve the deadlocking.</p>
<h2>A Few Bug Fixes</h2>
<h3><a href="https://github.com/ocaml/ocaml/pull/13605">#13605</a> (Samuel Vivien, review by Florian Angeletti, Richard Eisenberg and Jacques Garrigue)</h3>
<p>This PR adds a check to detect and prevent errors occurring as a result of generating typing constraints. When users define a parametrised type and create a constraint by binding that type with an <code>as</code>, there were times when a constraint would not behave as expected. By adding an error, this PR prevents users from having constraints and introduces unwanted behaviour.</p>
<h3><a href="https://github.com/ocaml/ocaml/pull/13812">#13812</a> (Samuel Vivien, review by Gabriel Scherer)</h3>
<p>This PR adds another check that tests the validity of the type variable name on the right-hand side of <code>_as_</code>. The added test reduces friction and confusion for the user.</p>
<h3><a href="https://github.com/ocaml/ocaml/pull/13895">#13895</a> and <a href="https://github.com/ocaml/ocaml/pull/13691">#13691</a> (Jan Midtgaard, review by Miod Vallat, Sadiq Jaffer and Antonin Décimo)</h3>
<p>Gc.control had four underlying globals that would cause data races when tested with <a href="https://ocaml.org/manual/5.3/tsan.html">TSan</a>. Jan tested them and rewrote them to be atomic, which resolved the racing and brought them in line with other globals which were already atomic.</p>
<h2>What’s Next?</h2>
<p>Work continues! Performance improvements to the garbage collector that aim to reduce the performance gap between versions 4.14 and 5.x include GC pacing and mark delay work.  Relocatable OCaml is another ongoing project, whose <a href="https://github.com/ocaml/RFCs/pull/53">RFC</a> was accepted in principle in March. It wasn't quite ready for 5.4, but will hopefully <a href="https://icfp22.sigplan.org/details/ocaml-2022-papers/12/Copying-opam-switches-it-should-Just-Work-">finally be arriving</a> in OCaml 5.5!</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-10-10-ocaml-5-4-release-new-features-fixes-and-more</link><guid isPermaLink="false">https://tarides.com/blog/2025-10-10-ocaml-5-4-release-new-features-fixes-and-more.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Fri, 10 Oct 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Ocsigen: A Full OCaml Framework for Websites and Apps]]></title><description><![CDATA[<p>Are you interested in using a functional programming language like OCaml for web development? This post will give you an overview of a great resource you might want to consider (if you're not using it already!).  <a href="https://ocsigen.org/home/intro.html">Ocsigen</a> is a collection of projects that provide a complete framework for the OCaml developer looking to create websites and mobile apps. It’s got you covered, from simple server-side websites to client-side programs and complex client-server applications.</p>
<h2>The Start of Ocsigen</h2>
<p>So, how did the idea of a web framework for OCaml come about? Ocsigen began as a research project under the <a href="https://www.irif.fr/index">Research Institute for Foundations of Computer Science</a> (IRIF), aiming to introduce support alongside the requisite tools for web development in OCaml. At the time of its inception (2005), there were no mature web frameworks for the language, and the founders Vincent Balat and Jérôme Vouillon wanted to implement a full framework that would support professional-level projects at scale.</p>
<p>The reasoning behind choosing OCaml for this project was to take advantage of the language’s expressive type system. This system can check the properties of programs at compile time, reduce development time, make it easier to refactor code and add new features, and ensure that existing features are working correctly. The research paper titled <a href="https://www.irif.fr/~balat/publications/2006mlworkshop-balat-ocsigen.pdf">Ocsigen: Typing Web Interaction With Objective Caml</a>, written by Vincent and based on the broader Ocsigen project, made two claims on which the design of Ocsigen was founded: firstly, that web languages could (and should) take greater advantage of static typing, and second, that functional programming is a good fit for web programming.</p>
<h3>Continuations, JavaScript, and Multi-Tier Programming</h3>
<p>The implications of their first claim meant that Vincent and Jérôme focussed on ensuring the correctness of generated pages using static typing – being able to statically check that links and forms match dynamic pages operating as typed services.</p>
<p>The second claim was inspired by several articles positing that web programming would benefit from taking advantage of continuations. Continuation is a concept in programming that refers to an <a href="https://en.wikipedia.org/wiki/Continuation">abstract representation of the control state of a computer program</a>. Functional programming can use and manipulate continuations in a straightforward and easy way. In web programming, continuations are a very elegant solution to what is known as “<a href="https://dl.acm.org/doi/pdf/10.1145/357766.351243">the back button problem</a>”. Vincent noticed that, at the time, few tools took advantage of this opportunity and determined to implement continuations along with other functional programming features in Ocsigen. As a result, the framework has added whole new functionalities to aspects of traditional web programming using continuations including typed forms, typed links, and advanced session handling using scoped references.</p>
<p>Furthermore, the Ocsigen team implemented <a href="https://ocsigen.org/js_of_ocaml/latest/manual/overview">Js_of_ocaml</a>, one of the first compilers to JavaScript, enabling developers to use OCaml both on the server and client side.</p>
<p>Finally, the Ocsigen team invented multi-tier programming. In multi-tier programming, both ‘sides’ of an application are written as a single program. This makes communication between the server and client straightforward, allowing the user to generate web pages from the server or the client side (with the application being indifferent to which side you choose). For the first time, this allowed developers to mix app development and traditional website development in the same app! It also opened up the possibility for users to write mobile applications with the exact same code as they wrote web applications.</p>
<h2>What is Ocsigen?</h2>
<p>Ocsigen consists of several independent open-source projects, all available on GitHub. In this post, we’ll introduce you to the most used ones, but if you want to discover more, check out <a href="https://ocsigen.org/home/projects.html">Ocsigen's website</a>.</p>
<ul>
<li><strong><a href="https://ocsigen.org/lwt/latest/manual/manual">Lwt</a>:</strong> is a concurrency library that handles I/O operations using promises.  Promises are references that will be filled asynchronously. One of the biggest advantages when calling a function that returns a promise is that it will not require a new stack or process. This ensures a high-speed, efficient call. Recently, Ocsigen is <a href="/blog/2025-03-13-we-re-moving-ocsigen-from-lwt-to-eio/">moving to using Eio</a> as a concurrency library.</li>
<li><strong><a href="https://ocsigen.org/tyxml/latest/manual/intro">TyXML</a>:</strong>  is a library used to build statically correct HTML and SVG documents. It functions largely like HTML with a combinator implementing HTML attributes and elements, so it’s easy to pick up and use for most programmers. However, with TyXML, you can use OCaml to manipulate elements, and invalid markup text will yield a type error, which helps you write cleaner code. There are standalone examples available in the <a href="https://github.com/ocsigen/tyxml/tree/master/examples">TyXML GitHub repo</a> to help you get started.</li>
<li><strong><a href="https://ocsigen.org/js_of_ocaml/latest/manual/overview">Js_of_ocaml</a>:</strong> is a compiler for converting OCaml bytecode to JavaScript, allowing users to run pure OCaml programs in JavaScript environments. Js_of_ocaml is easy to install, works out-of-the-box with a lot of existing bindings to browser APIs, and generates performant programs. In addition, the Js_of_ocaml repo now also contains Wasm_of_ocaml, another compiler (originally a fork of the former) that compiles to WebAssembly targets.</li>
<li><strong><a href="https://ocsigen.org/eliom/latest/manual/overview">Eliom</a>:</strong> is a framework that enables users to implement multi-platform applications on web browsers and mobile devices in a modern programming style. It is not a new language but rather an extension of OCaml. Its main features are that it provides high-level expressive concepts that enable developers to program complex behaviours in very few lines of code; supports higher security by implementing the OCaml type system and checking properties of applications at compile time; and makes it possible to write client-server applications as single programs using a multi-tier extension of OCaml.</li>
<li><strong><a href="https://ocsigen.org/ocsigenserver/latest/manual/quickstart">Ocsigen Server</a>:</strong> is a web server, as the name suggests. It is available as an executable and as a library and supports a wide variety of services, including static files, redirection, reverse proxy, CORS, page compression, authentication, and more.</li>
<li><strong><a href="https://ocsigen.org/ocsigen-toolkit/latest/manual/intro">Ocsigen Toolkit</a>:</strong> is a collection of resources like widgets and other utilities that users may want for their web applications. Usefully, most of the widgets are compatible with mobile programming thanks to their ability to be produced on the server side of the client using the same code.</li>
<li><strong><a href="https://ocsigen.org/ocsigen-start/latest/manual/intro">Ocsigen Start</a>:</strong> is an application template containing many standard features, including user management, notifications, and code examples. It can be used to learn the Ocsigen framework or to quickly create a prototype. There is a <a href="https://ocsigen-1.inria.fr/ocsigen-start/demo/">demo online</a> that you can try before installing the application, and it’s also available on mobile.</li>
</ul>
<p>A quick note on Eio: <a href="https://github.com/ocaml-multicore/eio">Eio</a> is an effects-based direct-style I/O library compatible with OCaml 5 including multicore. Direct-style concurrency enables the user to write code in a natural style without taking into consideration which code is concurrent and which isn’t (thereby eliminating the well-known <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">function colouring problem</a>). Eio emphasises performance, making the most of new kernel I/O interfaces for enhanced parallelism efficiency and security, with parts of it being formally verified and using low-level and high-level interfaces. You can read more in <a href="/blog/2024-03-20-eio-1-0-release-introducing-a-new-effects-based-i-o-library-for-ocaml/">one of our blog posts on Eio</a>.</p>
<h2>BeSport</h2>
<p>The most prominent user of Ocsigen is <a href="https://www.besport.com">BeSport</a>. BeSport is a social networking platform created for sports clubs, amateurs, and pros. Teams and clubs can share content with their fans, and users can follow results, news, and statistics for their favourite professional teams or their own amateur leagues.</p>
<p>Choosing Ocsigen enabled the team behind BeSport to quickly build a feature-complete social network, functioning both as a web application and as Android and iOS applications. They achieved this under the constraints of a start-up: limited funding, unclear and rapidly evolving specifications, and a need to build clean and sustainable code. Their experience encapsulates the benefits of the Ocsigen programming model, which fosters fast and reliable development.</p>
<h2>Until Next Time</h2>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-10-02-ocsigen-a-full-ocaml-framework-for-websites-and-apps</link><guid isPermaLink="false">https://tarides.com/blog/2025-10-02-ocsigen-a-full-ocaml-framework-for-websites-and-apps.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Thu, 02 Oct 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Parsimoni Joins Techstars' Autumn 2025 Programme!]]></title><description><![CDATA[<p>We are thrilled to announce that our sister company, <a href="https://parsimoni.co/">Parsimoni</a>, is part of <a href="https://www.techstars.com/newsroom/meet-the-startups-joining-techstars-fall-2025-accelerator-programs">Techstars’</a> autumn 2025 space accelerator programme in Los Angeles! They are in great company, with 4 other very synergistic space startups <a href="https://www.ant61.com/beacon">ANT61</a>, <a href="https://www.azora.space/">Azora</a>, <a href="https://cisgam.com/">CISGAM</a>, and <a href="https://www.translunar-esi.com/">Translunar Exports and Servicing Incorporated</a>.</p>
<p>Kick off was on the 8th of September. For the next 3 months, Parsimoni will benefit from Techstars’ mentorship, network, and investment to drive their mission of making satellite-based resources accessible to all.</p>
<h2>Parsimoni’s SpaceOS</h2>
<p>Parsimoni is developing SpaceOS, a next-generation operating system tailored to satellites and their payloads. By using unikernel technology (see <a href="https://mirageos.org/">MirageOS</a> for another example), the size of each application is optimised to its minimal viable state, reducing the attack size and improving performance. The resulting efficiency is valuable for users – reducing costs, extending mission capabilities, and allowing for more advanced processing in orbit – and also better for the planet. More efficient, multi-purpose satellites make better use of limited resources and create new possibilities for the satellite industry.</p>
<p>These advantages, along with its security-by-design approach (including PQC) and high adaptability to new features and applications, make SpaceOS the perfect candidate for use cases like AI-powered earth observation, secure mission operations, and next-generation space marketplaces.</p>
<p>Parsimoni uses <a href="https://ocaml.org">OCaml</a> to build their software, benefitting from the language's <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">security guarantees</a>, and this approach is garnering attention from a world-leading accelerator. It illustrates how Tarides' mission of building mission-critical systems in OCaml is a recipe for success – in the FinTech sector, space sector, and beyond! Visit <a href="https://parsimoni.co/">Parsimoni’s website</a> to learn more about SpaceOS and its future development and deployment.</p>
<h2>Techstars</h2>
<p><a href="https://www.techstars.com/">Techstars</a> is an accelerator with a global network that has been helping founders launch and grow their companies since 2006. Their three-month programme is designed to help startups put together all the pieces they need for success. This includes assigning mentors to each company, fostering business storytelling, providing fundraising opportunities and guidance on strategy, workshops, and networking. Techstars alumni include the very successful <a href="https://www.chainalysis.com/">Chainanalysis</a>, <a href="https://www.zipline.com/">Zipline</a>, <a href="https://www.datarobot.com/">DataRobot</a>, and <a href="https://www.alloy.com/">Alloy</a>.</p>
<p>For their autumn 2025 cohort, Techstars have selected startups with a strong focus on innovation and meeting future markets. Parsimoni is in the <em>space accelerator</em> group, and other groups include healthcare, the future of food, and the future of finance. Over 50 of the startups are also using AI in their offers.</p>
<h2>What’s Next</h2>
<p>We’re looking forward to seeing how Parsimoni will rise to the challenge over the next few months. If you want to keep up with them and SpaceOS, follow them on <a href="https://www.linkedin.com/company/parsimoni/posts/?feedView=all">LinkedIn</a> and keep an eye out for updates! Stay tuned to our blog – we will keep you updated about how Parsimoni is enabling the next revolution of space-based innovation.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-09-25-parsimoni-joins-techstars-autumn-2025-programme</link><guid isPermaLink="false">https://tarides.com/blog/2025-09-25-parsimoni-joins-techstars-autumn-2025-programme.html</guid><dc:creator><![CDATA[ Isabella Salenius ]]></dc:creator><pubDate>Thu, 25 Sep 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Dynamic Formal Verification in OCaml: An Ortac/QCheck-STM Tutorial]]></title><description><![CDATA[<p>You may have read our <a href="/blog/2024-09-03-getting-specific-announcing-the-gospel-and-ortac-projects/">recent post discussing our involvement in the Gospel Project</a>. In today's follow-up post, we will focus on <a href="https://github.com/ocaml-gospel/ortac">Ortac</a>, a tool we have been developing at Tarides as part of the Gospel Project.</p>
<h2>What is Ortac?</h2>
<p>Ortac aims to give OCaml programmers easy access to dynamic formal verification, also known as specification-driven testing. At its core, it translates computable Gospel terms into equivalent OCaml expressions. Different Ortac modes can then use these translations to generate OCaml code. This post is about the <a href="https://ocaml.org/p/ortac-qcheck-stm/latest">Ortac/QCheck-STM</a> mode that generates black-box model-based state-machine tests based on the <a href="https://ocaml.org/p/qcheck-stm/latest">QCheck-STM</a> framework.</p>
<h2>Ortac/QCheck-STM Mode</h2>
<p><a href="https://github.com/c-cube/qcheck">QCheck</a> is a property-based testing framework inspired by <a href="https://en.wikipedia.org/wiki/QuickCheck">QuickCheck</a>. As the name implies, the idea behind property-based testing is to check a property about a function, generally expressed as an equation involving the inputs and outputs of the function, against randomly generated inputs.</p>
<p>In contrast, QCheck-STM checks the behaviour of randomly generated sequences of function calls against a model. The model is implemented as a state machine (hence the STM). Testing a sequence of function calls helps users discover more bugs, especially when a mutable state is involved. In order to use QCheck-STM on a library, the users have to specify which type is the center of attention, also called System Under Test, or SUT for short. They also have to provide a functional model for the SUT, equipped with a <code>next_state</code> function computing the new model given a function call. You can read more about QCheck-STM <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">in a previous post on our blog</a> and <a href="https://janmidtgaard.dk/papers/Midtgaard-Nicole-Osborne:OCaml22.pdf">in a paper on parallel testing libraries for OCaml 5</a>.</p>
<p>One of the positives of using Ortac/QCheck-STM is that with just some Gospel annotations, a  Dune rule, and a configuration file, we can benefit from QCheck-STM tests. Another bonus is that in case of failure, the generated tests will provide a bug report containing the piece of Gospel specification that has been violated, a runnable scenario with the actual returned values to reproduce the failure and, if available, the expected returned value of the failing command according to the function specification.</p>
<p>Since the previous post, the Ortac tool has been improved in several ways. Version 0.4.0 brought support for keeping track of multiple Systems Under Test in the generated tests, all thanks to <a href="/blog/2024-09-24-summer-of-internships-projects-from-the-ocaml-compiler-team/">Nikolaus Huber's work</a>. Then, version 0.5.0 brought support for testing higher-order functions, thanks to Jan Midtgaard. Finally, version 0.6.0 improved the computation of the expected returned value based on the specifications.</p>
<p>The reader curious about how Ortac/QCheck-STM works internally can refer to this <a href="https://link.springer.com/chapter/10.1007/978-3-031-90660-2_1">paper</a>. The rest of this post will adopt a more practical perspective, using the <a href="https://en.wikipedia.org/wiki/Priority_queue">priority queue</a> as a running example.</p>
<h2>Project Setup</h2>
<p>Let's first explore a project setup, how to write the Gospel specification of the API, and finally, how to integrate Ortac/QCheck-STM with Dune.</p>
<p>Here is the project's structure:</p>
<pre><code><span class="sh-support-function-builtin">.</span><span class="sh-source">
</span><span class="sh-source">├── dune-project
</span><span class="sh-source">├── priority-queue.opam
</span><span class="sh-source">├── src
</span><span class="sh-source">│&nbsp;&nbsp; ├── dune
</span><span class="sh-source">│&nbsp;&nbsp; ├── priority_queue.ml
</span><span class="sh-source">│&nbsp;&nbsp; └── priority_queue.mli
</span><span class="sh-source">└── </span><span class="sh-support-function-builtin">test</span><span class="sh-source">
</span><span class="sh-source">    ├── dune
</span><span class="sh-source">    ├── dune.inc
</span><span class="sh-source">    ├── priority_queue_config.ml
</span><span class="sh-source">    └── priority_queue_tests.ml
</span><span class="sh-source">
</span><span class="sh-source">3 directories, 9 files
</span></code></pre>
<p>The idea behind Ortac/QCheck-STM is to <strong>not</strong> write the tests. Compared to a more traditional project, <code>priority_queue.mli</code> contains some Gospel specifications describing the expected behaviour of the functions, and the files <code>dune.inc</code> and <code>priority_queue_tests.ml</code> are generated by Ortac.</p>
<p>The <code>priority_queue_tests.ml</code> contains the generated QCheck-STM tests for the <code>Priority_queue</code> module, based on the Gospel specifications contained in the <code>priority_queue.mli</code> file and the <code>priority_queue_config.ml</code> file provided by the user. Furthermore, <code>dune.inc</code> contains the generated Dune rules to generate <code>priority_queue_tests.ml</code> and attach the execution of these tests to the <code>runtest</code> alias.</p>
<p>Generating <code>dune.inc</code> is the job of the Ortac/Dune mode, which is called by a hand-written dune rule in the <code>dune</code> file.</p>
<p>Due to how Dune's promote mode interacts with the <code>include</code> stanza, we must create an empty <code>dune.inc</code>. Another possibility is to use a <code>dynamic_include</code> rule (see the <a href="https://dune.readthedocs.io/en/stable/howto/rule-generation.html">rule generation chapter in the Dune docs</a> for more details).</p>
<p>Generating <code>priority_queue_tests.ml</code> is the job of the Ortac/QCheck-STM mode, which is called by a generated dune rule in <code>dune.inc</code>.</p>
<p>Ortac/QCheck-STM is provided by the <a href="https://ocaml.org/p/ortac-qcheck-stm/latest"><code>ortac-qcheck-stm</code></a> package, and Ortac/Dune by the <a href="https://ocaml.org/p/ortac-dune/latest"><code>ortac-dune</code></a> one. We need to declare these two packages as dependency for our project.</p>
<h2>Writing Some Gospel Specifications</h2>
<p>Let's take a look at the interface file for our priority queue, where the Gospel specifications are stored.</p>
<p>We begin by declaring the OCaml abstract type of a priority queue alongside its logical specification:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ open Sequence </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ type 'a priority = 'a * integer </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ type 'a priority_queue = 'a priority sequence </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ mutable model contents : 'a priority_queue
</span><span class="ocaml-comment-block">    with t
</span><span class="ocaml-comment-block">    invariant let q = t.contents in
</span><span class="ocaml-comment-block">              forall i.
</span><span class="ocaml-comment-block">              1 &lt;= i &lt; length q
</span><span class="ocaml-comment-block">              -&gt; snd q[i-1] &gt;= snd q[i] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>Gospel annotations are written inside special comments opened with <code>(*@</code>. We can open modules from the logical library that Gospel provides. The Sequence Module contains the definition of a mathematical sequence and some operations. Ortac comes with an implementation of the Gospel logical library. We can also declare Gospel types using the same syntax as the OCaml one.</p>
<p>Gospel annotations immediately following an OCaml <code>type</code> or <code>val</code> are attached to it, a bit like documentation comments. So, in the example above, <code>mutable model contents ...</code> is the specification for the <code>type 'a t</code>. Gospel is a specification language based on models. Thus, in the first line of the specification, we give our OCaml type a model named <code>contents</code>. The model is also marked as <code>mutable</code>. That doesn't mean that the model itself is mutable but that the model of an <code>'a t</code> may change when that <code>'a t</code> is mutated. It is a way of specifying that the OCaml type <code>'a t</code> is mutable.</p>
<p>Now, our model <code>contents</code> has a Gospel type: <code>'a priority_queue</code>. If we unfold the definitions of the Gospel types <code>priority_queue</code> and <code>priority</code> given just before in the file, this means that we will see OCaml values of type <code>'a t</code> as mathematical sequences of pairs of an element and an integer representing this element priority. There are other models we could have reasonably chosen, but let's go with this one. Let's also note that the model we choose is not directly related to the actual implementation. The logical model should make sense for any possible implementation.</p>
<p>We also use the <code>invariant</code> mechanism from Gospel to express that we keep the elements in decreasing order of priority in the model. This is not necessary, but it makes expressing the inspection and the evolution of the logical model a lot easier.</p>
<p>Now, we need to be able to express three fundamental operations precisely in our model: inserting a new element with its associated priority, looking at the next element, and deleting the next element.</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ function insert (q : 'a priority_queue)
</span><span class="ocaml-comment-block">                    (a : 'a)
</span><span class="ocaml-comment-block">                    (i : integer)
</span><span class="ocaml-comment-block">                    : 'a priority_queue =
</span><span class="ocaml-comment-block">      let higher = filter (fun x -&gt; snd x &gt; i) q in
</span><span class="ocaml-comment-block">      let equal = filter (fun x -&gt; snd x = i) q in
</span><span class="ocaml-comment-block">      let lesser = filter (fun x -&gt; snd x &lt; i) q in
</span><span class="ocaml-comment-block">      higher ++ snoc equal (a, i) ++ lesser </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ function peek (q : 'a priority_queue) : 'a option =
</span><span class="ocaml-comment-block">      if q = empty
</span><span class="ocaml-comment-block">      then None
</span><span class="ocaml-comment-block">      else Some (fst (hd q)) </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ function delete (q : 'a priority_queue) : 'a priority_queue =
</span><span class="ocaml-comment-block">      if q = empty
</span><span class="ocaml-comment-block">      then q
</span><span class="ocaml-comment-block">      else tl q </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>The <code>insert</code> function does all the specification heavy lifting and is responsible for maintaining the invariant we've declared for our logical model. Note that Ortac/QCheck-STM will include invariant verifications in the generated tests!</p>
<p>The <code>insert</code> function is also the place where we choose to store the elements of the same priority in a FIFO manner by using the function <code>snoc</code> from the Gospel logical library.</p>
<p>The way <code>insert</code> is written makes peeking and deleting straightforward. In both cases, it suffices to look at and delete the first element of the sequence.</p>
<p>As we can see, this is similar to a very naive <em>trusted</em> functional implementation of the logical model. This makes a lot of sense if we think about it! As Ortac compiles Gospel specifications into OCaml code, it consumes <em>computable</em> specifications. This process also looks like part of what we would have written if we had been using <code>QCheck-STM</code> directly. These three functions are the ones necessary to implement the functional state-machine that the QCheck-STM tests will rely on, whether hand-written or Ortac-generated. One of the benefits of using Ortac/QCheck-STM is that it will test the Gospel implementation of these functions against the model's invariants for free!</p>
<p>We could argue that using an ordered list as a model for a priority queue is inefficient. But, as we are in a testing situation, performance matters less than the correctness of the model.</p>
<p>We now have all the necessary vocabulary to talk about the behaviour we expect from a priority queue:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">empty</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ q = empty ()
</span><span class="ocaml-comment-block">    ensures q.contents = Sequence.empty </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">insert</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ insert q a i
</span><span class="ocaml-comment-block">    modifies q.contents
</span><span class="ocaml-comment-block">    ensures q.contents = insert (old q.contents) a i </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">peek</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ o = peek q
</span><span class="ocaml-comment-block">    ensures o = peek q.contents </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">extract</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ o = extract q
</span><span class="ocaml-comment-block">    modifies q.contents
</span><span class="ocaml-comment-block">    ensures o = peek (old q.contents)
</span><span class="ocaml-comment-block">    ensures q.contents = delete (old q.contents) </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>Gospel function's specification takes the form of a contract. The first line of which is a header naming the arguments and the returned value. What follows is a sequence of clauses describing the expected behaviour. Note that in these clauses, the names <code>insert</code>, <code>peek</code>, and <code>delete</code> refer to the Gospel functions defined above.</p>
<p>Here, the functions' contract is pretty straightforward. It is worth noting that in order to be able to include a function in the tests, Ortac/QCheck-STM asks that whenever a new SUT is created, or an existing one is modified, the contract contains a post-condition (an <code>ensures</code> clause) describing, again, in a computable way, the value of the related model.</p>
<p>When there are other post-conditions, they are added to the tests. Besides, when a post-condition is describing, in a computable way, the output value, Ortac/QCheck-STM will use it to include information about the expected returned value in the bug report in case of test failure.</p>
<h2>Integrating Specification-Driven Tests With a Dune Workflow</h2>
<p>Now, with a bit more effort, specification-driven testing is just a <code>dune runtest</code> away!</p>
<p>The first thing Ortac/QCheck-STM needs is a minimal configuration file in order to know how to build the QCheck-STM test suite we want. The minimal configuration consists of defining the <code>sut</code> type and the <code>init_sut</code> value.</p>
<p>The <code>sut</code> type is the type we want to focus the tests on, and the <code>init_sut</code> value is how to create an initial value to start the tests.</p>
<p>The configuration file also contains some additional information. Here we shadow the QCheck generators for <code>int</code>s and <code>char</code>s so that we only deal with three levels of priorities and readable elements in the tests.</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">sut</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">char</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init_sut</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">empty</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Gen</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">oneofl</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">char</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">char_range</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'a'</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'z'</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>Finally, we want to be able to generate and launch the tests with a <code>dune runtest</code>. Thanks to the Ortac/Dune plugin, we only need to write one rule:</p>
<pre><code><span class="dune-meta-stanza">(</span><span class="dune-meta-class-stanza">rule</span><span class="dune-meta-stanza">
</span><span class="dune-meta-stanza"> </span><span class="dune-meta-stanza">(</span><span class="dune-meta-class-stanza">alias</span><span class="dune-meta-stanza"> </span><span class="dune-meta-atom">runtest</span><span class="dune-meta-stanza">)</span><span class="dune-meta-stanza">
</span><span class="dune-meta-stanza"> </span><span class="dune-meta-stanza-rule">(</span><span class="dune-keyword-other">mode</span><span class="dune-meta-stanza-rule"> </span><span class="dune-constant-language-rule-mode">promote</span><span class="dune-meta-stanza-rule">)</span><span class="dune-meta-stanza">
</span><span class="dune-meta-stanza"> </span><span class="dune-meta-stanza-rule">(</span><span class="dune-keyword-other">deps</span><span class="dune-meta-stanza-rule">
</span><span class="dune-meta-stanza-rule">  </span><span class="dune-entity-tag-list-parenthesis">(</span><span class="dune-constant-language-flag">:specs</span><span class="dune-meta-list"> %{</span><span class="dune-meta-atom">project_root}/src/priority_queue.mli</span><span class="dune-entity-tag-list-parenthesis">)</span><span class="dune-meta-stanza-rule">)</span><span class="dune-meta-stanza">
</span><span class="dune-meta-stanza"> </span><span class="dune-meta-stanza-rule">(</span><span class="dune-keyword-other">action</span><span class="dune-meta-stanza-rule">
</span><span class="dune-meta-stanza-rule">  </span><span class="dune-meta-stanza-rule-action">(</span><span class="dune-entity-name-function-action">with-stdout-to</span><span class="dune-meta-stanza-rule-action">
</span><span class="dune-meta-stanza-rule-action">   </span><span class="dune-meta-atom">dune.inc</span><span class="dune-meta-stanza-rule-action">
</span><span class="dune-meta-stanza-rule-action">   </span><span class="dune-meta-stanza-rule-action">(</span><span class="dune-entity-name-function-action">run</span><span class="dune-meta-stanza-rule-action"> </span><span class="dune-meta-atom">ortac</span><span class="dune-meta-stanza-rule-action"> </span><span class="dune-meta-atom">dune</span><span class="dune-meta-stanza-rule-action"> </span><span class="dune-meta-atom">qcheck-stm</span><span class="dune-meta-stanza-rule-action"> %{</span><span class="dune-meta-atom">specs</span><span class="dune-meta-stanza-rule-action">}</span><span class="dune-meta-stanza-rule-action">)</span><span class="dune-meta-stanza-rule-action">)</span><span class="dune-meta-stanza-rule">)</span><span class="dune-meta-stanza">)</span><span class="dune-source">
</span><span class="dune-source">
</span><span class="dune-meta-stanza">(</span><span class="dune-meta-class-stanza">include</span><span class="dune-meta-stanza"> </span><span class="dune-meta-atom">dune.inc</span><span class="dune-meta-stanza">)</span><span class="dune-source">
</span></code></pre>
<p>Other than that, you are all set to implement a priority queue against specification-driven generated tests!</p>
<h2>Current and Future Work</h2>
<p>As mentioned <a href="https://github.com/ocaml-gospel/ortac?tab=readme-ov-file#found-issues">in the repo</a>, Ortac/QCheck-STM has already proven itself useful by discovering and helping fix a number of issues.</p>
<p>Thanks to Charlène Gros, the Ortac/Wrapper mode has just been released. Ortac/Wrapper consumes Gospel annotations to generate runtime assertion checking. Given an annotated OCaml interface file, this plugin will generate a new module with the same signature but with an implementation instrumented with assertions taken from the Gospel specifications. Each function is <em>wrapped</em> with an assertion of its pre- and post-conditions.</p>
<p>Another project is to make Ortac/QCheck-STM also target QCheck-STM/Domains. For now, Ortac/QCheck-STM only generates QCheck-STM tests for testing the library in a sequential context. One of the strengths of QCheck-STM is that it provides a way to test mutable data structures in a parallel context using domains.</p>
<p>The whole project is under active development and should continue to evolve. We are also committed to open-source software, so if you want to take Ortac for a spin (and I encourage you to), please don't hesitate to ask questions, contribute issues, or even PRs!</p>
<h2>Until Next Time</h2>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up to our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-09-10-dynamic-formal-verification-in-ocaml-an-ortac-qcheck-stm-tutorial</link><guid isPermaLink="false">https://tarides.com/blog/2025-09-10-dynamic-formal-verification-in-ocaml-an-ortac-qcheck-stm-tutorial.html</guid><dc:creator><![CDATA[ Nicolas Osborne ]]></dc:creator><pubDate>Wed, 10 Sep 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Internship Report: Refactoring Tools Coming to Merlin]]></title><description><![CDATA[<p>Refactoring features have contributed to the popularity of editors like <a href="https://www.jetbrains.com/idea/">IntelliJ</a>, as well as certain programming languages whose editor support offers interactive mechanisms to manage code — <a href="https://gleam.run/language-server/">Gleam</a> being an excellent example. Even though OCaml has some features related to refactoring (such as <a href="https://discuss.ocaml.org/t/ann-merlin-and-ocaml-lsp-support-experimental-project-wide-renaming/16008">renaming occurrences</a>, <a href="https://ocaml.github.io/merlin/editor/emacs/#expression-construction">substituting typed holes</a> with expressions, and <a href="/blog/2024-05-29-effective-ml-through-merlin-s-destruct-command/">case analysis</a> for pattern matching), the goal of my internship was to kickstart work on a robust set of features to enable the smooth integration of multiple complementary refactoring support commands.</p>
<p>As part of my Tarides internship (on the editor side), I specified several useful commands, inspired by competitors and materialised in the form of RFCs, subject to discussion. There were multiple candidates, but we found that <em>expression extraction to toplevel</em> was the most suitable for a first experiment. Since it touched on several parts of the protocol and required tools that could be reused for other features, it was important to design the system with extensibility and modularity in mind.</p>
<p>In this article, I will present the results of this experiment, including the new command and some interesting use cases.</p>
<h2>Examples</h2>
<p><em>Expression extraction to toplevel</em> will select the most inclusive expression that fits in your selection and propose to extract it. In this case, <code>extract</code> means that the selected expression will be moved into its own freshly generated let binding top level.</p>
<h3>Extracting Constants</h3>
<p>Here is a first example: Let's try to extract a constant. Let’s assume
that the float 3.14159 is selected in the following code snippet:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">circle_area</span><span class="ocaml-source"> </span><span class="ocaml-source">radius</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">3.14159</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">radius</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">**</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">2.</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">                      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ^^^^^^^ </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>The <code>extract</code> action code will then be proposed, and if you apply it, the code will look like this:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">const_name1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">3.14159</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">circle_area</span><span class="ocaml-source"> </span><span class="ocaml-source">radius</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">const_name1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">radius</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">**</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">2.</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Here is an illustrated example (based on an experimental branch of <a href="https://github.com/tarides/ocaml-eglot">ocaml-eglot</a>):</p>
<p align="center">
<img src="/blog/images/merlin-extract-1~RDOf-0dhfHB2JyJ0n1d4hw.gif" alt="Extract constant">
</p>
<p>We can see that the expression has been effectively extracted and replaced by a reference to the fresh let binding. We can also observe that in the absence of a specified name, the generated binding will be named with a generic name that is not taken in the destination scope. You also have the ability to supply the name you want for extraction.</p>
<p>For example, here is the same example where the user can enter a name:</p>
<p align="center">
<img src="/blog/images/merlin-extract-2~ZfD3Cjk4AeK3amEbq66vuQ.gif" alt="Extract constant with a given name">
</p>
<p>But the refactoring capabilities go much further than constant extraction!</p>
<h3>Extracting an Expression</h3>
<p>In our previous example, we could speculate about the purity of the expression, since we were only extracting a literal value. However, OCaml is an impure language, so extracting an expression into a constant can lead to unintended behavior. For example, let's imagine the following snippet:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> 
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> 
</span><span class="ocaml-source">    </span><span class="ocaml-source">print_endline</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Hello World!</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">print_endline</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Done</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>In this example, extracting into a <em>constant</em> would cause problems! Indeed, we would be changing the semantics of our program by executing both print statements beforehand. Fortunately, the command analyses the expression as not being a constant and delays its execution using a thunk — a function of <code>type unit -&gt; ...</code>.</p>
<p align="center">
<img src="/blog/images/merlin-extract-3~2oFjCOtDbixwhiiQOMoMyg.gif" alt="Extract expression">
</p>
<p>As we can see, our goal was to maximise the production of valid code, as much as possible, by carefully analysing how to perform the extraction. This is all the more challenging in OCaml, which allows for arbitrary (and potentially infinite) nesting of expressions.</p>
<h3>Extracting an Expression That Uses Variables</h3>
<p>The final point we’ll briefly cover is the most fun. Indeed, it’s possible that the expression we want to extract depends on values defined in the current scope. For example:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">z</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">45</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a_complicated_function</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> 
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> 
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">11</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> 
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">c</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> 
</span><span class="ocaml-source">  </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">c</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">c</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">z</span><span class="ocaml-source">
</span></code></pre>
<p>In this example, the extraction of the expression <code>a + b + c (c * x * y) + z</code> will be placed between <code>z</code> and <code>a_complicated_function</code>. As a result, <code>z</code> will still be accessible; however, <code>x</code>, <code>y</code>, <code>a</code>, <code>b</code>, and <code>c</code> will be <a href="https://en.wikipedia.org/wiki/Free_variables_and_bound_variables">free variables</a> in the extracted expression. Therefore, we generate a function that takes these free variables as arguments:</p>
<p align="center">
<img src="/blog/images/merlin-extract-4~THNQL-EZb5Gx7m80iaOg8A.gif" alt="Extract expression with free variables">
</p>
<p>Identifying free variables was one of the motivations for starting with this command. We are fairly certain that this is a function that we will need to reuse in many contexts!  Note that the command behaves correctly in the presence of objects and modules.</p>
<h2>A Real World Example</h2>
<p>Let’s try to extract something a little more complicated now. Let’s assume we have the following code and we want to refactor it, for example, by extracting the <code>markup</code> type pretty print logic outside our <code>pp</code> function.</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">markup</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">and</span><span class="ocaml-source"> </span><span class="ocaml-source">markup</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Text</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bold</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">show</span><span class="ocaml-source"> </span><span class="ocaml-source">doc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">buf</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">101</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">bold_tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">**</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">markup</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add_string</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">markup</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Text</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bold</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">bold_tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">^</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">^</span><span class="ocaml-source"> </span><span class="ocaml-source">bold_tag</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">doc</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">contents</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source">
</span></code></pre>
<p>We can observe that bounded variables in the extracted region are now passed as arguments, and the extracted function is properly replaced by a call to the new show_markup generated function.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">show_markup</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source"> </span><span class="ocaml-source">bold_tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">markup</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add_string</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">markup</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Text</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bold</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">bold_tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">^</span><span class="ocaml-source"> </span><span class="ocaml-source">txt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">^</span><span class="ocaml-source"> </span><span class="ocaml-source">bold_tag</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">show</span><span class="ocaml-source"> </span><span class="ocaml-source">doc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">buf</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">101</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">bold_tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">**</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">show_markup</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source"> </span><span class="ocaml-source">bold_tag</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">doc</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">contents</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source">
</span></code></pre>
<p>Here is an example of how it is used. <em>Impressive, isn't it</em>?</p>
<p align="center">
<img src="/blog/images/merlin-extract-5~eKmd53L9w5fkYgE7KoBFDA.gif" alt="Extract show_markup">
</p>
<h2>Editor Support</h2>
<p>To understand how this new Merlin command can be properly used in your favourite editor, we have to take a closer look at the functioning of the Language Server Protocol. The LSP supports two mechanisms to extend the existing protocol with new features. First, there is <code>code action</code>, which allows us to perform multiple LSP commands  sequentially. This kind of request has the merit of working out of the box without requiring any plugin or specific command support on the editor side (which oils the wheels for maintenance). Secondly, there are <code>custom requests</code>, which are more powerful than code actions and enable custom interactivity. So, if you want to prompt the user, a custom request is the way to go. The price you have to pay for this power is to have client-side support implemented for each custom request in every editor plugin.</p>
<p>The current editor team approach is as follows: For each of Merlin's commands that don't map directly to a standard LSP request, we provide a code action associated with the Merlin command and potentially a dedicated custom request if the feature requires custom interactivity. Regarding the ‘extract’ feature, the associated code action does not allow us to choose the name of the generated let binding, but the custom request does.</p>
<h2>What’s Next?</h2>
<p>I hope this new command helps you get even more productive in OCaml! Don’t hesitate to experiment with it and report any bugs you encounter.</p>
<p>The development of Merlin’s refactoring tools was part of a broader vision to improve OCaml editor support and perhaps claim an editor experience similar to JetBrains IDE in the future!</p>
<p>The work done on the <code>extract</code> command gives us the opportunity to identify various problems pertaining to refactoring (<em>substitution</em>, <em>code generation</em>) and potentially to make the connection to refactoring commands that already exist in Merlin (like <code>open</code> refactoring and project-wide renaming). The next step is to add a small toolbox library in Merlin dedicated to refactoring in order to develop even more refactor actions. I hope this is just the first refactoring feature of a long series.</p>
<p>If you're curious and want to take a look at the feature, it's split into several PRs:</p>
<ul>
<li><a href="https://github.com/ocaml/merlin/pull/1948">ocaml/merlin#1948</a> which implements the extraction logic on the Merlin side and exposes it in the protocol,</li>
<li><a href="https://github.com/ocaml/ocaml-lsp/pull/1545">ocaml/ocaml-lsp#1545</a> which exposes the Custom Request enabling the use of the LSP-side functionality,</li>
<li><a href="https://github.com/ocaml/ocaml-lsp/pull/1546">ocaml/ocaml-lsp#1546</a> which exposes an Action Code that allows the functionality to be invoked without additional formalities on the Editor side,</li>
<li><a href="https://github.com/tarides/ocaml-eglot/pull/65">tarides/ocaml-eglot#65</a> which implements extraction behaviour in OCaml-Eglot, invocable either from a type enclosing or directly as a classic Emacs command.</li>
</ul>
<p>All of these PRs are currently under review, and should be merged soon!</p>
<p>A big thanks to <a href="/blog/author/xavier-van-de-woestyne/">Xavier</a>, <a href="/blog/author/ulysse-gerard/">Ulysse</a>, and all the people that helped me during this internship. It was pretty interesting!</p>
<p>You can connect with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up to our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-08-20-internship-report-refactoring-tools-coming-to-merlin</link><guid isPermaLink="false">https://tarides.com/blog/2025-08-20-internship-report-refactoring-tools-coming-to-merlin.html</guid><dc:creator><![CDATA[ Timéo Arnouts ]]></dc:creator><pubDate>Wed, 20 Aug 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Add Your Own Recipes to the OCaml Cookbook!]]></title><description><![CDATA[<p>Are you looking to learn something new about OCaml? Or do you want to contribute to the community in a new way? <a href="http://OCaml.org">OCaml.org</a> hosts the <a href="https://ocaml.org/cookbook">OCaml Cookbook</a>, a collection of projects that users can try out, as well as contribute new ones for others to enjoy. This post will introduce you to the concept, show you how to add new recipes, and hopefully leave you inspired to check it out for yourself!</p>
<h2>Why do We Care About <a href="http://OCaml.org">OCaml.org</a>?</h2>
<p>Tarides supports the maintenance and development of OCaml.org, OCaml’s home on the web. Our engineers have spent significant time collaborating with all corners of the OCaml community to update the website, including improving the design, accessibility, documentation, and much more. We continue to fund projects that implement new features that the OCaml community wants and maintain others that they have come to rely on.</p>
<p>As an open-source, collaborative, and shared resource for the ecosystem, OCaml.org is truly a public common. The OCaml Cookbooks are just one example of its many features, which also include tutorials, documentation, news, and an OCaml playground. We encourage more contributors, from sponsors to maintainers, to join the many others in supporting this important resource.</p>
<h2>What is the OCaml Cookbook?</h2>
<p>The <a href="https://ocaml.org/cookbook">OCaml Cookbook</a> is a collection of ‘recipes’, instructions on how to complete tasks as part of projects using open source libraries and tools. The same task can have multiple recipes, each with its unique combination of resources. The result is a varied collection of projects that help users adopt new techniques, try new tools, and gain confidence in OCaml. OCaml’s book currently has recipes on compression, single-threaded concurrency, cryptography, and more!</p>
<p>OCaml is far from the only language to have a cookbook, and the team was inspired to create one by the popular <a href="https://rust-lang-nursery.github.io/rust-cookbook/">Rust Cookbook</a>, as well as Go’s <a href="https://gobyexample.com/">Go by Example</a> introduction to the language.</p>
<h2>How to Contribute</h2>
<p>The team behind the cookbook are always looking for new contributions, and creating a new recipe is straightforward if you follow the <a href="https://github.com/ocaml/ocaml.org/blob/main/CONTRIBUTING.md#content-cookbook">contributing instructions</a>.</p>
<p>To add a new recipe to the cookbook, you will need to start by finding the <a href="https://github.com/ocaml/ocaml.org/tree/main">OCaml.org</a> repo on GitHub. To add a recipe to an existing task, find the task in the <a href="https://github.com/ocaml/ocaml.org/blob/main/data/cookbook/tasks.yml"><code>data/cookbook/tasks.yml</code></a> section, go to the task’s folder inside <a href="https://github.com/ocaml/ocaml.org/tree/main/data/cookbook"><code>data/cookbook/</code></a> which will have the same name as the task’s slug, and create an <code>.ml</code> file with the recipe and a YAML header with metadata about the recipe.</p>
<p>If the recipe you want to add doesn’t match an existing task, you will need to create a new task first. To add a task you will need to make an entry in the <a href="https://github.com/ocaml/ocaml.org/blob/main/data/cookbook/tasks.yml"><code>data/cookbook/tasks.yml</code></a> file. When adding a new task in this file, the title, description, and slug are mandatory fields to fill in, and the task has to be located under a relevant category. You can even create new categories to organise entire groups of new tasks should you wish to do so.</p>
<p>Submitting a recipe will create a pull request, which the group of cookbook moderators will review and, if approved, merge into the website. When picking a recipe to contribute, you should bear the general guidelines in mind: choose a task that you think is relevant to a wide audience; write correct, clear, code that compiles without errors; and check that the packages you’ve chosen and the code are ready for use in production. That’s it, you’re ready to publish!</p>
<h2>What Does a Recipe Look Like?</h2>
<p>Let’s take a quick look at a recipe in action. The <a href="https://ocaml.org/cookbook/salt-and-hash-password-with-argon2/hashargon2">Salt and Hash a Password with Argon2</a> recipe in the Cryptography section shows you how to use the <code>opam</code> package <code>argon2</code> to configure password hashing based on <a href="https://owasp.org/">OWASP</a> recommendations and Argon2 defaults. Be sure to check out the recipe on OCaml.org for the full context and nice formatting!</p>
<p>The recipe includes the code snippets for the configuration:</p>
<pre><code>let t_cost = 2 and
    m_cost = 65536 and
    parallelism = 1 and
    hash_len = 32 and
    salt_len = 10
</code></pre>
<p>The hash output length:</p>
<pre><code>let encoded_len =
  Argon2.encoded_len ~t_cost ~m_cost ~parallelism ~salt_len ~hash_len ~kind:ID
</code></pre>
<p>Generating a salt string:</p>
<pre><code>let gen_salt len =
  let rand_char _ = 65 + (Random.int 26) |&gt; char_of_int in
  String.init len rand_char
</code></pre>
<p>Returning an encoded hash string for the given password:</p>
<pre><code>let hash_password passwd =
  Result.map Argon2.ID.encoded_to_string
    (Argon2.ID.hash_encoded
        ~t_cost ~m_cost ~parallelism ~hash_len ~encoded_len
        ~pwd:passwd ~salt:(gen_salt salt_len))
</code></pre>
<p>And finally, verifying if the encoded hash string matches the given password:</p>
<pre><code>let verify encoded_hash pwd =
  match Argon2.verify ~encoded:encoded_hash ~pwd ~kind:ID with
  | Ok true_or_false -&gt; true_or_false
  | Error VERIFY_MISMATCH -&gt; false
  | Error e -&gt; raise (Failure (Argon2.ErrorCodes.message e))

let () =
  let hashed_pwd = Result.get_ok (hash_password "my insecure password") in
  Printf.printf "Hashed password: %s\n" hashed_pwd;
  let fst_attempt = "my secure password" in
  Printf.printf "'%s' is correct? %B\n" fst_attempt (verify hashed_pwd fst_attempt);
  let snd_attempt = "my insecure password" in
  Printf.printf "'%s' is correct? %B\n" snd_attempt (verify hashed_pwd snd_attempt)
</code></pre>
<h2>Contribute to the CookBook!</h2>
<p>We invite you to take a look at the <a href="https://ocaml.org/cookbook">existing recipes</a> up on the website and bring your own contributions to the book. If you have questions or want input on a recipe, <a href="https://discuss.ocaml.org/">OCaml’s Discuss forum</a> is a great place to post to get tips and feedback.</p>
<p>Stay in touch with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>,
<a href="https://mastodon.social/@tarides">Mastodon</a>,
<a href="https://www.threads.net/@taridesltd">Threads</a>, and
<a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to
hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-07-25-add-your-own-recipes-to-the-ocaml-cookbook</link><guid isPermaLink="false">https://tarides.com/blog/2025-07-25-add-your-own-recipes-to-the-ocaml-cookbook.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 25 Jul 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing Jane Street's OxCaml Branch!]]></title><description><![CDATA[<p><a href="https://www.janestreet.com">Jane Street</a> is a well-known OCaml powerhouse. They have a reputation for expertise and a long history of supporting the open-source community. Jane Street have been developing experimental features on a branch of OCaml, using them in production internally, and preparing them to be shared with the rest of the ecosystem. These extensions are now bundled and distributed together under the name <a href="https://oxcaml.org/">OxCaml</a>.</p>
<p>We are always excited about projects that bring new features to OCaml and improve it for its users. This post will give you an overview of the different features being developed under the OxCaml umbrella and how Tarides is collaborating with Jane Street on the project.</p>
<h2>What is OxCaml</h2>
<p>OxCaml is an open-source branch of OCaml that incorporates several extensions designed to help write multicore and high-performance OCaml code.</p>
<p>Jane Street relies on OCaml’s sweet spot. OCaml lets you write code quickly, code that is performant, code that solves complex problems, and code that can be trusted, thanks to its strong focus on <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">correctness-by-construction</a>. However, like any language, OCaml has its limits. When writing low-latency code, you'd like the garbage collector not to kick in at inopportune moments. While this may seem like a niche use case, it is a niche that matters to Jane Street.</p>
<p>With OCaml 5, OCaml code can exploit <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">shared-memory parallelism</a>. Shared-memory parallel programming also leads to <a href="/blog/2024-01-17-what-are-data-races-and-do-they-threaten-your-business/">data races</a>. Because noticing and debugging data races is difficult, code review isn’t sufficient to trust multicore code in mission-critical contexts. OxCaml extensions have three main goals:</p>
<ol>
<li>Enable writing correct-by-construction multicore code;</li>
<li>Enable writing low-latency code in OCaml instead of a GC-free language;</li>
<li>Raise the overall level of performance.</li>
</ol>
<p>Last but not least, all those goals should be achieved without moving OCaml away from its sweet spot.</p>
<p>Furthermore, according to <a href="https://oxcaml.org">its website</a>, OxCaml’s primary design goals are to be “safe, convenient, [and] predictable”. Safety makes developers more productive and ensures they ship correct code. Convenience means using OCaml’s type system and type inference to provide adequate choice and control without adding complexity. Predictability involves retaining the aspects of OCaml that make it easy for developers to understand how their code will perform simply by looking at it, which requires keeping performance details explicit at the type level.</p>
<p>The OxCaml branch is open-source and welcomes new users. However, the extensions are experimental and not guaranteed to be stable or backwards compatible. Your feedback is needed to fine-tune the experience and improve the various new features. You can provide feedback by making issues on the <a href="https://github.com/oxcaml/oxcaml">OxCaml GitHub repo</a> or discuss it in the <code>#oxcaml</code> channel in the <a href="https://discord.gg/GTFyEupD8Q">OCaml community discord server</a>.</p>
<p>Open-source tools and libraries provided by Jane Street now come in two flavours. The default targets classic OCaml. The <code>with-extensions</code> branch targets OxCaml. For instance, Jane Street's successful Base library has a <a href="https://github.com/janestreet/base/tree/with-extensions"><code>with-extensions</code> branch</a>. Jane Street also provides an opam repository with <a href="https://github.com/oxcaml/opam-repository/">OxCaml compatibility patched versions of the packages</a>.</p>
<h2>How Tarides is Helping</h2>
<p>Tarides takes part in the processes around OxCaml in two ways: by providing platform support for working with and distributing OxCaml code, and by helping to upstream some of its features into mainline OCaml.</p>
<p>Tarides has adapted OxCaml features to the official compiler codebase and taken part in design discussions with the OCaml maintainers to ensure that the features integrate well into the existing compiler. Sometimes, the upstream compiler will have features that OxCaml doesn’t have yet, which overlap with OxCaml’s extensions, so care is needed to handle everything as seamlessly as possible and in a backwards-compatible way.</p>
<p>The OCaml 5.4 features <a href="ocaml/ocaml#13498">labelled tuples</a> and <a href="ocaml/ocaml#13097">immutable arrays</a> are two recent examples which have been upstreamed from OxCaml with Tarides’s assistance. For future releases, Tarides will be part of the design discussions for upstreaming new features, like "include functor", polymorphic parameters, module strengthening, and possibly more.</p>
<h2>Experimental Extensions</h2>
<p>Let’s take a look at the different extensions that are part of OxCaml:</p>
<ul>
<li><strong><a href="https://github.com/oxcaml/oxcaml/blob/main/jane/doc/extensions/_04-modes/intro.md">Modes</a>:</strong> Modes are deep properties of values that are tracked by the OxCaml compiler. They are similar but distinct from types. While types describe what a value <em>is</em>, modes describe <em>how</em> they can be used. Each value in OxCaml has a mode (plus its type). OxCaml permits type signatures to be decorated with modes, restricting how that value may be used. As an example, the <a href="https://oxcaml.org/documentation/modes/intro/#uniqueness-linearity">linear <em>modality</em></a> on a function says that the function may be invoked at-most-once. This property is distinct from the type of the function.</li>
<li><strong><a href="https://github.com/ocaml-flambda/flambda-backend/blob/main/jane/doc/extensions/_01-stack-allocation/intro.md">Stack Allocations</a>:</strong> With OxCaml, more values can be allocated on the stack instead of on the heap, which improves performance by reusing cache lines and reducing the cache footprint. The compiler uses a value’s locality to determine whether it is local and, therefore, can be allocated on the stack or global and needs to go on the heap.</li>
<li><strong><a href="https://github.com/ocaml-flambda/flambda-backend/blob/main/jane/doc/extensions/_02-unboxed-types/intro.md">Unboxed Types</a>:</strong> This extension gives users more options on how their data is represented in memory and registers. Unboxed types introduce the concept of <em>layouts</em>. Every type has a layout, with a number of base layouts available to the type system.</li>
<li><strong><a href="https://github.com/oxcaml/oxcaml/blob/main/jane/doc/extensions/_03-parallelism/intro.md">Data-Race-Free Parallelism</a>:</strong> There are a number of new features centred on OCaml’s support for multiple domains, including extending the mode system to track the concurrent use of values to improve safety during concurrency and introducing higher-level parallelism primitives.</li>
<li><strong><a href="https://github.com/ocaml-flambda/flambda-backend/blob/main/jane/doc/extensions/_05-kinds/intro.md">Kinds</a>:</strong> This adds a new system that extends the type system by adding “types” to types, that's what a kind is, a type's “type”. Two words are used to avoid confusion, as kinds don't have allocated values at runtime; only types do. Kinds have several components, among them layout, which describes the shape of the data at runtime and modal bounds, which assign limits on the mode's value may be assigned to.</li>
<li><strong><a href="https://github.com/ocaml-flambda/flambda-backend/blob/main/jane/doc/extensions/_06-uniqueness/intro.md">Uniqueness</a>:</strong> A mode that designates values that should only have a single reference pointing to them. By guaranteeing that only one reference will be consumed, the mode prevents bugs like the use-after-free segfault. There’s a <a href="https://blog.janestreet.com/oxidizing-ocaml-ownership/">Jane Street blog post</a>, as well as a <a href="https://kcsrk.info/ocaml/modes/oxcaml/2025/05/29/uniqueness_and_behavioural_types/">post by KC Sivaramakrishnan</a> about using Uniqueness and its features.</li>
<li><strong><a href="https://github.com/oxcaml/oxcaml/blob/main/jane/doc/extensions/_08-comprehensions/01-intro.md">Comprehensions</a>:</strong> This extension introduces ‘comprehensions’, which is a syntactic form that uses mathematical set-builder notation to build lists and arrays.</li>
<li><strong><a href="https://github.com/oxcaml/oxcaml/tree/main/jane/doc/extensions/_11-miscellaneous-extensions">Other Extensions</a>:</strong> There are several more smaller extensions as part of OxCaml. These include immutable arrays, labelled tuples, polymorphic parameters, and more!</li>
</ul>
<h2>Looking to the Future</h2>
<p>In light of the public release, we encourage OCaml users to <a href="https://oxcaml.org/get-oxcaml/">try OxCaml</a> and share their feedback. The best place to discuss OxCaml is the <code>#oxcaml</code> channel on <a href="https://discord.gg/GTFyEupD8Q">the OCaml community discord server</a>. Engineers from Jane Street and Tarides working on OxCaml regularly hang out in the channel and would be delighted to hear feedback.</p>
<p>The explicit goal of OxCaml is to iterate over these features and eventually upstream them to OCaml. Tarides has experience shepherding and successfully upstreaming Multicore OCaml, which was a multi-year effort to bring in native support for concurrency and parallelism to OCaml. Just as with Multicore OCaml, Tarides' goal is to make it easy for the community to work with OxCaml, gather feedback, iterate on the design with Jane Street engineers and help upstream the features to OCaml over the coming years. If we are successful with the upstreaming efforts, we believe that OCaml will fill an important gap in the design space for programming languages. We would like you to be part of this effort!</p>
<h2>Stay in Touch</h2>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-07-09-introducing-jane-street-s-oxcaml-branch</link><guid isPermaLink="false">https://tarides.com/blog/2025-07-09-introducing-jane-street-s-oxcaml-branch.html</guid><dc:creator><![CDATA[ KC Sivaramakrishnan, Cuihtlauac Alvarado, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 09 Jul 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Improving Memory Profiler Visualisations for OCaml]]></title><description><![CDATA[<p>Each year, Tarides has the pleasure of hosting several interns who work across different areas within the company. This year, we welcomed Kashish, who joined us to work on enhancing the visualisations for OCaml's memory profiling (you can check out our blog post about the return of <a href="/blog/2025-03-06-feature-parity-series-statmemprof-returns/">Statmemprof</a> for some more context). This post will explore Kashish's project and the steps we took to improve the visualisations available with OCaml's memory profiling tools.</p>
<h2>Background: Improving the OCaml 5 Memory Profiler</h2>
<p>First, a little background on the area. In OCaml, we have support for statistical memory profiling built into the runtime, called <strong>statmemprof</strong>.  The basic idea is the OCaml runtime provides an interface for registering callbacks to be called when interesting Garbage Collection events occur, such that we can track memory allocation activity for some statistical sample of allocations in an OCaml program. <a href="https://github.com/ocaml/ocaml/pull/12923">PR #12923</a> has more technical details of the implementation.</p>
<p>Built on top of this is <a href="https://github.com/janestreet/memtrace/">memtrace</a>, a library that uses the <strong>statmemprof</strong> interface to produce trace files formatted in the Common Trace Format (CTF). Memtrace has a <a href="https://github.com/janestreet/memtrace/blob/master/docs/internal.md">detailed technical description</a> of how it works. Finally, there is a web app, <a href="https://github.com/janestreet/memtrace_viewer">memtrace_viewer</a>, that displays information about memory allocations using a 'FlameGraph' format to visualise the allocations. Below is a sample of what such a trace might look like.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-05-26.memtrace-internship/memtrace-viewer-170w~4oRYdhhNZp8ZrL8kI79j6g.webp 170w, /blog/images/2025-05-26.memtrace-internship/memtrace-viewer-340w~2Mi7QL9cTT35CpNLJnaXgQ.webp 340w, /blog/images/2025-05-26.memtrace-internship/memtrace-viewer-680w~1Qbmi5ujo2_RCvLlKp1i2Q.webp 680w, /blog/images/2025-05-26.memtrace-internship/memtrace-viewer-1360w~0vRDcnfMCo67E2WX6yBDGg.webp 1360w" src="/blog/images/2025-05-26.memtrace-internship/memtrace-viewer-1360w~0vRDcnfMCo67E2WX6yBDGg.webp" alt=""></p>
<p>For Kashish's internship we thought about how to support other kinds of visualisations for memory profiling data. For example, Go uses a directed graph visualisation in <a href="https://go.dev/blog/pprof">pprof</a> that would be a <a href="https://signalsandthreads.com/performance-engineering-on-hard-mode/">good alternative to FlameGraphs</a>. FlameGraphs are excellent for visualising data. However, they lack a useful property called join points, which are points where stack traces start differently and then reach the same important function. Using a graph representation highlights these points in a way that FlameGraphs cannot.</p>
<h2>File Formats</h2>
<p>A picture is worth a thousand words, and this is never more appropriate than when trying to understand a bunch of numbers collected from a complex system like a garbage collector (GC). We had two realisations in approaching this problem: one, that we could reuse the work done by others in visualising it, and second, that there are common tracing formats already used by other languages that we could reuse to unlock more visualisation options. For example, Memtrace used a CTF file format that looked like it could be converted to the protocol buffers (protobuf) based format used by <code>pprof</code>.</p>
<p>The first step towards converting formats was to understand the protobuf-based format. Looking at both the <a href="https://github.com/google/pprof/tree/main/proto/README.md">README.md</a> and the <a href="https://github.com/google/pprof/blob/main/proto/profile.proto"><code>profile.proto</code></a> file gave an initial idea of the data types we needed. The pprof format is divided into three main parts: a <em>Profile</em> with general information, <em>Samples</em> recording values encountered in the execution of some program, and <em>Locations</em> identifying places within a program where samples are generated. Overall, the format is quite flexible and covers things like time profiling as well as memory profiling, which we're interested in.</p>
<p>Kashish looked at the pprof files generated from Go and Rust when doing memory profiling to see how they formatted their <em>Samples</em>. With this information, she started writing a tool to convert CTF traces into pprof traces. Starting with the <code>profile.proto</code>, she used <a href="https://ocaml.org/p/pbrt/latest">pbrt</a> and <code>ocaml-protoc</code> to generate the code for reading and writing the protobuf format; she then worked through the details of converting between the two formats. The end result is a cli tool for converting CTF files.</p>
<p>These files can be visualised using Go lang's pprof tool by running <code>pprof -http localhost:8080 &lt;proto_file&gt;</code>. For example:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-05-26.memtrace-internship/solver-graph-170w~PB5MLAyrip9ssVaupcvnDQ.webp 170w, /blog/images/2025-05-26.memtrace-internship/solver-graph-340w~Ljn2RvoGmKy03BXCIh27YA.webp 340w, /blog/images/2025-05-26.memtrace-internship/solver-graph-680w~5iaiWeCR2dl2alMmlmeLbw.webp 680w, /blog/images/2025-05-26.memtrace-internship/solver-graph-1360w~duyxE4WBxcD0A8A4g5blIA.webp 1360w" src="/blog/images/2025-05-26.memtrace-internship/solver-graph-1360w~duyxE4WBxcD0A8A4g5blIA.webp" alt=""></p>
<p>The picture is zoomed in on the <code>opamVersionCompare</code> function, which represents 33% of the allocations in this solver-service program.</p>
<p>From the top left menu, you can choose <em>Sample</em> to view the graph by the ‘number of samples’, i.e., the number of times a particular stack trace occurs in our profile or ‘alloc size’, i.e., the amount of memory allocated by each stack frame.</p>
<p>You can also use <em>Peek</em> to see a breakdown of allocations sorted by space allocated or objects allocated. In GC parlance, an <em>object</em> is a dynamically allocated piece of memory that contains an OCaml value. This visualisation can highlight the top allocating locations in an OCaml program. This first image shows the top allocating functions based on the memory size of their allocations and highlights code in OpamFormat parsing that is worth investigating.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-space-170w~02PHSvfUCNZYAgdjawzCCA.webp 170w, /blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-space-340w~aPgcYTRRdxrq6ZHSHBAr1Q.webp 340w, /blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-space-680w~qIwtjKtypAd5NSpFwYmf7g.webp 680w, /blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-space-1360w~RePruauCqce_0d520F6OpA.webp 1360w" src="/blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-space-1360w~RePruauCqce_0d520F6OpA.webp" alt=""></p>
<p>This second image shows the top object allocating functions unrelated to the amount of memory being allocated, which can highlight places that allocate many small pieces of memory and often go unnoticed. Note this again highlights the <code>opamVersionCompare</code> function.
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-objects-170w~XHJuSKgs3n4TPV3yJqUL8Q.webp 170w, /blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-objects-340w~HvdoWHAVMc5IT66_6A2PKA.webp 340w, /blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-objects-680w~EidLdlx8Hi1TgotMQ6i1jw.webp 680w, /blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-objects-1360w~qmwHt9IkEg3EZT6kVe7kGQ.webp 1360w" src="/blog/images/2025-05-26.memtrace-internship/solver-peek-alloc-objects-1360w~qmwHt9IkEg3EZT6kVe7kGQ.webp" alt=""></p>
<h2>A Library for Mappings: Blind Alleys</h2>
<p>The <code>pprof</code> format contains a field called <em>Mappings</em>, which uses information about the executable binary and the virtual addresses of functions or <em>Locations</em> in our stack traces. We thought we needed to get the virtual memory address for symbols to include as part of the stack traces in the profile. This would allow users to map addresses back to locations in the source code via the addresses in the executable.</p>
<p>However, these virtual addresses are not known until runtime, so we needed to get the virtual memory maps for the running process. On Linux, this information was available in the proc filesystem as <code>/proc/&lt;pid&gt;/maps</code> and simply required parsing out the information into a usable type. Then, we mapped the virtual addresses to a segment and back to locations in the original binary.</p>
<p>On MacOS, things are much more exciting and tricky. On the surface, macOS is a Unix operating system based on a FreeBSD userspace; however, under this facade is another operating system called <a href="https://en.m.wikipedia.org/wiki/Mach_(kernel)">Mach</a>, which is actually responsible for many aspects of the system, including how virtual memory maps for a process are represented. So, what we needed to do was write some low-level OCaml using c-types to call the right C functions to retrieve the information we needed. How that works deserves its own blog post, but you can read the code at <a href="https://github.com/tmcgilchrist/mach/">tmcgilchrist/mach</a>.</p>
<p>Later on, we realised that pprof traces could work in two ways; with memory addresses for compiled languages like C++ or Go or with symbolised locations for interpreted or JIT'd languages like Java or Python. The documentation calls these <em>Unsymbolized</em> profiles and <em>Symbolized</em> profiles respectively. OCaml will produce <em>Symbolized</em> profiles by default as Statmemprof supplies the location information for symbols. In future, <em>Unsymbolized</em> profiles could be supported, and even information like demangled names and source file locations could be included.</p>
<h2>Writing <code>pprof</code> Directly</h2>
<p>With the conversion code written and confident in our understanding of the file format, the next task was to write protobuf files directly from memtrace using the callback API provided by <code>Gc.Memprof.start</code> and creating a <code>Gc.Memprof.tracker</code> record. The resulting code was similar to the conversion code; however, there are some interesting points of difference for protobuf traces.</p>
<p>Protobuf traces tend to be larger than the equivalent CTF traces. This is because memtrace <a href="https://github.com/janestreet/memtrace/blob/master/docs/internal.md">optimises</a> the data written to a trace. The most important optimisation is the way it stores callstacks, as they are the single biggest piece of information stored. Consecutive backtraces usually only differ by the last few entries, so instead of storing the entire callstack each time, they store a “common prefix”, i.e., the number of entries that are the same as the callstack of the previous sample. Then, the reader can obtain the entire callstack by combining the “common prefix” entries with the new, unique entries.</p>
<p>With our restriction to use <em>pprof</em> for visualisations, we needed to use the pprof format as defined, and the pprof format does not support common prefixes. Meaning we could not implement common prefixes in our writer. In the future, it could be possible to write our own <code>protobuf</code> decoder that supports this feature.</p>
<p>To produce smaller trace files, <code>pprof</code> compresses its <code>protobuf</code> files using <code>gzip</code>, which significantly reduces their file size, making them much smaller than CTF files. To similarly reduce memory overhead in the <code>protobuf</code> writer, one option is to integrate an OCaml compression library such as <a href="https://github.com/xavierleroy/camlzip">camlzip</a> to compress the output on the fly as data is written. Naturally, this introduces a trade-off: lower memory usage at the cost of increased CPU time. For example, when profiling a sample program, the CTF file is 25Mb versus 421Mb for protobuf, which reduces to 8.3Mb when gzipped.</p>
<p>The Go and OCaml Garbage Collectors differ in important ways that impact the information collected in trace files. Since the premise of this work is reusing the Go tooling and visualisations, it is useful to understand what kind of Garbage Collector Go uses.</p>
<p>Go uses a Tracing Garbage Collector with the following properties:</p>
<ul>
<li>Hybrid stop-the-world/concurrent collector</li>
<li>Stop-the-world limited by a deadline (10 ms)</li>
<li>Concurrent collector running in parallel on CPU cores</li>
<li>Tri-colour mark-and-sweep algorithm</li>
<li>Non-generational</li>
<li>Non-compacting</li>
</ul>
<p>Of these the most important property is Go's GC isn't generational, while OCaml's is a generational GC. Memtrace tracks deallocation and promotion events between the generations. Currently, we're not tracking these events as pprof was built for Go programs and doesn't handle this information. The next obvious step would be writing out these events and building the visualisations to handle them.</p>
<h2>Collateral Fixes</h2>
<p>In the process of writing tests for the conversion tool, Kashish also discovered some failing tests in the OCaml 5 version of Memtrace caused by the way the <code>Gc.Memprof</code> API interacts with threads and the new domains introduced in OCaml 5. They were fixed <a href="https://github.com/tmcgilchrist/memtrace/pull/2">in this PR</a> and will be included in the OCaml 5.3 support <a href="https://github.com/janestreet/memtrace/pull/22">PR</a>.</p>
<h2>Until Next Time</h2>
<p>The goal of the internship was to improve visualisation options for memory profiling in OCaml by investigating different profiling file formats and then build tooling to generate (or convert) to these formats. Pprof is a protocol buffers-based format used by Go, Rust, and Java to capture profiling information. Kashish built tooling for converting CTF trace files to protobuf format and writing protobuf format directly from memtrace, both of which allow users to visualise memory profiles using a directed graph format originally used by pprof.</p>
<p>In future, it would be interesting for the OCaml community to build on this work by:</p>
<ul>
<li>Extending protobuf format to record all OCaml GC events.</li>
<li>Updating <em>memtrace viewer</em> to consume pprof format directly.</li>
<li>Producing <em>Unsymbolized</em> profiles from stripped binaries (i.e. without symbol information and just addresses). Reconstituting symbolised information afterwards, similar to C++ or Go.</li>
<li>Supporting encoding and decoding common prefix stack traces.</li>
<li>Adding different kinds of visualisations like treemaps, force directed graphs, or circle pack layout</li>
</ul>
<p>We would welcome contributions in these areas. Get in touch if anything there looks interesting or useful.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-07-04-improving-memory-profiler-visualisations-for-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2025-07-04-improving-memory-profiler-visualisations-for-ocaml.html</guid><dc:creator><![CDATA[ Kashish Raimalani, Tim McGilchrist, Isabella Leandersson ]]></dc:creator><pubDate>Fri, 04 Jul 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[ Feature Parity Series: Improving Developer Tooling on macOS]]></title><description><![CDATA[<p>When considering which projects to focus on, our highest priority tends to be those that restore support for tools users rely on or introduce new tools that address a compelling problem. One of those tools is the <a href="https://lldb.llvm.org/">LLDB</a> debugger, which needed some attention after the OCaml 5 update.</p>
<p>LLDB is the primary supported debugger for macOS and comes included with <a href="https://developer.apple.com/xcode/">Xcode</a> as part of Apple's developer tools. It supports both the ARM64 and AMD64 platforms, which OCaml also supports. Ensuring a smooth macOS experience is crucial, as it enables development on the Apple hardware used in the community and by Tarides engineers. This post provides an overview of the work done to enhance the macOS debugging experience for OCaml developers.</p>
<h2>LLDB and Our Goals</h2>
<p>Let's begin with some context about the technology we're discussing today. Debuggers are tools used to trace, manipulate, and visualise the state of a target program running on a target system. Developers use debuggers for tasks like tracing a program's control flow, inspecting the values of variables during execution, halting the program at predetermined locations, and executing functions within the running process.</p>
<p>Why focus on LLDB? It is a well-maintained project that supports a wide range of platforms we target, including macOS, iOS, FreeBSD, Windows, and Linux. Most significantly, for OCaml, LLDB is the only supported choice for ARM64 MacOS (an important and popular developer platform) and comes included with XCode. This means that providing a debugging experience on macOS requires LLDB. Unfortunately, GDB, another well-known open-source debugger, is unavailable for ARM64 macOS.</p>
<p>Based on our usage, we recognised that LLDB needed some attention and initially raised issue <a href="https://github.com/ocaml/ocaml/issues/12933">#12933</a>, highlighting that setting breakpoints within LLDB was broken. Further investigation revealed other problems, such as printing backtraces producing incorrect results. Additionally, we saw the opportunity to integrate GDB features, such as printing OCaml values and running debugger tests within OCaml's test suite.</p>
<p>To make an impact with LLDB support, we focused on:</p>
<ul>
<li>Fixing how to create breakpoints in LLDB</li>
<li>Porting GDB's Python-based value printers to LLDB</li>
<li>Improving the debugging information emitted by the OCaml compiler</li>
</ul>
<h2>Breakpoints and Name Mangling</h2>
<p>Breakpoints are a common feature in debuggers; they allow developers to halt program execution when a specific piece of code is executed. LLDB provides several methods for creating breakpoints. Firstly, you can use a memory address, or secondly, you can specify the name of a function, or finally, use a combination of a filename and a line number. Happily, using memory addresses to create breakpoints worked, but the other two ways were broken.</p>
<p>To understand how LLDB can set a breakpoint using a function name, we must explain how the compiler treats source-level names in OCaml and how that impacts LLDB. In the OCaml compiler, there is a process called <code>name mangling</code>, which involves turning the name of a program entity in OCaml into a form that is unique and can be linked against. Often, there are repeated names for particular functions or even modules in an OCaml codebase, and the compiler needs to generate unique names for them before sending them all to the linker, which is responsible for producing the final executable as a <a href="https://developer.apple.com/library/archive/documentation/Performance/Conceptual/CodeFootprint/Articles/MachOOverview.html">Mach-O</a> binary.</p>
<p>Concretely, when setting a breakpoint based on a function name, it is necessary to use the mangled name produced by OCaml. During OCaml 5 development (while fixing a linking bug), the name mangler was changed to generate names like <code>camlModule.function_name</code>. Let's illustrate how this works with an example, consider this Fibonacci program:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> fib.ml </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">fib</span><span class="ocaml-source"> </span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">fib</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">n</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">fib</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">n</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">main</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">fib</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">20</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Printf</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">fib(20) = </span><span class="ocaml-constant-character-printf">%d</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">main</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>The <code>main</code> function would get the name <code>camlFib.main_123</code>, and the <code>fib</code> function would be <code>camlFib.fib_271</code>; note that the trailing number can change between different runs of the compiler. These names are then used to create breakpoints within LLDB. Tab completion helps out, allowing you to specify part of the name and use tab to fill in the rest.</p>
<p>Unfortunately, using <code>.</code> in these names conflicted with LLDB, which used <code>.</code> for other purposes and wouldn't accept names with it present. Rather than modifying LLDB itself, we needed to alter the separator used in mangled names.</p>
<p>A one-character change seems simple, right? The other program that consumes these mangled names is called a <a href="https://en.wikipedia.org/wiki/Linker_(computing)">linker</a>, and they have restrictions on what characters can be used in a name, often restricted to a printable subset of ASCII characters. This problem already appeared in the <a href="https://github.com/ocaml/ocaml/pull/12640">MSVC</a> porting work, where the linker on that platform wouldn't accept <code>.</code> either and a workaround was introduced to use <code>$</code> instead. That was the approach we decided to take, and we modified name mangling for all platforms to use <code>$</code>. More details about how this change impacts other areas are detailed in <a href="https://github.com/ocaml/ocaml/pull/13050">PR #13050</a>. This change will appear in OCaml 5.4, fixing the problem of setting breakpoints using mangled names and providing consistent names across platforms.</p>
<p>Setting breakpoints based on filename and line number was a more exciting experience, which we will cover in a later post.</p>
<h2>Printing OCaml Values Using Python</h2>
<p>OCaml uses a uniform memory representation in which all values can be kept in a single machine word, typically 64 bits on modern hardware. An OCaml value is either an immediate integer or a pointer to some other memory representing the value. This representation has implications for debuggers, as understanding how to print an OCaml value requires familiarity with this memory structure.</p>
<p>Both GDB and LLDB can be extended using Python as a scripting language. This capability enables developers to implement custom printing formats for values, add new commands, and perform various other useful functions. There were pre-existing macros for GDB for printing OCaml values, these could be rewritten into Python which would allow them to be used with both debuggers.</p>
<p>The resulting core <a href="https://github.com/ocaml/ocaml/blob/trunk/tools/ocaml.py"><code>ocaml.py</code></a> library understands OCaml's uniform memory representation and how to print out values. The GDB-specific file <a href="https://github.com/ocaml/ocaml/blob/trunk/tools/gdb.py"><code>gdb.py</code></a> handles integrating with GDB's value printer, and a similar <a href="https://github.com/ocaml/ocaml/blob/trunk/tools/lldb.py"><code>lldb.py</code></a> exists for LLDB. The previous GDB macro file was retained for backward compatibility, but now it prints a deprecated warning when used. Beyond the core printing functionality, the new system also introduces improved commands <code>ocaml</code> and <code>ocaml find</code>, the former of which is introduced with the <a href="https://github.com/ocaml/ocaml/pull/13136">PR #13136</a> and allows for future sub-commands, and the latter being the heap search command, which was changed from <code>gdb-macros</code>.</p>
<p>Check out <a href="https://github.com/ocaml/ocaml/pull/13136">PR #13136</a> for more details, including several examples of what formatting you can expect when working with the debuggers. The end result is a Python-based solution shared between the two debuggers that can be easily extended in future.</p>
<h2>Improving the Debugging Information</h2>
<p>Debugging information is any data that is required by the debugger to perform its task. Often, this is extra information outside of the main executable. For example, a debugger needs to associate machine code in an executable with the source code used to produce it. DWARF is one such debug information format used on macOS and Linux systems.</p>
<p>We identified two issues with the debug information produced by OCaml:</p>
<ol>
<li>Printing backtraces produced incorrect results</li>
<li>LLDB would not display the OCaml source for an executable</li>
</ol>
<p>A backtrace is a visual representation of the current call stack for a program. There are two ways a debugger generates a backtrace (a process called <code>unwinding</code>): Call Frame Information (CFI) or Frame Pointers. CFI is part of the larger DWARF specification and is already used in the OCaml compiler. Clearly, the team needed to start by understanding the CFI information emitted by the compiler and validating the fixes. What followed was a series of PRs improving CFI.</p>
<p>The first PR, <a href="https://github.com/ocaml/ocaml/pull/13079">#13079</a>, focused on fixing the backtraces for macOS on the ARM64 platform. Somewhere in the 5.1.1 update, a change happened to the CFI information OCaml produced that caused LLDB to lose part of the stack trace when moving between C and OCaml frames.</p>
<p>The bug was caused by the difference in handling frame pointers: C frames maintained them while OCaml frames did not. The OCaml code generator reused the <code>x29</code> frame pointer register in the <code>Iextcall</code> fast path when calling from OCaml to C. Once identified, the fix involved correctly saving the <code>x29</code> register, resulting in better backtraces for ARM64 on macOS, and Linux. <a href="https://github.com/ocaml/ocaml/pull/13595">#13595</a> is a follow-up bugfix for CFI-based backtraces, where the wrong CFA register was used. This part of the code was rewritten when adding frame pointer support for ARM64 on macOS and Linux <a href="https://github.com/ocaml/ocaml/pull/13500">#13500</a>. Speaking of frame pointers, <a href="https://github.com/ocaml/ocaml/pull/13163">#13163</a> enabled frame pointers on the other macOS platform, AMD64. Now, printing backtraces can use either CFI or frame pointers to unwind the call stack.</p>
<p>The issue of LLDB not displaying OCaml source code for an executable was due to missing DWARF information, which is necessary on macOS but not on other platforms. We understand how to fix it and are working on a solution. Interestingly, the same lack of DWARF information is why setting breakpoints based on filenames and line numbers is broken.</p>
<h2>Extras</h2>
<p>While working on CFI for ARM64, we found and fixed CFI issues on other platforms. In particular, Linux for ARM64 and Risc-V platforms. The three PRs, <a href="https://github.com/ocaml/ocaml/pull/13241">#13241</a>, <a href="https://github.com/ocaml/ocaml/pull/13261">#13261</a>, and <a href="https://github.com/ocaml/ocaml/pull/13271">#13271</a>, address incorrect Call Frame Information (CFI) for LLDB and GDB. These bugs highlighted the difficulty in ensuring the CFI information is correct and that debuggers work as expected. Although both GDB and LLDB use CFI, their interpretations of the specification sometimes differ, leading to subtle bugs.</p>
<p>To allow users to check their debugger's functionality and to help ensure that future changes don't break the debugging experience, the team enabled both the GDB and LLDB native debuggers to run as part of the OCaml test suite. Since both debuggers provide Python APIs to facilitate interaction between them and the test suite, programmers can write Python-scripted tests for their code. The PR <a href="https://github.com/ocaml/ocaml/pull/13199">#13199</a> has more details. Since merging this change, we have discovered a few bugs, such as <a href="https://github.com/ocaml/ocaml/issues/13509">#13509</a>. The idea of writing a debugger test suite in Python is so good that it is what <a href="https://github.com/llvm/llvm-project/tree/main/lldb/test/API">LLDB does</a>, so we are in good company!</p>
<p>Finally, we gathered all the details we discovered about OCaml debugging and added a chapter to the OCaml manual documenting what we learnt. PR <a href="https://github.com/ocaml/ocaml/pull/13747">#13737</a> does just that, covering more technical details that might be interesting.</p>
<h2>Until Next Time</h2>
<p>Let us know your experience with debugging OCaml 5 programs! You can always share your feedback or questions on <a href="https://discuss.ocaml.org">Discuss</a>. For more information on how to debug OCaml with LLDB, check out <a href="https://lambdafoo.com/posts/2024-08-03-lldb-ocaml.html">Tim McGilchrist's blog post</a>, which provides an excellent overview.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-06-18-feature-parity-series-improving-developer-tooling-on-macos</link><guid isPermaLink="false">https://tarides.com/blog/2025-06-18-feature-parity-series-improving-developer-tooling-on-macos.html</guid><dc:creator><![CDATA[ Tim McGilchrist, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 18 Jun 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Opam Health Check: or How we Got to 90+% of Packages Building with Dune Package Management]]></title><description><![CDATA[<p>We have <a href="/blog/2025-04-11-expanding-dune-package-management-to-the-rest-of-the-ecosystem/">recently
posted</a>
about the process of enabling Dune to build as many packages as possible. Since then,
we've been hard at work, going through the failures and fixing issues as we go
along. In today's post, I'll give you an overview of what we have achieved so
far, as well as an idea of what is yet to come.</p>
<h2>What Has Improved Since Last Time</h2>
<p>If you check our <a href="https://github.com/ocaml/dune/issues/11601">tracking issue</a>,
you'll notice there are significantly more items there than there were before.</p>
<p>We've made enhancements in how <code>dune pkg</code> and the health check handle
dependencies - including depexts - and aligned them better with how <code>opam</code>
does it. There are, however, some intentional differences between how <code>dune pkg</code> and <code>opam</code> do things. Inspecting their repository health has led
to fixes aligning <code>dune pkg</code> more with <code>opam</code> semantics and sometimes improving
the correctness of the metadata on <code>opam-repository</code>. Below, we'll go through some
of these improvements.</p>
<ul>
<li>A lot of packages in <code>opam</code> did not declare <code>ocaml</code> as a dependency. In
theory, <code>opam</code> is OCaml-agnostic and can install packages written in any
language (<a href="https://opam.ocaml.org/packages/topiary/">Topiary</a> is written in
Rust, for example). However, in practice, most packages on <code>opam</code> require
OCaml, so when a package does not declare a dependency on OCaml, and none of
its dependencies capture an OCaml compiler in their dependency cone, then
Dune Package Management locks a solution without a compiler. In many cases,
this will fail, so many packages have had their metadata updated to include
<code>ocaml</code> as a dependency on <code>opam-repository</code> and, where possible, upstream.</li>
<li>When <code>opam</code> encounters undefined variables, it evaluates them to 'false'.
When locking a solution, we translate the build and install instructions into
Dune's own variable format. However, in the Dune semantics, unknown variables
are not evaluated as false by default. We changed the way we translate
variables to wrap the variables with <code>catch_undefined_var</code> in
<a href="https://github.com/ocaml/dune/pull/11512">#11512</a>, thus matching the
semantics of the original expressions.</li>
<li>Some packages depend on <code>ez-conf-lib</code>, which is a package that records the
place of its executable when it is built. Unfortunately, in the case of Dune
package management, that would be a sandbox location, so when other packages
attempted to access it, it would not exist at that location anymore. This
made the package non-relocatable. In
<a href="https://github.com/ocaml/dune/issues/11598">#11598</a>, this was changed to use
<code>opam</code> and Dune-provided variables, which are set to the appropriate location
when building so that users can find them.</li>
<li>When packages build, they often need additional dependencies from the
operating system: these are called <code>depexts</code>. In Opam-Health-Check, we used
<code>opam</code> to install these, but sometimes there was no valid solution, and
<code>opam</code> would fail. Unfortunately, the failure displayed an error message, but
the process still succeeded with exit code 0. We changed our code to detect
the error message in
<a href="https://github.com/ocurrent/opam-health-check/pull/103">#103</a> and ended up
reporting the issue upstream to <code>opam</code> as
<a href="https://github.com/ocaml/opam/issues/6488">#6488</a>.</li>
<li>When users locked a solution with <code>dune pkg</code>, it would also record the
detected <code>depexts</code>. However, differences in how optional packages were
handled between <code>opam</code> and <code>dune</code> could lead to not enough packages being
installed if we used <code>opam</code> to install depexts. In
<a href="https://github.com/ocurrent/opam-health-check/pull/104">#104</a>, we changed
the logic to use Dune to create the list of <code>depexts</code> and install these in a
separate step. This way, there should be no confusion between what <code>opam</code> and
Dune consider a dependency.</li>
<li>While most source archives ship with <code>.opam</code> files, they are technically not
required. <code>Opam</code> never reads them when installing (since it uses the
information from <code>opam-repository</code>), and Dune does not need them as it can
read all the required information from <code>dune-project</code>. However,
Opam-Health-Check used them to determine which package names existed, so when
it encountered packages without <code>.opam</code> files, it assumed there were no
packages to build in the source archive. With
<a href="https://github.com/ocurrent/opam-health-check/pull/97">#97</a>, we read the
package names from <code>.opam</code> files and from <code>dune-project</code> to ensure we capture
all names.</li>
<li>When <code>opam</code> builds packages with Dune, for the most part, it uses <code>dune build -p &lt;pkg-name&gt;</code>. The <code>-p</code> flag is a special flag which is mainly used for
releasing and implies <code>--release --only-packages &lt;pkg-name&gt;</code>. We couldn't use
the same <code>-p</code> flag, as <code>--release</code> itself expanded to a lot of other
configuration options, among these <code>--ignore-lock-dir</code>. It meant that if
<code>dune pkg lock</code> and then <code>dune build</code> were used, <code>--release</code> would ignore the
lock directory. This was implemented so that introducing <code>dune pkg</code> would not
break packages in <code>opam-repository</code> that used lock files. However, there
aren't many packages in <code>opam-repository</code> that use <code>--release</code> and building
packages with Dune package management in <code>release</code> mode is useful. Dune was
patched in <a href="https://github.com/ocaml/dune/pull/11378">#11378</a> to move
<code>--ignore-lock-dir</code> to <code>-p</code>. This allows you to use <code>--release</code> with package
management, and <a href="https://github.com/ocurrent/opam-health-check/pull/96">#96</a>
was merged to take advantage of it. The use of <code>--release</code> better represents
an <code>opam</code> build and enables the building of several key packages, such as
<code>base</code> and <code>core</code>, for which <code>--release</code> disables building
Jane-Street-internal tests.</li>
<li>When we looked for which packages to build, we accidentally used a subset
search instead of an exact name match. Thus, we would sometimes accidentally
pick packages to build that were not meant to be built. This was fixed in
<a href="https://github.com/ocurrent/opam-health-check/pull/99">#99</a>, ensuring that
when determining whether <code>lab</code> should be built, accidentally matching on
<code>gitlab</code> would not give us false positives.</li>
</ul>
<h2>Maybe Some Packages Just Don't Build</h2>
<p>It turns out some packages that are on <code>opam-repository</code> just do not build.
This can be due to a lot of reasons. Some packages don't support OCaml 5.3 (the
most recent release at the time of writing and the one we run the checks on),
and others don't support the platform we are running on. Some can't be
downloaded because the server that hosted them disappeared. In such cases,
there is nothing that Dune package management can do besides fail.</p>
<p>Thus, to make a fair comparison, we <a href="https://github.com/ocurrent/opam-health-check/pull/95">patched
Opam-Health-Check</a> and
extended it so it can build the same package with Dune and <code>opam</code> in the same
run. That way, we see that if a package doesn't build on <code>opam</code>, it is
unlikely to magically work when using Dune package management (although that
can happen, e.g. on transient network failures, which would prevent <code>opam</code> from
downloading the source tarball).</p>
<h2>Some Things We Don't Support</h2>
<p>There are some packages that will not work. Often, this is because the packages
fail due to how Opam-Health-Check works, which is not something we expect a
user of Dune package management to encounter.</p>
<h3>Complex Build Commands</h3>
<p>When selecting the packages that we plan to build, we make sure to only pick
Dune packages. However, the definition of a package 'using Dune' is not
clear-cut.</p>
<p>A source might have a <code>dune-project</code> file but never call Dune. A build might
call Dune but also do an arbitrary number of other steps. In <code>opam</code>, this
process is simple because <code>opam</code> will just execute all steps in the <code>build</code> and
<code>install</code> entries, be it launching Dune, calling <code>make</code>, or any other command.</p>
<p>For the health check, we decided to set the limit at <code>dune build</code>. This means
that packages that require extra instructions will most likely fail to build in
the health check.</p>
<p>The reason why we are setting the limit here is twofold:</p>
<ol>
<li>Interpreting which commands to run in the health-check would require us to
implement and evaluate the filters that <code>opam</code> supports for running the
commands. Making sure we evaluate things exactly like <code>opam</code> does would be a
non-trivial undertaking.</li>
<li>Packages that need extra commands to run usually run just fine when these
commands are run manually; thus users of <code>dune pkg</code> can most likely use Dune
package management when using it on their machines.</li>
</ol>
<p>Another bonus reason is that not that many packages are affected by this, so it
didn't seem worth the time investment.</p>
<h2>What Work is There Still Left to Do?</h2>
<p>There are still categories of errors that make it difficult to adopt package
management. The most notorious issue is
<a href="https://github.com/ocaml/dune/issues/10855">#10855</a>, colloquially called the
"in and out of workspace" bug.</p>
<p>It occurs when a project has dependencies, and these dependencies, in turn,
depend on a package that is in the project's workspace. Usually, this is a
circular dependency, but such a configuration can reasonably happen in some
cases, such as when a test dependency uses something from your project. For
example, if Lwt uses a test tool that, in turn, depends on Lwt, it is currently
impossible to build it with Dune package management, as Lwt would be part of
both the build and its own dependencies.</p>
<p>There are not many packages affected, but the ones that are are some of the
most used packages in OCaml. Among these are Lwt, Odoc, and, unfortunately,
Dune (due to lots of projects depending on <code>dune-configurator</code>). Thus, at the
moment, Dune package management cannot be used to develop Dune itself.</p>
<p>While addressing these issues was outside the scope of this particular project,
we plan to tackle them through future initiatives. Ultimately, our goal is to
provide a seamless user experience with Dune package management.</p>
<h2>Until Next Time</h2>
<p>If you're using Dune Package Management and have feedback or questions, please
share your thoughts on <a href="https://discuss.ocaml.org">Discuss</a>. Our teams are
always looking for input in order to improve tools and features, and your
feedback can help us make everyone's experience better.</p>
<p>Stay in touch with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>,
<a href="https://mastodon.social/@tarides">Mastodon</a>,
<a href="https://www.threads.net/@taridesltd">Threads</a>, and
<a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to
hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-06-05-opam-health-check-or-how-we-got-to-90-of-packages-building-with-dune-package-management</link><guid isPermaLink="false">https://tarides.com/blog/2025-06-05-opam-health-check-or-how-we-got-to-90-of-packages-building-with-dune-package-management.html</guid><dc:creator><![CDATA[ Marek Kubica ]]></dc:creator><pubDate>Thu, 05 Jun 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[CEOS Project Kick-Off: Using Satellites to Survey the Earth]]></title><description><![CDATA[<p>Tarides collaborates with several partners to implement new cutting-edge technologies based on our experience with OCaml. One of our special interest areas is the space sector and improving the security and versatility of satellite software. This is the driving force behind the <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a> and <a href="/blog/2023-12-29-announcing-the-orchide-project-powering-satellite-innovation/">ORCHIDE</a> projects. With <a href="https://www.bpifrance.fr/nos-appels-a-projets-concours/appel-a-projets-maturation-technologique-et-demonstration-de-solutions-dintelligence-artificielle-embarquee">funding from BPIFrance</a>, a French public investment bank striving to detect and foster the innovative projects of the future, we are proud to announce our part in an exciting new venture.</p>
<p>The kick-off of the CEOS (data Center and Edge cOmputing in Space) project was March 24, we're thrilled to be part of it and to be in such good company. CEOS is a collaboration between <a href="https://aerospace.actia.com/">STEEL Electronique</a>, <a href="https://www.elsys-design.com/en/">ELSYS Design</a>, <a href="https://www.ip-maker.com/">IP-Maker</a>, <a href="https://www.thalesaleniaspace.com/en">Thales Alenia Space</a> and Tarides to create a new generation of efficient, multi-purpose satellites. The project's overarching aim is to significantly diversify and expand the onboard capabilities of satellites to survey the environment. Tarides’ role is helping implement an efficient onboard OS and orchestrator that can dynamically deploy multiple earth observation applications on one satellite.</p>
<p>Let's take a closer look at the goals of CEOS and the technology making it all possible!</p>
<h2>How is CEOS Addressing Current Limitations?</h2>
<p>One of the best ways to support scientists, organisations, and governments in protecting the environment is to provide them with reliable, up-to-date data. A great way to source this data is to use Earth observation satellites, which can take images and make a multitude of measurements from orbit. Using two or more images of the same geographical area makes it possible to compare them and highlight the differences. The potential use cases are vast, including surveying for changes to biodiversity or deforestation and alerting users to anything from wildfires to natural disasters.</p>
<p>Usually, these images would need to be transferred to the ground before they’re processed, which can limit response times and saturate the communication link with mostly useless data (we are only interested in the changes and their nature). Alternatively, trying to process them on board would be a slow process on a normal satellite CPU, but the AI hardware and software upgrades the CEOS project brings to satellites will allow them to detect and analyse changes quickly and alert the relevant authorities or stakeholders. Using on-board machine learning to detect changes has big implications for efficiency and performance. If AI programs run in space, they can process and detect changes quickly enough to send near-instantaneous alerts to recipients on Earth. By enabling fast response times, this approach can save precious time in an emergency (for example, in the case of chemical spills or forest fires) and mitigate the damage. Furthermore, having multiple different applications available on-board (potentially running in parallel) allows satellites to have more diverse missions (for example switching from wildfire detection to detecting fishing abuses when the satellite moves from land to ocean) and avoid idle portions of the orbit.</p>
<p>Another major aspect of the CEOS project is the actual fabrication of the demonstrator board of the satellites, with STEEL responsible for its design and manufacturing. The board will be based on a Xilinx Versal SoC, and feature high-performance NVME storage (with IPs from IP-Maker) to store vast amounts of sensor data quickly and later process it. The satellites will be designed to be usable for multiple missions and have a longer service life, reducing the demand to produce new satellites and put them in orbit.</p>
<h2>What Technologies are Being Developed?</h2>
<p>Let's take a closer look at the technology we're developing alongside our partners, with the end goal of using AI to detect relevant changes earthside, deploying multiple AIs on one satellite with an <a href="https://www.arrow.com/en/research-and-events/articles/fpga-basics-architecture-applications-and-uses">FPGA architecture</a>.</p>
<h3>Detecting Changes</h3>
<p>Using a computer to detect changes is not as straightforward as it may seem to us humans, who, after all, perform this type of calculation on a daily basis without thinking much of it. There are three ways to detect changes in computer programming:</p>
<ol>
<li>Binary detection of changes: can only detect whether a change has occurred but not where</li>
<li>'From-to' detection: monitors for a single, specified change between a base level and a later state.</li>
<li>Multi-class changes: several changes can be detected and classified according to their type</li>
</ol>
<p>These algorithms can process several different types of images, including radar images and optic ones. Furthermore, we can use deep learning to validate the changes detected by the algorithms. For example, they should ensure that they haven't flagged a change in tree cover due to something as simple as a change of angle or a picture taken during a different time of day. By using deep learning, we can make the validation process much more efficient than if we rely on human intervention.</p>
<p>The CEOS demonstrator will embed several change-detection applications designed by ELSYS, the experts on the matter.</p>
<h3>The Operating System</h3>
<p>The operating system's and orchestrator's designs are also cutting-edge. In collaboration with the partners behind the European project <a href="/blog/2023-12-29-announcing-the-orchide-project-powering-satellite-innovation/">ORCHIDE</a> (in particular Thales Alenia Space), of which Tarides is one, a minimalist OS based on Linux and several lightweight orchestration components will be deployed on all boards. These components will allow the orchestration and execution of applications as isolated and light unikernels, mirroring  <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a>.</p>
<p>SpaceOS, and in general the ORCHIDE project, takes a modern approach to operating systems by deploying an efficient, specialised, light-weight OS  unique to each application. This reduces its dependency on a host architecture, and a simpler, less complex host structure like a hypervisor is enough. It also reduces bloat, with the system only strictly using the necessary resources, improving energy efficiency. Combined with state-of-the-art technologies for storage, orchestration, metrics collection and workflow design, ORCHIDE aims to provide a complete solution to design and embed complex dynamic payloads, such as the ones CEOS will handle, without compromising on efficiency and cybersecurity (thanks to the <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">strong security guarantees</a> of unikernels).</p>
<p>The challenge (and opportunity!) of this project will be to support a range of hardware acceleration devices like FPGAs and GPUs, in an agnostic way (as much as possible), and through the virtualisation barrier that is inherent to unikernels.</p>
<h2>What's Next?</h2>
<p>This exciting project has only just begun, and a lot of work has yet to be done over the next three years. To stay up-to-date on our progress, keep an eye on our blog and <a href="https://parsimoni.co">Parsimoni's website</a>, where you can also learn more about SpaceOS.</p>
<p>The end goal is to take a multifaceted approach to helping the climate by giving scientists, organisations, and governments access to powerful tools to help them monitor and safeguard the environment. The implementation also promotes sustainable practices by encouraging the use of a single satellite for longer and for more than one purpose.</p>
<p>Stay in touch with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>,
<a href="https://mastodon.social/@tarides">Mastodon</a>,
<a href="https://www.threads.net/@taridesltd">Threads</a>, and
<a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to
hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-05-30-ceos-project-kick-off-using-satellites-to-survey-the-earth</link><guid isPermaLink="false">https://tarides.com/blog/2025-05-30-ceos-project-kick-off-using-satellites-to-survey-the-earth.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 30 May 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Web Development: Essential Tools and Libraries in 2025]]></title><description><![CDATA[<p>Should you use OCaml for web projects? Web development trends are a hotly debated topic in the computer programming world and the familiar faces of languages and frameworks are unlikely to change: hypertext markup language or HTML, CSS, and JavaScript are the core technologies (with server-side technologies such as PHP, Python, etc.), and React, Vue, Svelte, and Angular are proving to be as popular as ever. AI and machine learning might be the biggest foreseeable <a href="https://medium.com/theymakedesign/ai-in-web-development-9a1b5f04eee">changes to the industry</a>, driving the emergence of new tools and workflows. But there is something else gaining momentum in the world of web development: functional programming!</p>
<p>Functional programming fits in well with the web world. The transactional nature of HTTP and the convergence towards immutable state management solutions (such as Redux and consort) make OCaml a very good candidate for web application development. In this article, we give you an overview of some of the web solutions supported by the OCaml ecosystem. It's important to note that we can't include every single one of the web development libraries available for OCaml, as there are simply too many for one post!</p>
<h2>Functional Programming, OCaml, and the Web</h2>
<p>Why do some developers use functional programming for web development? Functional programs leverage concepts like immutability, higher-order functions, and formal verification to achieve, among other beneficial outcomes, better <a href="https://richardsondx.medium.com/the-role-of-functional-programming-in-web-development-for-ruby-developers-82eee9570cf8">code reusability, parallelism, and fewer bugs</a>. These developer-friendly qualities that simplify programming and increase productivity are behind the rising interest in FP for front- and backend web development.</p>
<p>As a functional programming language, OCaml offers great scalability for user projects by making it easy to build large systems without sacrificing maintainability. Thanks to the expressive type system, it is easy to refactor and adjust projects during development and as requirements change and evolve. The type-checker is reliable and covers 100% of the code, making it easier for developers to verify that data is used correctly in operations. With these features and the general benefits of FP, we're noticing a growing interest among software developers using OCaml to create web applications.</p>
<p>There are several tools and libraries designed for web development in OCaml and a lot is possible with the language. Some workflows are still being developed, and many developers are building new projects and sharing them with the open-source community for feedback.</p>
<p>Before we begin, there are three notable industry success stories, <a href="https://www.besport.com/group/10902">BeSport</a>, <a href="https://ahrefs.com">Ahrefs</a>, and <a href="https://routine.co">Routine</a>. BeSport is a popular French sports social app built using Ocsigen, a full-stack modular web framework for OCaml that empowers developers to build websites, web applications, and even mobile applications. Ahrefs, the world-leading marketing intelligence platform, uses full-stack OCaml for their web stack and has been very successful <a href="https://tech.ahrefs.com/ahrefs-is-now-built-with-melange-b14f5ec56df4">migrating to Melange</a>. Routine, an integrated work platform that organises all your data in one place, uses <a href="https://github.com/aantron/dream">Dream</a> and <a href="https://opam.ocaml.org/packages/wasm_of_ocaml-compiler/">wasm_of_ocaml</a> to run their tools. With these companies showing how to leverage OCaml's potential, let's move on to the tools and libraries that make it all possible!</p>
<h2>Ocsigen: A Complete Framework</h2>
<p>Ocsigen is a collection of projects that provide a complete framework for developing web and mobile apps in OCaml. It is suitable for various uses, from simple server-side websites to client-side programs and complex client-server applications. The projects included under its umbrella are: <a href="https://ocsigen.org/lwt/">Lwt</a>, a general purpose concurrency library for OCaml; <a href="https://ocsigen.org/tyxml/">TyXML</a>, for generating typed XML; <a href="https://ocsigen.org/js_of_ocaml/">Js_of_ocaml</a>, which compiles OCamly bytecode from JavasScript and WASM; <a href="https://ocsigen.org/eliom/">Eliom</a>, which is a multi-tier framework for client-server web and mobile apps; <a href="https://ocsigen.org/ocsigenserver/">Ocsigen server</a>, a web server;  <a href="https://ocsigen.org/ocsigen-toolkit/">Ocsigen Toolkit</a>, a client-server widget library for Eliom and Js_of_ocaml; and <a href="https://ocsigen.org/ocsigen-start/">Ocsigen Start</a>, a higher level library providing user management and an application template that can be used as a basis for apps or to learn. The projects are mostly independent of each other, so users can pick and mix what they are interested in.</p>
<p>Ocsigen's design focuses on a couple of main strengths.  Firstly, it takes full advantage of OCaml's expressive type system to check multiple programs' properties at compile time. This approach drastically reduces the development time of complex apps and makes it easier to refactor an application's code to facilitate new features. Secondly, using the Eliom framework enables multi-tier (also known as universal) programming where an application's client and server side are written using the same language and as a single code, with annotations in the code to indicate which 'side' the code should be run. As a result, both the server and client parts of a web application can be implemented as one program.</p>
<p>This method affords the developer a lot of flexibility to, for example, generate parts of pages either in the client or server code, depending on their needs. It also streamlines communication between server and client since programmers can use server-side variables in client code or call server-side OCaml functions from client code. It also makes it easy to deploy web and mobile multi-platform applications. Android and iOS apps are generated from the same exact code of the web app and run in a web view.</p>
<p>There are loads of resources available online to learn more about Ocsigen. Check out this presentation <a href="https://watch.ocaml.org/w/qQzb94X9WM7zLif7FynPyN">on Watch OCaml</a> for an overview of the framework. For an example of what a mobile app developed using Ocsigen looks like, you can download the BeSport app <a href="https://play.google.com/store/apps/details?id=com.besport.www.mobile">on Google Play</a> or the <a href="https://apps.apple.com/fr/app/be-sport/id1104216922">Apple app store</a>.</p>
<h2>Backends</h2>
<p>Backend web development creates the foundations of the web, being the server-side portion of the website you don't see that implements the functionality of what you do see in your web browser. Common languages include Python, Ruby, and Java, just to name a few. OCaml has several web frameworks that enable you to do backend development, including options for beginners (e.g. Dream) as well as more experienced software engineers (e.g. Ocsigen).</p>
<h3>Dream</h3>
<p><a href="https://github.com/aantron/dream">Dream</a> is a backend for OCaml that is well-liked for its simplicity and minimalist approach. Developer experience is a central tenet of its design, achieved by having a simple API and relying on fundamental OCaml types like <code>string</code> and <code>list</code>, only introducing a few of its own types. For a newcomer, the web framework offers extensive <a href="https://github.com/aantron/dream?tab=readme-ov-file#documentation">documentation</a> and plenty of <a href="https://github.com/aantron/dream?tab=readme-ov-file#example-repositories">examples</a> to get them started. Further examples of its quality-of-life features include unified error handling, a simple logger, a minimalist programming model that lets the developer create web servers just using functions, and cryptography helpers and key rotations to set up security options.</p>
<p>The Dream web framework is composed of several sub-libraries with different dependencies, which allow the user to port their projects to a variety of environments according to their needs. Furthermore, Dream is unopinionated and low-level, letting users pick and choose how they want to use it. They can swap out libraries to use other tools instead of its built-in templates (such as <a href="https://github.com/aantron/dream/tree/master/example/w-mlx">mlx</a>). Essentially, the framework aims to be easy to use but highly configurable, giving the user the choice between customisation and simplicity. More concretely, Dream provides HTML templates (for OCaml or Reason); helpers for secure cookies and CSRF-safe forms;  easy HTTPS,  HTTP/2 and WebSockets support (meaning it supports most modern Web transport protocols); full-stack ML with clients compiled to Melange, ReScript, or Js_of_ocaml – and more!</p>
<p>If you want to try Dream, check out the <a href="https://github.com/aantron/dream/tree/master/example#readme">tutorials</a>. We also recommend this <a href="https://ceramichacker.com/blog/28-2x-backend-webdev-w-dream-and-caqti">blog post on the Ceramic Hacker website</a>, part two in an OCaml web development series that covers using Dream as a backend. It has some concrete code examples and gives a nice overview of a project. Of course <a href="https://aantron.github.io/dream/#types">Dream's homepage</a> is also an incredibly useful resource.</p>
<h2>Interop With JS, TS, and Wasm</h2>
<p>JavaScript is the reigning monarch of programming languages for the web, powering web servers and adding interactivity and dynamic elements to web pages. With a vast array of tools and features in a mature ecosystem, JavaScript cross-compatibility is a must for languages aiming to expand onto the web. Wasm, or WebAssembly, on the other hand, is a portable compilation target that enables deployment on a variety of platforms.  It is popular for its security guarantees, speed, and language- and platform- neutrality.</p>
<h3>Js_of_ocaml</h3>
<p><a href="https://ocsigen.org/js_of_ocaml/latest/manual/overview">Js_of_ocaml</a> compiles OCaml bytecode into JavaScript. This allows you to create dynamic and interactive elements on web pages and use tools like <a href="https://nodejs.org/en">Node.js</a>. You can install <code>js_of_ocaml</code> using <a href="https://opam.ocaml.org"><code>opam</code></a> or <a href="https://dune.build">Dune</a>, with the latter providing native support.</p>
<p>Some of the benefits of <code>js_of_ocaml</code> include its ease of use; it is simple to install and works with an existing installation of OCaml without requiring you to recompile your libraries. It also comes with existing bindings for many browser APIs and is stable and easy to maintain. Plus, <a href="https://ocsigen.org/js_of_ocaml/latest/manual/performances">performance comparisons</a> have indicated that <code>js_of_ocaml</code> typically outperforms the OCaml type code interpreter. Furthermore, by generating JavaScript from OCaml bytecode, <code>js_of_ocaml</code> relies on a very stable interface that allows it to easily remain compatible with new compiler releases and most of the OCaml ecosystem.</p>
<p>To try it out yourself, there's a great <a href="https://hackmd.io/@Swerve/HyhrqnFeF">js_of_ocaml tutorial</a> by Jack Strand showing you how to create an interactive animation for a website. There are also some examples of <code>js_of_ocaml</code> in action, like this <a href="https://ocsigen.org/js_of_ocaml/latest/manual/files/planet/index.html">animated 3D view of the earth</a> and a <a href="https://ocsigen.org/js_of_ocaml/latest/manual/files/boulderdash/index.html">Boulder Dash style game</a>.</p>
<h3>Wasm_of_ocaml</h3>
<p>Wasm_of_ocaml takes OCaml bytecode and transforms it into Wasm code. Originally forked from the <code>js_of_ocaml</code> compiler (and now <a href="https://github.com/ocsigen/js_of_ocaml/pull/1724">merged back</a>) it is of a similar, lightweight design. WebAssembly provides a sandboxed environment and enforces memory safety, making it popular among users who develop mission- and security-critical applications. Wasm_of_ocaml uses the WebAssembly garbage collection extension, removing the need to implement a garbage collector through other means and enabling good interoperability with JavaScript.</p>
<p>One of the compiler's biggest benefits is its speed, with some very impressive results from recent benchmarks driving renewed excitement. Compared to <code>js_of_ocaml</code>, programs compiled with <code>wasm_of_ocaml</code> are consistently faster. For example, Jane Street reported that they observed 2x-8x performance improvements using <code>wasm_of_ocaml</code> compared to <code>js_of_ocaml</code>. You can learn more about <code>wasm_of_ocaml</code> from <a href="/blog/2025-02-19-the-first-wasm-of-ocaml-release-is-out/">our blog</a> and in the <a href="https://github.com/ocaml-wasm/wasm_of_ocaml">repo's readme</a>.</p>
<h3>Melange</h3>
<p>Melange is another backend for OCaml, consisting of a set of tools that makes it capable of generating and interacting with JavaScript. The tools include a compiler and compiler libraries, which can generate JavaScript code, and the runtime, which consists of a series of supporting libraries written in JS which output 100% JS. This makes interoperability and incremental adoption much easier for new users, albeit at the small cost of needing to write OCaml or Reason in a slightly different way.</p>
<p>One of Melange's biggest strengths is that it provides support for many different tools such as the editors VSCode, Vim, and Emacs, full integration with the very popular build system Dune, and interoperability with the documentation tool <code>odoc</code> which can generate a diverse range of documentation from the user's files. Overall, Melange's integration with the <a href="https://ocaml.org/platform">OCaml Platform</a> gives the user access to the great resources of the OCaml Ecosystem.</p>
<p>Melange integrates with JavaScript through an expressive bindings language. This opens up the JavaScript ecosystem to the OCaml programmer, allowing them to use existing JavaScript packages and their own JavaScript libraries in OCaml projects and build applications that rely on features from the JavaScript ecosystem by working well with the syntax extension format <a href="https://ocaml.org/docs/metaprogramming">ppx</a>, Melange benefits from its <a href="https://sancho.dev/blog/whats-possible-with-melange">performance, functionalities, and compatibilities</a>.</p>
<p>You can try Melange in the <a href="https://melange.re/v2.1.0/playground">Melange Playground</a> and learn more about its finer details on <a href="https://melange.re/v4.0.0/">Melange's documentation website</a>.</p>
<p>For those familiar with <a href="https://react.dev">ReactJS</a>, there is an excellent <a href="https://react-book.melange.re">online resource specifically aimed at React developers who want to learn Melange</a>. It provides a hands-on introduction to Melange with several projects and examples. Melange offers excellent support for React codebases, with clear JS outputs and first-class support for many React patterns. <a href="https://sancho.dev">David Sancho</a>, one of Melange's maintainers, regularly produces very good content on <a href="https://sancho.dev/blog">his blog</a> about the use of Melange in real-life applications, for example: <a href="https://sancho.dev/blog/server-side-rendering-react-in-ocaml">Server-side rendering React in OCaml</a>. This elegant interoperability with ReactJS demonstrates Melange's desire to interface easily with the JavaScript ecosystem, making OCaml a serious replacement for TypeScript, while continuing to benefit from the vast ecosystem of the JavaScript world!</p>
<h2>Frontend</h2>
<p>Frontend web development refers to the creation and editing of the graphical user interface of a website, including everything pertaining from HTML editing and rendering to web design, up to client-side web applications that run entirely in the browser. The underlying languages for frontend web development include HTML, cascading style sheets or CSS, JavaScript and WebAssembly. There are also popular platforms for frontend web development, such as Wordpress.</p>
<h3>OCaml-VDom</h3>
<p>The <a href="https://github.com/LexiFi/ocaml-vdom"><code>ocaml-vdom</code></a> library, developed by <a href="https://www.lexifi.com">Lexifi</a>, implements an <a href="https://guide.elm-lang.org/architecture/">Elm-like architecture</a> and VDOM for OCaml. An Elm architecture refers to a development pattern that is used to create interactive programs. The Elm architecture is a reformulation of a <a href="https://en.wikipedia.org/wiki/Moore_machine">Moore machine</a>, adapted to the construction of user interfaces. It enables the state of an application to be controlled using 3 ingredients:</p>
<ul>
<li>A Model: the state of the application,</li>
<li>A View: A function which takes a model and returns the representation of the UI (in the case of Elm and <code>OCaml-vdom</code>, an HTML document); this document allows messages to be propagated, triggering the final ingredient,</li>
<li>An Update: A function that takes a <code>Message</code> (propagated by the <code>View</code>) and the current model as arguments and computes a new model, which will be passed to the view to rebuild the UI.</li>
</ul>
<p>In practice, the Elm architecture also introduces the notion of <code>Command</code> and <code>Subscription</code> to interact with discrete effects and communicate with the outside world. This approach is very expressive for describing reactive interfaces and is the result of several iterations described by the creator of Elm in the following essay: <a href="https://elm-lang.org/assets/papers/concurrent-frp.pdf">Elm: Concurrent FRP for Functional GUIs</a>. After much experimentation, he has drastically <a href="https://elm-lang.org/news/farewell-to-frp">simplified the user experience of Functional Reactive Programming</a> with the Elm architecture!</p>
<p><code>OCaml-vdom</code> is a very faithful implementation of the Elm architecture, giving OCaml the ease and expressiveness to describe rich web interfaces. To explore some examples and test <code>OCaml-VDom</code> yourself, visit <a href="https://github.com/LexiFi/ocaml-vdom">the <code>ocaml-vdom</code> library</a> and get hacking!</p>
<h3>Bonsai</h3>
<p><a href="https://github.com/janestreet/bonsai">Bonsai</a> is a client-side web framework created by <a href="https://www.janestreet.com">Jane Street</a> that lets users build web pages from the client (the browser) by creating components. For those familiar with other web development frameworks, Bonsai fills the same role as, for example, <a href="https://angular.io/">Angular</a> and <a href="https://vuejs.org">Vue</a>, but it is written entirely in OCaml.</p>
<p>The structure of Bonsai is as a group of components, each representing one part of the final page, available in the <code>bonsai_web_components</code> repo. One of the benefits of the framework's design is that, instead of structuring its components like a tree, it does so in a <a href="https://signalsandthreads.com/building-a-ui-framework/">"structured acyclic graph"</a>. This lets programmers create components that can 'communicate' with each other much more easily than with frameworks that use the tree structure.</p>
<p>To learn more about the framework, we definitely recommend the same article series mentioned above from the 'Ceramic Hacker' blog, especially starting from the <a href="https://ceramichacker.com/blog/30-4x-setting-up-bonsai">Setting up Bonsai</a> post. In it, Alexander Skvortsov illustrates why he found Bonsai excellent for building "safe, massively scalable, performant UIs".</p>
<h2>… And Many More</h2>
<p>There are plenty more libraries available for web development in OCaml, and we have not been able to include them all in this post. For example, <a href="https://erratique.ch/software/brr">Brr</a> provides a toolkit for programming browsers in OCaml using Js_of_ocaml; the <a href="https://github.com/let-def/lwd">Lwd</a> library enables reactive programming in the browser through a simple form of incremental computation; <a href="https://github.com/robur-coop/vif">Vif</a> is an experimental program that runs an OCaml script and launches a web server from it; <a href="https://github.com/dbuenzli/react">React</a> is a functional reactive programming library for OCaml; <a href="https://rescript-lang.org">ReScript</a> is a typed language that grew out of OCaml, now used to write fast and human-readable JavaScript for the web; and <a href="https://github.com/OCamlPro/wasocaml">Wasocaml</a> is another WebAssembly compiler developed by <a href="https://ocamlpro.com">OCamlPro</a> which works differently than Wasm_of_ocaml.</p>
<p>Furthermore, several big OCaml projects were developed in connection with extensive research done at universities, including projects like <a href="https://ocsigen.org/home/intro.html">Ocsigen</a> and <a href="https://mirage.io/">MirageOS</a> (check out <a href="https://github.com/robur-coop/unipi">Unipi</a> for a great example of a static web server built with MirageOS). For example, '<a href="https://www.irif.fr/~balat/publications/2013balat-rethinking.pdf">Rethinking Traditional Web Interaction</a>' is an interesting paper exploring the future of web development far ahead of its time, research which directly impacted Ocsigen's design!</p>
<p>While this post has focussed on application development frameworks, MirageOS offers a robust, mature, and efficient deployment solution similarly grown out of a rich academic history. When deploying a traditional application, it is common to separate it into multiple execution units (i.e., microservices). However, each of these services requires an OS (and ideally containerisation). MirageOS is a framework that allows developers to define an operating system (executed by the CPU's hypervisor) that only runs a limited set of tasks: a unikernel. This enables the creation of very lightweight operating systems designed to perform only a restricted set of tasks, significantly reducing the attack surface and boot time. As a result, <a href="https://mirage.io">MirageOS</a> provides an advanced set of tools for development using microservices.</p>
<p>Since OCaml is an excellent choice for building compilers and build systems, there are many advanced static web site generators to choose from! Without going into too much detail, we'd like to mention <a href="https://soupault.app/">Soupault</a>, a highly flexible HTML processor that offers a great deal of freedom when it comes to building static pages; <a href="https://www.good-eris.net/stog/">Stog</a>, a mature tool with a huge number of plugins; and <a href="https://github.com/xhtmlboi/yocaml">YOCaml</a>, a very generic framework for composing your own static site generator. There are many others, and since a static site generator is a specialised version of a build system,  <a href="https://dune.build">Dune</a> can even be used as one! In any case, OCaml is a useful (and fun) choice for creating a static site, which also works well alongside MirageOS for deploying very small static servers or using Git as a static file system (directly conceivable with <a href="https://ocaml.org/p/yocaml_git/latest">Yocaml_git</a>).</p>
<p>These are just a few examples of the things we didn't cover here, and there are many more great projects out there to discover. If you think we've missed one, please let us know!</p>
<h2>Until Next Time</h2>
<p>Thank you for checking out this article on web development in OCaml. We have tried to capture as much information as possible in one place (as you can probably guess from the length of this post!) and we look forward to hearing your feedback. Have you developed any web applications using OCaml? What was your experience? Connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-05-15-ocaml-web-development-essential-tools-and-libraries-in-2025</link><guid isPermaLink="false">https://tarides.com/blog/2025-05-15-ocaml-web-development-essential-tools-and-libraries-in-2025.html</guid><dc:creator><![CDATA[ Isabella Leandersson, Xavier Van de Woestyne ]]></dc:creator><pubDate>Thu, 15 May 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides at BOB Konferenz 2025]]></title><description><![CDATA[<p><a href="https://bobkonf.de/2025/en/">BOB Konferenz</a> is a 10-year-old
conference whose tagline is: "<strong>The software development
conference for everyone dissatisfied with the status quo</strong>"! Indeed,
BOB is a conference that focusses on a variety of subjects that
strongly converge with the interests of Tarides (and the OCaml world
in general). It aims to cover topics such as functional programming,
"<em>fancy types</em>" (dependent types, gradual typing, linear types, ...),
formal methods for correctness and robustness, abstractions for
concurrency and parallelism, controlled side effects, next-generation
IDEs, and <a href="https://bobkonf.de/2025/en/cfc.html">much more</a>!</p>
<p>The convergence has been so strong that, over the years, some big
names from the OCaml community have shown up — like <a href="https://anil.recoil.org/">Anil
Madhavapeddy</a>, <a href="https://hannes.robur.coop/">Hannes
Mehnert</a>, <a href="https://gallium.inria.fr/~scherer/">Gabriel
Scherer</a>, and even <a href="https://xavierleroy.org/">Xavier
Leroy</a>! This is one of the many reasons why
Tarides decided to sponsor the <a href="https://bobkonf.de/2024/en/">2024
edition</a> and send <a href="https://bsky.app/profile/sabine.sh">Sabine
Schmaltz</a> and <a href="https://bsky.app/profile/leostera.com">Leandro
Ostera</a>.</p>
<p>For the 2025 edition, March 14, 2025, in Berlin, <a href="https://xvw.lol">Xavier Van de
Woestyne</a> had the privilege of presenting Tarides’
work on editor support for OCaml during his talk "<a href="https://bobkonf.de/2025/woestyne.html"><em>Beyond the Basics
of LSP: Advanced IDE Services for
OCaml.</em></a>" Accompanied by
<a href="https://github.com/MisterDA">Antonin Decimo</a>, who attended the
conference, Xavier travelled to BOBKonf 2025 to share <em>their insights and
experience</em>.</p>
<h2>A Wide Range of Interesting Talks</h2>
<p>The BOB program is <strong>wonderfully eclectic</strong>, and every talk is an
opportunity to discover something new! For example, after a keynote on
<a href="https://bobkonf.de/2025/bieniusa.html">Local-First software</a> — which
included many fascinating use cases with potential applications for
<a href="https://irmin.org/">Irmin</a>. We had the chance to attend talks on
<a href="https://bobkonf.de/2025/loeh.html">abstraction</a>, speculative
reasoning about functions based on their types (for instance, a
function of type <code>a -&gt; a</code> having only one possible inhabitant), the
application of <a href="https://bobkonf.de/2025/allais.html">separation logic for concurrency in
Idris</a>, and even collaborations
between engineers and mathematicians on <a href="https://bobkonf.de/2025/bailly.html">the specification of formal
methods</a>.</p>
<p>We explored the <a href="https://bobkonf.de/2025/sperber.html">functional programming counterpart to design
patterns</a> — with a strong
emphasis on <strong>the power of robust module systems</strong>, something that
deeply resonates with us as OCaml developers. That was followed by a
deep reflection on <a href="https://bobkonf.de/2025/thoma.html">object-oriented programming from a functional
programmer’s perspective</a>, a clear
explanation of <a href="https://bobkonf.de/2025/breitner.html">how recursive definitions work in
Lean</a>, and, to wrap it all up,
a guide to common pitfalls to avoid when building distributed systems
with <a href="https://bobkonf.de/2025/oerdoeg.html">microservices</a>.</p>
<p>All in all, it was an intense and inspiring day — packed with ideas
that strongly resonated with us. From our perspective, the themes
explored throughout the conference aligned closely with the
ideological and technical choices we’ve made at Tarides, particularly
our commitment to OCaml. But beyond that, many of the talks echoed the
challenges and directions of the projects we actively maintain!</p>
<h2>About our Presentation</h2>
<p>Although the goal of <a href="https://bobkonf.de/2025/slides/woestyne.pdf">our
presentation</a> (you can watch the recording <a href="https://bobkonf.de/2025/woestyne.html">on BOBKonf's website</a>) was to
discuss OCaml editor support (through
<a href="https://github.com/ocaml/merlin">Merlin</a>,
<a href="https://github.com/ocaml/ocaml-lsp">Ocaml-lsp-server</a>, and its
clients, <a href="https://github.com/ocamllabs/vscode-ocaml-platform">Visual Studio
Code</a> and
<a href="https://github.com/tarides/ocaml-eglot">Emacs</a>), we aimed to present
an approach and a set of features that wouldn’t limit our audience to
just OCaml users. Instead, we wanted to spark a conversation with
other IDE users/maintainers to share ideas and implementation
perspectives!</p>
<p>We believe the presentation was well-received, generating some very
interesting questions along with positive conversations about how
some of the ideas we presented could be applied to proof assistants
like <a href="https://isabelle.in.tum.de/">Isabelle</a>,
<a href="https://www.idris-lang.org/">Idris2</a>, and
<a href="https://agda.readthedocs.io/en/latest/getting-started/what-is-agda.html">Agda</a>.</p>
<p>There was a proposal to combine our efforts to improve the <a href="https://microsoft.github.io/language-server-protocol/">Language
Server
Protocol</a>,
making it even more welcoming for certain functional languages that
leverage interactive features (where the acceptance model is primarily
based on voting). From our perspective, these were excellent and
<strong>motivating</strong> responses!</p>
<h2>Meet and Greet</h2>
<p>Beyond the technical side, one of the great things about conferences
is the chance to meet people—catch up with familiar faces, make new
connections, and have meaningful conversations around topics we’re all
passionate about! From our perspective, even though the schedule is
<em>quite packed</em>, the talk slots are spaced out just enough to let us
catch our breath — but more importantly, to connect and chat with
members of the community. It really helps to foster a friendly, sociable
atmosphere!</p>
<h2>To Conclude</h2>
<p>Attending conferences is an integral part of our work as engineers—for
several important reasons:</p>
<ul>
<li>Keeping up with the latest in technology and research</li>
<li>Sharing our progress and presenting the work we’ve been doing</li>
<li>Initiating potential collaborations with people driven by similar
goals and motivations.</li>
</ul>
<p>So yes, it's important — but at conferences like BOB, it’s also a real
pleasure! The talks are truly fascinating (we’re already looking
forward to the video recordings so we can catch up on what we missed),
and the interactions are incredibly motivating for our work. If, like
us, you’re interested in functional programming, fancy types, formal
methods, and many other exciting topics, don’t hesitate to check out
<a href="https://www.youtube.com/@BOBKonf/videos">BOB’s YouTube channel</a> –
and maybe even consider attending next time!</p>
<hr>
<p>Connect with Tarides online on
<a href="https://bsky.app/profile/tarides.com">Bluesky</a>,
<a href="https://mastodon.social/@tarides">Mastodon</a>,
<a href="https://www.threads.net/@taridesltd">Threads</a>, and
<a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for
our mailing list to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2025-05-08-tarides-at-bob-konferenz-2025</link><guid isPermaLink="false">https://tarides.com/blog/2025-05-08-tarides-at-bob-konferenz-2025.html</guid><dc:creator><![CDATA[ Antonin Décimo, Xavier Van de Woestyne ]]></dc:creator><pubDate>Thu, 08 May 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Feature Parity Series: Restoring the MSVC Port]]></title><description><![CDATA[<p>After the release of OCaml 5, restoring any features that were left out of the initial release has been a high priority for our teams and collaborators. We call this effort our 'feature parity' project, and <a href="/blog/2024-09-11-feature-parity-series-compaction-is-back/">compaction</a> is one example of a feature being brought back to OCaml 5 under its banner.</p>
<p>In this post, we look at another returning property, MSVC support, and the steps along the path to implementation. If you want to skip straight to the code, check out the <a href="https://github.com/ocaml/ocaml/pull/12954">#12954 pull request</a> (and the dozen more linked from it!) in the OCaml repo. Let's dive in!</p>
<h2>MSVC Support for OCaml on Windows</h2>
<p>First, let's explain what 'MSVC support' means. In general, OCaml supports compilation to Windows through three separate toolchains: <a href="https://www.cygwin.com/">Cygwin</a>, <a href="https://www.mingw-w64.org/">mingw-w64</a>, and <a href="https://visualstudio.microsoft.com/vs/features/cplusplus/">MSVC</a>. The <code>mingw-w64</code> toolchain was available for OCaml 5 from the moment the update launched. Cygwin was restored in OCaml 5.1, but MSVC support has lagged behind until now!</p>
<p>The delay stemmed from MSVC's initial incompatibility with C11 atomics, which the OCaml 5 runtime requires. David Allsopp had been exploring possible ways to overcome this incompatibility, testing how C++ atomics worked with older compilers. Eventually, however, Microsoft introduced support, albeit experimentally.</p>
<p>To restore the port, the team needed to ensure that the C11 atomics support was reliable, port <code>winpthreads</code> onto MSVC, and create a continuous integration (CI) workflow.</p>
<p>Since it is Microsoft's own C/C++ compiler, MSVC is popular and well-known to many developers and Windows users. Bringing compatibility with the compiler to OCaml 5 is an important step towards enabling more users to adopt the latest version of OCaml and explore its new features!</p>
<h3>C11 Atomics and Bug Reports</h3>
<p><a href="https://en.cppreference.com/w/c/11">C11</a> is a version of the C standard. Since the OCaml runtime is written almost entirely in C, the general standard lets us specify which properties of the C compiler can be used to build OCaml. If we didn't rely on a standard, we would have to list supported C compiler versions individually (GCC from one such version, Clang from another such version, etc). Instead, developers know that the OCaml runtime environment supports any C compiler that is C11-compliant.</p>
<p>For OCaml 5 and onwards, the C compiler must be C11 compliant and support C11 atomics.  All we need to know about atomics for this post is that the C11 atomic spec enables the compiler to help the programmer access data that can be shared between multiple cores. Without it, the developer would need to use other synchronisation mechanisms such as mutexes, that require more code, both to write and to run. So C11 atomics go beyond what most of us associate with atomicity and are key to writing sound and efficient code in a multicore setting.</p>
<p>Once the <a href="https://learn.microsoft.com/en-gb/visualstudio/releases/2022/release-notes">Visual Studio 2022</a> release introduced experimental support for C11 atomics, it provided a much clearer path for the team to work on restoring MSVC support. This team included David Allsopp, Antonin Décimo, and Samuel Hym from Tarides, but of course, the success of the project relied on the collaboration, input, and reviews of many open-source OCaml community members. With C11 atomics support in Visual Studio 2022 being experimental, the team needed to ensure that all the sequential tests passed and identify places where parallel tests failed due to bugs in MSVC. The team created several bug reports against the MSVC compiler as a result of this project.</p>
<p>The bug reports include:</p>
<ul>
<li><strong><a href="https://developercommunity.visualstudio.com/t/C11-atomics-Pointers-to-atomic-values-/10507360">Missing Atomic Stores When Dereferencing Pointer to Atomics</a>:</strong> MSVC version 19.38.33128 lacked support for pointers to atomic values and was not emitting atomic stores when writing to a dereferenced pointer for an atomic variable.</li>
<li><strong><a href="https://developercommunity.visualstudio.com/t/C11-atomics-Compound-assignment-operat/10507357">Compound Assignment Operators are not Atomics on Pointers to Atomic Values</a>:</strong> Again, MSVC 19.38.33128 lacked support for pointers to atomic values, and this bug report highlighted that MSVC did not generate atomic code for compound assignment operators with a pointer to an atomic value.</li>
<li><strong><a href="https://developercommunity.visualstudio.com/t/C11-atomics:-Missing-atomic-stores-whe/10507356">Pointers to Atomic Values Should be Reloaded</a>:</strong> In version 19.38.33128 MSVC failed to generate atomic code, meaning that pointers were not reloaded when they needed to be, causing threads to spin indefinitely.</li>
</ul>
<p>Thanks to great support from Microsoft, these bugs were resolved, and C11 atomics support was satisfactory to enable MSVC support for multicore OCaml. This was the biggest roadblock to the project's success, and with it cleared, the team turned their attention to new challenges.</p>
<h3>Winpthreads and MSVC</h3>
<p>The next hurdle on the road to success was another Windows-specific compatibility issue. OCaml 4.* had limited support for threading in the form of the optional systhread library, but the runtime itself made no use of it. That completely changed with OCaml 5! The abstraction used to enable threading support was Unix's <code>posix</code> threads, known as 'pthreads'. At the time, the runtime was prepared in the hope that a Windows version could be implemented in the future.</p>
<p>However, the original multicore PR could use the <code>winpthreads</code> part of the <code>mingw-w64</code> library to provide a <code>pthread</code> implementation for the Windows MinGW port. The intention then was that it would be a temporary workaround allowing all the existing <code>pthreads</code> code to be reused, partly due to the belief that it would only work for <code>mingw-w64</code> and not for MSVC.</p>
<p>Upon further investigation, David discovered a <a href="https://github.com/sgeto/winpthreads-msvc">library</a> demonstrating that <code>winpthreads</code> could be compiled with MSVC without introducing too many dependencies. Samuel and Antonin worked on formalising the process of extracting the <code>winpthreads</code> sources from the <code>mingw-w64</code> project to use them for the MSVC port. Antonin also <a href="https://github.com/MisterDA/mingw-w64">contributed directly</a> to the <code>mingw-w64</code> project to patch its <code>winpthreads</code> component.</p>
<p>Thanks to this work, the initially temporary <code>winpthreads</code> workaround has been implemented as a submodule for MSVC. This lets the new MSVC port use <code>pthread.h</code> via the <code>winpthreads</code> submodule (instead of using winpthreads implicitly as provided by the <code>mingw-w64</code> GCC compiler).</p>
<p>The future of pthreads in OCaml is still up for discussion, with one school of thought being that reimplementing OCaml's use of pthreads in a more abstract way would allow its primitives to function without the full weight of the <a href="https://posix.opengroup.org/">POSIX</a> spec, resulting in better performance. Work has started to <a href="https://github.com/ocaml/ocaml/pull/13416">remove winpthreads and use modern Windows APIs for the MSVC and MinGW-w64 ports</a>.</p>
<h3>CI</h3>
<p>Finally, the team <a href="https://github.com/ocaml/ocaml/pull/12954/commits/23a3209278b3f9d6b1c68a08d735086c250d3c93">added a continuous integration workflow</a> enabling <a href="https://github.com/features/actions">GitHub Actions</a> for MSVC testing.</p>
<p>As most OCaml compiler developers use a different port than MSVC, and since there are many differences between MSVC and the other ports, being able to test MSVC in CI helps the developer be confident that their modifications do not break the code. In particular, part of the CI workflow includes a check to make sure the assembly used in the MSVC port (meaning that it's written in MASM syntax) is kept consistent with the assembly used in the MinGW port (written in GNU Assembler syntax). This check has already allowed OCaml core developers to catch a few PRs that only updated the assembler in GNU syntax, catching a problem early and preventing it from affecting the program.</p>
<h2>A Note on the Unloadable Runtime</h2>
<p>Before we leave you, let's briefly review another feature parity project, the return of the unloadable runtime. This project was paired with the return of the MSVC port as two features that were important to the community. The 'unloadable runtime' is a feature that cleans up OCaml resources, including the stack, heap sections, code fragments, buffers, tables, and more, when OCaml is used as a shared library. For example, if a host program uses OCaml as a library, when control returns to the host program, the unloadable runtime ensures proper resource cleanup.</p>
<p>The return of this feature was <a href="https://github.com/ocaml/ocaml/issues/10865#issuecomment-1559741460">requested by the community</a>, and our team worked hard to make the restoration a reality. The PR associated with this effort is <a href="https://github.com/ocaml/ocaml/pull/12964">#12964</a>, which you can check out to learn more about the process behind the changes. The PR has been merged and is expected to be released with the 5.4 update.</p>
<h2>Stay in Touch!</h2>
<p>Keep an eye out for future updates on restored features on our blog. For a broader overview of the 5.3 update, you can check out our <a href="/blog/2025-01-09-ocaml-5-3-features-and-fixes/">release blog post</a> covering the changes.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-04-23-feature-parity-series-restoring-the-msvc-port</link><guid isPermaLink="false">https://tarides.com/blog/2025-04-23-feature-parity-series-restoring-the-msvc-port.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 23 Apr 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Expanding Dune Package Management to the Rest of the Ecosystem]]></title><description><![CDATA[<p>Since we published <a href="https://preview.dune.build">The Dune Developer Preview</a> a lot of things have improved on the package management front. While the developer preview has demonstrated how Dune can manage dependencies in a unified workflow, we have been working on making it practical for more projects to adopt Dune to handle their package dependencies. Our goal is to slowly move from a developer preview to a mature feature that the general public can use and rely on.</p>
<p>What do we mean by maturation? The goal is fuzzy (as with every software, it is never 'done'), but we want to get Dune package management into a shape where we can consistently recommend that people use it for their projects.&nbsp; They should be confident that their workflows will continue to work while unlocking the new features that <a href="https://dune.readthedocs.io/en/stable/tutorials/dune-package-management/dependencies.html">Dune package management</a> brings.</p>
<p>The core points of this are:</p>
<ul>
<li>The OCaml Platform Tools should work at least as well with Dune package management as they work with <code>opam</code>. With the new features in Dune, this interoperability should work even better as users do not have to share dependencies with the project in the local switch since tools can be installed automatically, possibly even from precompiled binaries. Do you want MDX? Declare a dependency, and voila, you have MDX.</li>
<li>Most projects can start using Dune with little to no adjustments. The majority will work out of the box, and the most frequent fix required is to correct the list of project dependencies. No substantial code changes are necessary, and all projects should continue to be compatible with both <code>opam</code> and <code>dune</code>; there is no lock-in to one tool or the other.</li>
</ul>
<p>Our goal is to successfully build as many projects as possible using Dune's package management feature. But to evaluate what we have left to do, we need to know where we stand now. This blog post will give you an overview of the project's scope and biggest challenges.&nbsp;</p>
<h2>Building "All" Packages</h2>
<p>What if we want to try to build all the existing OCaml packages? <code>opam-repository</code> to the rescue! While it might not include proprietary code bases, there are still a significant number of projects we can try to build with it. Fortunately, there has already been prior work done on this subject. <a href="https://github.com/ocurrent/opam-health-check/">Opam-health-check</a> is an existing tool mostly written by <a href="https://github.com/kit-ty-kate">Kate</a> that can determine whether packages can be installed on different historical, current, and future OCaml versions. It continuously monitors the state of the opam ecosystem, which inspired its name.</p>
<p>Tarides is running and maintaining multiple <code>opam-health-check</code> instances for the community. The most well-known is <a href="https://check.ci.ocaml.org">check.ci.ocaml.org</a> which regularly builds thousands of <code>opam</code> packages on Linux, <a href="https://freebsd.check.ci.dev">freebsd.check.ci.dev</a> which does the same thing but on FreeBSD, and <a href="https://windows.check.ci.dev">windows.check.ci.dev</a> which as the name implies builds packages on Windows to help us with the effort to deliver a better OCaml experience on Windows.</p>
<p>We were wondering whether we could use the tool when building with Dune instead of <code>opam</code>. Fortunately, the software is free, so we could extend the functionality to build Dune projects instead of installing opam packages. This gave rise to the next instance of opam-health-check, <a href="https://dune.check.ci.dev">dune.check.ci.dev</a> which, instead of using <code>opam</code>, builds them using Dune package management.</p>
<h3>Which Packages are we Building, Actually?</h3>
<p>Wer misst, misst Mist. – German proverb</p>
<p>Opam takes its installation instructions from the <code>opam</code> metadata files that are collected in <code>opam</code> repositories like <a href="https://github.com/ocaml/opam-repository"><code>opam-repository</code></a>. This is how the regular <code>opam</code> health check works, it selects (nearly all) packages, and attempts to build them.</p>
<p>However, only projects that already use Dune to build can use package management. This happens because, when building a project, you need to know which dependencies to build, where these dependencies get built and installed, and which paths to pass to the compiler so it can find the modules that the dependencies install. Unlike in <code>opam</code>, the packages don't get installed into a location containing all installed libraries (a switch), but into separate directories that will be composed together when building.</p>
<p>That means we need to be a bit more selective about which packages we are going to pick for testing. Picking projects that don't use Dune will fail in 100% of the cases and will not let us draw useful conclusions besides telling us that you need Dune projects to use Dune package management, which we already know.&nbsp;</p>
<p>So, when determining which packages we want to include as our candidates, we need to filter the list of packages to ones that use Dune. The <code>opam-health-check</code> tool expects to call a shell command to generate the list. However, the process of determining which packages count as 'are using Dune' is more complicated, since the best way to determine that would be to detect whether <code>dune build</code> is used in a package and whether the package depends on the <code>dune</code> package.</p>
<p>It's a bit fuzzy, but we decided to only include packages that depend on the 'dune' package. This leaves us with a few false positives (e.g. packages that don't support the most recent versions of Dune) and also some false negatives (packages that accidentally capture a 'dune' dependency through their own dependencies), so this will probably need a bit of revision in the future, but for now, it should be good enough.</p>
<h3>What About the Rest?</h3>
<p>There are a significant number of projects using Dune and this is far from all of them. While we can't build them directly because every build system works differently, all <code>opam</code> packages can be used as dependencies and should <em>just work</em>.</p>
<p>How do we know this? We run different kinds of tests before using an internal tool that is quite similar but less sophisticated than <code>opam-health-check</code>. In a previous run on OCaml 4.14, we tested using an <code>opam</code> package as a dependency, attempting to build a project, and then checking the results. For that test, we selected 2505 <code>opam</code> packages (since they were compatible with 4.14, <code>opam install</code> could find a solution) and ran it over a few days. Ultimately, we only had 36 failures; thus, our success rate was a whopping 98%! This means that users can safely start using Dune for package management in their projects as the overwhelming majority of dependencies are compatible.</p>
<h2>What is Building a Package, Really?</h2>
<p>The biggest challenge is that much of the package metadata in the source archives is incorrect. As a result, <code>dune pkg lock</code> almost certainly picks invalid versions of dependencies. Why is that?</p>
<h3>Dependencies Galore</h3>
<p>Opam installs packages by inspecting the files in its own metadata repository, <code>opam-repository</code>. This repository is created by authors submitting their packages on release, and from there on, it is maintained by the <code>opam-repository</code> maintainers. They will make sure to add dependencies that have been accidentally left out or adjust when new, incompatible versions of dependencies get published. Older package definitions will be updated to include upper version constraints.</p>
<p>However, if we check out a repository via git or download the source archive and try to build it with Dune, we don't have all these updates. Without them, many packages will fail to build (be it with opam or Dune).</p>
<p>These issues can often be fixed very easily by the author of the package, and having Dune fail to build packages due to invalid dependencies is very disappointing. If the dependencies were to be fixed, the project would either work just fine with Dune package management (success, hooray!) or at least fail with a more interesting error. Marking it as a dependency failure does us a disservice by hiding potential errors.</p>
<p>Our hack to test for Dune package management compatibility rather than accurate dependency declarations was to replace the dependencies from the source archive with information from <code>opam-repository</code>. This was a two-step process:</p>
<ol>
<li>Overwriting the <code>opam</code> files with the opam files from <code>opam-repository</code>.</li>
<li>Removing the dependency information from <code>dune-project</code> because Dune prioritises the information in this file by default.</li>
</ol>
<p>Step two had an additional challenge as the <code>dune-project</code> file is in S-expression syntax, but the usual helpful processing tools like <code>jq</code> do not support S-expressions. So, we used Jane Street's <a href="https://github.com/janestreet/sexp">sexp</a> tool to do the processing, along with a generous helping of common Unix shell tools.</p>
<p>This is not to say that users should be migrating their dependency specifications out of <code>dune-project</code> (they shouldn't), but for our automated processing it was easier to take the updated opam files and use them as-is, instead of migrating them back into the <code>dune-project</code> syntax.</p>
<h3>What is a Package, Actually?</h3>
<p>When <code>opam</code> builds a package that uses Dune, it usually calls <code>dune build -p &lt;package-name&gt;</code>, which makes Dune ignore everything in the source repository that is not attached to the package name. However, it doesn't work for the health check, as you want all projects in the source archive to be built, not just the current one that is to be tested. But you also don't want to build every package from the source archive, as that might introduce additional dependencies and unrelated failures. Likewise, you don't want to build code that is not part of any package (e.g. examples, benchmark, utilities).</p>
<p>In the end, we solve it by determining the internal dependencies of the project to be built and then collecting these dependencies. We start the build by calling <code>dune build --only-packages &lt;packages-discovered&gt;</code> to restrict the build to only these packages.</p>
<h2>Ok, Ok, but Show Me the Results!</h2>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/dunepkgscreen-170w~1TZEWWngxwWZsoqI6dxswg.webp 170w, /blog/images/dunepkgscreen-340w~g14Dv1h3bLnE2yJArixSZA.webp 340w, /blog/images/dunepkgscreen-680w~PUjt3ohsCeVZPn7jMJWw0A.webp 680w, /blog/images/dunepkgscreen-1360w~CIZK07raZiNDJPM-1kezpw.webp 1360w" src="/blog/images/dunepkgscreen-1360w~CIZK07raZiNDJPM-1kezpw.webp" alt="Output of a run of opam-health-check building Dune packages, ordered by amount of reverse dependencies"></p>
<p>The output of these runs is published on <a href="https://dune.check.ci.dev/">dune.check.ci.dev</a>, where we build the candidate packages on Linux amd64 using the <a href="https://preview.dune.build/">Dune developer preview binaries</a>. We chose this platform because it will give us the biggest set of candidates since most packages are developed on systems similar to it. On the website, you can see all the packages we selected and the result of the build. At the time of writing we have selected 2243 packages to build and 1866 have completed the build successfully, which means that, at the time of writing, we have an 83% success rate in building projects directly! For the remaining 377 packages, the failures can be seen when clicking the entries since <code>opam-health-check</code> keeps logs of all the builds. It is our main tool to determine which issues to tackle next. So as we go forward we expect the success rate to rise to match <code>opam</code> as closely as possible!</p>
<h2>Where Do We Go From Here?</h2>
<p>Now that we have <code>opam-health-check</code> running and reporting build successes and failures, we can look into the build issues that it has revealed. A lot of them were small stumbling blocks which could have nevertheless been blockers to adoption:</p>
<ul>
<li>The potentially simplest issue arose from the Dune not supporting packages distributed in ZIP archives. Due to OCaml's strong origins on Unix, most packages are distributed as compressed tarballs (often <code>gzip</code> or <code>bzip2</code> compressed). However, especially on Windows, the ZIP format is more popular and is also supported in <code>opam</code>. In <a href="https://github.com/ocaml/dune/pull/11511">#11511</a>, we added Dune support for uncompressing ZIP files. We usually call programs to decompress the data to avoid shipping implementations of compression algorithms. However, to use these programs, they need to be available, and what is available depends on the platform. The simplest command to call is <code>unzip</code> from the Info-ZIP project. Still, on some platforms, the <code>tar</code> command also supports decompressing ZIP files as if they were tarballs, so we're trying to use whatever the user might have available.</li>
<li>When pinning a package, we assume it uses Dune. This works most of the time because a significant number of packages use Dune to build, but if a package does not, we will have to build and install it using the commands that it declares in its <code>opam</code> file. <a href="https://github.com/ocaml/dune/pull/11513">#11513</a> does just that. It extracts the commands when pinned and uses them when the pinned package needs to be built.</li>
<li>A somewhat obscure semantic of the way dependencies and conflicts are represented in <code>opam</code> files is that packages which are dependencies are implicitly conjunctions (depending on <code>foo</code>, <code>bar</code> means depending on <code>foo</code> AND <code>bar</code>); however, for conflicts, they are implicitly disjunctions (conflicting with <code>foo</code> and <code>bar</code> means to conflict with <code>foo</code> OR <code>bar</code>). This makes a lot of sense intuitively but is easily forgotten. Dune used to accept a conflict only if all packages were conflicting, and this behaviour flew under the radar for a long time because conflicts are rare. Most of the time, the conflict is only a single package, in which case it doesn't make a difference. This was fixed in&nbsp; <a href="https://github.com/ocaml/dune/pull/11515">#11515</a>, which also simplified the code.</li>
<li>When solving a project's dependencies, the solver has to go through all of them and find a solution that satisfies all constraints, or it will display an error. These constraints are usually declared in your <code>dune-project</code> or <code>.opam</code> files, but when using Dune package management, there is an additional constraint: the solution needs to be buildable with the currently running version of Dune. Unfortunately, in such a case, the solver would crash. In <a href="https://github.com/ocaml/dune/pull/11554">#11554</a>, we solved the issue to some degree: instead of crashing, the solver will display an error message, which will hopefully make it clearer why it can't find a solution.</li>
<li>Opam has a little-known but very useful feature when declaring package dependencies. Instead of depending on a specific version, the user can use the current version of the package as a variable. This allows projects that consist of multiple packages to depend on each other without having to update all dependencies on every release (an example of this is <code>ocaml-zmq</code>, which comes with <code>async</code> and <code>lwt</code> variants which depend on a common core). However, these constraints don't matter much when building the packages, so we always set the version to <code>dev</code>. Unfortunately, this can cause subtle issues where no solution can be found, so in&nbsp; <a href="https://github.com/ocaml/dune/pull/11517">#11517</a>, the code was changed to attempt to read the <code>version</code> fields to populate the variable with the value the user declared.</li>
<li>At the moment, Dune handles the compiler in a special way. When attempting to build the compiler, instead of building it in the project, it will build it in a separate location in the user's home directory. This is due to the fact that the compiler can't be moved to a different location at the moment (work is underway to improve the situation - that effort is called "relocatable OCaml"). How OCaml 5.3.0 is packaged in <code>opam-repository</code> changed and introduced a new transitive dependency for the compiler. Thus, the code would not be able to properly detect which <code>opam</code> package is the compiler. This was fixed in <a href="https://github.com/ocaml/dune/pull/11310">#11310</a> by computing the dependency cone of all possible compiler packages that are currently used to detect which package contains the compiler.</li>
<li>Opam has a way to mark a package as 'do not pick this package unless requested explicitly' - <code>avoid-version</code>. This is, for example, used to mark beta versions of packages that can be installed manually but should not be automatically picked. The solver in Dune does not have such a feature, so originally, Dune sorted these packages to the end of the candidate list, but it would not match the semantics of <code>opam</code>. Dune would then interpret them as forbidden dependencies. However, some older packages failed to build without access to these dependencies, so <a href="https://github.com/ocaml/dune/pull/11494">#11494</a> was implemented where, instead of failing, the solver tries to minimise the number of dependencies picked that have the <code>avoid-version</code> flag.</li>
<li>Findlib, the tool whose package specification format is prevalent in the OCaml ecosystem and is also used by Dune, has a feature where parts of packages are installed in subdirectories. These subdirectories can also be optional when certain package features are enabled or disabled during building. It is a rare feature, but some real-world packages use it. Unfortunately, Dune would always assume that these directories existed if they were declared and try to read their contents. But if the directory does not exist (e.g. the feature is disabled), this would lead to a crash. The fix in <a href="https://github.com/ocaml/dune/pull/11569">#11569</a> is short and shows that all bugs are shallow if enough eyes inspect the code.</li>
</ul>
<p>Fixing these issues has gotten us to an (at the time of writing) 83% success rate in building projects according to <code>opam-heath-check</code>. That's a pretty good result and makes us confident that the package management feature is on the right track.</p>
<p>The issues above, as well as future issues related to package coverage and their status, are collected in a <a href="https://github.com/ocaml/dune/issues/11601">tracking issue</a> on the Dune bug tracker.</p>
<h2>How You Can Help</h2>
<p>If you want to take part in improving our OCaml ecosystem to have a simple, one-stop-shop for building and installing packages check out <a href="https://preview.dune.build/">the nightly developer preview</a> and try it with your projects. The team is looking for feedback on how they can improve Dune package management, so please share your thoughts on <a href="https://discuss.ocaml.org">Discuss</a>, and report any issues on <a href="https://github.com/ocaml/dune/issues">GitHub</a>!</p>
<p>Stay in touch with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-04-11-expanding-dune-package-management-to-the-rest-of-the-ecosystem</link><guid isPermaLink="false">https://tarides.com/blog/2025-04-11-expanding-dune-package-management-to-the-rest-of-the-ecosystem.html</guid><dc:creator><![CDATA[ Marek Kubica ]]></dc:creator><pubDate>Fri, 11 Apr 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml in Space: SpaceOS is on a Satellite!]]></title><description><![CDATA[<p>OCaml is in space! With its impressive performance and security guarantees, OCaml is a great choice for the many interconnected devices that power our world. Satellites are not only crucial to the functioning of these devices, but the <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">new generation of satellites</a> are beginning to function like Cloud servers, where one device hosts more than one software and performs more than one service.</p>
<p>The natural next step, considering the growing need for agile multi-purpose satellites and the suitability of OCaml-based solutions, is to put OCaml to work in space! Following up on our sister company <a href="https://parsimoni.co/index.html">Parsimoni</a>’s SpaceOS project, there has been an <a href="https://parsimoni.co/blog/2025-02-11-parsimoni-to-demonstrate-its-spaceos-in-orbit-on-clustergate-1.html">exciting development</a> on this front. Parsimoni has partnered with <a href="https://www.dphispace.com/">DPhi Space</a> and put <a href="https://parsimoni.co/blog/2025-03-24-spaceos-is-now-in-orbit-on-clustergate-1.html">SpaceOS software aboard their Clustergate ride-sharing platform for hosted payloads</a>. OCaml launched into space aboard <a href="https://nextspaceflight.com/launches/details/7136">Transporter-13</a> on the 15th of March.</p>
<h2>The Clustergate Platform</h2>
<p>Clustergate is a payload platform developed by DPhi Space and deployed on a host satellite. The goal of this platform is to offer the power and computing capabilities of a larger satellite to cube-sat-sized payloads at a lower cost, where customers only pay for what they need. Making satellite deployment more accessible and agile will change the future of space innovation, incentivising new actors and services.</p>
<p>Parsimoni is changing the way that satellite software is designed, centring on the security, efficiency, and scalability of satellite payload management. By utilising <a href="https://mirage.io/">unikernel technology written in OCaml</a>, SpaceOS can host multiple applications with a reduced attack surface, safe from a multitude of common security vulnerabilities, without the overhead of typical virtual machines.</p>
<h2>What’s on the Satellite?</h2>
<p>DPhi Space has embedded its own Clustergate computer on Transporter-13, and the team behind SpaceOS has onboarded OCaml 5 software on the satellite. As part of a larger ‘rideshare’ mission, DPhi Space’s computer hosts a variety of software and hardware from different partners.</p>
<p>So, what OCaml code is on the satellite? All in all, the team have included a simple version of Petrel to manage unikernels, the necessary ‘glue code’ to give unikernels access to orbital data, the basic functionality needed to manage data transfers and send commands, plus a ‘hello world’ unikernel. Petrel is an experimental unikernel manager and orchestrator (written in OCaml 5 with <a href="https://github.com/ocaml-multicore/eio">Eio</a> concurrency) based on <a href="https://github.com/robur-coop/albatross">Albatross</a> by <a href="https://robur.coop/">Robur</a>. Instead of hardware virtualisation (which is not available on this flight), our team uses the <code>solo5-spt</code> backend of MirageOS as the unikernel runtime, which leverages Linux’s <code>seccomp</code> feature to isolate the software payloads.</p>
<p>During the mission, they will test whether the system works by sending new MirageOS unikernels using the data onboard. Parsimoni expects to start testing the software onboard in May. The goal is to show SpaceOS in action, sending and receiving interesting data and deploying self-contained applications on a limited bandwidth. They will start with the hello world and move on to more complex tasks utilising orbital data.</p>
<h2>Until Next Time</h2>
<p>You can watch <a href="https://www.youtube.com/watch?v=zLxXLmTzHkk">the launch of Transporter-13</a> to see the moment when OCaml travels through the atmosphere! If you want to discuss the opportunities that SpaceOS offers for deploying specialised and secure applications that use limited resources efficiently, you can <a href="/contact/">contact us</a> or <a href="https://parsimoni.co/index.html#contact">Parsimoni</a> to find out more.</p>
<p>Connect with Tarides online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2025-04-03-ocaml-in-space-spaceos-is-on-a-satellite</link><guid isPermaLink="false">https://tarides.com/blog/2025-04-03-ocaml-in-space-spaceos-is-on-a-satellite.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 03 Apr 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[ FOSDEM 2025: Report from the Friendly Functional Languages BOF Room]]></title><description><![CDATA[<p>On Thursday, January 30, 2025, I spontaneously decided to join three of my colleagues, Jules Aguillon, Xavier Van de Woestyne, and Paul-Elliot Angles d'Auriac, in attending <a href="https://fosdem.org/2025/">FOSDEM 2025</a>.</p>
<p>When I realised that FOSDEM was still taking proposals for Birds-of-a-Feather (BOF) sessions, I submitted a proposal to organise a Functional Languages BOF session around the idea of gathering the Functional Programming community to showcase projects and elegant solutions to real world programming problems.</p>
<p>I was surprised and happy when the proposal was accepted by the FOSDEM organisers late Saturday night, during the conference, leaving just enough time to prepare and call in the Functional Programming community at FOSDEM for our Sunday afternoon session. Thankfully, FOSDEM runs a Matrix chat server for the conference, so it was simple to announce this last-minute addition to the conference schedule.</p>
<h2>Arriving at the Friendly Functional Languages BOF Room</h2>
<p>On Sunday afternoon, almost 50 functional programming enthusiasts filled room H.3242 for our "Friendly Functional Languages Show and Tell" session. The turnout exceeded our expectations by far and represented a diverse cross-section of the functional programming community. We had developers from various language communities including <a href="https://ocaml.org/">OCaml</a>, <a href="https://gleam.run/">Gleam</a>, <a href="https://elm-lang.org/">Elm</a>, <a href="https://elixir-lang.org/">Elixir</a>, <a href="https://www.haskell.org/">Haskell</a>, and several others.</p>
<h2>Show and Tell</h2>
<p>The format was intentionally casual – a space for practitioners to share real-world code they're proud of and discuss practical applications of functional programming principles.</p>
<p>During these open sessions, participants presented programming techniques, API choices, interaction with an IDE, concepts inherent to programming and, of course, their projects! We saw parser combinators (live-coded in Haskell), the use of a monad to implement ‘undo’ functionality over composable operations (which Paul-Elliot presented in the context of his personal OCaml project Slipshow), compile-time SQL query generation in Gleam, effect abstraction in Haskell, and a turn-based videogame with a frontend built on Elm. It was clear that people are active and that they have a lot to say in 5 minutes!</p>
<p>I found it very interesting to see people demonstrate how functional programming can elegantly solve complex problems.</p>
<h2>Looking Forward: Organising an FP Dev Room 2026</h2>
<p>Perhaps the most exciting outcome of our BOF session was the discussion we initiated towards the end of the session about establishing a Functional Programming dev room at FOSDEM 2026. Since only the most mainstream programming languages have a realistic chance at having their application for a dev room at FOSDEM accepted, many attendees expressed interest in creating a space for FP languages next year, and several volunteered to help organise it.</p>
<p>We did a quick brainstorming session on what kinds of sessions an FP dev room at FOSDEM should host in order to achieve more visibility of functional programming in the Open Source scene. A major theme that emerged is that we need to show what can be and is being built with these languages in production environments. For example, OCaml is used by <a href="https://ocaml.org/success-stories/peta-byte-scale-web-crawler">Ahrefs</a> to build their leading SEO platform and by <a href="https://ocaml.org/success-stories/large-scale-trading-system">Jane Street</a> to power their trading operations, which handled $17 trillion worth of securities trades in 2020 using a codebase of 65 million lines of OCaml. <a href="https://www.erlang.org">Erlang</a> is another example, as it is <a href="https://github.com/WhatsApp">used to power WhatsApp</a>, supporting billions of active users.</p>
<h2>Engaging the Open Source Community</h2>
<p>As a Developer Advocate for OCaml at Tarides, I was happy to connect with developers and organisations within the Open Source software ecosystem. Several developers curious about OCaml volunteered to participate in recorded user testing sessions for the OCaml tooling we're developing at Tarides.</p>
<p>These kinds of direct interactions are invaluable for understanding how developers approach OCaml, what challenges they face, and what opportunities exist to make the language more accessible and powerful. People's willingness to contribute time to help improve the ecosystem speaks volumes about the collaborative spirit of the open-source community.</p>
<h2>Final Thoughts</h2>
<p>Organising a BOF session at FOSDEM was a somewhat spontaneous decision, but it proved to be an excellent opportunity to bring together functional programming enthusiasts in an informal setting.</p>
<p>I'm looking forward to seeing how the seeds planted during this session grow into a more established functional programming presence at FOSDEM 2026. If you're interested in joining the organising team for next year's prospective FP dev room at FOSDEM, feel free to reach out to <a href="mailto:sabine@tarides.com">sabine@tarides.com</a>.</p>
<p>Thank you to everyone who attended and made the session such a success!</p>
<p>Connect with Tarides online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects.</p>
<p><em>Sabine Schmaltz is a Developer Advocate at Tarides, focusing on OCaml community engagement and developer experience.</em></p>
]]></description><link>https://tarides.com/blog/2025-03-28-fosdem-2025-report-from-the-friendly-functional-languages-bof-room</link><guid isPermaLink="false">https://tarides.com/blog/2025-03-28-fosdem-2025-report-from-the-friendly-functional-languages-bof-room.html</guid><dc:creator><![CDATA[ Sabine Schmaltz ]]></dc:creator><pubDate>Fri, 28 Mar 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[We're Moving Ocsigen from Lwt to Eio!]]></title><description><![CDATA[<p>Among the <a href="/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-life/">big changes</a> that came with OCaml 5, concurrency via effect handlers was introduced alongside the I/O library <a href="https://github.com/ocaml-multicore/eio">Eio</a>, letting users take advantage of effects to write <a href="/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin/">more efficient</a> concurrent programs. In an exciting new project, we are transitioning one of the biggest OCaml open source projects, Ocsigen, from <a href="https://ocsigen.org/lwt/latest/manual/manual"><code>Lwt</code></a> concurrency to concurrency using effects.</p>
<p>The most exciting part of this project is that we will develop tools to automate parts of the transition and document how we achieve it, which will be great resources for the wider OCaml community. This work is made possible thanks to a grant from the <a href="https://nlnet.nl/">NLnet Foundation</a>, which funds research and development projects furthering internet technologies and the open internet, and the <a href="https://nlnet.nl/core/">NGI Zero Core fund</a> of the European commission. This post will give you an overview of the tools, the goals of the project, and some of the methods we will use.</p>
<h2>Why Ocsigen and Why Eio?</h2>
<p><a href="https://ocsigen.org">Ocsigen</a> is a web and mobile framework composed of several projects and libraries including <a href="https://ocsigen.org/eliom/latest/manual/overview">Eliom</a>, <a href="https://ocsigen.org/js_of_ocaml/latest/manual/overview">Js_of_ocaml</a>, <a href="https://ocsigen.org/ocsigenserver/latest/manual/quickstart">Ocsigen Server</a>, and <a href="https://ocsigen.org/lwt/latest/manual/manual">Lwt</a>. Ocsigen lets you build a variety of applications, from simple server-side web sites to complex client-server web and mobile apps. It is built using OCaml and benefits from its strong type system to reduce development time, simplify refactoring, and reduce the likelihood of bugs. It is one of the biggest open source projects in OCaml, and is used commercially to run the <a href="https://www.besport.com/group/10902">BeSport</a> app.</p>
<p>Choosing a large and established project like Ocsigen will give the community a well-documented proof-of-concept of what the transition between different concurrency models looks like.</p>
<p>Lwt, a monadic-style concurrent programming library for OCaml, is developed as part of the Ocsigen umbrella. It has served as a way of managing I/O operations using promises for many different OCaml projects, and it is significant that Lwt’s own inventor and biggest user Ocsigen is now making the switch to effects.</p>
<p>The monadic style has several advantages over more traditional concurrency models (like preemptive threading or interaction loops) including fewer data races and straightforward writing, but also comes with drawbacks in comparison to direct-style effects-based concurrency. Namely, creating an abundance of heap allocations and introducing the <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">function colouring problem</a> to the user’s programs. With direct-style concurrency it is possible to write code in a natural, direct, style as opposed to callback-style with considerations for which code is concurrent and which is not. For more information about concurrency using effect handlers, check out our <a href="/blog/2024-03-20-eio-1-0-release-introducing-a-new-effects-based-i-o-library-for-ocaml/">blog post on Eio</a>.</p>
<p>Switching to effects is also a first step towards enabling the use of multicore features, which is not fully possible with Lwt.</p>
<h2>Goals of the Project</h2>
<p>The aim of the project is to create a straight-forward path for developers who want to transition projects to OCaml 5 and its new direct-style concurrency libraries. To this end, the team will develop tools for rewriting monadic syntax into direct style, rewrite interfaces, and automate the use of libraries like <a href="https://ocaml.org/p/picos/0.4.0/doc/Picos_lwt/index.html">picos-lwt</a> and <a href="https://github.com/ocaml-multicore/lwt_eio">eio-lwt</a>.  They will also develop heuristics for detecting places in the code where manual intervention is required, simplifying the developer workflow.</p>
<p>Put simply, we are going to:</p>
<ul>
<li>Automate the aspects of transforming monadic style concurrency to direct style concurrency that we can,</li>
<li>Make manual intervention as smooth as possible,</li>
<li>Document the process so that it is easy to replicate, troubleshoot, and adapt for other projects.</li>
</ul>
<p>Making effects-based concurrency easier to adopt means that more OCaml developers can potentially take advantage of its benefits. Speaking from experience, Simon Grondin summed up his experiments with Eio in our <a href="/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin/">interview with him from 2024</a>:</p>
<blockquote>
<p>Eio helped me reason about my code, and I discovered bugs and problems because of how much Eio had cleaned up the code. I uncovered hidden bugs in every program I converted from Lwt to Eio. Every single one also ended up being faster, not because Eio itself was faster (it was as fast as Lwt), but because of the optimisations I could now afford to make, thanks to the reduced complexity.</p>
</blockquote>
<p>This project will enable more people to try Eio and see if their experience matches Simon’s, with the potential to significantly improve their workflow!</p>
<h2>Challenges and Methods</h2>
<p>One of the biggest challenges facing this work is the way that in an effect-based library there is an explicit fork feature to create a new thread, whereas forking is implicit with <code>Lwt</code> which makes it hard to detect. This fact alone is the reason why the team won’t be able to write a fully automated conversion tool.</p>
<p>Another challenge for the team is to make sure that they stay as neutral as possible in their approach to the effect library's design, in order to be able to make changes later or provide multiple alternatives.</p>
<p>Finally, they will strive to maintain backward compatibility to the greatest extent, by using Lwt-effect bridges to enable intercompatibility for existing applications without forcing them to switch immediately.</p>
<h2>Until Next Time</h2>
<p>Keep your eye on our blog and <a href="https://discuss.ocaml.org">OCaml Discuss</a> for more updates on this project and the tools that emerge from it.</p>
<p>Connect with Tarides online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2025-03-13-we-re-moving-ocsigen-from-lwt-to-eio</link><guid isPermaLink="false">https://tarides.com/blog/2025-03-13-we-re-moving-ocsigen-from-lwt-to-eio.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 13 Mar 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Feature Parity Series: Statmemprof Returns!]]></title><description><![CDATA[<p>Welcome to part two of our feature parity series! In it, we present returning features that were originally lost when OCaml gained multicore support.  The addition of multiple domains means that the underpinning design decisions behind certain features have had to change significantly, and work is ongoing to adapt them and return them to OCaml 5.</p>
<p>One of these features is memory profiling, which, after much theoretical consideration, has been <a href="https://github.com/ocaml/ocaml/pull/12923">successfully adapted to OCaml 5</a>. Memory profiling is an important tool for developers who want to optimise their programs, and our post today delves into OCaml 5’s statistical memory profiler, <code>statmemprof</code>, and its now multicore-compatible design. Let’s explore the journey to its return!</p>
<h2>What is a Memory Profiler?</h2>
<p>Developers use memory profilers to understand how their programs use memory. Whether they think it’s using too much, is behaving suspiciously, or want to analyse it for comparison’s sake, attaching a memory profiler lets them see how their program allocates memory and keep track of it when it runs. It sounds straightforward, but this is where the challenges begin!</p>
<p>One of the first hurdles to clear is the sheer volume of allocated memory. Many programs, and in fact, many of the programs that are likely to be interesting from a memory perspective, allocate terabytes of memory over their run time. Running a memory profiler that monitors them all would significantly slow down the entire system. OCaml used to have a memory profiler that monitored all allocations (see <a href="https://blog.janestreet.com/a-brief-trip-through-spacetime/"><code>spacetime</code></a>), but it was removed because it was too resource-expensive.</p>
<p>The solution to this first conundrum is to use a <em>statistical</em> memory profiler (the ‘stat’ in Statmemprof). A statistical memory profiler records a random sample of memory allocations in the program. This method still allows users to find allocations that stand out. Large allocations of memory tend to be more noteworthy, and consequently, if you have a program that allocates small and large pieces of memory, you want the random sampler to sample the bigger ones more often.</p>
<p>Statmemprof was added to 4.11 while Multicore OCaml was in development. Hence, the original implementation of Statmemprof did not have to worry about how memory profiling should work with multiple domains. With the arrival of OCaml 5 with its multicore support, this question had to be addressed.</p>
<h2>How <code>Statmemprof</code> Works With OCaml 5</h2>
<h3>Memory Allocation in OCaml</h3>
<p>There are a few things one needs to wrap one’s head around to understand how <code>statmemprof</code> does its magic.  This includes the way OCaml allocates memory with an inline pointer-bump allocator. If you are already familiar with memory allocation in the minor and major heap, jump ahead to the next section!</p>
<p>OCaml needs to be able to allocate millions of objects a second and, therefore, needs very efficient memory allocation. Most programming languages call a function in the language library (such as <code>malloc</code>) that determines which memory to allocate. This process is too slow to work well in OCaml and for many other garbage collected languages such as Haskell, which also use bump-pointer allocators.</p>
<p>In OCaml, part of the total memory available is reserved in what is known as the minor heap. In the minor heap, an allocation register points to the lowest address of allocated memory or to the boundary between what is allocated and free. Say a new object needs 32 bytes of memory: the system subtracts 32 bytes from where the allocation register is pointing and this space is used for the new object. When the minor heap’s garbage collector (GC) runs, it checks which objects can be reclaimed and which need to be kept. Surviving objects are promoted to the major heap, and the allocator register is reset to the start of the minor heap since it is now empty.</p>
<p>The minor heap has a ‘limit’, most commonly set to where the heap’s space ends, that, when reached, triggers a jump into the runtime system. The runtime can then take one of several actions, including garbage collection. This design makes memory allocation in OCaml very fast. Crucially for our topic today, this limit can be used to trigger a number of important events. Signal handling, for example, is achieved by tripping the limit in the minor heap to get into the runtime, which then runs the signal handlers. The runtime decides what actions to take and where to set the limit in the minor heap, allowing it to perform many different behaviours.</p>
<p>OCaml can also bypass the minor heap and allocate objects directly in the major heap. This is useful for very large objects, which tend to live longer and survive the minor heap’s GC anyway. That's a topic for another time.</p>
<p>With this basic overview of how OCaml allocates memory in our back pocket, let’s look at how <code>statmemprof</code> profiles memory in this system.</p>
<h3>Statistical Memory Profiling in OCaml</h3>
<p>The key to how <code>statmemprof</code> profiles memory lies in how the ‘statistical’ aspect is defined. To sample only a subset of memory allocations we need to define a workflow by which we get a random selection of samples. Since it only profiles every <code>n</code> number of allocations the user can leave the profiler running in the background without introducing significant overhead.</p>
<p>So how does it work? We need to generate a number for both the minor and major heap to help us select the sample we want to profile. We need the number to be random, meaning that every number has an equal probability of being generated. <code>Statmemprof</code> achieves this through statistical sampling using a so-called <a href="https://www.sciencedirect.com/topics/mathematics/bernoulli-trial">Bernoulli trial</a>, meaning that it samples every word of memory allocation with the same probability.</p>
<p>Say the event we’re interested in is the allocation of a single word of memory to the minor heap. We have a parameter called ‘lambda’ for any such event, which represents the likelihood that <code>statmemprof</code> will sample that particular event. The random number we get, called a geometric random variable, stands for how many Bernoulli trials for some given lambda (or likelihood). You can also think of it as how long do we wait (how many events happen) before we sample one event.</p>
<p>This choice of distributions is driven by the sampling mechanics in each heap. For the minor heap, we need to know "when is the next sample due?" which is naturally modeled by a geometric distribution - it tells us how many trials (allocations) until we hit our first success (sample). For the major heap, since we're dealing with larger blocks of memory, we need to know "how many samples should we take in N words?" This is naturally modeled by a binomial distribution, as it represents the number of successes (samples) in a fixed number of trials (N words). The geometric distribution is also computationally efficient for triggering the GC mechanism at the right time, while the binomial distribution provides a more systematic way to sample larger memory blocks.</p>
<p>Now, let's imagine we get a random number, say 137. That number is subtracted from the allocation register in the minor heap, and the limit is set there. When the limit is reached, we go into the runtime, and the action we take is to take a memory profile sample. <code>Statmemprof</code> then generates a new number, and the process repeats. The process is the same for the major heap, but we use a binomial random variable instead of a geometric one.</p>
<p>The benefit of statistical memory profiling is that smaller-sized objects in the minor and major heaps are less likely to be sampled since they don’t take up as much space as larger objects. This is good because the larger objects tend to be more interesting from a memory profiling perspective.</p>
<h3>What Happens When <code>Statmemprof</code> Samples an Object?</h3>
<p><code>Statmemprof</code> was designed to be a flexible mechanism that gives the programmer a lot of choice. There is no hardwired action set up for when <code>statmemprof</code> samples an allocation. Instead, there are a number of actions to choose from left open for users to configure. They include determining the size of the object, whether it came from the minor or major heap, and what the program was doing at the time of the object’s allocation.</p>
<p>When statmemprof samples an allocation it executes a callback (a construct that essentially works like a function) which is provided with details about the allocation and a backtrace. A backtrace refers to the sequence of functions that called a particular function. Backtraces are used to trace backwards from the function that triggered the allocation to the functions that called it, and so on, until it reaches the entry point of the program. What this means for statmemprof is that the API provides enough details for tools like <code>memtrace</code> to generate visual representations of memory use for the user's programs.</p>
<p>There are five different kinds of events that can trigger the callback:</p>
<ol>
<li><code>alloc_minor</code>: an object is allocated to the minor heap</li>
<li><code>alloc_major</code>: an object is allocated to the major heap</li>
<li><code>promote</code>: an object survives garbage collection and is moved to the major heap</li>
<li><code>dealloc_minor</code>: an object does not survive garbage collection and is freed from the minor heap</li>
<li><code>dealloc_major</code>: an object does not survive garbage collection and is freed from the major heap</li>
</ol>
<p>So, the hypothetical lifecycle of an object could be as follows: it gets stored in the minor heap with <code>alloc_minor</code>. The limit is tripped in the minor heap, and the garbage collector runs. The object survives garbage collection and is moved to the major heap with <code>promote</code>. The garbage collector runs in the major heap, and if the object is not needed anymore, it gets freed with <code>dealloc_major</code>. As an object's lifecycle progresses, statmemprof will execute a callback for each event and a complete picture of it can be built up. <code>Statmemprof</code> is designed to be flexible and configurable, and, for example, users can choose to set the profiler to retain callback information or opt to discard it.</p>
<h3>Memtrace</h3>
<p>For many users, delving into the code to configure <code>statmemprof</code> would add an undesirable level of complexity to their workflow. The solution is to use tools like <a href="https://github.com/janestreet/memtrace">Memtrace</a>, a profiling library that uses the <code>statmemprof</code> interface. By building on the <code>statmemprof</code> functionality, these tools enable users to profile memory in the way they want to without having to worry about the specifics of how <code>statmemprof</code> works. Memtrace can accumulate the allocations and callstacks from the program to get a picture of which code locations are responsible for triggering allocations. (Note that, as of writing, the 5.3 compatible version of Memtrace has yet to be released by <a href="https://www.janestreet.com/">Jane Street</a>, but work is <a href="https://github.com/janestreet/memtrace/pull/22">underway</a>).</p>
<p>Memtrace was created at Jane Street to help them pinpoint memory issues like space leaks. It uses the callback API implemented by <code>statmemprof</code> to record allocation events in the binary format Common Trace Format (CTF). Memtrace also comes with a <a href="https://github.com/janestreet/memtrace_viewer">viewer</a>, a helpful tool that lets developers visualise their programs and see how memory is allocated.</p>
<p>Generating a trace is straightforward, and Luke Maurer from Jane Street outlines the process <a href="https://blog.janestreet.com/finding-memory-leaks-with-memtrace/">in a great blog post on their website</a>, and, if you want to learn more about the design of Memtrace, check out this <a href="https://github.com/janestreet/memtrace/blob/master/docs/internal.md">excellent guide</a>.</p>
<p>This is just one example of how restoring <code>statmemprof</code> support brings powerful options to users of OCaml 5. Its features support the creation and implementation of tools that let users manage and understand how their programs use memory in new and detailed ways.</p>
<h3>Considerations for Multiple Domains</h3>
<p>So how do multiple domains affect the design of a memory profiler? The choices we made reflect our preference after weighing our options, and not necessarily the only 'right' way to approach the problem. Below are some examples of the design choices we made while bringing memory profiling to multiple domains in OCaml:</p>
<ul>
<li>Let’s say you have two domains running at the same time doing different jobs separately, then one domain starts profiling its memory allocations. Should memory allocated by the other domain be sampled? For us, the answer was no. Behaviour in separate domains should be treated independently of each other.</li>
<li>Say you are in one domain and you start profiling, then, from this domain, you spawn another. Should the allocations in the new domain be profiled? We chose to answer 'yes' to this, since the new domain was created to achieve the work of the original domain.</li>
<li>Should call-backs keep running after the profiler has called <code>stop</code>? In OCaml 4, after <code>stop</code> was called <code>statmemprof</code> would essentially throw away all of its sampled information. In OCaml 5 the user can determine whether to ask the profiler to stop sampling, where <code>statmemprof</code> stops sampling new allocations but keeps the information, or stop and discard where the profiler discards all the information held for that profile. This wasn’t a relevant feature for OCaml 4 since a terminated domain meant the program had ended and <code>statmemprof</code> could just disregard that information. With OCaml 5, longer running memory profiling is more likely, and we need to be able to distinguish between the two <code>stop</code> calls.</li>
<li>For <code>statmemprof</code>, one domain can start a ‘profile’ by calling the start function of <code>statmemprof</code> and sets up all the callbacks and sampling separately from all other domains. In theory, you could apply entirely different profiling tools, like <code>memtrace</code>, in different domains in the same program.</li>
<li>Let’s say you run a program on multiple domains and run a profile on one domain which allocates some objects, samples them, and runs the allocated callbacks. Let’s then suppose that that domain terminates but the profile keeps running (say if another domain is running the same profile) and an allocation callback is promoted in the GC and continues its lifecycle. It is generally the rule that callbacks should be run by the domain that allocated the object, but if that original domain has terminated the callback may be run by a different domain because the object might still be alive on the major heap. When the object is freed and <code>statmemprof</code> would need to run a deallocation callback, it can also run that callback from a different domain if the original domain has been terminated.</li>
<li>Lastly, a lot of work went into synchronisation and ensuring that no domain was ever waiting for <code>statmemprof</code> before being able to continue its jobs. <code>Statmemprof</code> only uses one lock to enforce synchronisation, which occurs when a domain terminates while <code>statmemprof</code> is still running. Its data is put on the orphans list which is protected by a lock. Any other domain can then adopt this data.</li>
</ul>
<p>These were just some of the decisions that our team made to ensure the profiler worked well for programs with multiple domains, a technically complex challenge with a lot of variables to consider.</p>
<h2>Until Next Time!</h2>
<p>Multicore <code>statmemprof</code> was developed at Tarides, and we are happy to have brought the tool into the OCaml 5 era. We invite you to use the memory profiler to analyse your own programs. Please provide feedback and raise any issues in the <a href="https://github.com/ocaml/ocaml/issues">OCaml repo</a> and on the <a href="https://discuss.ocaml.org">OCaml Discuss forum</a>. You can also <a href="/contact/">contact us</a> directly for support with your multicore code or to get advice on how to take advantage of multicore OCaml.</p>
<p>Curious about how we maintain and restore features to OCaml 5? Read more of our <a href="/blog/tag/multicore/">multicore</a> and <a href="/blog/tag/compiler/">compiler</a> blog posts, such as <a href="/blog/2024-09-11-feature-parity-series-compaction-is-back/">compaction</a>,  <a href="/blog/2024-06-19-keeping-up-with-the-compiler-how-we-help-maintain-the-ocaml-language/">compiler maintenance</a>, and <a href="/blog/2024-08-21-how-tsan-makes-ocaml-better-data-races-caught-and-fixed/">catching data races</a>.</p>
<p>Connect with Tarides online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects.</p>
<h4>Acknowledgements</h4>
<p>A huge thank you to Nick Barnes and Tim McGilchrist for their invaluable and extensive input on this post.</p>
]]></description><link>https://tarides.com/blog/2025-03-06-feature-parity-series-statmemprof-returns</link><guid isPermaLink="false">https://tarides.com/blog/2025-03-06-feature-parity-series-statmemprof-returns.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 06 Mar 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Full blown productivity in VSCode with OCaml]]></title><description><![CDATA[<p>Happy New Year, OCamlers! 🎉
As we usher in another year, we have something special to celebrate — a New Year's gift that promises to make your coding experience even better!
We have been working on exciting new features in VSCode designed to boost productivity, streamline workflows, and make your development journey smoother and more enjoyable.</p>
<p>For users of Emacs, we have a brand new <code>emacs</code> mode for interacting with the <code>lsp</code> server that will make your coding experience as enjoyable as it should be. Check out the Discuss announcement at <a href="https://discuss.ocaml.org/t/ann-release-of-ocaml-eglot-1-0-0/15978">Release of ocaml-eglot 1.0.0</a> and the project repository at <a href="https://github.com/tarides/ocaml-eglot">ocaml-eglot</a>.</p>
<p>Without further ado, let's "unwrap" 🎁 these features for your viewing pleasure.</p>
<h2>1. Type of Selection</h2>
<p>This feature enhances code comprehension by allowing you to grow or shrink the selection to view updated types at different levels of granularity. You can adjust the verbosity of the type information to suit your needs, providing either a concise or detailed view. This information can be accessed conveniently through the default hover pop-up or via a dedicated output panel, making it adaptable to your workflow.</p>
<h4>Command name: <code>Get the Type of the Selection</code></h4>
<ul>
<li>Command shortcut: <kbd>Alt</kbd> + <kbd>T</kbd></li>
<li>Grow Selection: continously press <kbd>Alt</kbd> + <kbd>T</kbd></li>
<li>Shrink Selection: <kbd>Alt</kbd> + <kbd>Shift</kbd> + <kbd>T</kbd></li>
<li>Add Verbosity: <kbd>Alt</kbd> + <kbd>V</kbd></li>
</ul>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/type_selection_1~nODG6ymRfc2mc4ZQtkJf2Q.gif" alt="Using Alt+T to get the type of Selection"></p>
<h4>Using a dedicated Output Panel</h4>
<p>In the settings/preferences of the ocaml platform extension, you can toggle an option to display the results of type selection in a dedicated output panel.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2024-12-20.vscode-client-imp/type_selection-170w~0mHCNot-22a78rMDi_48fg.webp 170w, /blog/images/2024-12-20.vscode-client-imp/type_selection-340w~zudMAcXKrK0cc0j-PXWGUw.webp 340w, /blog/images/2024-12-20.vscode-client-imp/type_selection-680w~8WrShq3--GO0vBCffhJEug.webp 680w, /blog/images/2024-12-20.vscode-client-imp/type_selection-1360w~Dcz5VUnRGC2KGPr3U6pfSQ.webp 1360w" src="/blog/images/2024-12-20.vscode-client-imp/type_selection-1360w~Dcz5VUnRGC2KGPr3U6pfSQ.webp" alt="Toggling the settings to use a dedicated output panel"></p>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/type_of_selection_3b~BtFhS-dRVmx4NPFyWjDUaA.gif" alt="Type Selection with results displayed in a dedicated output panel"></p>
<h2>2. Search by Type or Polarity</h2>
<p>Looking for functions or values that match a specific type?
The Search by Type/Polarity feature let's you input a type signature, e.g., <code>int -&gt; string</code> or a polarity <code>-int +string</code>, and then it fetches all matching functions and values across your project.</p>
<h4>Command name: <code>Search a value by type or polarity</code></h4>
<h4>Command shortcut: <kbd>Alt</kbd> + <kbd>F</kbd></h4>
<ul>
<li>
<p>Search by Type
<img src="/blog/images/2024-12-20.vscode-client-imp/search_type~s5VxdelITfYqJqahXF6RQQ.gif" alt="Searching a value by it's type"></p>
</li>
<li>
<p>Search by Polarity
<img src="/blog/images/2024-12-20.vscode-client-imp/search_polarity~02LYdhV2T5U9cI-9Z9-MmA.gif" alt="Searching a value by it's polarity"></p>
</li>
</ul>
<h2>3. Construct Typed Holes</h2>
<p>This feature let's you construct possible values for a given typed hole.</p>
<h4>Command name: <code>List values that can fill the selected typed-hole</code></h4>
<h4>Command shortcut: <kbd>Alt</kbd> + <kbd>C</kbd></h4>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/construct_1~ydYDYFSZlpuoII028UnGbA.gif" alt="Construct functionality to list values that can fill the selected typed-hole"></p>
<p>This feature also comes with a configurable option that allows it to construct values for the next typed hole automatically.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2024-12-20.vscode-client-imp/construct_toggle-170w~0pHX-bkvKZ7qM2Xjy97ijw.webp 170w, /blog/images/2024-12-20.vscode-client-imp/construct_toggle-340w~FdpeQlwrx335I9C08Lm45Q.webp 340w, /blog/images/2024-12-20.vscode-client-imp/construct_toggle-680w~R0X33zINWjQlLt0AHfwwaQ.webp 680w, /blog/images/2024-12-20.vscode-client-imp/construct_toggle-1360w~FGhUKQXnEJ6txwIPDG7h7g.webp 1360w" src="/blog/images/2024-12-20.vscode-client-imp/construct_toggle-1360w~FGhUKQXnEJ6txwIPDG7h7g.webp" alt="Setting to toggle construct to be conducted for the next typed hole automatically"></p>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/construct_2~Zkwc5CmvowlxRa5V5CyhcA.gif" alt="Performing construct with chaining turned on"></p>
<h2>4. Jump to a specific Target</h2>
<p>Traditional navigation, while it works, falls short when it comes to navigation in OCaml. This feature provides a seamless way to jump to specific targets which are closest to your cursor in the source code. For example, a large match construct and you could jump from one case to the next effortlesly.</p>
<h4>Command name: <code>List possible parent targets for jumping</code></h4>
<h4>Command shortcut: <kbd>Alt</kbd> + <kbd>J</kbd></h4>
<p>At this point, we support the following targets:</p>
<ul>
<li>Modules</li>
<li>Functions</li>
<li>Let statements</li>
<li>Match statements</li>
<li>Match cases (previous and next)</li>
</ul>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/jump_highlight_5~OrqDUTlXj0HmBEs1WgOrAQ.gif" alt="Jumping to a specific target"></p>
<h2>5. Navigate Typed Holes</h2>
<p>This feature let's you navigate to typed holes.</p>
<h4>Command name: <code>List typed holes in the file for navigation</code></h4>
<ul>
<li>As you move through the list with your arrow keys, the cursor jumps to the typed hole to give you a preview.
When you make a selection, the cursor stays there.</li>
</ul>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/navigate_hole_1~OuZBRz-KX1UKkuRFpC8rOA.gif" alt="Navigating to different typed holes"></p>
<ul>
<li>If you toggle the Navigate</li>
</ul>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2024-12-20.vscode-client-imp/construct_after_navigate_toggle-170w~lmUy6HxuVJA0mZeP_PwJ_g.webp 170w, /blog/images/2024-12-20.vscode-client-imp/construct_after_navigate_toggle-340w~J5RRJ0cxVlemPMQ-QXdZTA.webp 340w, /blog/images/2024-12-20.vscode-client-imp/construct_after_navigate_toggle-680w~GajjBOXyazwr4VsZf0w0cg.webp 680w, /blog/images/2024-12-20.vscode-client-imp/construct_after_navigate_toggle-1360w~JBmIXiZygSnYiNEDYnqapg.webp 1360w" src="/blog/images/2024-12-20.vscode-client-imp/construct_after_navigate_toggle-1360w~JBmIXiZygSnYiNEDYnqapg.webp" alt="Setting to automatically perform construct after jumping to a typed hole"></p>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/navigate_hole_2~Nda3owocXy-BCDPrSWwqxQ.gif" alt="Automatically performing construct after navigating to a typed hole"></p>
<ul>
<li>If you don't feel like jumping to a typed hole yet, just hit <kbd>Esc</kbd> and your cursor will portal back to it's original position.</li>
</ul>
<p><img src="/blog/images/2024-12-20.vscode-client-imp/navigate_hole_3~kwJYeOUQwp-Mp-leqMEQXA.gif" alt="Pressing the Escape key to stop operations and return back to the origin cursor position"></p>
<p>Hope you are excited to try out these new features. It is our wish that you have a much better and smoother experience while coding OCaml in VsCode.</p>
<p>Please feel free to open issues if you discover a problematic behaviour:</p>
<ul>
<li><a href="https://github.com/ocamllabs/vscode-ocaml-platform/issues">Issues for VSCode OCaml Platform Extension</a></li>
<li><a href="https://github.com/ocaml/ocaml-lsp/issues">Issues for OCaml LSP Server</a></li>
<li><a href="https://github.com/ocaml/merlin/issues">Issues for Merlin</a></li>
</ul>
<p>You can connect with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-02-28-full-blown-productivity-in-vscode-with-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2025-02-28-full-blown-productivity-in-vscode-with-ocaml.html</guid><dc:creator><![CDATA[ Pizie Dust ]]></dc:creator><pubDate>Fri, 28 Feb 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[The First Wasm_of_ocaml Release is Out!]]></title><description><![CDATA[<p>The <a href="https://opam.ocaml.org/packages/wasm_of_ocaml-compiler/">first feature-complete release of Wasm_of_OCaml</a> (also known as WSOO) is out! A low-level virtual machine and portable compilation target, Wasm is popular with many developers thanks to its flexibility and wide compatibility.</p>
<p>We introduced you to Wasm and the benefits of bringing support for it to OCaml in our <a href="/blog/2023-11-01-webassembly-support-for-ocaml-introducing-wasm-of-ocaml/">blog post on it in 2023</a>. Since then, Wasm_of_ocaml has undergone new developments, so let’s take a look at what’s new and give you an overview of the release.</p>
<h2>What is Wasm_of_ocaml?</h2>
<p>Let’s start with a quick recap. Wasm_of_ocaml is a fork of the popular <a href="https://github.com/ocsigen/js_of_ocaml">Js_of_ocaml</a> compiler, translating OCaml bytecode to <a href="https://webassembly.org/">WebAssembly</a>.  It is web-oriented and relies on a JavaScript environment, and is designed to be an alternative to Js_of_ocaml. Since WebAssembly provides a sandboxed environment and enforces memory safety it is well-suited for security-critical applications, such as blockchain applications and programs running in the cloud. We plan to target these environments in the future.</p>
<p>Wasm_of_ocaml builds on the WebAssembly garbage collection extension (WasmGC), which is available by default on Chrome, Safari, and Firefox. This design means we don’t need to reimplement a garbage collector, and - as an added benefit - gives us good interoperability with JavaScript. Js_of_ocaml translates OCaml bytecode to JavaScript and is a well-liked, industrial-strength compiler for running OCaml on the web. The goal of Wasm_of_ocaml’s development is to retain the strengths of Js_of_ocaml and offer feature parity and inter-compatibility between the two compilers. You can compile your programs with Wasm_of_ocaml instead of Js_of_ocaml (you may have to make a few adjustments) and experience overall better performance.</p>
<p>Because of its popularity and versatility, creating an OCaml to Wasm translator has been a big priority for the team, and they continue to improve and optimise Wasm_of_ocaml over time.</p>
<h2>What’s New?</h2>
<p>Over the past year, much work has been done to get Wasm_of_ocaml to release readiness. Some of the changes include:</p>
<ul>
<li><strong>Putting Wasm_of_ocaml into the same development repo as Js_of_ocaml</strong>: This was a natural step due to how much code the two tools share, considering Wasm_of_ocaml is a fork of Js_of_ocaml. However, the two have diverged since the former was forked away from Js_of_ocaml. To put them in the same repo required changes to bring them back in sync. This change was necessary for the first public release of Wasm_of_ocaml. These are just a subset of all the fixes and contributions, and you can check out the work <a href="https://github.com/ocsigen/js_of_ocaml/pull/1724">in the associated PR</a> for a more complete picture.</li>
<li><strong>Support for Wasm_of_ocaml in Dune</strong>: An important milestone on the road to the public release, this change allowed users to compile Wasm in Dune, making it much easier for existing OCaml projects to adopt the new tool. Wasm_of_ocaml support has been <a href="https://github.com/ocaml/dune/releases/tag/3.17.0">released in Dune 3.17.0</a>, which you can upgrade to if you haven’t already.</li>
<li><strong>Separate compilation</strong>: Support for separate compilation enables much faster compilation when building a program. There are two PRs: the <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/36">first PR brings the main update</a>, and the <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/43">second PR makes it more fine-grained</a> and avoids having to load too many modules.</li>
<li><strong>Sourcemap support</strong>: <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/27">PR #27</a> introduces support for source-level debugging of Wasm executables, implementing mapping between source and Wasm locations.</li>
<li><strong>Support the JS String Builtins Extension</strong>: <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/33">PR #33</a> change enables the use of JS string builtins when available for JS engines, which allows for more efficient operations on strings.</li>
<li><strong>Minimise the use of the unsafe JS command eval</strong>: The JS command <code>eval</code> is known for being unsafe, and <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/24">PR #24</a> creates an alternative workflow that minimises its use. Instead of using <code>eval</code>, strings can be emitted as external JavaScript fragments whenever the value of the string is known at compile time.</li>
<li><strong>Store long-lived top-level values into global variables</strong>: <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/30">PR #30</a> introduces a change where any variable that is used a number of instructions after being defined is stored as a global variable rather than a local variable. This change improves performance and reduces the compilation time of Wasm projects.</li>
<li><strong>Updates to make Wasm_of_ocaml compatible with OCaml 5.2 and 5.3</strong>: Two PRs, <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/54">#54</a> and <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/59">#59</a>, brought changes that made Wasm_of_ocaml compatible with OCaml: 5.2. For 5.3, <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/136">PR #136</a> included updates to make Wasm_of_ocaml compatible with the then latest OCaml update.</li>
<li><strong>Bugfixes</strong>: Let’s round off with some bug fixes! <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/22">PR #22</a> ensured that locals are always explicitly initialised before being used, <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/31">PR #31</a> fixed the spec-compliance of some emitted tuple instructions, and <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/pull/46">PR #46</a> fixed a stack resizing bug in structural value comparison.</li>
</ul>
<h2>Benchmarks and Performance</h2>
<p>The team has run several benchmarks with exciting results. When comparing the performance of Wasm_of_ocaml to Js_of_ocaml and the native code of OCaml’s compiler <code>ocamlopt</code>, the results consistently show that programs compiled with Wasm_of_ocaml are faster than ones compiled with Js_of_ocaml (but two times slower than native code). This holds true not only for microbenchmarks but on macroscopic benchmarks as well. Even more impressively, Jane Street reports that they have observed 2x-8x performance improvements using Wasm_of_ocaml compared to Js_of_ocaml.</p>
<p>Another aspect of performance lies in casts and bound checks. Wasm_of_ocaml uses a generic representation of values, which means that, at run time, a number of casts might be required to ensure safety. Furthermore, due to the nature of the data representation the team has chosen, a bound check is required whenever a field of value is accessed. The team found that the Wasm_of_ocaml checks take up around 10% of the execution overtime on the V8 engine and 20% on the Bonsai benchmark. The goal is to keep improving performance by reducing the amount of needless casts.</p>
<p>Regarding file size, Wasm_of_ocaml output code takes up more space than Js_of_ocaml, which is likely due to Wasm being a lower-level language than JavaScript. For example, Wasm_of_ocaml has to generate explicit code to allocate closures and access the environment, both implicit in JavaScript.</p>
<p>If you’re curious to learn more about Wasm_of_ocaml’s benchmarks and performance, Jérôme Vouillon’s <a href="https://www.youtube.com/live/KLWiEf3x3kc?t=26981s">talk from the ML track at ICFP 2024</a> goes more in-depth.</p>
<h2>Release process, Plus a New Version of <code>js_of_ocaml</code></h2>
<p>From now on, wasm_of_ocaml and js_of_ocaml will be released jointly. For this reason, this first public release of wasm_of_ocaml is numbered 6.0.1 since it is synchronised with the release of js_of_ocaml 6.0.1.</p>
<p>A new and important feature of js_of_ocaml 6.0.0 is <em>double translation</em>, a way of making programs that use effect handlers faster. Effect handler support is realised by compiling some functions to Javascript code in continuation passing-style (CPS), which incurs a performance penalty. By passing <code>--effects=double-translation</code>, some functions are compiled in several versions, and the choice of which version of the function to run is made at run time. This improves performance at the cost of slightly larger Javascript bundles. More details are available on <a href="https://ocsigen.org/js_of_ocaml/latest/manual/effects">the effect handlers page</a> of the js_of_ocaml manual.</p>
<h2>Until Next Time</h2>
<p>If you want to try Wasm_of_ocaml yourself, start by checking out the documentation for it in <a href="https://dune.readthedocs.io/en/stable/wasmoo.html">Dune</a> and in the <a href="https://ocsigen.org/js_of_ocaml/dev/manual/wasm_overview">manual</a>. If you and have any feedback or questions, the best way to get in touch is on <a href="https://discuss.ocaml.org">Discuss</a> or <a href="https://github.com/ocsigen/js_of_ocaml">in the repo</a> for Js_of_ocaml and Wasm_of_ocaml.</p>
<p>Connect with Tarides online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2025-02-19-the-first-wasm-of-ocaml-release-is-out</link><guid isPermaLink="false">https://tarides.com/blog/2025-02-19-the-first-wasm-of-ocaml-release-is-out.html</guid><dc:creator><![CDATA[ Olivier Nicole, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 19 Feb 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS on OCaml 5!]]></title><description><![CDATA[<p>OCaml 5 brought significant changes to fundamental parts of the language – notably concurrency using effects and multithreaded parallelism. This has caused some features and tools compatible with OCaml 4.14 to be incompatible with the new update, and several projects at Tarides aim to restore compatibility where that is the case. In today’s post, we will focus on the efforts toward creating a <a href="https://mirage.io/">MirageOS</a> port for OCaml 5.</p>
<p>The main benefit of making MirageOS compatible with OCaml 5 is to make it possible to explore how to best take advantage of features unique to the latest version of the language. An <a href="https://github.com/TheLortex/eio-solo5">early proof-of-concept</a> thus experimented with replacing Lwt with the new concurrency library <a href="https://github.com/ocaml-multicore/eio">Eio</a>. This is just one example, but it illustrates why bringing OCaml 5 support to MirageOS is a high priority for our team.</p>
<p>To port MirageOS to OCaml 5 we have been contributing to the port using <a href="https://github.com/Solo5/solo5">Solo5</a> backends but we are also exploring the possibility to use <a href="https://unikraft.org/">Unikraft</a> backends. The Solo5 port has been released recently, with the versions 1.x of the <code>ocaml-solo5</code> package on opam.  As a complementary effort, work has also been ongoing to make cross-compilation easier in general. This will benefit not just the MirageOS project but also any cross-compilation projects for OCaml in the future.  Let’s dive into the updates!</p>
<h2>The Solo5 Backend</h2>
<p>The first goal was to restore support for a backend that was available to OCaml 4.14 MirageOS users: <a href="https://github.com/Solo5/solo5">Solo5</a>. Solo5 is popular with developers building MirageOS unikernels, and since it is currently the only fully supported base for building a freestanding Mirage application, getting Solo5 support for OCaml 5 is a big step in the right direction for OCaml users.</p>
<p>The PRs that introduce OCaml 5.2.1 support via the Solo5 API is the fruit of the collective effort of many developers, including, in no particular order:  <a href="https://github.com/palainp">Pierre Alain</a>, <a href="https://github.com/fabbing">Fabrice Buoro</a>, <a href="https://github.com/dinosaure">Romain Calascibetta</a>, <a href="https://github.com/haesbaert">Christiano Haesbert</a>, <a href="https://github.com/shym">Samuel Hym</a>, <a href="https://github.com/kit-ty-kate">@kit-ty-kate</a>, <a href="https://github.com/hannesm">Hannes Mehnert</a>, and many more helpful eyes.</p>
<p>So, what is Solo5? Mirage applications can run in a hypervisor, such as KVM or Xen, without a full OS, and Solo5 provides minimal services (such as telling the time, reading or writing a block on the disk or network, etc.) for running an application there. OCaml-Solo5 adds the extra libraries required to build the OCaml runtime on top of Solo5.</p>
<h3>What has Changed?</h3>
<p>As you can imagine, the significant features introduced in OCaml 5 depend on correspondingly large changes to its underlying design. These include assumptions of how the OS works, how the C compiler works, and how the build system is set up. All of these modifications have a direct impact on what OCaml-Solo5 must provide to build the OCaml runtime for MirageOS unikernels.</p>
<p>If you prefer to dive right into the code changes, I recommend that you <a href="https://github.com/mirage/ocaml-solo5/pull/134">visit the PR</a> directly. Otherwise, let’s take a look at what’s new!</p>
<p><strong><code>Nolibc</code> Extensions</strong>
The usage of (OS) threads has changed, as well as how memory is managed (relying in particular on <code>mmap</code>/<code>munmap</code>), some C features such as thread-local storage and C11 atomics are now required. Support for all of these must be added in a freestanding setting such as MirageOS, even if Solo5 remains monocore. Therefore, to make <code>OCaml-Solo5</code> compatible with the latest release, developers have amended the <code>nolibc</code> library and the way the C compiler is invoked. The modifications come in the form of extensions to <code>nolibc</code>, and most of the <code>nolibc</code> extensions included in the PR are inherited from previous PRs by <a href="https://github.com/kit-ty-kate">@kit-ty-kate</a>, <a href="https://github.com/dinosaure">Romain Calascibetta</a>, and <a href="https://github.com/palainp">Pierre Alain</a>, including changes to <code>pthread</code>, <code>mmap</code> and <code>TLS</code>.</p>
<p><strong>Build System</strong>
To address the build system changes, the PR applies version-specific patches to sources when fetched. This replaces the previous method of modifying the OCaml build system with <code>sed</code>s and <code>echo</code>s. The reasoning behind this change is twofold: Firstly, all bar one of the patches have been designed to improve how the compiler build system supports cross-compilation, simplifying the maintenance of the OCaml and Solo5 compatibility for MirageOS. Secondly, making the modification system into separate patches with full explanation messages makes reviewing them easier and clarifies to the user which modified build system they rely on. A neat bonus of this restructuring is that the <code>.opt</code> versions of the compiler are now also built, which should result in better performance, particularly when building large unikernels.</p>
<p><strong>Toolchain</strong>
The update also means that the <code>ocaml-solo5</code> package now installs a new <code>{aarch64,x86_64}-solo5-ocaml-*</code> toolchain. Creating a toolchain avoids baking the build-time directories containing <code>nolibc</code> and <code>openlibm</code> into the generated OCaml compilers. The package generates two versions of the toolchain: one with built-time directories, that is added to <code>PATH</code> only when the compiler builds, and the other with the final destination directories, installed in the <code>bin</code> directory by opam.</p>
<p>Beside adding compatibility with OCaml&nbsp;5 for Solo5, we are also exploring alternative options. In particular, we are working on adding Unikraft support to test whether it can provide better performance.</p>
<h2>The Unikraft Backend</h2>
<p><a href="https://unikraft.org/">Unikraft</a> is a Unikernel Development Kit that lets users create custom unikernels with a large support for standard APIs to help port applications. It is an open-source project maintained and supported by over fifty active contributors. The main benefit of adding support for a Unikraft backend to MirageOS is to improve I/O performance in comparison to the Solo5 API.</p>
<p>Because of these potential benefits, adding the Unikraft backend is a high priority. Currently, there are several repositories being worked on, the most mature of which is <a href="https://github.com/shym/ocaml-unikraft"><code>ocaml-unikraft</code></a>. The project is still in an exploratory phase, and more updates will follow when we have more to share.</p>
<h2>Making the Build of OCaml Cross Compilers Easier</h2>
<p>In addition to new backends, part of improving the user experience with OCaml 5 has focused on improving the compiler’s build system, in particular regarding cross compilers, which is helpful for MirageOS users since OCaml-Solo5 is really a cross compiler to the Solo5 target.</p>
<p>The first step to streamlining the compiler’s build system involved reducing the number of <code>Makefile</code>s down to one, the root <code>Makefile</code>. By bringing all the build logic into one place and avoiding duplication and stacking dependencies, the compiler’s build system is more consistent and easier to use. The effort is split into many PRs, both big and small, including <a href="https://github.com/ocaml/ocaml/pull/11243">#11243</a>, <a href="https://github.com/ocaml/ocaml/pull/11248">#11248</a>, <a href="https://github.com/ocaml/ocaml/pull/11268">#11268</a>, <a href="https://github.com/ocaml/ocaml/pull/11420">#11420</a>, <a href="https://github.com/ocaml/ocaml/pull/11675">#11675</a>.</p>
<p>In addition to reducing the number of <code>Makefile</code>s, the build system improvements also involved improving <code>ocamldep</code>. It needed to be able to distinguish between source vs build trees and have support for <code>lex</code> and <code>yacc</code> input files. The effort also included breaking the <code>dynlink</code> library’s dependency on <code>compilerlibs</code> to make the build system simpler and faster (<a href="https://github.com/ocaml/ocaml/pull/11996">#11996</a>).</p>
<p>Continuing on this work, several PRs have brought more fixes and improvements to cross-compilation, with more on the way. Currently, the largest upstreamed PRs are <a href="https://github.com/ocaml/ocaml/pull/13281">#13281</a>, <a href="https://github.com/ocaml/ocaml/pull/13282">#13282</a>, <a href="https://github.com/ocaml/ocaml/pull/13526">#13526</a>, and <a href="https://github.com/ocaml/ocaml/pull/13312">#13312</a>. If you use the new OCaml 5 capabilities in Mirage with either the Solo5 or Unikraft, you can take advantage of the simplified build systems and cross-compilation.</p>
<h2>Try it Yourself and Stay in Touch</h2>
<p><code>OCaml-Solo5</code> is released and you can get started simply by building MirageOS in a 5.2.1 switch, with no pinning involved. Samuel wrote a quick guide on how to get started with <code>ocaml-solo5</code> in a <a href="https://discuss.ocaml.org/t/mirageos-on-ocaml-5/15822">Discuss post</a>, and we recommend you give it a try. For Unikraft, keep an eye out for updates as work continues behind-the-scenes.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-02-06-mirageos-on-ocaml-5</link><guid isPermaLink="false">https://tarides.com/blog/2025-02-06-mirageos-on-ocaml-5.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 06 Feb 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides: 2024 in Review]]></title><description><![CDATA[<p>At <a href="/">Tarides</a>, we believe in making OCaml a
mainstream programming language by improving its tooling and
integration with other successful ecosystems. In 2024, we focused our
efforts on initiatives to advance this vision by addressing key
technical challenges and engaging with the community to build a
stronger foundation for OCaml’s growth. This report details our work,
the rationale behind our choices, and the impact achieved. We are very
interested in getting your feedback: <a href="/contact/">please get in
touch</a> (or respond to the
<a href="https://discuss.ocaml.org/t/tarides-2024-in-review/15990">Discuss thread</a>)
if you believe we are going in the right direction.</p>
<p><em>TL;DR – In 2024, Tarides focused on removing adoption friction with
better documentation and tools; and on improving adoption via the
integration with three key thriving ecosystems: multicore programming,
web development, and Windows support. Updates to
<a href="http://ocaml.org">ocaml.org</a> improved onboarding and documentation,
while the <a href="https://preview.dune.build/">Dune Developer Preview</a>
simplified workflows with integrated package management. Merlin added
support for <a href="/blog/2024-08-28-project-wide-occurrences-a-new-navigation-feature-for-ocaml-5-2-users/">project-wide reference
support</a>
and <a href="https://discuss.ocaml.org/t/odoc-3-0-planning/14360">odoc 3</a>,
which is about to be released. OCaml 5.3 marked the first stable
multicore release, and <code>js_of_ocaml</code> achieved up to 8x performance
boosts in real-world commercial applications thanks to added support
for WebAssembly. On Windows, opam 2.2 brought full compatibility and
CI testing to all Tier 1 platforms on <code>opam-repository</code>, slowly moving
community packages towards reliable and better support for
Windows. Tarides’ community efforts included facilitating the creation of the first <a href="https://fun-ocaml.com/">FUN
OCaml conference</a> by both sponsoring and contributing resources to organise it, hosting many local meetups, and two
rounds of Outreachy internships.</em></p>
<h2>Better Tools: Toward a 1-Click Installation of OCaml</h2>
<p>Our primary effort in 2024 was to continue delivering on the <a href="https://ocaml.org/tools/platform-roadmap">OCaml
Platform roadmap</a> published
last year.  We focused on making it easier to get started with OCaml
by removing friction in the installation and onboarding process. Our
priorities were guided by the latest <a href="https://discuss.ocaml.org/t/ann-ocaml-user-survey-2023/13469">OCSF User
Survey</a>,
direct user interviews, and
<a href="https://discuss.ocaml.org/tag/user-feedback">feedback</a> gathered from
the OCaml community. Updates from Tarides and other OCaml Platform
maintainers were regularly shared in the <a href="https://discuss.ocaml.org/tag/platform-newsletter">OCaml Platform
Newsletter</a>.</p>
<h3>OCaml.org</h3>
<p>OCaml.org is the main entry point for new users of OCaml. Tarides
engineers are key members of the OCaml.org team. Using
<a href="https://plausible.ci.dev/ocaml.org">privacy-preserving analytics</a>,
the team tracked visitor behaviour to identify key areas for
improvement. This led to a redesign of the <a href="https://ocaml.org/install">installation
page</a>, simplifying the setup process, and a
revamp of the <a href="https://ocaml.org/docs/tour-of-ocaml">guided tour of
OCaml</a> to better introduce the
language. Both pages saw significant traffic increases compared to
2023, with the installation page recording 69k visits, the tour
reaching 65k visits and a very encouraging total number of visits
increasing by +33% between Q3 and Q4 2024</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/average-monthly-visits-170w~rWkwpgfVt8kpp1wxcxjlSQ.webp 170w, /blog/images/average-monthly-visits-340w~vuPLxIzmE3UZnKWYuRVFpA.webp 340w, /blog/images/average-monthly-visits-680w~VZYDziJj9q4nYy-zZzJkRw.webp 680w, /blog/images/average-monthly-visits-1360w~QEQSqhe66fLVpZwWodcg1Q.webp 1360w" src="/blog/images/average-monthly-visits-1360w~QEQSqhe66fLVpZwWodcg1Q.webp" alt="Average Monthly Visits"></p>
<p>Efforts to improve user experience included a satisfaction survey
where 75% of respondents rated their experience positively, compared
to 17% for the previous version of the site. User testing sessions
with 21 participants provided further actionable insights, and these
findings informed updates to the platform. The redesign of OCaml.org
community sections was completed using this feedback. It introduced
several new features: a new <a href="https://ocaml.org/community">Community landing
page</a>, an <a href="https://ocaml.org/academic-users">academic institutions
page</a> with course listings, and an
<a href="https://ocaml.org/industrial-users">industrial users showcase</a>. The
team also implemented an automated <a href="https://ocaml.org/events">event
announcement</a> system to inform the community
of ongoing activities.</p>
<p>Progress and updates were regularly shared through the <a href="https://discuss.ocaml.org/tag/ocamlorg-newsletter">OCaml.org
newsletters</a>,
keeping the community informed about developments. Looking ahead, the
team will continue refining the platform by addressing feedback,
expanding resources, and monitoring impact through analytics to
support both new and experienced OCaml users. Lastly, the
infrastructure they build is starting to be used by other communities:
<a href="https://rocq-prover.org/">Rocq</a> just announced their brand new
website, built using the same codebase as ocaml.org!</p>
<h3>Dune as the Default Frontend of the OCaml Platform</h3>
<p>One of the main goals of the OCaml Platform is to make it easier for
users—especially newcomers—to adopt OCaml and build projects with
minimal friction. A critical step toward this goal is having a single
CLI to serve as the frontend for the entire OCaml development
experience (codenamed
<a href="https://speakerdeck.com/avsm/ocaml-platform-2017?slide=34">Bob</a> in
the past). This year, we made significant progress in that direction
with the release of the <a href="https://preview.dune.build/">Dune Developer
Preview</a>.</p>
<p>Setting up an OCaml project currently requires multiple tools: <code>opam</code>
for package management, <code>dune</code> for builds, and additional
installations for tools like OCamlFormat or Odoc. While powerful, this
fragmented workflow can make onboarding daunting for new users. The
Dune Developer Preview consolidates these steps under a single CLI,
making OCaml more approachable. With this preview, setting up and
building a project is as simple as:</p>
<ol>
<li><code>dune pkg lock</code> to lock the dependencies.</li>
<li><code>dune build</code> to fetch the dependencies and compile the project.</li>
</ol>
<p>This effort is also driving broader ecosystem improvements. The
current OCaml compiler relies on fixed installation paths, making it
difficult to cache and reuse across environments, so it cannot be
shared efficiently between projects. To address this, we are working
on making the compiler relocatable (<a href="https://hackmd.io/@dra27/ry56XtKii">ongoing
work</a>). This change will enable
compiler caching, which means faster project startup times and fewer
rebuilds in CI. As part of this effort, we also
<a href="https://github.com/ocaml-dune/opam-overlays/tree/main/packages">maintain</a>
patches to core OCaml projects to make them relocatable – and we
worked with upstream to merge (like <a href="https://github.com/ocaml/ocamlfind/pull/72">for
ocamlfind</a>). Tarides
engineers also continued to maintain Dune and other key Platform
projects, ensuring stability and progress. This included organising
and participating in regular development meetings (for
<a href="https://discuss.ocaml.org/tag/dev-meetings">Dune</a>,
<a href="https://github.com/ocaml/opam/wiki/2024-Developer-Meetings">opam</a>,
<a href="https://github.com/ocaml/merlin/wiki/Public-dev%E2%80%90meetings">Merlin</a>,
<a href="https://github.com/ocaml-ppx/ppxlib/wiki#dev-meetings">ppxlib</a>, etc.)
to prioritise community needs and align efforts across tools like Dune
and opam to avoid overlapping functionality.</p>
<p>The Dune Developer Preview is an iterative experiment. Early user
feedback has been promising (the Preview’s NPS went from +9 in Q3
2024 to +27 in Q4 2024), and future updates will refine the
experience further. We aim to ensure that experimental features in the
Preview are upstreamed into stable releases once thoroughly
tested. For instance, the package management feature is already in
Dune 3.17. We will announce and document it more widely when we believe
it is mature enough for broader adoption.</p>
<h3>Editors</h3>
<p>In 2024, Tarides focused on improving editor integration to lower
barriers for new OCaml developers and enhance the experience for
existing users. Editors are the primary way developers interact with
programming languages, making seamless integration essential for
adoption. With more than <a href="https://survey.stackoverflow.co/2024/technology#1-integrated-development-environment">73% of developers using Visual Studio Code
(VS
Code)</a>,
VS Code is particularly important to support, especially for new
developers and those transitioning to OCaml. As part of this effort,
Tarides wrote and maintained the <a href="https://marketplace.visualstudio.com/items?itemName=ocamllabs.ocaml-platform">official VS Code plugin for
OCaml,</a>
prioritising feature development for this editor. We also support
other popular editors like Emacs and Vim—used by many Tarides
engineers—on a best-effort basis. Improvements to
<a href="https://github.com/ocaml/ocaml-lsp">OCaml-LSP</a> and
<a href="https://github.com/ocaml/merlin">Merlin</a>, both maintained by Tarides,
benefit all supported editors, ensuring a consistent and productive
development experience.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/total-vscode-plugin-installations-170w~Gj2LhiFKGPb93gxsonkSmA.webp 170w, /blog/images/total-vscode-plugin-installations-340w~2yRY5sTJFJ4_cqCg8xzuEg.webp 340w, /blog/images/total-vscode-plugin-installations-680w~z3ZaCVsrZgVBhLjuYefEKA.webp 680w, /blog/images/total-vscode-plugin-installations-1360w~zl9hCa9ruBd7ugqHXK6h3Q.webp 1360w" src="/blog/images/total-vscode-plugin-installations-1360w~zl9hCa9ruBd7ugqHXK6h3Q.webp" alt="Total VSCode Plugin Installation"></p>
<p>While several plugins for OCaml exist (<a href="https://marketplace.visualstudio.com/items?itemName=freebroccolo.reasonml">OCaml and Reason
IDE</a>–128k
installs,
<a href="https://marketplace.visualstudio.com/items?itemName=hackwaly.ocaml">Hackwaly</a>–90k
installs), our <a href="https://marketplace.visualstudio.com/items?itemName=ocamllabs.ocaml-platform">OCaml VS Code
plugin</a>
–now with over 208k downloads– is a key entry point for developers
adopting OCaml in 2024. This year, we added integration with the Dune
Developer Preview, allowing users to leverage Dune's package
management and tooling directly from the editor. Features such as
real-time diagnostics, autocompletion, and the ability to fetch
dependencies and build projects without leaving VS Code simplify
development and make OCaml more accessible for newcomers.</p>
<p>The standout update in 2024 was the addition of <a href="/blog/2024-08-28-project-wide-occurrences-a-new-navigation-feature-for-ocaml-5-2-users/">project-wide
reference
support</a>,
a long-requested feature from the OCaml community and a top priority
for commercial developers. This feature allows users to locate all
occurrences of a term across an entire codebase, making navigation and
refactoring significantly easier—especially in large
projects. Delivering this feature required coordinated updates across
the ecosystem, including changes to the OCaml compiler, Merlin, OCaml
LSP, Dune, and related tools. The impact is clear: faster navigation,
reduced cognitive overhead, and more efficient workflows when working
with complex projects.</p>
<p>Additional improvements included support for new Language Server
Protocol features, such as <code>signature_help</code> and <code>inlay_hint</code>, which
enhance code readability and provide more contextual
information. These updates enabled the introduction of new commands,
such as the "Destruct" command. This <a href="/blog/2024-05-29-effective-ml-through-merlin-s-destruct-command/">little-known but powerful
feature</a>
automatically expands a variable into a pattern-matching expression
corresponding to its inferred type, streamlining tasks that would
otherwise be tedious.</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-1~kHA8_iC67tU-2us0hsjbhQ.gif" alt="Destruct on expression">
</p>
<h3>Documentation</h3>
<p>Documentation was identified as the number one pain point in the
latest <a href="https://discuss.ocaml.org/t/ann-ocaml-user-survey-2023/13469">OCSF
survey</a>. It
is a critical step in the OCaml developer journey, particularly after
setting up the language and editor. Tarides prioritised improving
<code>odoc</code> to make it easier for developers to find information, learn the
language, and navigate the ecosystem effectively. High-quality
documentation and tools to help developers get "unstuck" are essential
to reducing friction and ensuring a smooth adoption experience.</p>
<p>Tarides is the primary contributor and maintainer of
<a href="https://github.com/ocaml/odoc"><code>odoc</code></a>, OCaml’s main documentation
tool. In preparation for the <a href="https://discuss.ocaml.org/t/odoc-3-0-planning/14360">odoc 3
release</a>, our
team introduced two significant updates. First, the <a href="/blog/2024-02-28-two-major-improvements-in-odoc-introducing-search-engine-integration/"><code>odoc</code> Search
Engine</a>
was integrated, allowing developers to search directly within OCaml
documentation via the <a href="https://ocaml.org/docs">Learn page</a>. Second,
the <a href="/blog/2024-09-17-introducing-the-odoc-cheatsheet-your-handy-guide-to-ocaml-documentation/"><code>odoc</code>
Cheatsheet</a>
provides a concise reference for creating and consuming OCaml
documentation. We would like to believe that these updates, deployed
on ocaml.org, were the main cause of a <strong>45% increase in package
documentation usage</strong> on
<a href="https://ocaml.org/pkg/">https://ocaml.org/pkg/</a> in Q4 2024!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/discussions-and-documentation-170w~tuRRmxDfTszCAQgbxrGAEQ.webp 170w, /blog/images/discussions-and-documentation-340w~61eG6BHNSEW-LQKNrjrA1w.webp 340w, /blog/images/discussions-and-documentation-680w~CnFo9MN0V1JT_kcPKd9DEQ.webp 680w, /blog/images/discussions-and-documentation-1360w~VS_UyGh1l5BVh7iKz3avZw.webp 1360w" src="/blog/images/discussions-and-documentation-1360w~VS_UyGh1l5BVh7iKz3avZw.webp" alt="Discussions and Documentations"></p>
<p>Another area where developers often get stuck is debugging programs
that don’t work as expected. Alongside reading documentation, live
debuggers are crucial for understanding program issues. Tarides worked
to improve native debugging for OCaml, focusing on macOS, where LLDB
is the only supported debugger. Key progress included a <a href="https://github.com/ocaml/ocull/pull/13050">name mangling
fix</a> to improve symbol
resolution, restoring ARM64 backtraces, and introducing Python shims
for code sharing between LLDB and GDB.</p>
<p>OCaml’s error messages remain a common pain point, particularly for
syntax errors. Unlike <a href="https://doc.rust-lang.org/error_codes/error-index.html">Rust’s error
index</a>, OCaml
does not (yet!) have a centralised repository of error
explanations. Instead, we are focused on making error messages more
self-explanatory. This requires developing new tools, such as
<a href="https://github.com/let-def/lrgrep"><code>lrgrep</code></a>, a domain-specific
language for analysing grammars built with Menhir. <code>lrgrep</code> enables
concise definitions of error cases, making it possible to identify and
address specific patterns in the parser more effectively. This
provides a practical way to improve error messages without requiring
changes to the compiler. In December 2024, @let-def successfully
defended his PhD (a collaboration between Inria and Tarides) on this
topic, so expect upstreaming work to start soon.</p>
<h3>OCaml Package Ecosystem</h3>
<p>The last piece of friction we aimed to remove in 2024 was ensuring
that users wouldn’t encounter errors when installing a package from
the community. This required catching issues early—before packages are
accepted into <code>opam-repository</code> and made available to the broader
ecosystem. To achieve this, Tarides has built and maintained extensive
CI infrastructure, developed tools to empower contributors, and guided
package authors to uphold the high quality of the OCaml package
ecosystem.</p>
<p>In 2024, Tarides’ CI infrastructure supported the OCaml community at
scale, handling approximately <strong>20 million jobs on 68 machines
covering 5 hardware architectures</strong>. This infrastructure continuously
tested packages to ensure compatibility across a variety of platforms
and configurations, including OCaml’s Tier 1 platforms: x86, ARM,
RISC-V, s390x, and Power. It played a critical role during major
events, such as new OCaml releases, by validating the ecosystem’s
readiness and catching regressions before they impacted
users. Additionally, this infrastructure supported daily submissions
to <code>opam-repository</code>, enabling contributors to identify and resolve
issues early, reducing downstream problems. To improve transparency
and accessibility, we introduced a CI pipeline that automates
configuration updates, ensuring seamless deployments and allowing
external contributors to propose and apply changes independently.</p>
<p>In addition to maintaining the infrastructure, Tarides developed and
maintained the CI framework running on top of it. A major focus in
2024 was making CI checks available as standalone CLI tools
distributed via <code>opam</code>. These tools enable package authors to run
checks locally, empowering them to catch issues before submitting
their packages to <code>opam-repository</code>. This approach reduces reliance on
central infrastructure and allows developers to work more
efficiently. The CLI tools are also compatible with GitHub Actions,
allowing contributors to integrate tests into their own workflows. To
complement these efforts, we enhanced <code>opam-repo-ci</code>, which remains an
essential safety net for packages entering the repository. Integration
tests for linting and reverse dependencies were introduced, enabling
more robust regression detection and improving the reliability of the
ecosystem.</p>
<p>To uphold the high standards of the OCaml ecosystem, every package
submission to <code>opam-repository</code> is reviewed and validated to ensure it
meets quality criteria. This gatekeeping process minimises errors
users might encounter when installing community packages, enhancing
trust in the ecosystem. In 2024, Tarides continued to be actively
<a href="https://github.com/ocaml/opam-repository/blob/master/governance/README.md#maintenance">involved</a>
in maintaining the repository, ensuring its smooth operation. We also
worked to guide new package authors by updating the <a href="https://github.com/ocaml/opam-repository/blob/master/CONTRIBUTING.md">contributing
guide</a>
and creating a detailed
<a href="https://github.com/ocaml/opam-repository/wiki">wiki</a> with actionable
instructions for adding and maintaining packages. These resources were
<a href="https://discuss.ocaml.org/t/opam-repository-updated-documentation-retirement-and-call-for-maintainers/14325">announced on
Discuss</a>
to reach the community and simplify the process for new contributors,
improving the overall quality of submissions.</p>
<h2>Playing Better with the Larger Ecosystem</h2>
<h3>Concurrent &amp; Parallel Programming in OCaml</h3>
<div class="text-center text-sm">
<em>"Shared-memory multiprocessors have never really 'taken off', at
least in the general public. For large parallel computations, clusters
(distributed-memory systems) are the norm. For desktop use,
monoprocessors are plenty fast."</em></div>
<div class="text-right text-xs mt-2">
  —
<a href="https://sympa.inria.fr/sympa/arc/caml-list/2002-11/msg00274.html">
    Xavier Leroy, November 2002
</a></div>
<p>Twenty+ years after this statement, processors are multicore by
default, and OCaml has adapted to this reality. Thanks to the combined
efforts of the OCaml Labs and Tarides team, the OCaml 5.x series
introduced multicore support after <a href="/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-life/">a decade of research and
experimentation.</a>
While this was a landmark achievement, the path to making multicore
OCaml stable, performant, and user-friendly has required significant
collaboration and continued work. In 2024, Tarides remained focused on
meeting the needs of the broader community and commercial users.</p>
<p>OCaml 5.3 (released last week) was an important milestone in this
journey. With companies such as <a href="https://routine.co/">Routine</a>,
<a href="https://hyper.systems">Hyper</a>, and
<a href="/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin/">Asemio</a>
adopting OCaml 5.x, and advanced experimentation ongoing at Jane
Street, Tezos, Semgrep, and others, OCaml 5.3 is increasingly seen as
the first “stable” release of the multicore series. While some
<a href="https://github.com/ocaml/ocaml/issues/13733">performance issues</a>
remain in specific parts of the runtime, we are working closely with
the community to address them in OCaml 5.4. Tarides contributed
extensively to the
<a href="/blog/2024-05-15-the-ocaml-5-2-release-features-and-fixes/">5.2</a>
and
<a href="/blog/2025-01-09-ocaml-5-3-features-and-fixes/">5.3</a>
releases by directly contributing to <strong>nearly two-thirds of the merged
pull requests</strong>. Since Multicore OCaml was incorporated upstream in
2023, we have been continuously involved in the compiler and language
evolution in collaboration with Inria and the broader OCaml ecosystem.</p>
<p>Developing correct concurrent and parallel software is inherently
challenging, and this applies as much to the runtime as to
applications built on it. In 2024, we focused on advanced testing
tools to help identify and address subtle issues in OCaml’s runtime
and libraries. The <a href="https://github.com/ocaml-multicore/multicoretests">property-based test
suite</a> reached
maturity this year, uncovering over 40 critical issues, with 28
resolved by Tarides engineers. Trusted to detect subtle bugs, such as
<a href="https://github.com/ocaml/ocaml/pull/13580#issuecomment-2478454501">issues with orphaned
ephemerons</a>,
the suite has become an integral part of OCaml’s development
workflow. Importantly, it is accessible to contributors without deep
expertise in multicore programming, ensuring any changes in the
compiler or the runtime do not introduce subtle concurrency bugs.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/false-alarms-plot-errors-only-170w~5YZPBXgUoTrAIc0KQle6iw.webp 170w, /blog/images/false-alarms-plot-errors-only-340w~5HTdqWao21Ru8BWhPQSXJA.webp 340w, /blog/images/false-alarms-plot-errors-only-680w~ulyPw_CnsHR2OYtym2eM_A.webp 680w, /blog/images/false-alarms-plot-errors-only-1360w~wOpCubYg66VDTbHXaZMcZw.webp 1360w" src="/blog/images/false-alarms-plot-errors-only-1360w~wOpCubYg66VDTbHXaZMcZw.webp" alt="A stacked histogram illustrating the outcome of CI workflow runs split, focusing only on the 'ci', 'genuine', and 'other' error categories"></p>
<p>Another critical effort was extending ThreadSanitizer (TSAN) support
to most Tier 1 platforms and <a href="/blog/2024-08-21-how-tsan-makes-ocaml-better-data-races-caught-and-fixed/">applying it extensively to find and fix
data races in the
runtime</a>. This
work has improved the safety and reliability of OCaml’s multicore
features and is now part of the standard testing process, further
ensuring the robustness of the runtime.</p>
<p>Beyond testing, we also worked to enhance library support for
multicore programming. The release of the <a href="/blog/2024-12-11-saturn-1-0-data-structures-for-ocaml-multicore/">Saturn
library</a>
introduced lock-free data structures tailored for OCaml 5.x. To
validate these structures, we developed
<a href="/blog/2024-04-10-multicore-testing-tools-dscheck-pt-2/">DSCheck</a>,
a static analyser for verifying lock-free algorithms. These tools,
along with Saturn itself, provide developers with reliable building
blocks for scalable multicore applications.</p>
<p>Another promising development in 2024 was the introduction of the
<a href="https://ocaml-multicore.github.io/picos/doc/picos/index.html">Picos</a>
framework. Picos aims to provide a low-level foundation for
concurrency, simplifying interoperability between libraries like Eio,
Moonpool, Miou, Riot, Affect, etc. Picos offers a simple,
unopinionated, and safe abstraction layer for concurrency. We believe
it can potentially standardise concurrency patterns in OCaml, but we
are not there yet. Discussions are underway to integrate parts of
Picos into higher-level libraries and, eventually, the standard
library. We still have a long way to go, and getting feedback from
people who actively tried it in production settings would be very
helpful!</p>
<h3>Web</h3>
<p>Web development remains one of the most visible and impactful domains
for programming languages; <a href="https://survey.stackoverflow.co/2024/technology#most-popular-technologies-language">JavaScript, HTML, and CSS are the most
popular
technologies</a>
in 2024. For OCaml to grow, it must integrate well with this
ecosystem. Fortunately, the OCaml community has already built a solid
foundation for web development!</p>
<p>On the frontend side, in 2024, Tarides focused on strengthening key
tools like <a href="https://github.com/ocsigen/js_of_ocaml"><code>js_of_ocaml</code></a>
by expanding its support for WebAssembly
(Wasm). <code>js_of_ocaml</code> (JSOO) has long been the backbone of OCaml’s web
ecosystem, enabling developers to compile OCaml bytecode into
JavaScript. This year, we <a href="https://github.com/ocsigen/js_of_ocaml/pull/1494">merged Wasm support back into
JSOO</a>, unifying the
toolchain and simplifying adoption for developers. The performance
gain of Wasm has been very impressive so far: CPU-intensive
applications in commercial settings have seen <strong>2x to 8x speedups</strong>
using Wasm compared to traditional JSOO. We also worked on better
support for effect handlers in <code>js_of_ocaml</code> to ensure applications
built with OCaml 5 can run as fast in the browser as they used to with
OCaml 4.</p>
<p>On the backend side, Tarides maintained and contributed to Dream, a
lightweight and flexible web framework. Dream powers projects like
<a href="/">our own website</a> and the
<a href="https://mirageos.org">MirageOS website</a>, where we maintain a fork to make
Dream and MirageOS work well together. Additionally, in 2024, we
enhanced <code>cohttp</code>, adding <a href="https://github.com/mirage/ocaml-cohttp/pull/847">proxy
support</a> to address
modern HTTP requirements.</p>
<p>While Tarides focused on JSOO, <code>wasm_of_ocaml</code>, Dream, and Cohttp, the
broader community made significant strides elsewhere. Tools like
Melange offer an alternative for compiling OCaml to JavaScript, and
frameworks like Ocsigen, which integrates backend and frontend
programming, continue to push the boundaries of what’s possible with
OCaml on the web. Notably, Tarides will build on this momentum in 2025
through a <a href="https://nlnet.nl/project/OCAML-directstyle/">grant</a> to
improve direct-style programming for Ocsigen.</p>
<h3>Windows</h3>
<p>Windows is the most widely used operating system, making first-class
support for it critical to OCaml’s growth. In 2024, <strong>31% of visitors
to <a href="https://ocaml.org">ocaml.org</a></strong> accessed the site from Windows,
yet the platform’s support historically lagged behind Linux and
macOS. This gap created barriers for both newcomers and commercial
users. We saw these challenges firsthand, with Outreachy interns
struggling to get started due to tooling issues, and commercial users
reporting difficulties with workflow reliability and compilation
speed.</p>
<p>To address these pain points, Tarides, in collaboration with the OCaml
community, launched the <a href="/blog/2024-05-22-launching-the-first-class-windows-project/">Windows Working
Group</a>. A
key milestone that our team contributed to was the release this year
of <strong>opam 2.2</strong>, three years after its predecessor. This release made
the upstream <code>opam-repository</code> fully compatible with Windows for the
first time, removing the need for a separate repository and providing
Windows developers access to the same ecosystem as Linux and macOS
users. The impact has been clear: feedback on the updated installation
workflow has been overwhelmingly positive, with developers reporting
that it "just works." The <a href="https://ocaml.org/install">install page</a>
for Windows is now significantly shorter and simpler!</p>
<p>In the OCaml 5.3 release, Tarides restored the MSVC Windows port,
ensuring native compatibility and improving performance for Windows
users. To further support the ecosystem, Tarides added Windows
machines to the opam infrastructure, enabling automated testing for
Windows compatibility on every new package submitted to opam. This has
already started to improve package support, with ongoing fixes from
Tarides and the community. The results are publicly visible at
<a href="https://windows.check.ci.dev/">windows.check.ci.dev</a>, which we run on
our infrastructure, providing transparency and a way to track progress
on the status of our ecosystem. While package support is not yet on
par with other platforms, we believe that the foundations laid in
2024—simplified installation, improved tooling, and continuous package
testing—represent a significant step forward.</p>
<h2>Community Engagement and Outreach</h2>
<p>In 2024, Tarides contributed to building a stronger OCaml community
through events, internships, and support for foundational
projects. The creation of <a href="https://fun-ocaml.com/">FUN OCaml 2024</a> in Berlin was the first dedicated OCaml-only event for a long time (similar to how the OCaml Workshop was separated from ICFP in the past). The conference was a huge success thanks to a huge amount of effort from David Sancho and Dmitriy Kovalenko for taking on the majority of the organisational work, Sabine Schmaltz for running the conference and web infrastructure, and from others inside Tarides, including Claire Vandeberghe who worked on the website design. Over 75 participants joined for two days of talks, workshops, and hacking, and the event has already reached <a href="https://www.youtube.com/channel/UC3TI-fmhJ_g3_n9fHaXGZKA">5k+ views on
YouTube</a> (for more details, you can check out our dedicated <a href="/blog/2024-11-13-the-new-conference-on-the-block-what-is-fun-ocaml/">blog post on FUN OCaml</a>). Tarides also co-chaired the OCaml Workshop at <a href="https://icfp24.sigplan.org/">ICFP
2024</a> in Milan, bringing together contributors from academia, industry, and open-source communities. These events brought together two different kinds of OCaml developers (with some overlap) bringing exciting new perspectives, changes, and developments to our community.</p>
<p>To expand local community involvement, Tarides organised OCaml hacking
meetups in
<a href="https://discuss.ocaml.org/t/announcing-ocaml-manila-meetups/14300">Manila</a>
and
<a href="https://discuss.ocaml.org/t/chennai-ocaml-meetup-october-2024/15417">Chennai</a>. To
make it easier for others to host similar events, we curated a list of
interesting hacking issues from past <a href="/blog/2023-03-22-compiler-hacking-in-cambridge-is-back/">Cambridge
sessions</a>,
now available on
<a href="https://github.com/tarides/compiler-hacking/wiki">GitHub</a>.</p>
<p>As part of the Outreachy program, Tarides supported two rounds of
internships in 2024, with results published on
<a href="https://discuss.ocaml.org/tag/outreachy">Discuss</a> and
<a href="https://watch.ocaml.org">watch.ocaml.org</a>. These internships not only
provided great contributions to our ecosystem but also brought fresh
insights into the challenges faced by new users. For example, interns
identified key areas where documentation and tooling could be
improved, directly informing future updates.</p>
<p>Tarides also maintained its commitment to funding critical open-source
projects and maintainers. We continued funding
<a href="https://blog.robur.coop/articles/finances.html">Robur</a> for their
maintenance work on MirageOS (most of those libraries are used by many
–including us– even in non-MirageOS context) and <a href="https://github.com/sponsors/dbuenzli">Daniel
Bünzli</a>, whose libraries like
<code>cmdliner</code> are essential for some of our development.</p>
<p>Finally, Tarides extended sponsorships to non-OCaml-specific events,
including <a href="https://jfla.inria.fr/jfla2024.html">JFLA</a>,
<a href="https://bobkonf.de/2025/en/">BobConf</a>,
<a href="https://www.fsttcs.org.in/">FSTTCS</a>, and <a href="https://www.youtube.com/watch?v=fMy0XhFdLAE">Terminal
Feud</a> (which garnered
over 100k views). These events expanded OCaml’s visibility to new
audiences and contexts, introducing the language to a broader
technical community that –we hope– will discover OCaml and enjoy using
it as much as we do.</p>
<h2>What’s Next?</h2>
<p>As we begin 2025, Tarides remains committed to making OCaml a
mainstream language. Our focus this year is to position OCaml as a
robust choice for mission-critical applications by enhancing developer
experience, ecosystem integration, and readiness for high-assurance
use cases.</p>
<p>We aim to build on the Dune Developer Preview to further improve
usability across all platforms, with a particular emphasis on Windows,
to make OCaml more accessible to a broader range of
developers. Simultaneously, we will ensure OCaml is ready for critical
applications in industries where reliability, performance, and
security are essential. Projects like
<a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a>
showcase the potential of memory- and type-safe languages for
safety-critical systems. Built on MirageOS and OCaml’s
unique properties, SpaceOS is part of the EU-funded
<a href="https://orchide.pages.upb.ro/">Orchide</a> project and aims to set a new
standard for edge computing in space. Additionally, SpaceOS is being
launched in the US through our spin-off
<a href="https://parsimoni.co">Parsimoni</a>. However, these needs are not
limited to Space: both the <a href="https://digital-strategy.ec.europa.eu/en/policies/cyber-resilience-act">EU Cyber Resilience
Act</a>
and the <a href="/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release/">US cybersecurity
initiatives</a>
highlight the growing demand for type-safe, high-assurance software to
address compliance and security challenges in sensitive
domains. Tarides believes that OCaml has a decisive role to play here
in 2025!</p>
<p>I’d like to personally thank our sponsors and customers, especially
Jane Street, for their unwavering support over the years, and to
<a href="https://github.com/dangdennis">Dennis Dang</a>, our single recurring
GitHub sponsor. Finally, to every member of Tarides who worked so hard
in 2024 to make all of this happen: thank you. I’m truly lucky to be
sailing with you on this journey!</p>
<p><em>We are looking for <a href="https://github.com/sponsors/tarides">sponsors on
GitHub</a>, are happy to
<a href="/innovation/">collaborate on innovative projects</a>
involving OCaml or MirageOS and offer <a href="/services/">commercial
services</a> for open-source projects –
including long-term support, development of new tools, or assistance
with porting projects to OCaml 5 or Windows.</em></p>
]]></description><link>https://tarides.com/blog/2025-01-20-tarides-2024-in-review</link><guid isPermaLink="false">https://tarides.com/blog/2025-01-20-tarides-2024-in-review.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Mon, 20 Jan 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Using `clang-cl` With OCaml 5]]></title><description><![CDATA[<p>Bringing new features to OCaml is not a trivial procedure, and any new contribution is subject to rigorous testing and inspection. The introduction of <a href="/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations/">Multicore OCaml</a> added a whole new dimension of complexity to the process, and this post takes you behind-the-scenes of a project that sprang from troubleshooting the restoration of MSVC to OCaml 5.</p>
<h2>Motivation</h2>
<p><a href="/blog/2025-01-09-ocaml-5-3-features-and-fixes/">OCaml 5.3 is released</a> and comes with a restored <a href="https://learn.microsoft.com/en-us/cpp/?view=msvc-170">MSVC</a> (Microsoft Visual C/C++ compiler) port. It had been removed with the introduction of the multicore runtime in OCaml 5.0. The <a href="https://github.com/ocaml/ocaml/pull/12954">Restore the MSVC port of OCaml #12954</a> PR lists all the prerequisite changes, and the final streak of changes, required to restore support.</p>
<p>The OCaml 5 runtime requires <a href="https://en.cppreference.com/w/c/atomic">C11 atomic</a> support from the C compiler, which for MSVC was introduced in <a href="https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/">Visual Studio 2022 version 17.5 Preview 2</a>. This early version had a few bugs during code generation and we were wondering if they would impact the OCaml runtime (fortunately, they did not). While the Microsoft team worked hard to fix the bugs, we learned about <a href="https://clang.llvm.org/docs/UsersManual.html#clang-cl"><code>clang-cl</code></a>, a driver program for Clang that attempts to be compatible with MSVC's <code>cl.exe</code>. It is ABI-compatible with MSVC, implements all MSVC compiler extensions, and is also a drop-in command-line replacement for <code>cl.exe</code>.</p>
<p>I wanted to try building OCaml with <code>clang-cl</code>, mostly because it would get us a second compiler's opinion on our code, as Clang has a different set of warnings and hints than MSVC. Seeing that <code>clang-cl</code> was already <a href="https://blog.llvm.org/2018/03/clang-is-now-used-to-build-chrome-for.html">used to build Chrome</a> and Firefox (<a href="https://blog.mozilla.org/nfroyd/2018/05/29/when-implementation-monoculture-right-thing/">1</a>, <a href="https://blog.mozilla.org/nfroyd/2019/04/25/an-unexpected-benefit-of-standardizing-on-clang-cl/">2</a>), I was hoping the needs of the OCaml runtime would have already been covered and bugs fixed, and that we could adopt it seamlessly.</p>
<p>The OCaml 5 runtime uses the POSIX threads (pthreads) library on Unix-like systems for all its concurrency primitives. For the OCaml 5.3 branch we've chosen to use the <code>winpthreads</code> library, part of the MinGW-w64 project, which implements <code>pthreads</code> on Windows. The OCaml 5 MinGW-w64 port uses it, and we found out we could use it with MSVC too. I then submitted two patch series to <code>winpthreads</code> (<a href="https://sourceforge.net/p/mingw-w64/mailman/mingw-w64-public/thread/20231204154806.2076-1-antonin%40tarides.com/#msg58709057">Patches and cleanups towards MSVC support</a>, <a href="https://sourceforge.net/p/mingw-w64/mailman/mingw-w64-public/thread/20240126143246.12930-1-antonin%40tarides.com/#msg58729186">MSVC support without GCC extensions</a>), checking my work with MinGW-w64+GCC, MinGW-w64+clang, MSVC, and <code>clang-cl</code>, foreshadowing their use within the OCaml runtime. What an adventure! And thanks to the MinGW-w64 team for reviewing this work.</p>
<h2>Using <code>clang-cl</code></h2>
<p>Clang on Unix-like systems masquerades as GCC, supports all GNU C extensions and defines the <code>__GNUC__</code> macro. On Windows, it masquerades as MSVC and defines the <code>_MSC_VER</code> macro instead, but <em>still</em> supports the GNU C extensions that we can take advantage of!</p>
<p>The build system only required <a href="https://github.com/ocaml/ocaml/pull/13093">a few changes</a>. For instance, MSVC currently defaults to C99 and needs two flags to switch to C11 and enable experimental C11 atomic support, whereas <code>clang-cl</code> defaults to C17. We also <a href="https://lists.gnu.org/archive/html/autoconf/2024-04/msg00000.html">discussed</a> how to improve the support of MSVC in Autoconf, which led to a few patches. We could then use <code>clang-cl</code> to discover new problems reported by the warnings it raised and fix them. In conjunction with this work, I raised the warning level of MSVC on the OCaml runtime C code from none to <code>-W2</code>.</p>
<p>Fortunately, most of the warnings were quite minor (see <a href="https://github.com/ocaml/ocaml/pull/13081">#13081</a> and <a href="https://github.com/ocaml/ocaml/pull/13243">#13243</a>), mainly consisting of warnings for deprecated functions or implicit truncations when converting integers or floating points values of different sizes. Switching to newer compilers also allowed us to remove dead code and workarounds for older versions of the compilers.</p>
<p>I found out that most of the uses of compiler <a href="https://clang.llvm.org/docs/AttributeReference.html">attributes</a> or <a href="https://clang.llvm.org/docs/LanguageExtensions.html#id33">builtins</a> were guarded by the <code>__GNUC__</code> macro, and as such were only enabled by GCC or Clang on Unix, even though <code>clang-cl</code> on Windows supports them too. Compiler attributes may enable more checks and warnings from the compiler. For instance, the <a href="https://clang.llvm.org/docs/AttributeReference.html#format"><code>format</code></a> attribute tags a function to be <code>printf</code>-like and checks the types of the list of values passed to it against the specifiers inside a format string. Compiler builtins may enable more optimisations, such as <a href="https://clang.llvm.org/docs/LanguageExtensions.html#builtin-expect"><code>__builtin_expect</code></a>. So, instead of <em>guarding</em> their use, we could <em>discover</em> whether the compiler provides them using newer macros such as <code>__has_attribute</code> or <code>__has_builtin</code>. This improved <a href="https://github.com/ocaml/ocaml/pull/13280">feature parity</a> between the <code>clang-cl</code> port and an OCaml build using GCC or Clang.</p>
<p>In particular, we could <a href="https://github.com/ocaml/ocaml/pull/13239">detect the labels as values</a> (also known as <em>computed gotos</em>) compiler extension to enable threaded code interpretation, which dramatically improves the speed of <code>ocamlc</code>, the OCaml bytecode interpreter. This optimisation isn't supported by MSVC and would require us to use inline assembly on x86 or write part of the bytecode interpreter in assembly on other architectures. If you're often using the bytecode interpreter, there's now a clear advantage of using <code>clang-cl</code> over MSVC. Another interesting optimisation uses <a href="https://github.com/ocaml/ocaml/pull/13238">software prefetching</a> to speed up the GC when traversing the graph of values. We had worked on <a href="https://github.com/ocaml/ocaml/pull/11827">restoring it in OCaml 5</a> from <a href="https://github.com/ocaml/ocaml/pull/10195">OCaml 4</a> but forgot to port it to Windows!</p>
<p>The overall work on restoring the MSVC port of OCaml and also building it with <code>clang-cl</code> led to a few bug reports to Microsoft and to the <a href="https://github.com/llvm/llvm-project/issues?q=is%3Aissue+author%3AMisterDA">LLVM project</a>, which I hope will benefit the community. The OCaml project now has an extra set of compiler eyes that scrutinise each and every change on Windows. Windows users may now take advantage of a (compliant) C11 and C23 FOSS compiler, with a wide range of optimisations and checks available.</p>
<p>I'm grateful to my colleagues at Tarides for helping me with this work, the OCaml core team for reviewing it, and Jane Street for sponsoring this effort.</p>
<h2>Try it Out and Stay in Touch!</h2>
<p>With the release of <a href="https://opam.ocaml.org/blog/opam-2-2-0/"><code>opam</code> 2.2</a> (now 2.3) supporting Windows, the restoration of the Cygwin port in OCaml 5.1, the MSVC port in OCaml 5.3, and the option to build OCaml with <code>clang-cl</code>, and the swarm of bug fixes that accompanied them, using OCaml on Windows has never been easier! Give it a whirl!</p>
<p>Connect with Tarides online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2025-01-15-using-clang-cl-with-ocaml-5</link><guid isPermaLink="false">https://tarides.com/blog/2025-01-15-using-clang-cl-with-ocaml-5.html</guid><dc:creator><![CDATA[ Antonin Décimo ]]></dc:creator><pubDate>Wed, 15 Jan 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5.3: Features and Fixes!]]></title><description><![CDATA[<p>We have a brand new OCaml release on our hands! 5.3 comes packed with features, fixes, and optimisations, including the return of some ‘familiar faces’. Support for the MSVC port is returning, as is statistical memory profiling now compatible with multicore projects.</p>
<p>This post highlights new and restored features, notable changes and user experience improvements, plus some bug fixes. There is no way that I can cover everything in this update, so I recommend that you check out the <a href="https://github.com/ocaml/ocaml/blob/5.3/Changes">Changes document</a> on GitHub for the full list of contributions!</p>
<h2>MSVC</h2>
<p>The 5.3 release restores support for the MSVC port of OCaml on Windows, marking the last remaining platform from 4.x to regain support in 5.x. This is part of a wider effort to achieve feature parity between OCaml 4.14 and OCaml 5, of which <a href="/blog/2024-09-11-feature-parity-series-compaction-is-back/">compaction</a> is a previous example, making the transition between versions as smooth as possible. The bulk of the effort is summarised in PRs <a href="https://github.com/ocaml/ocaml/pull/12954">#12954</a> and <a href="https://github.com/ocaml/ocaml/pull/12909">#12909</a> opened by David Allsopp, Antonin Décimo, and Samuel Hym (review by Miod Vallat and Nicolás Ojeda Bär).</p>
<p>Since the OCaml 5 runtime uses C11 atomics, supported platforms need to be compatible with them as well. Visual Studio 2022 introduced experimental support for C11 atomics which made the MSVC port of OCaml 5 possible, but the team needed to test out the feature first. This exploratory effort led to <a href="https://developercommunity.visualstudio.com/t/C11-atomics-Pointers-to-atomic-values-/10507360">several bug reports</a> addressed by Microsoft, and once these were completed (alongside a lot of other work, including fixing the <code>winpthreads</code> library of the <code>mingw-w64</code> project to that it builds with MSVC), the MSVC port was ready for public release.</p>
<p>As part of the project bringing MSVC back the team explored <a href="https://clang.llvm.org/docs/UsersManual.html#clang-cl">clang-cl</a>, an alternative command line interface to <a href="https://clang.llvm.org/">Clang</a> designed to be compatible with the MSVC compiler cl.exe. This was helpful because clang-cl has a different set of warnings and tips to MSVC, and using it effectively gave them a ‘second opinion’ on their code. The main PR for this side of the project is <a href="https://github.com/ocaml/ocaml/pull/13093">#13093</a>.</p>
<h2>Statmemprof</h2>
<p>OCaml 4.14 had support for statistical memory profiling, a feature of the language that can sample memory allocations allowing tools like <a href="https://github.com/janestreet/memtrace">Memtrace</a> to <a href="https://blog.janestreet.com/finding-memory-leaks-with-memtrace/">help users identify how their programs are using memory</a>. The multicore update introduced significant complexity to the process which made it necessary to drop support for 5.0; but work soon commenced to restore support under our feature parity banner! In 5.3, <code>statmemprof</code> makes its return, now equipped with multicore capabilities.</p>
<p>So how does it work? <code>Statmemprof</code> can check the allocation of memory at some given frequency (lambda) per word or unit of data. By sampling a fraction of allocations at random, we are able to monitor programs in a language like OCaml which allocates high rates of memory. It would be far too expensive performance-wise to monitor every allocation.</p>
<p>The new design has a lot in common with the OCaml 4 implementation of statmemprof, but with several tricky optimisations and changes to account for the significant complication of multiple domains and threads. Delve into the details in the PRs <a href="https://github.com/ocaml/ocaml/pull/12923">#12923</a> and <a href="https://github.com/ocaml/ocaml/issues/11911">#11911</a> by Nick Barnes (external reviews by Stephen Dolan, Jacques-Henri Jourdan,  and Guillaume Munch-Maccagnoni).</p>
<h2>Deep Effect Handlers</h2>
<p>OCaml 5.0 came with experimental support for algebraic effects, which allow users to describe computations and what effects they are expected to create. A <em>handler</em> essentially manages a computation by monitoring its execution and keeping track of resulting  effects. This ‘management’ can be done in two ways, <em>deeply</em> or <em>shallowly</em>. A shallow effect handler  monitors a computation until it either terminates or generates one effect, only handling that effect. A deep effect handler always manages a computation until it terminates and handles all of the effects performed by it.</p>
<p>PR <a href="https://github.com/ocaml/ocaml/pull/12309">#12309</a> (Leo White, Tom Kelly, Anil Madhavapeddy, KC Sivaramakrishnan, Xavier Leroy and Florian Angeletti, review by the same, Hugo Heuzard, and Ulysse Gérard) introduces effect syntax for deep effect handlers, rules that define the structure for writing them, compatible with the type checker and with support for pattern matching. This change aims to simplify the code needed to use deep effect handlers, improving user experience. Note that you can still use shallow effect handlers, and there is a good tutorial for using both in the <a href="https://ocaml.org/manual/5.3/effects.html">correspondingly updated manual page</a>.</p>
<h2>Debugging Improvements</h2>
<p>Another long-term project coming to fruition in this update are the several improvements to debugging on macOS. The platform is popular with a wide variety of OCaml users, including compiler developers, and they need good debugging workflows for their programs.</p>
<p><a href="https://lldb.llvm.org">LLDB</a> is the only supported native debugger on macOS, for both the ARM64 and x86_64 architectures. The improvements enable several new features:</p>
<ul>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13163">#13163</a> enable frame pointers on macOS x86_64 (Tim McGilchrist, review by Sébastien Hinderer and Fabrice Buoro):</strong> This PR introduces support for a common technique used by profiling tools including Linux perf, eBPF, FreeBSD, and LLDB, called stack-walking. Various performance tools use stack walking to reconstruct call graphs for programs, and frame pointers are what enable them to do so.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13241">#13241</a>, <a href="https://github.com/ocaml/ocaml/pull/13261">#13261</a>, <a href="https://github.com/ocaml/ocaml/pull/13271">#13271</a>, add CFI_SIGNAL_FRAME to arm64 and RISC-V runtimes for the purpose of displaying backtraces correctly in GDB (Tim McGilchrist, review by Miod Vallat, Gabriel Scherer and KC Sivaramakrishnan):</strong> This change helps sync up the runtime for the arm64 architecture for macOS (and the RISC-V runtime) with the amd64 and s390x runtimes. The two additional PRs add improvements to the first.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13136">#13136</a> Compatible LLDB and GDB Python extensions (Nick Barnes):</strong> This PR replaces some old GDB macros (used to debug OCaml programs) with faster and more capable extensions, and makes those extensions available in LLDB. This is especially useful to macOS users who can’t use GDB.</li>
</ul>
<h2>OS-Based Synchronisation for Stop-the-World Sections</h2>
<p>PR <a href="https://github.com/ocaml/ocaml/pull/12579">#12579</a> (B. Szilvasy, review by Miod Vallat, Nick Barnes, Olivier Nicole, Gabriel Scherer and Damien Doligez) improves user experience by replacing generic busy-wait synchronisation with OS-based synchronisation primitives, namely barriers and futexes. The change has significant performance benefits, especially on Windows machines, where spinning was causing long wait times. You can learn more about it in <a href="/blog/2024-07-10-deep-dive-optimising-multicore-ocaml-for-windows/">our blog post on the project</a>.</p>
<h2>User Experience Improvements</h2>
<ul>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/12868">#12868</a> Refresh HTML manual/API docs style (Yawar Amin, review by Simon Grondin, Gabriel Scherer, and Florian Angeletti):</strong>  An update to the <a href="https://ocaml.org/manual/5.3/index.html">OCaml Manual</a> which simplifies the colours, removes the gradients, and fixes the search button. It’s a nice improvement to a part of the OCaml ecosystem that is visible to users of all different backgrounds and contexts.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13201">#13201</a>, <a href="https://github.com/ocaml/ocaml/pull/13244">#13244</a> (Sébastien Hinderer, review by Miod Vallat, Gabriel Scherer and Olivier Nicole), and <a href="https://github.com/ocaml/ocaml/pull/12904">#12904</a> (Olivier Nicole, suggested by Sébastien Hinderer and David Allsopp, external review by Gabriel Scherer) various improvements to TSan:</strong> These three PRs represent the continuous work being put in to bring improvements to TheadSanitizer or TSan. They include speedups and the ability for users to choose which PRs they want to run the TSan testsuite on.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13014">#13014</a> (Miod Vallat, review by Nicolás Ojeda Bär) add per function sections support to the missing compiler backends:</strong> This PR is an example of how much focus there is on ensuring that each native backend is equally supported, having features available across all Tier-1 platforms. Here, the compile-time option <code>function–sections</code> was re-enabled on all previously unsupported (POWER, riscv64, and s390x) native backends.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/11996">#11996</a> emancipate <code>dynlink</code> from <code>compilerlibs</code> (Sébastien Hinderer and Stephen Dolan, review by Damien Doligez and Hugo Heuzard):</strong> The <code>dynlink</code> library used to depend on <code>compilerlibs</code>, having to embed a copy of <code>compilerlibs</code> meaning that it would be compiled twice, costing the user in time and performance. After the change, the build time and size of both <code>dynlink.cma</code> and <code>dynlink.cmxa</code>were reduced.</li>
</ul>
<h2>Miscellaneous Bug Fixes</h2>
<p>These two bug fixes grew out of internship projects at Tarides, it's great to see how these projects can benefit the language as a whole.</p>
<ul>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13419">#13419</a> (B. Szilvasy and Nick Barnes, review by Miod Vallat, Nick Barnes, Tim McGilchrist, and Gabriel Scherer):</strong> This PR addressed resource leaks that caused memory bugs in the runtime events system.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13535">#13535</a> (Antonin Décimo, Nick Barnes, report by Nikolaus Huber and Jan Midtgaard, review by Florian Angeletti, Anil Madhavapeddy, Gabriel Scherer, and Miod Vallat):</strong> Expanded the documentation for <code>Hashtbl.create</code> to explain that negative values are allowed in the hash table but will be disregarded.</li>
</ul>
<p>These bug fixes stem from discoveries made during the release cycle of the 5.3 update. Catching and fixing broken bits of code is an important but often lengthy part of the release process.</p>
<ul>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13138">#13138</a> (Gabriel Scherer, review by Nick Roberts):</strong> This PR is an old one, first opened eight years ago in 2016! Optimised pattern matching with mutable and lazy patterns was observed to result in occasions where seemingly impossible cases were taken, causing unsoundness issues. After <em>lengthy</em> efforts to narrow down the cause, the problem has been fixed for 5.3!</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13519">#13519</a> (Sébastien Hinderer, report by William Hu, review by David Allsopp):</strong> This PR restored backward compatibility lost when renaming some items in <code>Makefile.config</code>.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13591">#13591</a> (Antonin Décimo, review by Nick Barnes, report by Kate Deplaix):</strong> This PR fixed a problem whereby compiling C++ code using the OCaml C API resulted in a name-mangled <code>caml_state</code> on Cygwin. The fix ensured that installed headers were compatible with C++ and protected the ones that were not with <code>CAML_INTERNALS</code>.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13471">#13471</a> (Florian Angeletti, review by Gabriel Scherer):</strong> Added a flag to define the list of keywords recognisable by the lexer, making adding future keywords to OCaml easier.</li>
<li><strong><a href="https://github.com/ocaml/ocaml/pull/13520">#13520</a> (David Allsopp, review by Sébastien Hinderer and Miod Vallat):</strong> Fixed the compilation of native-code versions of systhreads.</li>
</ul>
<h2>What’s Next?</h2>
<p>Work on OCaml continues! The next few months will bring more features and bug fixes to the language, with focus on big changes like the relocatable compiler, unloadable runtime, and laying the ground work for project-wide renaming and other powerful navigation and refactoring features. The <a href="https://ocaml.org/changelog">OCaml changelog</a> is the place to go to keep up with what’s new, as well as the <a href="https://discuss.ocaml.org/">OCaml Discuss</a> forum.</p>
<p>You can connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> or sign up for our mailing list to stay updated on our latest projects. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2025-01-09-ocaml-5-3-features-and-fixes</link><guid isPermaLink="false">https://tarides.com/blog/2025-01-09-ocaml-5-3-features-and-fixes.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 09 Jan 2025 00:00:00 GMT</pubDate></item><item><title><![CDATA[Multicore Property-Based Tests for OCaml 5: Challenges and Lessons Learned]]></title><description><![CDATA[<p>In <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">a previous post</a>, we discussed how we have developed property-based tests (PBTs) to stress test the new runtime system in OCaml 5, and gave concrete examples of such tests. In this second part, we discuss some of the challenges and the lessons learned from that effort.</p>
<h2>Testing APIs With Hidden or Uncontrolled State</h2>
<p>In part 1, we saw how <code>STM</code> and <code>Lin</code> were useful to test stateful module interfaces, like <code>Float.Array</code>. The OCaml standard library and runtime however also expose modules that are stateful, but for which the state is hidden or outside the full control of a black-box testing process. We are nevertheless interested in helping ensure the correctness of such modules.</p>
<p>For example, <code>Ephemeron</code>s depend on the state of OCaml's heap, meaning that a garbage collection (generally out of control of the test driver) may unsuspectingly trigger and cause changes in the test outcome. As a result, we ended up abandoning <code>Lin</code> tests of <code>Ephemeron</code>s, as they would run multiple observations of the same system under test - one parallel observation and several sequential ones - most likely with different results, because of different garbage collection schedulings.</p>
<p>In addition, the <code>Lin</code> tests could cause excessive shrinking searches, in trying to find a minimal example causing <code>Ephemeron</code> differences between such runs. Instead we favoured replacing them with <code>STM</code> tests, that perform only <em>one observation</em> of the system under test. We furthermore experimented with <a href="https://github.com/ocaml-multicore/multicoretests/pull/367">inserting an explicit call to <code>Gc.full_major</code> in between our run of the sequential and parallel tests to increase the chance of starting the latter on a relatively stable heap basis</a>. We encountered <a href="https://github.com/ocaml-multicore/multicoretests/pull/365">similar problems with tests of the <code>Weak</code> module</a> and recently also in <a href="https://github.com/ocaml-multicore/multicoretests/pull/469">the <code>STM</code> tests of the <code>Gc</code> module</a>.</p>
<p>Despite these challenges, the above tests successfully found several issues, e.g.</p>
<ul>
<li><a href="https://github.com/ocaml/ocaml/pull/11749">racing <code>Weak</code> functions could in some cases produce strange values</a>,</li>
<li><a href="https://github.com/ocaml/ocaml/issues/11934">certain combinations of <code>Weak</code> <code>Hashset</code> functions could cause the runtime to <code>abort</code> or segfault</a>,</li>
<li><a href="https://github.com/ocaml/ocaml/issues/11503">the <code>Ephemeron</code> tests could trigger an assertion failure and abort</a>, and</li>
<li>out-of-date documentation for <a href="https://github.com/ocaml/ocaml/pull/13424"><code>Gc.quick_stat</code></a> and  <a href="https://github.com/ocaml/ocaml/pull/13440"><code>Gc.set</code></a></li>
</ul>
<h2>Cygwin Challenges</h2>
<p>As Cygwin support was being restored up in the 5.1 release, we wanted to test this platform as well to help ensure its correctness. However, we found that a test-suite run took an excessively long time, often not completing within a 6-hour timeout! We solved this initially by <a href="https://github.com/ocaml-multicore/multicoretests/pull/305">splitting a test-suite run into two separate workflows</a> and later by rephrasing it as <a href="https://github.com/ocaml-multicore/multicoretests/pull/313">two separate CI jobs, belonging to the same workflow</a>. Thanks to general improvements to the OCaml 5 runtime system, <a href="https://github.com/ocaml-multicore/multicoretests/pull/420">we have since been able to merge this split back into just one job</a> like the remaining platforms, and eventually also <a href="https://github.com/ocaml-multicore/multicoretests/pull/449">reduce the timeout for this single Cygwin workflow to 4 hours, like the remaining platforms</a>.</p>
<p>Another Cygwin challenge early on was <a href="https://cygwin.com/packages/summary/opam.html">the relatively old <code>opam.2.0.7</code> version it includes</a>. This made it harder to test the various OCaml compiler versions on Cygwin (and later MinGW). As a workaround, we set up <a href="https://github.com/shym/custom-opam-repository">a custom <code>opam</code> repository</a> with appropriate <code>opam</code> files for each compiler version.</p>
<h2>Keeping Dependencies Minimal</h2>
<p>Initially we happily used <code>ppxlib</code> to generate boilerplate code for QCheck generators using  <code>ppx_deriving_qcheck</code> and <code>show</code> functions for the tested <code>cmd</code>s. However, <code>ppxlib</code> depends on the OCaml compiler's AST which means that it occasionally breaks on the compiler's <code>trunk</code> development branch whenever its AST changes. As a consequence, testing <code>trunk</code> would halt until <code>ppxlib</code> was fixed again - an unfortunate situation when trying to help ensure its correctness! After <a href="https://github.com/ocaml-ppx/ppxlib/pull/407">helping keep a branch of <code>ppxlib</code> continuously working with <code>trunk</code></a>, at some point we instead decided to eliminate the test suite's <code>ppxlib</code> dependency. We therefore wrote the corresponding definitions by hand and by utilising a dedicated printing library <a href="https://ocaml-multicore.github.io/multicoretests/0.4/qcheck-multicoretests-util/Util/Pp/index.html"><code>Pp</code></a> in <a href="https://ocaml-multicore.github.io/multicoretests/0.4/qcheck-multicoretests-util/"><code>qcheck-multicoretests-util</code></a>. Since then, testing has not been blocked by such breakages, and the dependencies are down to just the <code>qcheck-core</code> package, and (transitively) <code>dune</code>.</p>
<p>Testing an increasing number of platforms such as the MinGW and Cygwin ports over the past 2-3 years has been challenging, as much of that effort predates <a href="https://opam.ocaml.org/blog/opam-2-2-0/">the more recent Windows support brought by opam-2.2</a>. As we additionally wanted to test OCaml 5's now restored MSVC port, also during its development, we ended up abandoning <code>opam</code> and <code>setup-ocaml</code> in favour of just building the tested compiler and our dependencies (<code>qcheck-core</code> and <code>dune</code>) from source in our CI workflows. As a result, we have gained a uniform CI workflow setup across platforms that also allows us to kick-off tests of feature branches, and thereby eliminate the need for the above-mentioned custom <code>opam</code> repository.</p>
<h2>Testing in the Presence of Misbehaviour</h2>
<p>Overall, when having found, investigated, reported, and sometimes also fixed an error, we would like to continue testing despite the existence of such known issues. A test-suite rerun is however likely to trigger and report the same bug again, temporarily hindering the discovery and fixing of other issues. Something similar has been observed by others, e.g. in <a href="https://www.erlang-factory.com/upload/presentations/582/CertifyingyourcarwithErlang.pdf">Quviq's property-based testing of AUTOSAR for Volvo</a>.</p>
<p>To be able to continue testing we can, in the simple cases, (temporarily) adjust the tested property to accept the observed (mis)behaviour. This was the case, for example, for <a href="https://github.com/ocaml/ocaml/pull/13424"><code>Gc.quick_stat</code> which would return non-zero entries for four record fields, where the documentation was out-of-date and specifying that zeros should be returned</a>.</p>
<p>However we have also had to (temporarily) adjust the generator to avoid triggering a particular buggy <code>cmd</code>. This was the case, for example, for <a href="https://github.com/ocaml/ocaml/pull/13370"><code>Gc.counters</code> which had an independently found-and-fixed issue with improper C interfacing with the GC</a>. <a href="https://github.com/ocaml-multicore/multicoretests/pull/469">Our new <code>Gc</code> test</a> would nevertheless trigger it and cause a crash on 5.2.0, until we adjusted the generator to skip generating <code>Gc.counters</code> calls on tests of versions up to 5.2.0. Since then <a href="https://discuss.ocaml.org/t/ocaml-5-2-1-released/15634">the fix has been included in the 5.2.1 bugfix release</a>.</p>
<p>Finally, we have also (temporarily) disabled a test on a platform. This is the case, for example, for <a href="https://github.com/ocaml/ocaml/issues/13046">the parallel test of <code>Dynlink</code> which is unsafe on Windows, due to an underlying flexdll issue</a>. Whereas <a href="https://github.com/ocaml/flexdll/pull/136">the flexdll issue is now fixed in the flexdll repository</a>, we are awaiting a new release before proceeding to re-enable the parallel test on Windows.</p>
<h2>Crashes Take Down the Test Runner Too</h2>
<p>Since the test-driver is running in OCaml too – and in the same process – when the SUT crashes, so does the entire QCheck test-driver process. In our experience this happens more often than not, as runtime issues tend to lead to memory corruption and typically a <a href="https://en.wikipedia.org/wiki/Segmentation_fault">segmentation fault</a>. Running the test in a <a href="https://en.wikipedia.org/wiki/Fork_(system_call)"><code>fork</code>ed</a> child process may guard against a crash in the child taking down the parent process, with the caveat that OCaml 5 prevents <code>fork</code>ing child processes after the first <code>Domain.spawn</code>. In the spirit of functional programming, this option is available as a reusable combinator <a href="https://ocaml-multicore.github.io/multicoretests/0.4/qcheck-multicoretests-util/Util/index.html#val-fork_prop_with_timeout"><code>Util.fork_prop_with_timeout : int -&gt; ('a -&gt; bool) -&gt; 'a -&gt; bool</code></a> in <a href="https://ocaml.org/p/qcheck-multicoretests-util/latest"><code>qcheck-multicoretests-util</code></a>, thus allowing us to easily wrap the property of a crash-triggering <code>QCheck</code> test.</p>
<p>Despite not designed with any of the above in mind, we have arrived at a test layout where most tests are run as separate executables, which lets us identify crashes relatively easily and simultaneously lets a test-suite run continue despite encountering a crash underway.</p>
<h2>Positive Testing, Negative Testing, and Stress Testing</h2>
<p>While developing the <code>STM</code> and <code>Lin</code> libraries it became clear that we should guard against changes mistakenly affecting their error-finding behaviour. For this reason, we added negative tests of e.g. parallel modifications of an unprotected <code>ref</code> cell that are expected to fail, and then <a href="https://github.com/c-cube/qcheck/pull/244">extended <code>QCheck</code> with a <code>Test.make_neg</code> function to construct a negative test</a>. In the CI we then use this negative testing ability e.g. to test that <code>Hashtbl</code> is unsafe to use in parallel as mentioned in part 1 and that the generator can find a counterexample illustrating it.</p>
<p>Often, such a parallel negative test triggers quickly, e.g. on one of the first 10 test inputs generated – effectively stress-testing a module under parallel usage for only a very limited amount of time. We therefore added stress test properties, in the form of <code>stress_test</code> for <code>Lin</code> and <code>stress_test_par</code> for <code>STM</code>, both offering a weaker, more forgiving property that only fails on unexpected exceptions or outright crashes, thus strengthening our belief in the runtime - even under longer, continued parallel misusage.</p>
<p>Effectively, we have arrived at PBT variants of classical test concepts:</p>
<ul>
<li><strong>positive tests</strong> - expected to hold across many random test inputs</li>
<li><strong>negative tests</strong> - expected not to hold and produce a counterexample</li>
<li><strong>stress tests</strong> - expected not to misbehave by raising an exception or crashing</li>
</ul>
<h2>False Alarms</h2>
<p>As more and more CI target platforms have been added, we have also seen a variation in behaviour across them: Some of the above negative tests are not triggered as consistently across platforms and some of the tests take a long time or cause timeouts on some platforms.</p>
<p>These add noise and still require us to check whether a failure was genuine or not. We have therefore focused on reducing the noise from false positives. To better understand this effort here is a plot of CI workflow outcomes for merged PRs (361-486) spanning a period of ~1.5 years from early June 2023 to early December 2024:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/false-alarms-plot-170w~kfpgpUAdT9uV0ZkyJt555g.webp 170w, /blog/images/false-alarms-plot-340w~XMa35cJ4nLEu0s52QbwbBQ.webp 340w, /blog/images/false-alarms-plot-680w~eQrc8jI6bcA5Lhq0n8SNLw.webp 680w, /blog/images/false-alarms-plot-1360w~dX4NQFJuZHoHK0QiRAid_g.webp 1360w" src="/blog/images/false-alarms-plot-1360w~dX4NQFJuZHoHK0QiRAid_g.webp" alt="A stacked histogram illustrating the outcome of CI workflow runs split into 'OK', 'ci', 'genuine', and 'other' categories"></p>
<p>First of all, this covers a period of testing a mix of OCaml versions, starting with workflows targeting 5.1 in June 2023, then 5.2, and now 5.3 and 5.4/trunk (the current compiler development version). Note that this only plots the outcome for Jan's PRs to <code>main</code> for a fair comparison, as these would originally trigger twice as many workflow runs (push and PR). It furthermore only includes the last run for each PR, in case there were more because of PR revisions. Each CI run may in principle trigger several errors (in different categories even!). This is a rare incident however.</p>
<p>Further notes:</p>
<ul>
<li>On 370 a ppc64 workflow was added (workflow number increase)</li>
<li>391 added framepointer workflows (workflow number increase)</li>
<li>392-393 revealed <a href="https://github.com/ocaml/ocaml/issues/12543">a cmi-file lookup regression</a> that made several workflows fail, hence the spike</li>
<li>There were CI and network issue around 395-398</li>
<li>On 396 a first FreeBSD workflow was added (workflow number increase)</li>
<li>On 398 a second FreeBSD workflow and extra opam install workflows were added (workflow number increase)</li>
<li>On 420 the 2-split Cygwin workflows were merged (workflow number decrease)</li>
<li>On 429 (merged before 431) we eliminated duplicate CI runs for both push and the PR</li>
<li>On 438 we retired the 5.1 workflows to only run weekly (not on every PR)</li>
<li>On 449 the older MSVC PR (399) adding 2 additional MSVC workflows had just been merged</li>
<li>On 453 the parallel <code>Dynlink</code> tests were disabled on Windows (since a fix had been developed and offered on the FlexDLL repo)</li>
<li>On 458 2 macOS ARM64 workflows were temporarily duplicated while moving them from <code>multicoretests-ci</code> to GitHub actions</li>
<li>On 471 <code>5.4.0+trunk</code> workflows were added and removed three 5.1 multicoretests-ci workflows</li>
<li>On 481 <code>multicoretests-ci</code> stopped running the 4 remaining 5.2 workflows</li>
<li>On 482 the ten 5.2 workflows were disabled</li>
</ul>
<p>The smaller sub-bars are harder to distinguish with the dominant OK bars in the above figure.  Below we therefore zoom in and display the same plot, including only the different kinds of failures:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/false-alarms-plot-errors-only-170w~5YZPBXgUoTrAIc0KQle6iw.webp 170w, /blog/images/false-alarms-plot-errors-only-340w~5HTdqWao21Ru8BWhPQSXJA.webp 340w, /blog/images/false-alarms-plot-errors-only-680w~ulyPw_CnsHR2OYtym2eM_A.webp 680w, /blog/images/false-alarms-plot-errors-only-1360w~wOpCubYg66VDTbHXaZMcZw.webp 1360w" src="/blog/images/false-alarms-plot-errors-only-1360w~wOpCubYg66VDTbHXaZMcZw.webp" alt="A stacked histogram illustrating the outcome of CI workflow runs split, focusing only on the 'ci', 'genuine', and 'other' error categories"></p>
<p>There are multiple competing efforts and aspects behind the amount of 'genuine' errors triggered in the above:</p>
<ul>
<li>First, OCaml developers have fixed a number of defects and released 5.1.0, 5.2.0, 5.2.1, and 5.3.0~beta2 over this time period</li>
<li>Second, as new compiler features are added and merged they may accidentally introduce new errors</li>
<li>Third, we have added tests and 'sharpened the axe' of the existing ones</li>
<li>Finally, a fix of a bug in 5.1 merged into 5.2 doesn't prevent the bug from continuing to show up on 5.1 CI runs</li>
</ul>
<p>It is clear from the plot that both genuine issues (categorised as 'genuine') and false alarms (categorised as 'other') have decreased over this period. The workflow alarms were initially dominated by false ones, then started being dominated by genuine (and 'ci') ones, and have now settled on a level with zero alarms being a common outcome. Such a noise-free test-suite signal is also central for OCaml compiler developers to utilise the test suite in their own development workflow. Here a test-suite red light should ideally signal a problem with a proposed runtime change, rather than (a) false alarms adding needless noise ("oh, never mind that red light!") and (b) true alarms adding genuine noise ("actually that's an existing unfixed issue, not your fault").</p>
<p>To understand the failures triggering on specific versions, here's first a plot of the outcomes for the weekly CI runs of the upcoming 5.3 release:</p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/workflow-ci-runs-53-170w~MAxMQ7TiFvt_peAArj2ViA.webp 170w, /blog/images/workflow-ci-runs-53-340w~IZK0D8uJo6yhFuDNGNy3NA.webp 340w, /blog/images/workflow-ci-runs-53-680w~6MVyB_rf2AQZJ_1ML-jJCw.webp 680w, /blog/images/workflow-ci-runs-53-1360w~587YI9GedCt9uz4sgqe0YQ.webp 1360w" src="/blog/images/workflow-ci-runs-53-1360w~587YI9GedCt9uz4sgqe0YQ.webp" alt="A stacked histogram illustrating the outcome of 5.3 CI workflow runs split into 'OK' and 'Fail' categories across 3 months, showing only one failure" width="45%">
</p>
<p>The one failure on October 21 happened during Cygwin installation, and was hence not related to the test suite. In comparison, weekly runs of the 5.0 workflows tend to trigger issues. This below plot covers 8 workflows in contrast to 5.3's 12 workflows, as Cygwin, frame-pointer, and MSVC byte/native workflows were added since. Since the 5.0 release is older than 5.3, we also started recording weekly workflow outcomes for it sooner:</p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/workflow-ci-runs-50-170w~oh9VdUd6FLX7B-YdoJ58_w.webp 170w, /blog/images/workflow-ci-runs-50-340w~glF_vLQa_We6QdlOasRkAg.webp 340w, /blog/images/workflow-ci-runs-50-680w~iJAugUPo8KRE56egXeVXsQ.webp 680w, /blog/images/workflow-ci-runs-50-1360w~TBemZzEzBagiSN81hfbXbQ.webp 1360w" src="/blog/images/workflow-ci-runs-50-1360w~TBemZzEzBagiSN81hfbXbQ.webp" alt="A stacked histogram illustrating the outcome of 5.0 CI workflow runs split into 'OK' and 'Fail' categories across 3 months, showing failures in all but 4 out of 22 runs" width="45%">
</p>
<p>The plot clearly indicates that issues are triggered on 5.0 workflows more often that not. Ever since starting to note the cause of such failures in late August 2024, the triggered issues are genuine, and typically include <a href="https://github.com/ocaml/ocaml/issues/12103">crashes due to parallel access to <code>Buffer</code></a>, pinpointing a case in <code>Buffer.add_string</code> that flew under the radar of <a href="https://github.com/ocaml/ocaml/pull/11742">the first <code>Buffer</code> fix included in the 5.0.0 release</a>.</p>
<h2>Hidden Costs</h2>
<p>We have worked to reach zero false alarms and now generally achieve it across an array of 31 CI workflows. We apply due diligence and have thus developed a workflow of going over the CI red lights to understand and summarise the failures – both the genuine ones and the false alarms. We then keep track of them as <a href="https://github.com/ocaml-multicore/multicoretests/issues">issues in the multicoretests repo</a>. This allows us to easily refer to them and spot trends, such as starting to see new failures or noticing that a particular failure no longer occurs.</p>
<p>When we spot new errors, we work to reproduce them locally to make sure the issue is genuine. If so, we then report the issue upstream to <code>ocaml/ocaml</code> along with a description of the required steps needed to reproduce it. When understanding the problem well enough, we also contribute with a compiler fix PR. Out of <a href="https://github.com/ocaml-multicore/multicoretests#issues">the 40 issues currently identified</a>, Tarides engineers have filed PRs to fix 28 of these, 10 issues have been fixed by others (typically Inria or Jane Street engineers), and 2 issues remain open. Tarides have thus put in a significant effort to resolve errors.</p>
<h2>Some Issues are Still Hard to Reproduce</h2>
<p>Despite all our efforts to amplify problems and increase reproducibility, some issues are still hard to trigger. One such case was <a href="https://github.com/ocaml/ocaml/pull/12707">ocaml/ocaml#12707</a> in which we were able to trigger the assertion failure, albeit rarely. This one took some head-scratching until we realised the problem was caused by a small time window between reading the same atomic field twice in an assertion: <code>di-&gt;backup_thread_msg == BT_INIT || di-&gt;backup_thread_msg == BT_TERMINATE</code>. This was carried out in parallel with a backup-thread transitioning from <code>BT_TERMINATE</code> to <code>BT_INIT</code> by an atomic write <code>atomic_store_release(&amp;di-&gt;backup_thread_msg, BT_INIT)</code>, thus creating a tiny chance of neither of the conditions to be true, if the write would happen just in between the two reads. We could then manually insert a call to <code>sleep</code> to confirm, and develop an appropriate fix.</p>
<p>We are currently investigating <a href="https://github.com/ocaml-multicore/multicoretests/issues/480">an even rarer issue triggered by the <code>Gc</code> tests, that causes a rare crash on macOS running on an ARM64 processor</a> and seems to require just the right OCaml heap conditions to trigger. In both cases, despite not triggering on every PBT run, the randomised tests have nevertheless highlighted genuine issues that would otherwise only show up even more rarely on the ocaml/ocaml test suite or – worse – to end users of OCaml.</p>
<h2>Lowering the Barrier to Entry for <code>multicoretests</code> for Compiler Engineers</h2>
<p>To offer compiler engineers the ability to run the test suite easily, <a href="https://github.com/ocaml/ocaml/pull/13458">we have made it possible to do so by labeling a PR with a <code>run-multicoretests</code> tag</a>. Recently <a href="https://github.com/ocaml/ocaml/pull/13580">a PR to improve major GC performance with mark-delay</a> kicked off using the <code>run-multicoretests</code> tag and is already making good use of the test suite as <a href="https://github.com/ocaml/ocaml/pull/13580#issuecomment-2478454501">it detected an issue with the GC marking of Ephemerons</a>. With <a href="https://github.com/ocaml/ocaml/pull/13616">the test suite finding an issue in another GC improving PR</a>, we are confident in the value addition that the test suite brings to compiler developer in helping quality-assure runtime-related PRs.</p>
<h2>Usage Outside <code>multicoretests</code></h2>
<p>The usage of <code>STM</code> has spread outside of <code>multicoretests</code>. In <a href="https://github.com/ocaml-multicore/saturn">Saturn</a>, a library of lock-free data structures for OCaml 5, both existing and new data structures come with <code>STM</code> tests to help ensure their correctness. This started in connection with <a href="https://github.com/ocaml-multicore/saturn/pull/43">moving experimental tests of <code>ws_deque</code> out of the <code>multicoretests</code> suite</a> and has continued since. In <a href="https://github.com/ocaml-multicore/picos">Picos</a>, a library for composing effect-based schedulers, <code>STM</code> tests also help ensure correctness of its underlying data structures.</p>
<p>For the Gospel specification language for OCaml, Tarides has worked to develop <a href="https://github.com/ocaml-gospel/ortac">Ortac-QCheck-STM, as a plugin for Ortac</a>. This is a tool to extract sequential <code>STM</code> tests from a Gospel specification, thereby putting the strength of PBT in the hands of OCaml developers willing to annotate their interfaces with Gospel specifications. This effort is paying off, as <a href="https://github.com/ocaml-gospel/ortac?tab=readme-ov-file#found-issues">the tool is starting to find genuine issues</a>.</p>
<h2>Conclusion</h2>
<p>Using PBT to test OCaml 5 started as a blue-sky project at Tarides. Despite a range of challenges, the test suite has nevertheless reached a point where a run yields a clear signal free of false alarms and worthy of a confidence increase in a compiler code change. Getting here was a team effort with contributions from Charlène Gros, Samuel Hym, Olivier Nicole, Nicolas Osborne, and Naomi Spargo, along with patience and effort from OCaml compiler engineers working to fix our reported findings.</p>
<p>The PBT approach has successfully pinpointed a range of issues across the <code>Stdlib</code>, the rewritten OCaml 5 runtime system, and the restored backends. We hope that the test suite can continue to do so and thereby help maintain OCaml's reputation as a rock solid and safe platform.</p>
]]></description><link>https://tarides.com/blog/2024-12-23-multicore-property-based-tests-for-ocaml-5-challenges-and-lessons-learned</link><guid isPermaLink="false">https://tarides.com/blog/2024-12-23-multicore-property-based-tests-for-ocaml-5-challenges-and-lessons-learned.html</guid><dc:creator><![CDATA[ Jan Midtgaard ]]></dc:creator><pubDate>Mon, 23 Dec 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Learn OCaml the Easy Way - Including the Hard Bits]]></title><description><![CDATA[<p>OCaml is a valuable and powerful tool, combining performance, security, and reliability. Joining the ecosystem, community, and learning the language can help you make the most of its capabilities if you're not already familiar with it. But how can we make learning OCaml easier? As with other functional programming languages, its visibility can be limited by factors such as your educational and professional environment, or the programmers you follow. Nevertheless, the world of software development is <a href="/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release/">increasingly recognising the strengths of functional programming</a> in general and <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">OCaml in particular</a>.</p>
<p>To meet the growing interest in OCaml, we aim to make learning the language as straightforward as possible. For individuals,  we’re prioritising tutorials, books, and events centred around OCaml. For teams and organisations, we offer <a href="/services/training/">OCaml courses</a> for both basic and advanced training.</p>
<h2>The Best Ways to Learn OCaml: For the Independent Learner</h2>
<h3>OCaml.org</h3>
<p>Whether you’re looking to write a ‘hello world’ or delve into the details of the OCaml compiler, the central resource is OCaml.org. The <a href="https://ocaml.org/docs">Learn</a> area provides a range of resources, including tutorials, exercises, and a directory of books and academic papers, catering to different skill levels. The page also offers ways to install the <a href="https://ocaml.org/install#linux_mac_bsd">latest version of OCaml</a>, access the <a href="https://ocaml.org/manual/5.2/index.html">manual</a>, and explore  OCaml’s documentation. The team has taken inspiration from other languages like Python, Rust, and the functional language Haskell to improve the resources available to learn OCaml.</p>
<p>These resources are tailored to the independent learner, using their free time to pursue knowledge, or the <a href="https://ocaml.org/academic-users">teacher needing materials</a> for their students. We support the continuous improvement of tutorials and documentation on OCaml.org, and the accessibility of books, papers, and other learning materials. The pages are also open-source, meaning that everyone is welcome to contribute!</p>
<h3>Books</h3>
<p>If you prefer learning from books, there are several available to choose from. For beginners, <a href="https://cs3110.github.io/textbook/cover.html">OCaml Programming: Correct + Efficient + Beautiful</a> and <a href="https://ocaml-book.com">OCaml From the Very Beginning</a> are excellent choices. The former is a textbook used at Cornell University, complete with a video playlist. Intermediate learners may find <a href="https://dev.realworldocaml.org">Real World OCaml</a> and <a href="https://www.amazon.com/gp/product/0957671113">More OCaml: Algorithms, Methods, &amp; Diversions</a> beneficial. <a href="https://dev.realworldocaml.org">Real World OCaml</a> is available for free download thanks to our <a href="/blog/2022-10-14-real-world-ocaml-book-giveaway/">sponsorship of its gold open access release</a>.</p>
<h3>Community</h3>
<p>OCaml’s robust open-source ecosystem significantly aids newcomers by making source code freely available for study, and offering access to experts and feedback through the <a href="https://github.com/ocaml/ocaml">GitHub repository</a> and <a href="https://discuss.ocaml.org">Discuss forum</a>.</p>
<p>There are many more ways to engage with the OCaml community online:</p>
<ul>
<li>Social media: head over to the <a href="https://ocaml.org/community">Community</a> page on OCaml.org for an overview of the different places online people meet to discuss, develop, and blog about OCaml. Resources include more or less formal forums, educational text- or video-based content, entertainment, social networking, and live chats.</li>
<li>Advent of Code: An <a href="https://adventofcode.com/2024/about">advent calendar</a> of programming puzzles, suitable for a variety of skill levels and compatible with all languages, popular with programmers as a way to challenge themselves and learn new languages. This year, Sabine has organised <a href="https://x.com/i/lists/1846851455384240406">a leaderboard</a> for people solving the advent of code in OCaml.</li>
<li>The OCaml Planet: An <a href="https://ocaml.org/ocaml-planet">RSS feed aggregator hosted on OCaml.org</a> which shares articles and blog posts on everything OCaml. Follow it to stay up-to-date on the latest OCaml news, technical deep dives, open-source project updates, and developer insights.</li>
</ul>
<h3>Events</h3>
<p>We also host and sponsor community events like <a href="/blog/2024-05-01-we-host-our-first-ocaml-retreat-in-india/">hacking days</a> and <a href="/blog/2024-11-13-the-new-conference-on-the-block-what-is-fun-ocaml/">conferences</a> to further support users who want to learn more about OCaml. Upcoming events include:</p>
<ul>
<li>The OCaml Workshop: Continuing in a well-established tradition, the OCaml Workshop will be held at the <a href="https://icfp25.sigplan.org/">2025 ICFP Conference</a>. With a more  academic focus, talks start out as papers which are then presented as part of the OCaml track.</li>
<li><a href="https://fun-ocaml.com/">The FUN OCaml Conference</a>: Returning for a second time in 2025, the new OCaml conference is organised by developers for developers and covers topics big and small.</li>
<li><a href="https://reason-ocaml.in/">Reason OCaml India Meetups</a>: A community of Reason and OCaml enthusiasts that organise regular in person and online meet-ups.</li>
</ul>
<p>There are of course many more events organised outside of those we host and sponsor, and some notable examples are <a href="https://confengine.com/conferences/functional-conf-2025">Functional Conf</a> and <a href="https://www.meetup.com/ocaml-paris/">OUPS</a>. We recommend that you check out the <a href="https://ocaml.org/events">Events</a> page to get an overview of all the great OCaml events you could join!</p>
<h2>The Best Ways to Learn OCaml: Tailored Training in OCaml</h2>
<p>In large organisational teams, relying on individual learning or the dissemination of knowledge from a few experts can be time-consuming and inconsistent. Tailored courses enable simultaneous training for the entire group, fostering collaboration in tackling new challenges. To make learning OCaml easier for groups, we’ve created courses to enhance team competence in the language. Here’s what our OCaml courses offer.</p>
<h3>Our Foundational Course <a href="/services/training/"><em>Starting with OCaml: An Introduction</em></a></h3>
<p>This course teaches core OCaml programming concepts through theory and practice, and has been designed to facilitate a smooth transition for engineers from other languages, including imperative programming. It covers OCaml’s history, design choices, and key functional programming concepts such as functions, recursion, and higher-order functions alongside critical concepts like tuples, the OCaml type system, type inference, pattern matching, and polymorphism.</p>
<p>The bulk of the course comprises several modules that introduce the practical building blocks of programming with OCaml. This includes methods: imperative and modular programming, data structures, error handling, interfacing with C, and command-line parsing. It also offers guidance on tooling: <code>opam</code>, Dune, VSCode, debugging and profiling, concurrent programming with <code>lwt</code>, and testing using expect tests and QuickCheck.</p>
<p>Finally, the course would not be complete without hands-on experience writing OCaml code. The modules on setting up the OCaml environment, building, documenting, and releasing an OCaml project culminate in an entire day spent on building an OCaml application from scratch. Participants will leave with a solid understanding of OCaml, the necessary tools for development, and practical application-building experience. Following this foundational course, your entire team will be familiar with the OCaml language and workflow.</p>
<h3>Our Higher-Level Course <a href="/services/training/"><em>Mastering OCaml: Advanced Techniques</em></a></h3>
<p>The advanced course is designed for experienced users and teams looking to enhance their skills. It consists of three sections: advanced OCaml techniques, advanced methods for common OCaml tools, and advanced tooling.</p>
<p>Advanced OCaml techniques make up the bulk of the course, including generalised algebraic data types (<a href="https://ocaml.org/manual/5.2/gadts-tutorial.html">GADTs</a>), multicore programming with OCaml 5, and transitioning from OCaml 4 to OCaml 5. Advanced methods for common OCaml tools cover using the  <a href="https://dune.build">Dune</a> build system, and <a href="https://ocaml.org/docs/metaprogramming">PPX preprocessors</a>.</p>
<p>Topics in advanced tooling include: the <a href="https://mirage.io">MirageOS</a> library operating system for secure, high-performance, network applications; web development with  <a href="https://ocsigen.org/js_of_ocaml/latest/manual/overview">JS_of_ocaml</a>, <a href="https://github.com/ocaml-wasm/wasm_of_ocaml">Wasm_of_ocaml</a>, and <a href="https://aantron.github.io/dream/">Dream</a>; concurrent programming with [Eio], and using <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">property-based testing</a> to evaluate programs. This course aims to equip teams already using OCaml with new skills and tools. An advanced course with deep topics, the course is tailored to focus on relevant areas for your team, taking them to the next level.</p>
<p>Finally, one of the biggest benefits we offer with our courses is <strong>customisation</strong>. We understand that teams have unique challenges and goals, and offer you the option to create a course that fits your needs. Not only can you pick a selection of topics from the <a href="/services/training/">advanced OCaml</a> modules, but you can completely tailor the training with a customised itinerary. This is negotiated on a per-instance basis, and you can schedule a <a href="/contact/">free consultation</a> to discuss what that could look like for your teams.</p>
<h2>Get in Touch</h2>
<p>Need help learning OCaml? Reach out to the community on <a href="https://mastodon.social/@ocaml@discuss.tchncs.de">Mastodon</a>, <a href="https://x.com/ocaml_org">X</a>, or <a href="https://www.linkedin.com/company/ocaml-org/">LinkedIn</a> to get helpful pointers or ask questions on <a href="https://discuss.ocaml.org">Discuss</a>. If you’re interested in our courses for your teams, you can <a href="/contact/">get a free consultation</a> to discuss whether the curriculum is right for you and sign up to our mailing list to receive the latest news.</p>
<p>We look forward to hearing from you, please don’t hesitate to get in touch!</p>
]]></description><link>https://tarides.com/blog/2024-12-18-learn-ocaml-the-easy-way-including-the-hard-bits</link><guid isPermaLink="false">https://tarides.com/blog/2024-12-18-learn-ocaml-the-easy-way-including-the-hard-bits.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 18 Dec 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Saturn 1.0: Data structures for OCaml Multicore]]></title><description><![CDATA[<p>The first version of the Saturn library is out! <a href="https://github.com/ocaml-multicore/saturn">Saturn</a> is a new OCaml 5 library available on <a href="https://opam.ocaml.org/">opam</a>, which offers a collection of well-tested, benchmarked, and efficient concurrent data structures ready to be used with OCaml Multicore. Access to concurrent-safe data structures saves developers from the time-consuming and often error-prone process of designing their own.</p>
<p>This post will give you an overview of the library, its main features, and some use cases. The team encourages you to try the data structures and share your feedback <a href="https://github.com/ocaml-multicore/saturn">in the repo</a> and on <a href="https://discuss.ocaml.org/">Discuss</a>. After you're finished with this post, I also recommend you read the <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/12/Saturn-a-library-of-verified-concurrent-data-structures-for-OCaml-5">paper on Saturn</a> from the OCaml workshop at ICFP 2024 if you want more details. Let’s dig in!</p>
<h2>What is Saturn?</h2>
<p>The Saturn repository is made up of one package, <code>saturn</code>, containing all lock-free data structures. You can use Saturn with OCaml 5.2 or later, and it can be installed from <code>opam</code> with the command <code>opam install saturn</code>.</p>
<p>Saturn covers many use cases, from simple stacks and queues to more complex structures, including skip lists, hash tables, work-stealing deques, and more. All of them have been adapted to be compatible with and take advantage of the OCaml 5 memory model. For example, the team behind Saturn had to rework the Michael-Scott queue to avoid memory leaks.</p>
<p>To improve performance, they introduced several micro-optimisations, including preventing false sharing, adding fenceless atomic reads where possible (improving performance on ARM processors), and avoiding extra indirection in arrays and atomics to reduce memory consumption. While optimising the library, their work highlighted some missing features in OCaml 5 and led to upstreamed improvements to the language, such as padded atomics and fixing a CSE bug.</p>
<h2>Why Saturn?</h2>
<p>Sharing data between multiple threads or cores is a well-known problem in computer science. The most obvious solution is to use a sequential data structure protected by a lock, but this approach can introduce performance overhead due to contention between locks. Liveness issues like deadlock, starvation, and priority inversion are also associated with locks in parallel programming.</p>
<p>The opposite approach is to use a lock-free implementation, relying on fine-grained synchronisation instead of locks. This approach benefits from higher performance and guarantees system-wide progress. However, it does come with its own set of bugs, including the ABA problem (which is largely mitigated in garbage-collected languages), data races, and unexpected behaviours as a result of non-linearisability.</p>
<p>In this quagmire of pitfalls, Saturn stands out as a source of reliable, tested data structures that the user can adopt for their own projects. Developers can pick from a variety of structures knowing that they are suitable and safe, saving them time and making OCaml 5 easier to use.</p>
<h3>Benchmarks</h3>
<p>The library is still relatively new, so more benchmarks are still to come, but there are some numbers to give you an idea of performance. You can find tables showing the throughput of different queues and stacks on single and multiple domains in the <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/12/Saturn-a-library-of-verified-concurrent-data-structures-for-OCaml-5">paper from the OCaml Workshop at ICFP 2024</a> about Saturn. The tests reveal that Saturn implementations either outperform or, in the case of simpler implementations, match the performance of non-Saturn structures. However, the benefit of using Saturn, even in cases where another simple implementation exists, is that the Saturn data structures have built-in optimisations and synchronisation safeguards, and it’s up to each individual developer to decide what that’s worth to them.</p>
<p>The library also has a benchmarking command <a href="https://github.com/ocaml-multicore/saturn/blob/main/bench/README.md"><code>make bench</code></a> that can be run from the root of the repository to run a standard set of benchmarks. It outputs in JSON and is intended to be consumed by <a href="https://bench.ci.dev/ocaml-multicore/saturn/branch/main/benchmark/default?worker=fermat&amp;image=bench.Dockerfile"><code>current-bench</code></a>.</p>
<h3>Tests</h3>
<p>Testing Saturn’s structures was a high priority for the team and crucial to ensure the safety, <a href="https://dl.acm.org/doi/10.1145/78969.78972#:~:text=Linearizability%20is%20a%20correctness%20condition,techniques%20from%20the%20sequential%20domain.">linearisability</a>, and lock-freedom where expected. Saturn was tested using two primary tools: DSCheck and STM. <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">STM</a> is used for both unit testing and linearisability. It can automatically generate random full programs using the API provided, and for Saturn, this is a data structure. It then executes the programs in parallel with two domains and checks all the results against the <code>post-conditions</code> of each function, providing unit testing. STM performs a sequence of random commands in parallel, records the results, and checks whether the observed results can be linearised and reconciled with some sequential execution.</p>
<p><a href="/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1/">DSCheck</a> is a model checker designed to compute all the possible interleavings of instructions between multiple domains and verify that each one returns the expected result. This method is useful for detecting bugs that only occur in rare circumstances which would otherwise be hard to catch. DSCheck can also be used to verify that a program is lock-free, as it will fail to terminate if any form of blocking is present in the program. The DSCheck implementation has been optimised to make the tests efficient even on Saturn’s more complex data structures.</p>
<h3>Formal Verification</h3>
<p>The verification work for Saturn is done at Inria, forming part of Clément Allain's PhD work there. Due to the notoriously finicky behaviours of lock-free algorithms, they have formally verified part of Saturn’s data structures and aim to keep going until they’ve covered the entire library. The main criterion for correctness in concurrent data structures is linearisability, which requires each operation in a data structure to appear to take effect instantaneously at some point during its execution (called the linearisation point) so that all linearisation points from all operations form a coherent, sequential history. The team at Inria are formally verifying that this linearisability is correctly present in Saturn's data structures.</p>
<p>To verify linearisability, they use the mechanised concurrent separation logic <a href="https://iris-project.org/">IRIS</a>, which has been used in the past to verify realistic data structures. All proofs are formalised in <a href="https://coq.inria.fr">Coq</a> and available on GitHub. They manually translated the original code from Saturn to a deeply embedded language in Coq, hoping to potentially automate the process in the future.  Work continues to verify data structures for Saturn, ensuring safe and predictable behaviours.</p>
<h2>Stay in Touch!</h2>
<p>To delve deeper into everything Saturn, I recommend you watch the <a href="https://www.youtube.com/live/OuQqblCxJ2Y?si=Z5eUhyNFUjWSI43v&amp;t=24398">ICFP 2024 presentation</a> from the OCaml Workshop. You can try Saturn for your projects via the <a href="https://github.com/ocaml-multicore/saturn">repo</a>, and don’t forget to share your feedback with the team!</p>
<p>Connect with us online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2024-12-11-saturn-1-0-data-structures-for-ocaml-multicore</link><guid isPermaLink="false">https://tarides.com/blog/2024-12-11-saturn-1-0-data-structures-for-ocaml-multicore.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 11 Dec 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Irmin on MirageOS: Under-the-Hood With Notafs]]></title><description><![CDATA[<p>In part one of our series on Notafs, we outlined the motivations and challenges behind creating a file system for <a href="https://mirage.io/docs/overview-of-mirage">MirageOS</a> tailored to the <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a> use case of storing large files on disk in satellites. Furthermore, Notafs can be used to run the <code>irmin-pack</code> backend of Irmin, which, in turn, provides users with an amazing file system for their MirageOS projects.</p>
<p>In this post, we delve into more detail concerning the design choices behind <code>Notafs</code> and provide some benchmarks and visualisations to give you a better understanding of the file system. Let's dig in!</p>
<h2>Big Design Decisions: Copy-on-Write</h2>
<h3>The Limits of File Systems</h3>
<p>In our previous post, we mentioned some of the expectations of file systems to resist corruption and data loss. Ensuring these conditions are met can be challenging since hardware exhibits many failure modes. For example, it is impossible to check if a disk has physically persisted a write operation. In order to improve performance, disks use RAM buffering to quickly store the write requests before committing them persistently (which is a much slower process).  Short of unplugging the disk and waiting for its internal battery and RAM to clear before rebooting, we can't know that the writes were executed and must be ready for the worst-case scenario: unwritten, partially written, or corrupted data regions on the disk! Writes could also be reordered, leading to consistency issues.</p>
<p>Furthermore, the way that disks are structured presents its challenges. Disks are organised into blocks of fixed sizes, like 512 bytes, 1 kb, or 4kb, depending on the model. A block is the smallest unit we can read or write, so even if we only want to update a few bytes on the disk, we still need to re-write the full surrounding block data. Different disk models can have additional constraints, such as:</p>
<ul>
<li>Hard drives that favour contiguous block reads and writes to ensure high performance since spinning their head to random locations is expensive.</li>
<li>SSD exhibits faster wear if one block is updated more often than the others, leading to a shorter device life than what's expected under uniform usage.</li>
</ul>
<h3>All About Copy-on-Write</h3>
<p>Considering these factors, Notafs was designed as a copy-on-write file system. This strong design principle means that when we update the file system, we never override a block in place but rather write the new data block in a new, unused location on the disk. Preserving the old block means that Notafs maintains access to a backup until the block is reused. If a write operation fails, for example, in case of a power outage in the middle of multiple block updates, Notafs can recover information from the old block.</p>
<p>This strategy does not impact performance since writing the new data in a new location or in place is virtually indistinguishable from the disk's perspective. From an implementation standpoint, copy-on-write is very similar to how purely functional data structures work to provide fast updates. Only modified data needs to be copied, so fast updates are still possible, as sharing unmodified content between revisions is free.</p>
<p>To implement the copy-on-write design, we needed to keep track of the unused free blocks where writes can be performed. This is akin to the challenge of implementing a memory allocator, but it is made slightly simpler because we only need to allocate fixed-size blocks. Our block allocator is backed by a queue containing the list of free block identifiers. After formatting a disk, the queue is (virtually) initialised to contain all block locations except a couple of reserved blocks. When a new block needs to be written to the disk, we pick a free block location from the queue where the write should take place. Filesystem operations are carefully implemented to free old blocks which aren't reachable anymore by pushing their block identifier location back onto the queue. Once the disk is full, these free blocks will eventually be reused to store new data. At the same time, old data is kept intact and can be used as a backup if needed. To avoid complications in determining if and when a block can be freed, we enforce strict linear ownership of blocks so that we do not need any reference counting or advanced garbage collection.</p>
<p>The queue of free blocks is part of the file system state and must be recoverable when we boot it. This is why the queue state itself is backed by a copy-on-write data structure, which is stored on disk! To ensure good performance, the block locations that have been freed during a single file system operation are sorted before being pushed to the queue. This process favours the allocation of contiguous block locations, leading to faster writes on hard drives. Consecutive block locations are also compressed in the queue by representing them as intervals instead of listing all the intermediate block locations.</p>
<p>When the disk gets full, the previously freed blocks will eventually be reused, storing new data. When that happens, the queue guarantees that we use the blocks that have been free for the longest time so that only the oldest backups become unrecoverable. However, since reusing blocks can lead to data consistency issues if some writes fail to be committed on disk, we needed a way to detect whether a block contains the expected latest data or the old partially-written/corrupted data. To this end, we use a checksum of the stored data to validate each block. We could have stored the checksum in the block next to the data it validates, but it would only have allowed for the detection of corrupted data and not for detecting whether the block's data is old since the old checksum remains valid. So, instead, we store the checksum in the parent block next to the block location it validates. When a parent block needs to read one of its child block's data, it uses its checksum to verify that its child data is as expected.</p>
<h3>Final Considerations</h3>
<p>There was one last issue for our team to tackle regarding the design: if everything is copy-on-write and the new filesystem root is written to a new block every time, how do we find the location of the latest root when the unikernel boots? We wanted to avoid scanning the whole disk in search of the latest root. Hence, when formatting a new disk, we reserved a fixed number of blocks at known locations. After each file system update, one of these blocks is updated in place to point to the latest root allocation in a round-robin fashion, with a generation counter to help our application determine which one was written last.</p>
<p>When the filesystem starts, it identifies the latest root from the reserved blocks and traverses all of its reachable data to check each block checksum and validate that the whole filesystem is in a consistent state. If an issue is detected, implying the disk wasn't shut down properly, Notafs will attempt to use the previous filesystem root found in the previous reserved block. This procedure effectively rolls back the failed write operation until a valid filesystem state is found (or the disk is deemed unrecoverable!). To avoid SSD wear from updating the reserved root blocks too often compared to other blocks, we have added an extra, dynamically allocated level of indirection to the previous explanations, which amortises the number of updates to the reserved blocks.</p>
<p>Finally, we implemented several optimisations to the above scheme to best use runtime resources. For example, we save disk space by representing block locations with as few bytes as possible, depending on the disk size. A 'Least-Recently-Used (LRU) cache keeps track of the latest reads and bufferises the incoming writes. This LRU enables Notafs to run with very low memory overhead, even when a file system operation would have required more RAM if another system had performed it.</p>
<p>We are very happy with this design because it combines simplicity with high data consistency guarantees. We used the setup to implement a copy-on-write rope data structure to represent files of any size with fast updates and read operations. We added a rudimentary hierarchical structure of folders and files on top to enable multiple file usage. This last part has not been optimised and is only suitable for applications with a low number of files. It will get slower the more files you have, but there are no hard limits.</p>
<h2>Benchmarks and Tests</h2>
<p>Since filesystem correctness is critical, we tested all steps of the project to validate our solution. Of special interest is the final verification that our team realised with the <a href="https://github.com/ocaml-gospel/gospel">Gospel specification language</a> using <a href="https://github.com/ocaml-gospel/ortac">Ortac</a> to translate the specification into a <a href="https://ocaml-multicore.github.io/multicoretests/">QCheck test suite</a> from the formal spec. By comparing the behaviour of our Notafs file system with an obviously correct in-memory reference model, we could stress-test our system by simulating a huge number of tests to detect discrepancies. Our experience with Gospel made a strong case for it being the future of documentation, with high-level specifications that are both mechanically verifiable and readable by end-users.</p>
<p>Ultimately, correctness alone was not enough to justify using Notafs in production if it did not meet our performance goals. We benchmarked Notafs in comparison to existing MirageOS file systems for the features that were critical for SpaceOS and Irmin support. Those features were the ability to write large files, to read back those files, and to read only a sub-range in those files (a feature required by <code>irmin-pack</code>).</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Notafs-benchmarks-170w~GGy6BO7qAC8VTQTNcR6sBA.webp 170w, /blog/images/Notafs-benchmarks-340w~0CPnOJbfrX12fyRVpRu5dQ.webp 340w, /blog/images/Notafs-benchmarks-680w~tTRunmNYMr9Du8jyZ6oBtA.webp 680w, /blog/images/Notafs-benchmarks-1360w~uaILlPW6mfmLZu5wyaqOxw.webp 1360w" src="/blog/images/Notafs-benchmarks-1360w~uaILlPW6mfmLZu5wyaqOxw.webp" alt="Notafs benchmarks in three graphs, comparing the speed of different file systems when it comes to writing and reading large files."></p>
<p>While the TAR file format will always be the fastest thanks to its simplicity, it comes with the limitation that files can't be removed or updated (hence eventually running out of disk space!). In general, the benchmarks confirm that Notafs meets its goal of handling large files efficiently. In particular, its careful use of memory with LRU enables Notafs to run the benchmarks on larger files than the alternative – for unikernels with restricted max memory usage. Note that benchmarks are always subject to interpretation; we did not benchmark the scenarios where we expect Notafs to perform the worst, e.g. with a large number of small files!</p>
<p>Our next image is actually a video showing Irmin running on top of a small Notafs disk. It combines everything we have explained so far: (epilepsy warning: the video displays flashing lights)</p>
<ul>
<li>The disk is divided into blocks of 1024-sized bytes, conveniently represented by 32x 32px squares. The blocks in colour contain live data used by the Irmin store, while the grey crossed-out ones are the free blocks. The colourful blocks create distinctive patterns that indicate the different types of data that Irmin uses to store the database.</li>
<li>As more Irmin operations occur during the test's execution, you can see the Notafs block allocator in action, circling around the disk to write new data to the oldest freed block.</li>
<li>At the bottom of the video, two histograms show the distribution of the reads and writes on each block to check that fair usage of each block is achieved.</li>
<li>Finally, we can see the Irmin garbage collector clearing its old history on a frequent basis to avoid running out of disk space.</li>
</ul>
<center>
<video autoplay="" loop="" style="width: 100%">
  <source src="/blog/images/notafs-video~zJZ_zyxrLnCkQb_phqW_0w.webm" type="video/webm">
</video>
</center> 
<p>This visualisation was made possible because MirageOS libraries can also be used on standard operating systems without requiring special compilation by a unikernel. In other words, Notafs also enables Irmin to be used as a single-file database (like <a href="https://www.sqlite.org/onefile.html">sqlite</a>, which might be of interest for some applications' workflows.  We want to do more work on this front to lift the limitation that the database size must currently be fixed in advance when formatting the disk.</p>
<h2>Conclusion</h2>
<p>Developing a new filesystem has been a very exciting project, and we hope that you'll enjoy the new options when developing unikernels with MirageOS:</p>
<ul>
<li><strong>Notafs:</strong> for applications which require a limited number of very large files (e.g. SpaceOS satellite pictures).</li>
<li><strong>Irmin on Notafs:</strong> for a general-purpose file system, including optimisations to support a large number of arbitrarily sized files and many advanced features (with the caveat that this version of Irmin requires OCaml 5, which is still experimental on MirageOS).</li>
</ul>
<p>We invite you to participate and try Notafs yourself! Please consult the <a href="https://github.com/tarides/notafs">open-source Notafs repository</a> for further information, and share your experience (good and bad!) with the team. You can stay in touch with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>.</p>
]]></description><link>https://tarides.com/blog/2024-12-04-irmin-on-mirageos-under-the-hood-with-notafs</link><guid isPermaLink="false">https://tarides.com/blog/2024-12-04-irmin-on-mirageos-under-the-hood-with-notafs.html</guid><dc:creator><![CDATA[ Arthur Wendling ]]></dc:creator><pubDate>Wed, 04 Dec 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Irmin on MirageOS: Introducing the Notafs File System]]></title><description><![CDATA[<p>We are pleased to announce one (or two) new filesystems for MirageOS! The motivation behind creating them is an exciting new use case requiring the system to store data on disk.</p>
<p><a href="https://mirage.io/docs/overview-of-mirage">MirageOS</a> allows you to compile OCaml applications into unikernels. By selecting the operating system functionalities required, a unikernel can be constructed for your application using only the necessary components. This process reduces the attack surface and increases the hardware efficiency of your final application. MirageOS unikernels can be deployed to various cloud and mobile platforms.</p>
<p>As a case in point, Tarides is developing <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a> to run unikernels on satellites, an industry where both security and performance are critical. While MirageOS has a long history of cloud usage and comes with advanced network capabilities, SpaceOS provides an interesting use case for disk storage. A possible use case for satellites is to take pictures from space and send them to Earth for analysis, and since satellites may not be able to send the pictures right away (due to their location, for example), storing high-resolution images on a disk is a must.</p>
<p>So, naturally, the question we posed ourselves was what MirageOS file system could we use for SpaceOS? And that started us down the journey to the new file system Notafs, that even lets you use Irmin on top!</p>
<h2>Why Create a New Filesystem?</h2>
<p>At the start, there were a couple of existing filesystems available for MirageOS that we needed to evaluate:</p>
<ul>
<li>The <a href="https://github.com/yomimono/chamelon">Chamelon</a> filesystem is designed to efficiently support a large number of very small files on a disk.</li>
<li>The <a href="https://github.com/dinosaure/docteur">Docteur</a> filesystem which provides read-only file compression.</li>
<li>The <a href="https://github.com/mirage/ocaml-fat/tree/main">FAT</a> filesystem, which provides support for the (old) standard <a href="https://en.wikipedia.org/wiki/File_Allocation_Table">FAT16</a> filesystem with restrictions on file sizes.</li>
<li>The <a href="https://github.com/mirage/ocaml-tar">TAR</a> file format, which has the limitation that users can only add new files: deletion or modification of existing files is not supported.</li>
</ul>
<p>All of the above libraries are well-tested and highly recommended if they fit your application needs! But, as we will illustrate in part 2 with our benchmarks, none of these systems satisfied the SpaceOS requirement to store large files.</p>
<p>Furthermore, using a conventional file system like the ones available on traditional operating systems like Linux and Windows was also not an option since they are closely tied to large operating system functionalities. This is because to ensure higher security, MirageOS libraries are built using the <a href="https://ocaml.org">OCaml programming language</a>, which provides <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">strong memory-safety guarantees</a>. While this is a fantastic design choice for the <a href="/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release/">future of computing</a>, it is harder to support a conventional filesystem (programmed in C/C++) natively on MirageOS. A unikernel based on that kind of technology would lose the appeal of a small software stack with high-security guarantees. In comparison, the Notafs solution fits into four thousand lines, which is a much more ‘human-sized’ project to review.</p>
<h2>The Challenges With Filesystems</h2>
<p>Implementing a filesystem is a complex task requiring compromises, which explains the lack of options for storing large files with the SpaceOS project. For example, even though file systems may appear simple, with an interface that everyone is familiar with, their critical main function is to protect applications from (recoverable) hardware defects. Unless the disk dies, the user should never lose or see corrupted data, even if a power outage was to interrupt the filesystem in the middle of an operation. In other words, a filesystem update should have transactional semantics: either the operation succeeds, or it does not, but applications should never observe an in-between broken state. Without this property, software built on top of the file system would be vulnerable to experiencing faults.</p>
<p>Tarides is well aware of the challenges involved with implementing a filesystem – we maintain the <a href="https://irmin.org">Irmin database</a>, which provides a hierarchical key-value store with high data consistency guarantees, git-inspired history for rollbacks, and distributed replication over the network. While the Irmin API resembles a filesystem, it is a full-blown database and provides many more functionalities. So rather than starting a new filesystem implementation from scratch, we asked ourselves whether we could reuse the Irmin database as a filesystem for MirageOS.</p>
<h2>Irmin as a Filesystem?</h2>
<p>Using Irmin for this purpose is not a new idea, and Irmin already provides multiple backends that allow users to run its databases on various platforms with different constraints. The <code>irmin-pack</code> backend was of particular interest for MirageOS support. We have spent years optimising its performance since this backend is notably used to <a href="/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed/">store the Tezos blockchain</a>, and it enables the freeing of disk space by <a href="/blog/2023-05-05-optimising-archive-node-storage-for-tezos/">truncating the database history to get rid of unnecessary old backups</a>.</p>
<p>Finally, the low-level implementation of <code>irmin-pack</code> is especially suited for a port to MirageOS, as it only requires support for a few large (append-only) files from the operating system.</p>
<p>This last technical requirement was especially relevant to the SpaceOS use case, where the satellite needs to be able to store high-resolution pictures on disk. If our custom-made file system supported large files, then we would be able to support SpaceOS and enable the general-purpose use of Irmin for MirageOS unikernels. This realisation still left us with the task of developing the foundations of a file system, but even just the bare functionalities would satisfy our use cases.</p>
<h2>Notafs is Born</h2>
<p>We called this new file system ‘Notafs’, which, as the name suggests, is not a general-purpose file system due to its focus on handling a few large files. It is designed to handle a small number of large files for Mirage block devices. It can, however, be used to run the <code>irmin-pack</code> backend, which gives users all the benefits of an Irmin filesystem for MirageOS. Together, running <code>irmin-pack</code> on Notafs lifts the limitations of the latter, and supports many file names, is optimised for small and large files, and includes a git-like history with branching and merging <a href="https://irmin.org/">just to name a few features</a>!</p>
<p>Navigating design restrictions is an excellent way to focus the efforts of a project. In our case, that meant directing them towards the correctness and performance of the few selected operations of <code>Notafs</code>. We didn’t need to implement advanced filesystem operations since missing functionalities could be provided by <code>irmin-pack</code>, including optimised management of many small files. As long as our underlying file system could provide fast operations on large files, the rest was taken care of!</p>
<p>From a unikernel developer’s perspective, <code>Notafs</code> provides an implementation of the <a href="https://ocaml.org/p/mirage-kv/latest/doc/Mirage_kv/index.html"><code>Mirage_kv</code></a> interfaces. This is the standard API for filesystem usage on MirageOS, so any existing unikernel can use it without needing to change its application code. We also provide a <a href="https://github.com/tarides/notafs?tab=readme-ov-file#notafs-cli">command-line interface</a> to simplify the formatting of disks and to allow for external inspection of file system contents of the unikernel. Check out this example of a <a href="https://github.com/tarides/notafs/tree/master/unikernel-kv">minimal setup of Notafs in MirageOS</a>.</p>
<p>While Notafs has restricted functionalities, it provides a nice developer experience and a useful alternative in the file system design space for MirageOS. We are especially proud of the safety guarantees that Notafs provides for your data stored on disk.</p>
<h2>Using Irmin on Notafs</h2>
<p>To enable users to run Irmin on MirageOS, we provide an implementation of the syscalls required by the <code>irmin-pack</code> backend on top of Notafs. This was straightforward as Notafs provides the functionalities required to run <code>irmin-pack</code>. Thanks to <a href="https://dev.realworldocaml.org/functors.html">OCaml functors</a>, the <code>irmin-pack</code> codebase was already structured for alternative syscall implementations.</p>
<p>One OCaml subtlety we encountered was handling asynchronous I/O: MirageOS and Notafs use the <code>Lwt</code> monad, while the Irmin syscalls expected a direct-style I/O implementation. This would have been an issue in the past, but we could bridge that gap using the <a href="https://ocaml.org/manual/5.2/effects.html">effect handlers that came with OCaml 5</a>. Note that OCaml 5 support is still experimental on MirageOS at the time of writing!</p>
<p>From a user perspective, using Irmin addresses the design limitations of Notafs (while keeping its desirable consistency properties) by adding efficient support for managing a large number of small files. Irmin is a much more general-purpose file system; on top of that, a user can build a unikernel application with many additional features. We’re hopeful that applications already using Irmin will consider this new bridge for targeting MirageOS unikernels.</p>
<h2>Until Next Time!</h2>
<p>Thank you for checking out the first part of this two-part series where we’ve introduced Notafs, the motivation behind the file system, the challenges, and its use cases. Look out for part two where we delve into the details behind Notafs’ design alongside benchmarks and visualisations of the file system in action.</p>
<p>Connect with us online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to stay updated on our latest projects. We also invite you to consult the <a href="https://github.com/tarides/notafs">open-source Notafs repository</a> for more information or to test out the file system yourself!</p>
]]></description><link>https://tarides.com/blog/2024-11-27-irmin-on-mirageos-introducing-the-notafs-file-system</link><guid isPermaLink="false">https://tarides.com/blog/2024-11-27-irmin-on-mirageos-introducing-the-notafs-file-system.html</guid><dc:creator><![CDATA[ Arthur Wendling ]]></dc:creator><pubDate>Wed, 27 Nov 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Advanced Code Navigation in OCaml-LSP]]></title><description><![CDATA[<h2>1. Introduction</h2>
<p>The <a href="https://microsoft.github.io/language-server-protocol/">Language Server Protocol (LSP)</a> defines a standardised protocol that facilitates communication between an editor (client) and a language server. It is developed by Microsoft and was designed to simplify the process of adding language support to different code editors by abstracting away the most common implementation details of the programming language.</p>
<p>OCaml-LSP, which relies on Merlin's amazing engine, is an implementation of the LSP protocol for the OCaml programming language.</p>
<h2>2. Code Navigation in LSP</h2>
<p>When it comes to navigating through code, the LSP protocol provides four ways to do this:</p>
<ul>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_declaration">Goto Declaration Request</a>: Resolves the declaration location of a symbol.</li>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_definition">Goto Definition Request</a>: Resolves the definition location of a symbol.</li>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_typeDefinition">Goto Type Definition Request</a>: Resolves the type definition location of a symbol.</li>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_implementation">Goto Implementation Request</a>: Resolves the implementation location of a symbol. Not supported in <code>ocaml-lsp</code>.</li>
</ul>
<p>While these goto requests provide general navigation for most programming languages, it falls short when dealing with complex syntax such as OCaml's modules, functors, etc., and being able to navigate to specific sections.</p>
<h2>3. Code Navigation in Merlin</h2>
<p>Merlin offers a range of commands that allow precise code navigation, including the ability to jump to specific constructs like functions, module definitions, and match cases. As earlier seen, generalised navigation in LSP is insufficient for precise movements. To solve this, Merlin introduces a specialised <a href="https://github.com/ocaml/merlin/blob/main/doc/dev/PROTOCOL.md#jump--target-string--position-position">Jump command</a> that allows users to jump to specific targets within their OCaml code.</p>
<p>The key targets include:</p>
<ul>
<li><code>fun</code>: Jumps to a function definition.</li>
<li><code>let</code>: Jumps to a let binding.</li>
<li><code>module</code>: Jumps to a module.</li>
<li><code>module-type</code>: Jumps to a module type definition.</li>
<li><code>match</code>: Jumps to a match construct.
<ul>
<li><code>match-next-case</code>: Jumps to the next case in a match statement.</li>
<li><code>match-prev-case</code>: Jumps to the previous case in a match statement.</li>
</ul>
</li>
</ul>
<h2>4. Custom Requests in LSP</h2>
<p>When standard LSP requests are insufficient, we have the possibility of writing custom requests to implement the functionality we want. The downside to this approach is that we lose native support in the various editors or clients and will also have to write the resulting client implementations for each custom request. Implementing precise navigation in OCaml-LSP using Merlin's Jump is a typical use for a custom request.</p>
<p>For this implementation, our focus was on being able to use the requests already available on LSP and use them in non-typical, innovative ways. With this in mind, our solution works by using a <code>CodeAction</code> in combination with a <code>ShowDocument</code> request to move the cursor to our desired position.</p>
<h2>5. Implementing Merlin's Jump in OCaml-LSP</h2>
<p>Code action requests are used to execute commands for a given text document and range. These commands are typically used to change the state of a document, such as code fixes and beautifying or refactoring code. In general, we use code actions to perform quick edits in code. It's a bit unusual to think of code actions as a navigation tool, but with some smart workarounds, we can indeed use code actions to move through code.
Here's how the feature works:</p>
<ul>
<li>
<p><strong>a) Document Detection</strong>: When a source file is opened in the editor, the OCaml-LSP server checks if it is an OCaml file supported by Merlin. If the document is incompatible, the "Jump to Target" functionality is not provided.</p>
</li>
<li>
<p><strong>b) Client Capability Check</strong>: The LSP allows servers to query what features a client (the editor) supports. For "Jump to Target" to work, the server checks if the client supports the <code>ShowDocument</code> capability, which is necessary to move the cursor to the correct location.</p>
</li>
<li>
<p><strong>c) Generating CodeActions</strong>: The client begins by sending a request for all valid code actions:</p>
</li>
</ul>
<pre><code><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Trace</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">06</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">21</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">32</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sending</span><span class="ocaml-source"> </span><span class="ocaml-source">request</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'textDocument</span><span class="ocaml-keyword-operator">/</span><span class="ocaml-source">codeAction</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">71</span><span class="ocaml-source">)</span><span class="ocaml-source">'</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Params</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">textDocument</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">uri</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">file:///.../test.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">range</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">start</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">end</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">context</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">diagnostics</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-list">[]</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">triggerKind</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>The server receives this request, and for each target (fun, let, match, module, etc.), it asynchronously queries Merlin's Jump command. It assembles the possible jumps into a list of code actions and returns this list back to the client.</p>
<pre><code><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Trace</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">06</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">21</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">32</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Received</span><span class="ocaml-source"> </span><span class="ocaml-source">response</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'textDocument</span><span class="ocaml-keyword-operator">/</span><span class="ocaml-source">codeAction</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">71</span><span class="ocaml-source">)</span><span class="ocaml-source">' </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> 7ms</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Result</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">command</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">arguments</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">file:///.../test.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">                    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">end</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">                        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">                        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">9</span><span class="ocaml-source">
</span><span class="ocaml-source">                    </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">                    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">start</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">                        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">                        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">9</span><span class="ocaml-source">
</span><span class="ocaml-source">                    </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">]</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">command</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ocamllsp/merlin-jump-to-target</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">title</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Let jump</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">kind</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">merlin-jump-let</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">title</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Let jump</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>Given that code actions are a standard LSP functionality, we benefit from an already existing pool of client implementations, so the clients display the codeactions returned by the server without any coding or customisation.
Each CodeAction includes a:</p>
<ul>
<li>
<p><em>Title</em> (e.g., "Jump to function").</p>
</li>
<li>
<p><em>Command</em> (ocamllsp/merlin-jump-to-target): A command that is executed when the code action is selected.</p>
</li>
<li>
<p><em>Kind</em>: This parameter distinguishes various code actions and can be used to group similar code actions or differentiate them especially for use with keybindings.</p>
</li>
<li>
<p><strong>d) Executing CodeAction Commands</strong>: When a user clicks or selects a specific code action, the command associated with that code action is executed.
The client sends an <code>executeCommand</code> request to the server.</p>
</li>
</ul>
<pre><code><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Trace</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">06</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">37</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sending</span><span class="ocaml-source"> </span><span class="ocaml-source">request</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'workspace</span><span class="ocaml-keyword-operator">/</span><span class="ocaml-source">executeCommand</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">73</span><span class="ocaml-source">)</span><span class="ocaml-source">'</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Params</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">command</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ocamllsp/merlin-jump-to-target</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">arguments</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">file:///.../test.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">end</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">9</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">start</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">9</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>This <code>executeCommand</code> request triggers the server to ask the client to perform a <code>showDocument</code> request using the document URI and location range received from the code action.</p>
<pre><code><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Trace</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">06</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">37</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Received</span><span class="ocaml-source"> </span><span class="ocaml-source">request</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'window</span><span class="ocaml-keyword-operator">/</span><span class="ocaml-source">showDocument</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">6</span><span class="ocaml-source">)</span><span class="ocaml-source">'</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Params</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">selection</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">end</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">9</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">start</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">character</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">line</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">9</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">}</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">takeFocus</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">uri</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">file:///.../test.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>Both <code>executeCommand</code> and <code>showDocument</code> are standard LSP requests, meaning all clients that already have LSP support can automatically perform this jump.</p>
<ul>
<li><strong>e) Error Handling</strong>: In cases where Merlin fails to find a valid target (e.g., the target does not exist or the position is invalid), the server gracefully handles the error by omitting the specific code action for that particular target, ensuring a smooth user experience.</li>
</ul>
<h2>6. Feature Demonstration</h2>
<p>Below is a demonstration showing the various code actions which perform navigation.</p>
<p><img src="/blog/images/merlin-jump~3HRsyh_aCuRziN8_3Uil4g.gif" alt="Merlin Jump code actions"></p>
<h2>7. Conclusion</h2>
<p>Set for release in early December 2024, Merlin's Jump functionality in OCaml-LSP brings precision navigation into the LSP world, enhancing OCaml code navigation capabilities across various editors. By using a combination of existing LSP requests, the feature is implemented in a way that is client-agnostic and fully compatible with LSP, ensuring a smooth and efficient user experience for developers working with OCaml.</p>
<p>For developers working on large or complex OCaml projects, this precise navigation capability will significantly streamline workflows, enabling quick and direct navigation to relevant parts of their code.</p>
<p>While implementing Merlin's Jump as a code action offers a broad client-agnostic solution, there are several drawbacks to this approach. First, it can clutter the code action lists, making them noisy for users who are used to a more streamlined selection. Since code actions are typically used for refactoring or fixing issues, adding navigation-related actions here is not an expected usage of code actions. Additionally, binding these navigation actions to shortcuts becomes harder, as code actions do not naturally lend themselves to precise key bindings for navigation.</p>
<p>To overcome these limitations, one potential solution is to use a custom LSP request, which would allow navigation functionality without overloading the code action list. We have a an <a href="https://github.com/ocaml/ocaml-lsp/pull/1374">open PR</a> on <code>ocaml-lsp</code> repository which introduces this custom request. Alternatively, a client-side tool like <code>tree-sitter</code> could be used to parse the code and generate the necessary jump targets directly in the client, offering a more flexible and efficient solution tailored to the user's editor setup.</p>
<h2>8. Resources</h2>
<ul>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#window_showDocument">LSP Protocol - ShowDocument Request</a></li>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#workspace_executeCommand">LSP Protocol - ExecuteCommand Request</a></li>
<li><a href="https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_codeAction">LSP Protocol - CodeActions Request</a></li>
<li><a href="https://github.com/ocaml/merlin/issues">Issue Tacker - Merlin</a></li>
<li><a href="https://github.com/ocaml/ocaml-lsp/issues">Issue Tacker - OCaml-LSP</a></li>
</ul>
]]></description><link>https://tarides.com/blog/2024-11-20-advanced-code-navigation-in-ocaml-lsp</link><guid isPermaLink="false">https://tarides.com/blog/2024-11-20-advanced-code-navigation-in-ocaml-lsp.html</guid><dc:creator><![CDATA[ Pizie Dust ]]></dc:creator><pubDate>Wed, 20 Nov 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[The New Conference on the Block: What is FUN OCaml?]]></title><description><![CDATA[<p><a href="https://fun-ocaml.com/#event">FUN OCaml</a> describes itself as a "two day open-source hacking event dedicated to OCaml enthusiasts and developers around the globe". This past September, 70+ developers descended upon a coworking space in Berlin, Germany, to learn, socialise, and hack together. Some had travelled from as far away as Brazil and the United States to join the event in person, and some combined it with <a href="https://reactalicante.es">React Alicante</a> and <a href="https://icfp24.sigplan.org">ICFP</a> for a summer filled with functional programming.</p>
<p>As a key sponsor - alongside <a href="https://ahrefs.com">Ahrefs</a> and <a href="https://www.janestreet.com">Jane Street</a> - Tarides is striving to make OCaml more accessible. I have had the pleasure of interviewing one of the head organisers Sabine Schmaltz about her experience at the conference, the aim of the event, and the future of FUN OCaml. Let's dive in!</p>
<h2>Why FUN OCaml</h2>
<p>FUN OCaml began as a small idea that grew thanks to great teamwork; as Sabine recalled, "David Sancho from Ahrefs was thinking of organising an online conference and had already done a lot of work to plan the talks, etc. I reached out to explain that I was planning an in-person conference and asked whether we could join forces – and he immediately said yes!" It's great to see collaboration across companies benefiting the community, and several additional volunteers would join the organising committee as the conference drew nearer.</p>
<p>I was curious to know what motivated Sabine to organise the event, and she explained that:</p>
<blockquote>
<p>"I felt that OCaml was missing a bigger event that was targeted uniquely at developers. FUN OCaml gives them a space to teach each other things and build connections, bringing maintainers and contributors together. FUN OCaml is organised by developers for developers, covering things they are interested in on a peer-to-peer basis".</p>
</blockquote>
<p>She was inspired by <a href="https://zfoh.ch/zurihac2024/">ZuriHac</a>, a yearly Haskell event in Zurich that she has attended before. "It has a great spirit; people go there to hack and hang out by the lake. It's a very social event with many different rooms and discussions". Inspired by ZuriHac, Sabine adopted an alternative approach to scheduling for FUN OCaml in comparison to traditional conferences.</p>
<blockquote>
<p>"We didn't stop at 6 pm and send everyone away! We stayed open until midnight on the first day and at 10 pm on the second. I wanted to provide a space in the evenings for activities, whether OCaml-related or not. Letting people put faces to names and form relationships was important because it's crucial for collaboration".
The evenings were filled with board games, hacking, hanging out, and karaoke – creating a space for socialisation and mingling. Sabine pointed out that when conferences close after their programs end, people generally stick to groups of people they already know to hang out with in the evening, which prevents them from making new connections. Consequently, creating a social space for everyone was a high priority for the organisers and a core motivation behind the conference.</p>
</blockquote>
<h2>Talks, Workshops, and More!</h2>
<p>So, what was a day at FUN OCaml like? Both days ran a track for talks and a track for workshops. The talks were live streamed and are [available online for you to watch]. Recording them meant that participants who went to a workshop could catch up on the ones they missed after the fact. The talks covered <a href="https://mirage.io">MirageOS</a>, GADTs, <a href="https://github.com/ocaml/odoc"><code>odoc</code></a>, type engineering, <a href="https://github.com/ahrefs/ocannl">OCANNL</a>, and learning OCaml with Tiny Code Xmas, just to name a few! It was a great mix of more and less technical topics. Sabine recalled the software engineer and popular live streamer Dillon Mulroy's presentation: "Dillon gave a great talk on his journey to OCaml, like a personal account of how he came to OCaml and the OCaml community. It was great to see a Typescript developer and public figure in the developer community talk about OCaml".</p>
<p>Dillon commented on his time at FUN OCaml:</p>
<blockquote>
<p>“FUN OCaml was my favorite conference I've ever attended. I finally got to meet so many friends and peers that I've met online over the past year or two and got so much value out of collaborating with them in person. I can't wait for next year 🐫”</p>
</blockquote>
<p>The workshop track was a valuable addition that gave participants an alternative way of engaging with OCaml: "People really enjoy the hands-on approach and the rare opportunity to hack on projects together with their maintainers". The workshops covered a range of topics such as a beginner's introduction to OCaml, building reliable actor systems, concurrency and parallelism, <a href="https://mirage.io">MirageOS</a>, creating 2D games in OCaml, and web and mobile app development. "I heard from the host of the game engine workshop, Émile, that people built many cool things, including a minesweeper game!"</p>
<h2>FUN OCaml 2025 and Running Your own OCaml Meet-up or Conference</h2>
<p>To wrap up, I was curious about what Sabine had planned for the future, including whether there were any plans for another FUN OCaml. I am pleased to report that FUN OCaml is on track for 2025!</p>
<blockquote>
<p>"This year, we had to schedule FUN OCaml for September to make it work with ICFP and REACT Alicante, but that meant it fell outside of the semester break for many students. Next time, we would like to schedule FUN OCaml between semesters so as many students as possible can come. They would benefit from hacking on open-source projects and getting feedback from maintainers. Software in the wild!"</p>
</blockquote>
<p>We want to see more OCaml events around the world! You can organise your own conference or meet-up to bring more OCaml to your neck of the woods. Sabine is happy to help any member of the OCaml community looking to create events, and you can contact <a href="https://x.com/sabine_s_">Sabine on X</a>, <a href="https://bsky.app/profile/sabine.sh">on Bluesky</a>, or <a href="mailto:sabine(at)tarides.com">by email</a>.</p>
<h2>Until Next Time!</h2>
<p>Connect with us online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to hearing from you and hope to see you around the OCaml universe!</p>
]]></description><link>https://tarides.com/blog/2024-11-13-the-new-conference-on-the-block-what-is-fun-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2024-11-13-the-new-conference-on-the-block-what-is-fun-ocaml.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 13 Nov 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Making OCaml Mainstream: Support Our Open Source Work on GitHub]]></title><description><![CDATA[<p>We are steadfast OCaml advocates, <a href="/blog/2024-06-19-keeping-up-with-the-compiler-how-we-help-maintain-the-ocaml-language/">providing core maintenance</a>, <a href="/blog/2024-05-01-we-host-our-first-ocaml-retreat-in-india/">hosting community events</a>, and <a href="/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-life/">bringing groundbreaking new features to the language</a>. We choose OCaml because it is a programming language with a unique combination of strengths, combining an expressive syntax and strong type system with memory safety and the power of multicore programming.</p>
<p>OCaml’s success is the combined result of the innovative efforts, passion, and dedication of the people who contribute to its development. The open-source community behind OCaml works collaboratively and transparently to make OCaml faster, safer, and easier. This community is made up of individual developers, research groups, and companies all working together. Tarides is one of the companies that contribute extensively to OCaml, and several of our team members are core OCaml developers.</p>
<p>You can support our open-source work by becoming a sponsor on our <a href="https://github.com/sponsors/tarides">GitHub page</a>. Your contribution will have a direct impact on the OCaml ecosystem by helping us maintain, develop, and improve its libraries, features, and tools. Let’s show you what we’re working on and what your support will help us achieve for OCaml!</p>
<h2>What Does Tarides do?</h2>
<p>At Tarides, we’re passionate about making OCaml even more powerful and accessible for developers. Our work focuses on three key areas: Enhancements, maintenance, and accessibility. Your contribution will support our work in all these areas.</p>
<ol>
<li>Enhancing the Language: We bring new features to OCaml by resolving developer pain points and testing performance on different platforms. We want OCaml to be a competitive programming language with developer experience a high priority.</li>
<li>Maintaining Core Tools and Libraries: We ensure that OCaml developers have a reliable foundation for their projects by keeping the tools and libraries they depend on up to date.</li>
<li>Community Support and Outreach: We prioritise clear documentation and tutorials to improve accessibility and regularly organise events that allow community members to meet and exchange ideas.</li>
</ol>
<h2>What Projects Does Tarides Work on?</h2>
<p>We work on a broad range of projects targeting different parts of the OCaml ecosystem: Compiler and language tools, development tools, community and infrastructure, and advanced projects.</p>
<h3>Compiler and Language Tools</h3>
<ul>
<li><a href="https://github.com/ocaml/ocaml">OCaml Compiler</a>: Long-term maintenance of the compiler alongside several collaborators, as well as feature development and enhancements.</li>
<li><a href="/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations/">OCaml 5</a>: We’re driving the OCaml 5 release with Multicore support and effect handlers.</li>
<li><a href="https://github.com/ocsigen/js_of_ocaml">Js_of_ocaml</a> and <a href="https://github.com/ocaml-wasm/wasm_of_ocaml">wasm_of_ocaml</a>: Run OCaml code in your browser!</li>
</ul>
<h3>Development Tools</h3>
<ul>
<li><a href="https://ocaml.org/docs/platform">OCaml Platform</a>: Core OCaml tools, ensuring availability and compatibility with new compiler releases.</li>
<li><a href="https://marketplace.visualstudio.com/items?itemName=ocamllabs.ocaml-platform">VSCode extension</a>: Support and development of the OCaml Platform VSCode editor extension.</li>
<li><a href="https://ocaml.org/packages"><code>opam</code></a>: The OCaml package manager, its tools, and plugins.</li>
<li><a href="https://github.com/ocaml/dune">Dune</a> and <a href="https://preview.dune.build/">Dune Developer Preview</a>: The OCaml build system.</li>
<li><a href="https://github.com/ocaml/merlin">Merlin</a> and <a href="https://github.com/ocaml/ocaml-lsp">OCaml-LSP</a>: A modern IDE for OCaml.</li>
<li><a href="https://github.com/ocaml/odoc"><code>odoc</code></a>: A documentation generator for OCaml.</li>
<li><a href="https://github.com/ocaml-ppx/ocamlformat">OCamlFormat</a>: Formatting for OCaml code.</li>
</ul>
<h3>Community and Infrastructure</h3>
<ul>
<li><a href="http://ocaml.org/">OCaml.org</a>: OCaml’s home on the web, the central knowledge base where the community can connect, access resources, and get the latest OCaml news.</li>
<li><a href="https://github.com/ocaml/infrastructure">OCaml Infrastructure</a>: Maintenance of various parts of the <a href="http://ocaml.org/">OCaml.org</a> and <code>opam</code> infrastructures.</li>
<li>OCaml Package Ecosystem: Maintenance of the growth and quality of the OCaml package ecosystem, including <a href="https://github.com/ocaml/opam-repository">opam-repository</a> and related <a href="https://github.com/ocurrent/opam-repo-ci">opam-repo-ci</a>, <a href="https://github.com/ocurrent/ocaml-ci">OCaml-CI</a> as well as <a href="https://github.com/ocurrent/opam-health-check">opam-health-check</a> for <a href="https://check.ci.ocaml.org">Linux</a>, <a href="https://freebsd.check.ci.dev/">FreeBSD</a> and <a href="https://windows.check.ci.dev/">Windows</a> as public services.</li>
</ul>
<h3>Advanced Projects</h3>
<ul>
<li><a href="https://github.com/ocaml-multicore/eio">Eio</a>: A modern, effect-based I/O library for OCaml designed to provide a high-level, structured concurrency model.</li>
<li><a href="https://mirage.io/">MirageOS</a>: An operating system that constructs unikernels for secure, high-performance applications across various cloud computing and mobile platforms.</li>
<li><a href="https://irmin.org/">Irmin</a>: Distributed data stores based on distributed version-control systems.</li>
</ul>
<h2>Contribute to the Future of OCaml Today!</h2>
<p>By sponsoring us, you’ll support the maintenance of these essential OCaml tools and libraries and contribute to the growth of a diverse, dynamic community. Your contribution will have a direct impact on our projects and the OCaml ecosystem as a whole.</p>
<p>Please reach out to us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2024-11-06-making-ocaml-mainstream-support-our-open-source-work-on-github</link><guid isPermaLink="false">https://tarides.com/blog/2024-11-06-making-ocaml-mainstream-support-our-open-source-work-on-github.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 06 Nov 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Making Crypto Safer: Introducing the ARGOS Project]]></title><description><![CDATA[<p>The world of cryptocurrency and internet transactions is constantly evolving, and the changing landscape of technologies that support this growing industry often means that legislation struggles to keep up, allowing cybercriminals to exploit loopholes and take advantage of the lack of oversight. Concerted effort across the sector is necessary to address this vulnerability.</p>
<p>To that end, we are thrilled to announce the ARGOS project, which stands for <em>Analyse et Représentation des Graphes des Opérations Suspectes</em> or the analysis and graphical representation of suspicious operations. Together with <a href="https://www.functori.com">Functori</a>, the LMF lab at <a href="http://www.universite-paris-saclay.fr/en">Paris Saclay</a>, and the Lip6 lab at <a href="https://www.sorbonne-universite.fr/en">Sorbonne University</a>, Tarides is creating a platform for blockchain analysis. ARGOS will facilitate the monitoring and tracing of individual transactions on the blockchain, helping both nations enforce the law and organisations comply with regulation.</p>
<h2>The Current Problem</h2>
<p>The financial crimes perpetrated using crypto and the blockchain include money laundering, extortion, fraud, and funnelling funding for terrorist activities. Cybercriminals take advantage of the same traits that make crypto attractive to legitimate users: its decentralisation, dynamism, and versatility. Further exacerbating the problem, these same traits make it difficult for nations to legislate and enforce rules like they do for traditional financial institutions like banks.</p>
<p>This is because, rather than employing the traditional methods of identification, authentication, and document verification by a central authority, the blockchain uses an 'authorisation by default' approach to validate the exchange itself. Even though existing safeguards exist, which make it possible for police and the judicial system to determine the individuals associated with a transaction, they are not enough to effectively tackle cybercrime. What is needed is a way to identify, analyse, and track suspicious transactions and their sources at scale.</p>
<h2>The ARGOS Solution</h2>
<p>Several proposed solutions to the situation exist, but many of them can't address the breadth and complexity of the challenge or/and are not designed for the European financial system. Existing regulations empower nations to pose specific requirements on cryptocurrency providers and traders, and in Europe, that includes <a href="https://www.adan.eu/en/publication/the-french-regulatory-framework-for-markets-in-crypto-assets/">PACTE</a> for France. Regulations impose rules on how the blockchain interacts with traditional financial institutions and the obligations that providers have including authenticating clients, freezing accounts, and identifying cryptographic addresses.</p>
<p>ARGOS will be a French solution custom-created to comply with and enforce French and European financial market regulations. It is designed to be <strong>lightweight</strong> enough not to place an undue burden on the existing ecosystem of blockchain developers, cryptocurrency providers, and traders, <strong>efficient</strong> enough to handle large codebases and complex systems, and <strong>robust</strong> enough to catch criminal activity reliably.</p>
<h3>What is ARGOS?</h3>
<p>The success of ARGOS rests on the key characteristics of the project, which are as follows:</p>
<ul>
<li><strong>Data Sovereignty:</strong> ARGOS will be designed and developed in France in compliance with the best practices and regulations of the European market. This provides additional security and privacy guarantees for European users, who do not have to share sensitive information with non-European parties.</li>
<li><strong>Open Source:</strong> Being open-source is a fundamental aspect of the ARGOS project, providing the high levels of transparency required to facilitate collaboration across organisations. The project will be published under an open-source licence, ensuring the entire platform (including the source code, tests, etc.) is available to the community for validation and scrutiny. This transparency is important for users to feel confident in the platform and to encourage innovation in its ecosystem.</li>
<li><strong>Large-Scale Data Management:</strong> We will use and adapt large-scale data management techniques to handle the big datasets characteristic of the blockchain sector. Our focus will be on efficiency and performance, and we will undertake R&amp;D work to develop new solutions that suit our use case.</li>
<li><strong>Suspicious-Pattern Detection:</strong> We plan to implement data processing mechanisms like <a href="https://www.sonarsource.com/blog/what-is-taint-analysis/#:~:text=Taint%20analysis%20identifies%20every%20source,you%20do%20anything%20with%20it.">taint analysis</a> and pattern detection to track the movement of digital assets. Machine learning should also greatly improve the quality of the results we obtain.</li>
<li><strong>In-Depth Analysis of Smart Contracts:</strong> Our system will be able to analyse smart contracts and understand the paths of nested interactions to determine whether a series of operations is suspicious.</li>
<li><strong>Blockchain Bridge Traceability:</strong> Our solution will account for transactions across blockchain bridges, ensuring that ARGOS doesn't lose track of them and remains global (not limited to one blockchain).</li>
<li><strong>Customisable User Interface:</strong> ARGOS will provide an intuitive and customisable user interface that can be adapted to suit each user's individual needs. It's important that our solution is accessible, easy to use, and integrates with existing tools. For example, a Digital Asset Service Provider (DASP) (so-called Prestataire de Service sur Actifs Numériques or PSAN in French) should be able to use it in combination with the monitoring and cryptographic tools they already employ with their own users.</li>
<li><strong>Adapted Analysis Reports:</strong> ARGOS will offer several analytics report tools tailored to the specific needs of different users.</li>
</ul>
<h2>What Does the Future Hold?</h2>
<p>The project is in its early stages, and we are collaborating closely with our partners to research and develop new approaches and technologies. Look out for more updates on the project coming in the future!</p>
<p>Connect with us online on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to stay updated on our latest projects.</p>
]]></description><link>https://tarides.com/blog/2024-10-30-making-crypto-safer-introducing-the-argos-project</link><guid isPermaLink="false">https://tarides.com/blog/2024-10-30-making-crypto-safer-introducing-the-argos-project.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 30 Oct 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Looking Back on our Experience at ICFP!]]></title><description><![CDATA[<p>It has been a while since the biggest functional programming event of the year: <a href="https://icfp24.sigplan.org/">ICFP 2024</a> in Milan, Italy. The annual conference, sponsored by <a href="http://www.acm.org/">ACM</a> <a href="https://www.sigplan.org/">SIGPLAN</a>, combines world-class talks with workshops on some of the biggest functional programming languages, including <a href="https://en.wikipedia.org/wiki/ML_(programming_language)">ML</a>, <a href="https://www.haskell.org/">Haskell</a>, <a href="https://ocaml.org/">OCaml</a>, <a href="https://www.scheme.org">Scheme</a> and <a href="https://www.erlang.org/">Erlang</a>.</p>
<p>Tarides are co-sponsors of ICFP, and several of our team members attend each year. This year, we had <a href="/blog/2024-08-30-the-biggest-functional-programming-conference-of-the-year-what-are-we-bringing-to-icfp/">six talks given by Tarides team members</a> and many of our colleagues chose to go as participants. I have asked several of them to share their experience at this year's ICFP to give you a taste of what the conference is all about.</p>
<h2>The Tarides Talks</h2>
<p>Before we get started, let's recap the talks created or co-created by Tarides engineers and where you can watch them after the fact.</p>
<ul>
<li><strong><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/14/First-Class-Windows-Building-a-Roadmap-for-OCaml-on-Windows">First-Class Windows: Building a Roadmap for OCaml on Windows</a>:</strong> A talk introducing the project seeking to make the developer experience with OCaml on Windows as good as it is on macOS and Linux. <a href="https://www.youtube.com/live/OuQqblCxJ2Y?si=wWzuSV9uJQkdbIq0&amp;t=13527">Watch the first-class Windows talk here</a>.</li>
<li><strong><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/10/Opam-2-2-and-beyond">Opam 2.2 and Beyond</a>:</strong> Gives background to the recent <code>opam</code> release from the perspective of its core maintainers.  <a href="https://www.youtube.com/live/OuQqblCxJ2Y?si=sre5ToVH0uuCPkUK&amp;t=27421">Watch the <code>opam</code> 2.2 talk here</a>.</li>
<li><strong><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/5/Picos-Interoperable-effects-based-concurrency">Picos – Interoperable Effects-Based Concurrency</a>:</strong> Covers the ongoing project to create an interface between effects-based schedulers and concurrency abstractions. <a href="https://www.youtube.com/live/OuQqblCxJ2Y?si=jUaZixua3xb6k7jz&amp;t=20132">Watch the Picos talk here</a>.</li>
<li><strong><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/3/Project-wide-occurrences-for-OCaml-a-progress-report">Project-Wide Occurrences for OCaml, a Progress Report</a>:</strong> Describes the design behind the first iteration of improved search features available with editor tooling. <a href="https://www.youtube.com/live/OuQqblCxJ2Y?si=tCBfDNU4sBctXpv9&amp;t=10943">Watch the project-wide occurrences talk here</a>.</li>
<li><strong><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/12/Saturn-a-library-of-verified-concurrent-data-structures-for-OCaml-5">Saturn: A Library of Verified Concurrent Data Structures for OCaml 5</a>:</strong> This talk covers Saturn, a new library of well-tested, benchmarked, partially verified concurrent data structures. <a href="https://www.youtube.com/live/OuQqblCxJ2Y?si=4ztpRF2kl-9dcDs2&amp;t=24397">Watch the Saturn talk here</a>.</li>
<li><strong><a href="https://icfp24.sigplan.org/details/mlworkshop-2024-papers/10/Wasm_of_ocaml">Wasm_of_ocaml</a>:</strong> Presents the work done on the Js_of_ocaml fork which translates OCaml bytecode to Webassembly. <a href="https://www.youtube.com/live/KLWiEf3x3kc?si=IssHoUK5TdvBXzbd&amp;t=26981">Watch the Wasm_of_OCaml talk here</a></li>
</ul>
<h2>Experience Reports</h2>
<p>Curious about ICFP or missing the action already? I've asked several of my fellow Tarides team members who attended ICFP this year to share their thoughts and experiences. Let's dive in!</p>
<h3>Why ICFP?</h3>
<p>ICFP stands out to functional programming enthusiasts for gathering a large community of like-minded people in one place. As Jan Midtgaard puts it, "it's a wonderful mix of academics and industry people gathering due to a common interest in functional programming". KC Sivaramakrishnan has been attending ICFP since 2009 when he was a PhD student, and "ICFP is now the place where I meet friends and collaborators as well as the future generation of functional programming researchers".</p>
<p>Another reason behind the conference's enduring popularity is, of course, the packed schedule full of high-quality talks. The different tracks offer a variety of opportunities for participants to explore topics that interest them. Ambre Suhamy comments, "I know for sure that I will learn something new and come back from ICFP with more knowledge".</p>
<h3>Stand-Out Talks</h3>
<p>On the topic of talks, my colleagues attended several across different days and tracks. I asked them to share some of their impressions from the presentations, and while there were far too many to include them all, let's check out some of what they had to share!</p>
<ul>
<li><a href="https://icfp24.sigplan.org/details/icfp-2024-papers/10/Functional-Programming-in-Financial-Markets-Experience-Report-">ICFP Paper: Functional Programming in Financial Markets</a> and <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/11/Recursion-schemes-in-OCaml-An-experience-report">OCaml Workshop: Recursion schemes in OCaml</a>: Tim McGilchrist really enjoyed these two talks, the first focussing on Standard Chartered Bank's use of typed functional programming (Haskell) across their entire tech stack, and the second on Bloomberg's use of OCaml modelling bilateral financial contracts. "An interesting experience report for recursion schemes at Bloomberg. I'd previously only seen this implemented in Haskell!"</li>
<li><a href="https://icfp24.sigplan.org/details/icfp-2024-papers/19/Oxidizing-OCaml-with-Modal-Memory-Management">ICFP Papers: Oxidising OCaml</a>:  To KC Sivaramakrishnan, this was a "technical highlight". The talk centred around the potential for optimising heap allocations in OCaml without compromising on safety guarantees. "The modal types work aims to get the best of Rust into OCaml without letting go of what OCaml is good for – and the Rust-like features are opt-in!"</li>
<li><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/10/Opam-2-2-and-beyond">OCaml Workshop: Opam 2.2</a>: Both Riku Silvola and Jan were impressed by this presentation on the new <code>opam 2.2</code> release which notably brought native Windows support among other features. "That Raja Boujbel from OCamlPro, Kate Deplaix from Ahrefs (and the OCaml Software Foundation), and David Allsopp from Tarides gave a joint <code>opam 2.2</code> talk sent a wonderful signal that three OCaml companies have worked together to do good for the community", said Jan. Riku agreed, commenting "David, Kate, and Raja presenting the long-awaited <code>opam 2.2</code> release with its plethora of new features highlighted a longstanding and fruitful open-source collaboration across companies". They also announced <code>opam</code>'s new time-based release cycle of updates every six months, check out the <a href="https://opam.ocaml.org/blog/opam-2-3-0-alpha1/"><code>opam 2.3.0</code> blog post</a> to learn more!</li>
<li><a href="https://icfp24.sigplan.org/details/mlworkshop-2024-papers/8/Pattern-matching-on-mutable-values-danger-">ML Track: Pattern Matching on Mutable Values</a>: This one stood out in Ambre's mind for its memorable demonstration of an obscure corner case where the pattern-matching compiler would generate incorrect code. "It was insane; here's this five-line snippet of OCaml code; you can run it <strong>today</strong>, and it segfaults!" The original bug was found in 2016, and work has been ongoing to narrow down and fix the problem. Even though it was a rare edge case, the community rallied to address it, and a bug fix will be included in OCaml 5.3.</li>
<li><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/3/Project-wide-occurrences-for-OCaml-a-progress-report">OCaml Workshop: Project-Wide Occurrences</a>: Riku highlighted the work presented by Ulysse Gérard, introducing a new feature of editor tooling that lets users query all occurrences of a selected value anywhere in their project. "This work also highlights the important part the compiler plays in the overall platform vision, as the feature required work to be done not only in <code>ocaml-lsp-server</code> and <code>merlin</code> but also in <code>dune</code> and the compiler", Riku said.</li>
</ul>
<h3>Memorable Moments</h3>
<p>My colleagues' week in Milan created several lasting memories. Ambre chaired her first session, the 3rd OCaml Workshop session, which included presentations on <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/15/Priodomainslib-Prioritized-Fine-grained-Parallelism-for-Multicore-OCaml">Priodomainslib</a>, <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/12/Saturn-a-library-of-verified-concurrent-data-structures-for-OCaml-5">Saturn</a>, <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/5/Picos-Interoperable-effects-based-concurrency">Picos</a>, and <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/9/Distributed-Actors-in-OCaml">Distributed Actors</a>. She also attended <a href="https://icfp24.sigplan.org/home/farm-2024">FARM</a>, the international workshop on Functional Art, Music, Modelling, and Design. Ambre remembers the workshop as "people using music to make programs, or programming to make music, and understanding music through programming".</p>
<p>Tim recalls his many conversations between events with a variety of people in the hallways. It might not be the first thing that springs to mind, but the conversations that strike up organically at ICFP can be enlightening. For example, "people generally didn't know that you could use GBD/LLDB on OCaml binaries and, once they knew that, they were very excited about using those tools on their OCaml programs".</p>
<p>KC is focussed on the future, highlighting the feedback he received from participants and how it will help Tarides going forward. "I got some really nice feedback from folks, building on OCaml, about the work that Tarides is doing. I also got lots of honest feedback on what's not working. At the end of the day, our user community matters, and we need to solve their pain points so that OCaml becomes an advantage and not a technical risk for their engineering teams." He also wants to keep contributing to the future of ICFP: "I'm on the ICFP steering committee and will be a DEI co-chair for the next ICFP. I want to ensure that ICFP remains a thriving, friendly, and inclusive place for all of our attendees".</p>
<p>Finally, we can't discuss a Milan conference without mentioning the food! All the participants enjoyed sampling the local Italian fare and just look at this delicious pizza!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/pizza-square-170w~obqFAzpvi1Cmkw_xZsTP6w.webp 170w, /blog/images/pizza-square-340w~na6j04mV77cnis2HB2qVgg.webp 340w, /blog/images/pizza-square-680w~tDFhzaFXllfippFscfYclg.webp 680w, /blog/images/pizza-square-1360w~jhmPm2AA1oR5wR0bWzoHUA.webp 1360w" src="/blog/images/pizza-square-1360w~jhmPm2AA1oR5wR0bWzoHUA.webp" alt="An Italian-style pizza with a whole piece of mozzarella cheese in the middle, surrounded by sliced meat."></p>
<h2>Join us Next Year!</h2>
<p>The good news is that ICFP happens every year, so if you didn't attend this year, you can always set your sights on next year. The conference also moves around, alternating locations to make it easier for participants around the globe to join. We look forward to seeing you at <a href="https://icfp25.sigplan.org/">ICFP 2025 in Singapore</a>, so come find us if you're going!</p>
<p>You can reach out to us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. Stay in touch!</p>
]]></description><link>https://tarides.com/blog/2024-10-23-looking-back-on-our-experience-at-icfp</link><guid isPermaLink="false">https://tarides.com/blog/2024-10-23-looking-back-on-our-experience-at-icfp.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 23 Oct 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Dune Developer Preview: Installing The OCaml Compiler With Dune Package Management]]></title><description><![CDATA[<p>We’re excited to share a significant update to Dune's Package Management system, particularly one that will be of great interest to OCaml developers. For those who have been exploring Dune’s experimental package management capabilities over the past six months, you’ll be pleased to know that we've recently merged a feature allowing OCaml compiler packages to be installed directly through Dune.</p>
<p>Until now, Dune’s Package Management has supported the installation of various packages from the opam repository. However, it lacked the ability to install a functioning OCaml compiler — a crucial component for most Dune projects. This limitation meant that early adopters of Dune’s Package Management had to rely on external tools to manage their OCaml compiler installations, which wasn’t ideal. The good news is that this obstacle has been removed, making Dune Package Management far more robust and ready for testing by early adopters.</p>
<p>The challenge we faced in integrating the OCaml compiler installation stemmed from a conflict between the compiler’s build system and the way Dune handles package builds. Dune builds packages in what’s known as a “sandbox” — a temporary directory where a package is initially constructed and installed. Once the build is complete, the package's installed components are moved to their final destination within Dune’s build directory. However, the OCaml compiler assumes that its installation location will be its permanent home. Moving the installed files after installation caused the compiler to malfunction, making this an intractable problem for Dune’s package management.</p>
<p>While work is underway to make the OCaml compiler more flexible in terms of its installation location, we didn’t want to delay Dune’s package management features until this work was completed. Instead, we’ve introduced a workaround that allows compiler packages to be installed in a way that maintains their functionality.</p>
<p>The solution involves installing OCaml compiler packages to a global, per-user directory, rather than within the project’s sandbox. By default, the compiler is installed in a directory such as <code>~/.cache/dune/toolchains/ocaml-base-compiler.&lt;version&gt;-&lt;hash&gt;</code>. This ensures that the compiler remains in its expected location and operates correctly without the need for relocation.</p>
<p>Moreover, this approach offers an added benefit: compilers installed through Dune can be shared across multiple projects. If two projects use the same version of the compiler, the installation step can be skipped in the second project, significantly speeding up the build process. Given that installing an OCaml compiler can take several minutes, this optimisation will save developers considerable time.</p>
<p>In summary, this new feature represents a substantial improvement to Dune’s Package Management system, making it easier and faster for developers to set up and manage their OCaml projects. By enabling direct installation of OCaml compilers through Dune, we’re removing a major barrier to adoption and enhancing the overall development experience.</p>
<p>Since it works transparently whenever you build a project with the Developer Preview, we'd love for you to test it out! Try out the new <a href="/blog/2024-10-03-introducing-the-dune-developer-preview-a-new-era-for-ocaml-development/">Dune Developer Preview</a> today, and let us know how it goes on <a href="https://discuss.ocaml.org/">Discuss</a>. We’re eager to see how the community leverages this new capability and look forward to your feedback as we continue to refine and expand Dune’s Package Management features.</p>
]]></description><link>https://tarides.com/blog/2024-10-16-dune-developer-preview-installing-the-ocaml-compiler-with-dune-package-management</link><guid isPermaLink="false">https://tarides.com/blog/2024-10-16-dune-developer-preview-installing-the-ocaml-compiler-with-dune-package-management.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 16 Oct 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Dune Package Management: Revolutionising OCaml Development]]></title><description><![CDATA[<p>At Tarides, we’ve been working on an initiative to improve the OCaml development experience: Dune Package Management. As outlined in the <a href="https://github.com/tarides/ocaml-platform-roadmap">Platform Roadmap</a>, which was created through community collaboration, the aim is to unify all OCaml development workflows under a single, streamlined tool. At Tarides, we aim to make Dune the recommended frontend for the OCaml Platform, offering new users a seamless developer experience.</p>
<p>The motivation behind Dune Package Management is clear. For years, the OCaml community has called for a single tool to address all the development concerns: building projects, managing dependencies, testing on different compiler versions, etc. By integrating package management directly into Dune, we want to resolve the above long-standing pain points that can make OCaml cumbersome to work with, both for newcomers and experienced developers.</p>
<h2>The Vision for Dune</h2>
<p>Our long-term goal is to make Dune the central tool for OCaml development. That means more than just feature additions! It's about radically simplifying how developers work with the OCaml platform. By making installation painless and simplifying frustrating workflows, such as the handling of dependencies and testing against multiple compiler versions, Dune will address all your OCaml needs.</p>
<p>Dune integrates package management by using opam as a libary in essential parts of our approach. Two commands lie at the heart of integration: <code>dune pkg lock</code> and <code>dune build</code>. <code>dune pkg lock</code> creates a generated lock file, whereas <code>dune build</code> depends on this lock file to manage project dependencies. You can now handle everything from project initialisation to dependency management using these simple commands.</p>
<h3>What We’ve Achieved So Far</h3>
<p>We've accomplished a lot in these past few months! The work we have done for Dune Package Management can already handle such complex projects as <strong>OCaml.org</strong> and <strong>Bonsai</strong> using the new package management features. Both were successfully built using these new features. These early successes confirm our hypothesis: we are on the right track, because this proves the solution's viability in real world scenarios.</p>
<p>But this is not the end of the work. In the future, we plan to further improve the UX so that Dune is not only correct but also easy and productive for developers to use. The remaining challenges are yet to be overcome, and we hope to make Dune Package Management the standard tooling for all OCaml workflows.</p>
<h3>The Road Ahead</h3>
<p>Now that we hit the milestone for MVP, the subsequent phase will have testing, validation, and enhancement of the developer experience. Our main focuses going ahead will include:</p>
<ul>
<li><strong>Smoothen UX:</strong> We want to make the Dune Package Management interface as intuitive as possible, so developers can get their projects underway quickly.</li>
<li><strong>Optimising Performance:</strong> This means shorter compilation times, quicker install times for dependencies, and ensuring all operations work seamlessly.</li>
<li><strong>Simplify Tooling:</strong> We're starting to include things like testing, formatting, documentation generation, and more! This way developers will no longer have to run several different tools to manage their projects.</li>
<li><strong>Providing Clear Documentation:</strong> Thorough, user-friendly documentation will be essential in helping developers adopt these new features.</li>
</ul>
<h3>A Unified Future for OCaml</h3>
<p>Package Management brings in a new era of OCaml development. Dune will now be the only tool engineers will need, making OCaml development as seamless and effective for both complete beginners and experienced developers on the platform.</p>
<p>We look forward to the future and what Dune Package Management will facilitate within the OCaml community. Stay tuned, and prepare to take part in a more integrated and seamless OCaml development experience with Dune.</p>
]]></description><link>https://tarides.com/blog/2024-10-09-dune-package-management-revolutionising-ocaml-development</link><guid isPermaLink="false">https://tarides.com/blog/2024-10-09-dune-package-management-revolutionising-ocaml-development.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 09 Oct 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing the Dune Developer Preview: A New Era for OCaml Development]]></title><description><![CDATA[<p>The Dune team is excited to announce the arrival of <a href="https://discuss.ocaml.org/t/ann-dune-developer-preview-updates/15160/7">Dune Developer Preview</a>, an experimental nightly release of Dune created to test improvements, including the new package management feature. This is a major milestone for OCaml development! We've been working hard improving Dune, and we're excited to introduce this new way to ease OCaml workflows. If you are an OCaml developer, this is the time to explore the future of package management and development yourself.</p>
<h3>Why the Wait?</h3>
<p>Our progress has been slower than anticipated because the Dune project is a complex endeavour. We’re committed to balancing support for existing OCaml users while introducing new workflows. While opam remains the stable workflow and isn't going away, a key difference is that the Dune Developer Preview showcases experimental workflows. We believe, once fully ready, this will be transformative for the community. Meanwhile, we continue to maintain opam and its infrastructure, ensuring the OCaml ecosystem remains robust, with recent updates like opam 2.2 bringing proper Windows support.</p>
<h3>Why the Change?</h3>
<p>We have listened over time to your feedback and learned that while functional, the existing tooling around OCaml can be quite cumbersome. Our mission is to make OCaml development easier, faster and more enjoyable, and Dune is now about to become that all-in-one tool which does just that. This separation will allow Dune to cut down on the complexity and make its development process much lighter.</p>
<p>The Developer Preview gives a sneak peek into these improvements and provides an opportunity for developers to get their hands on these tools before the official release.</p>
<h3>What's New in Dune?</h3>
<p>The Dune Developer Preview introduces several major changes in how OCaml developers will work. Probably the most notable change is that it includes package management right within Dune itself. No more juggling with multiple tools; Dune does it all in one go.</p>
<p>Accordingly, the new Dune comes as a binary. That means it no longer requires the usage of opam to install it. You can simply just install Dune within minutes, and you'll be off to a flying start. That gives you so much more time for actual development.</p>
<h3>Drawing Inspiration from the Best</h3>
<p>Throughout this process, we've drawn inspiration from other successful ecosystems. We learned from Rust, Go, and Erlang the power of letting developers test and provide feedback on a pre-general release, and we are doing the same with Dune: giving the OCaml community a chance to have their say in shaping up the final form of this tool. Your input will be invaluable in refining Dune and ensuring it meets your needs.</p>
<h3>Join the Beta</h3>
<p>We are looking for developers to participate in the Dune Developer Preview. Experienced developers can help us put the latest features of Dune through their paces so that we can fine-tune the workflow to be smooth and intuitive across a range of projects. Those new to OCaml can test the workflow from their perspective of working in other ecosystems. Whether you have been using OCaml for years or are new to the OCaml ecosystem, your input is crucial in helping us make Dune the recommended tool for OCaml development.</p>
<p>To become a beta tester, one only needs to download the <a href="https://preview.dune.build/">Dune Developer Preview</a>. You will get the very latest version and immediately start playing with the new workflows and package management features. Your feedback will help to further shape the future of OCaml tooling, and we want to hear your thoughts on everything from usability to performance enhancements.</p>
<h3>Measuring Success</h3>
<p>Our goal is clear: use OCaml with as little ceremony as possible, so we set a few key benchmarks to make that happen. First: via Dune, the user should be able to fire up a new project in less than 10 seconds with no extra tooling required. This level of speed makes all the difference during everyday hacking.</p>
<p>We also monitor general developer satisfaction with the new Dune workflows. For the Net Promoter Score, we will know how likely a user is to recommend Dune to other users. Our objective is to reach +80, which shows great approval and satisfaction within the community.</p>
<h3>What's Next?</h3>
<p>This is only the beginning of the Dune Developer Preview. We will regularly update and enhance it based on feedback from beta testers. This includes polishing the new Dune Toolchain feature in order to automatically manage OCaml versions and development tools, and exploring new distribution channels in order for Dune to be even easier to download and use.</p>
<p>This is huge for OCaml, and we are excited to have you along for the ride. Why wait? Download the <a href="https://preview.dune.build/">Dune Developer Preview</a> today, and give some of the new workflows a try before letting us know what you think. With your help, we can make Dune the go-to tool for all OCaml development.</p>
<p>We can't wait to hear from you — and happy coding!</p>
]]></description><link>https://tarides.com/blog/2024-10-03-introducing-the-dune-developer-preview-a-new-era-for-ocaml-development</link><guid isPermaLink="false">https://tarides.com/blog/2024-10-03-introducing-the-dune-developer-preview-a-new-era-for-ocaml-development.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 03 Oct 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Unlock your Team’s Potential with Expert Training in OCaml, Cybersecurity Fundamentals, Functional Programming, and More]]></title><description><![CDATA[<p>Training your teams has proven benefits, <a href="https://seismic.com/uk/enablement-explainers/the-importance-of-training/">enhancing the efficiency and quality of work</a>, and equipping members with the skills they need to tackle new challenges. Specialist training empowers them to use new techniques, leverage advanced technologies, and solve more complex problems.</p>
<p>At Tarides, we are launching a new initiative to share our expertise and industry experience in a series of customisable, flexible training courses designed to unlock new possibilities for your teams.</p>
<h2>How Tarides Can Help You</h2>
<p>Technical training equips your teams with essential skills to handle the latest software advancements and workflow improvements. While many generic courses exist, specialised knowledge is often necessary, and our targeted training suite offers:</p>
<ul>
<li><strong>Expert Instructors</strong>: Learn from core developers with years of industry experience</li>
<li><strong>Hands-On Learning</strong>: Jump straight into coding on day one, with exercises and real-world scenarios</li>
<li><strong>Peer Collaboration</strong>: Share knowledge and insights with a community of like-minded developers</li>
<li><strong>Career Advancement</strong>: Acquire new skills to face new challenges</li>
<li><strong>Flexibility &amp; Customisability</strong>: Choose your custom topic elements to ensure the course works best for your needs</li>
</ul>
<p>Our bespoke training can be customised to the unique circumstances of each team, addressing weaknesses and enhancing strengths. Each course offers:</p>
<ul>
<li>Flexible sessions: On-site, online, or in a hybrid configuration to fit your schedule</li>
<li>10 hours of post-training support in every package to address follow-up questions</li>
</ul>
<p><a href="/services/training/">Sign up online</a> for our courses, and we will contact you for an initial consultation to provide more information, discuss options, and help you decide if training is right for you!</p>
<h2>Our Courses</h2>
<p>We have created a group of courses tailored to different needs, ranging from onboarding a team with OCaml to mastering specialised skills, from basic compliance with cybersecurity guidelines to auditing custom workflows. While they address various applications, all of our training shares a common goal: to minimise friction for busy organisations. Our most popular courses are detailed below:</p>
<h3><a href="/services/training/">Getting Started With OCaml: An Introduction</a></h3>
<p>Are you using OCaml for the first time or onboarding new team members? This course is a perfect fit. It covers the fundamental language concepts, tools, and techniques and culminates with a practical exercise in which participants build their own application. The course includes modules on the Dune build system, the OCaml Platform, imperative and modular programming, debugging, and more!</p>
<p>OCaml stands out among its peers with its expressive syntax, robust type system, and exceptional performance. After this course, your team will walk away with an understanding of OCaml's main features and how to start using them to their advantage.</p>
<h3><a href="/services/training/">Mastering OCaml: Advanced Techniques</a></h3>
<p>This is the best choice for teams already familiar with OCaml and who are using it in their projects. It enables your teams to adopt advanced techniques, such as Web application development with <a href="https://github.com/ocsigen/js_of_ocaml">JSOO</a> and <a href="https://github.com/ocaml-wasm/wasm_of_ocaml">WSOO</a>, multicore programming with <a href="https://github.com/ocaml-multicore/eio">Eio</a>, <a href="https://mirage.io/">MirageOS</a>, testing, and GADTs, making expert-level techniques accessible to our clients.</p>
<p>This course allows you to customise modules to cover the exact tools and techniques your teams need the most. This allows them to improve the quality of the code they produce precisely and effectively, boosting OCaml developers’ confidence and skills.</p>
<h3><a href="/services/training/">Scalable, Flexible, and Powerful: Language-neutral Functional Programming</a></h3>
<p>Help your teams understand the core functional programming principles that help developers produce safer, less buggy, and more readable code - regardless of their programming language. This three day course is a rich introduction to functional programming. The first day focusses on the foundations, including recursive and higher-order functions, type annotations and type inference, and function composition and pipes. The second day delves deeper into types covering topics like immutability, monads, and currying. The final day pushes further into I/O monads, continuations, type algebra, and more.</p>
<p>This course is the perfect choice for teams responding to the <a href="/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release/">growing push for safer code</a>, wanting to adopt functional programming in their workflows. It is also an excellent choice for onboarding new teams, helping them gain confidence and competence with a new way of programming.</p>
<h3>Coming soon: More Courses and More Content!</h3>
<p>Some of our training programmes are still under development. <a href="/services/training/">Register your interest</a> in the upcoming courses to find out when they become available.</p>
<ul>
<li>
<p><strong><a href="/services/training/">Open-source Development: How to make OSS work for you</a>:</strong> Introduces the methodologies and best practices of open-source development, including how multiple contributors improve the quality of the end product, innovation of the ecosystem, and lower the overall cost of development.</p>
</li>
<li>
<p><strong><a href="/services/training/">Cybersecurity &amp; Secure-by-Design</a>:</strong> Teaches the fundamentals of secure-by-design principles, how programming language design affects security, and common attacks and their mitigation. Furthermore, for EU-based clients, we will include how to comply with the <a href="https://digital-strategy.ec.europa.eu/en/library/cyber-resilience-act">Cyber Resilience Act</a>. We will also offer an optional additional audit of your systems' vulnerabilities and suggestions for improving their resilience.</p>
</li>
</ul>
<h2>Get in Touch</h2>
<p>We're excited to provide this service to our clients and know how important it is to share the knowledge we have accumulated. As the world increasingly recognises the need for change in how software is developed, OCaml, functional programming, and open source are growing. We are here to help you be ready for this transition!</p>
<p>Stay in touch with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2024-10-01-unlock-your-team-s-potential-with-expert-training-in-ocaml-cybersecurity-fundamentals-functional-programming-and-more</link><guid isPermaLink="false">https://tarides.com/blog/2024-10-01-unlock-your-team-s-potential-with-expert-training-in-ocaml-cybersecurity-fundamentals-functional-programming-and-more.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire, Miklos Tomka ]]></dc:creator><pubDate>Tue, 01 Oct 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing Dune: The Essential Build System for OCaml Developers]]></title><description><![CDATA[<p>One of the first tools you'll encounter when adopting OCaml is Dune, OCaml's official build system. Understanding what Dune is and how it serves you is key to crafting everything from a small project to maintaining large-scale codebases. So let's dive in! Learn how Dune makes development easier and serves as a gateway to the greater OCaml Platform.</p>
<p>For a quick introduction to OCaml, check out the <a href="https://ocaml.org/docs/installing-ocaml">tutorials on OCaml.org</a>.</p>
<h3>What is Dune?</h3>
<p>Dune is much more than a simple build tool. It automatically compiles your OCaml code, manages its dependencies, and generates the final executable or library. It's also a well-maintained, highly-optimised platform that streamlines your development process. Spend more time writing code and less on struggling with complex build rules.</p>
<h3>Advantages of Dune</h3>
<h4>1. <strong>Consistency Across Projects</strong></h4>
<p>With Dune, you can be sure that the build processes are consistent, no matter how many different projects you are managing. This is very helpful when collaborating with other developers or when maintaining multiple projects. Once you work with Dune on one project, it's easier to work on the next, even if it has a totally different codebase, because Dune standardises how things are done.</p>
<h4>2. <strong>Integration With the OCaml Platform</strong></h4>
<p>Dune lives on the cutting edge of the OCaml Platform (a set of tools and libraries) and forms a solid foundation for your development environment. Dune automatically plays nicely with other tools such as opam (an OCaml package manager) and helps you manage dependencies, run tests, and set up project documentation.</p>
<h4>3. <strong>Performance Optimisation</strong></h4>
<p>Dune is fast and efficient. It tracks dependencies and rebuilds only when necessary, so your development processes will be more responsive. This performance optimisation benefits the developer, regardless of project size. Although for big projects, it especially makes a difference because it significantly reduces the build time.</p>
<p>Dune also supports other languages and tools within the same project. This flexibility makes it easy to incorporate C stubs, inline assembly, or even JavaScript (via js_of_ocaml and <a href="https://melange.re/v2.1.0/">Melange</a>) into your OCaml projects without needing to change your build system.</p>
<h3>A Well-Maintained and Evolving Tool</h3>
<p>The Dune team listens to community needs and regularly releases updates for performance, features, and bug fixes. This keeps Dune current with OCaml's development, giving engineers a coherent and state-of-the-art tool that evolves with the language and ecosystem.</p>
<p>Soon, Dune will also provide package managing functionality, so you can choose whether to use Dune or opam. It's currently in beta testing, so watch this blog for an announcement of the upcoming release!</p>
<h3>Getting Started With Dune</h3>
<p>It is very easy to install Dune using opam:</p>
<pre><code><span class="sh-source">opam install dune
</span></code></pre>
<p>Now make a new OCaml project by running:</p>
<pre><code><span class="sh-source">dune init project my_project
</span></code></pre>
<p>This creates a minimal project structure with sensible defaults, so you can get to coding right away. When ready, compile your project using <code>dune build</code> and run it using <code>dune exec ./my_project</code>.</p>
<h3>Conclusion</h3>
<p>Dune simplifies the development process, ensures uniformity, and is deeply integrated with the entire OCaml Platform. If you set up Dune from the very beginning, it will let you focus on creating great software, the most important thing.</p>
<p>As you learn more about OCaml, you'll appreciate the power, flexibility, and great community behind Dune and OCaml as a whole. For both small personal projects and collaborations on big applications, Dune is a tool you can rely on from start to finish.</p>
]]></description><link>https://tarides.com/blog/2024-09-26-introducing-dune-the-essential-build-system-for-ocaml-developers</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-26-introducing-dune-the-essential-build-system-for-ocaml-developers.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 26 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Summer of Internships: Projects From the OCaml Compiler Team]]></title><description><![CDATA[<p>We have had the pleasure of hosting several interns in the compiler team this past year. Their projects have tackled varied and challenging tasks touching on different aspects of compiler development, ranging from modularising the observability tool Olly to creating eBPF-based kernel-side performance monitoring, improving polyglot package management, and lifting limitations of the Ortac tool that helps developers test Gospel specifications for OCaml.</p>
<p>Let's take a look at what the interns have been up to over the past six months. Remember, if you want to intern with us, keep an eye on our <a href="/careers/">careers page</a> for upcoming opportunities!</p>
<h2>Eutro and Olly</h2>
<p><a href="https://github.com/eutro">@eutro's</a> internship focussed on making the observability tool <a href="https://github.com/tarides/runtime_events_tools">Olly</a> more useful for developers by addressing two shortcomings: an incompatibility between Olly and the Runtime Events API that would cause crashes in certain cases, and the number of Olly's dependencies which made it difficult or impossible to use its core functions with unreleased ("trunk") OCaml (or at all on Windows). To resolve these issues, @eutro modularised Olly, implementing a table-based translation of runtime event names and tags.</p>
<p>@eutro modularised Olly by refactoring the binary into smaller libraries, with the awkward dependencies isolated into an optional library. Splitting up the libraries gives users greater control over their dependencies and you can <a href="https://github.com/tarides/runtime_events_tools/pull/43">learn more about modularisation in the PR</a>.</p>
<p>The second part of the internship focussed on the table-based translation of runtime event names and tags to allow different versions of OCaml to consume runtime events. The goal was to avoid two bugs that arose when Olly profiles a program compiled with a different version of OCaml, <code>olly trace</code> generating nonsensical names for the slices and <code>olly gc-stats</code> silently generating garbage output. To delve into the details, check out <a href="https://github.com/tarides/runtime_events_tools/pull/44">@eutro's PR about runtime event names</a>.</p>
<p>Lastly, @eutro managed to squeeze in some bug-fixes as well!  One PR addresses the <a href="https://github.com/ocaml/ocaml/pull/13089">incorrect use of <code>snprintf_os</code> in formatting the runtime events ring file path</a> and another that <a href="https://github.com/ocaml/ocaml/pull/13091">fixes some memory bugs in <code>runtime_events_consumer.c</code></a>. Both of these fixes will be included in the 5.3 OCaml update.</p>
<h2>Lee Koon Wen and Eio</h2>
<p>The <a href="https://github.com/ocaml-multicore/eio">Eio library</a> enables users to write high-performance I/O programs leveraging the new effects system that came with OCaml 5. The I/O library pairs well with the io_uring interface on Linux as a rule; however, the asynchronous nature of <code>io_uring</code> can make it hard for the developer to get a grasp on the performance of their Eio programs when they are bottlenecked by something within the kernel rather than the program itself.</p>
<p>The Linux kernel has a mature and extensive system for gathering data on its performance. In his internship project, <a href="https://github.com/koonwen">Lee Koon Wen's</a> task was to produce a library that used the <a href="https://ebpf.io/what-is-ebpf/">Linux eBPF</a> probes to gather kernel-side data on an Eio program's <code>io_uring</code> use and deliver them to the program's runtime events. This would let users analyse kernel-side performance data alongside their program's performance using an observability tool like <a href="https://github.com/tarides/runtime_events_tools">Olly</a>.</p>
<p>Two projects have sprung from this work, <a href="https://github.com/koonwen/uring-trace?tab=readme-ov-file"><code>uring-trace</code></a> and <a href="https://github.com/koonwen/ocaml-libbpf"><code>ocaml-libbpf</code></a>. The former is a tracer that, using bindings provided by the latter, can extract events from a Linux kernel. These traces can then be generated in <a href="https://fuchsia.dev/fuchsia-src/reference/tracing/trace-format">Fuschia</a> format and displayed on <a href="https://ui.perfetto.dev/">Perfetto</a>. This project will benefit developers on the Linux platform, helping them understand and optimise their programs using accurate data.</p>
<h2>Ryan and Polyglot Package Management</h2>
<p>Using several programming languages in one project lets you take advantage of the particular strengths of each language and of its library ecosystem. For example, <a href="https://www.python.org/">Python</a> is well-known for its data science libraries, <a href="https://www.rust-lang.org/">Rust</a> for its ownership memory model, and <a href="https://ocaml.org/">OCaml</a> for its type safety. Over the past couple of decades, many large programming language ecosystems (and even some smaller ones) have acquired language-specific package managers, e.g. <code>pip</code> (Python's; 2008), <code>cargo</code> (Rust's; 2015 - although it started with one) and, of course, <code>opam</code> (OCaml's; 2013). Managing these so-called 'polyglot programming' projects, with several languages working together, relies on coordinating these package managers to provide language libraries and toolchains like compilers and build systems. The need to use multiple package managers naturally increases the complexity of these projects. Additionally, dependencies are hard, or impossible, to express across different package managers.</p>
<p><a href="https://github.com/RyanGibb">Ryan Gibb's</a> research internship at Tarides focussed on using nix as an initial bridge towards these objectives, extending <a href="https://opam.ocaml.org/"><code>opam</code></a> to support the provision of "external dependencies" using <a href="https://nixos.org/">Nix</a>, the language-agnostic functional package manager, instead of the OS's own package manager. There has been much work in this area already (e.g. <a href="https://github.com/tweag/opam-nix">opam-nix</a> and <a href="https://github.com/timbertson/opam2nix">opam2nix</a>), but these have focussed more on being able to take <code>opam</code> packages themselves and install them <em>using</em> nix.</p>
<p>Ryan's work moves in the other direction, allowing Nix packages to be used within the environment set-up by opam, by adding a <code>depext</code> mechanism to <code>opam</code>. Parallel with this work, Ryan also extended a previous investigation with <code>nixpkgs</code> to allow users to be able to specify versions of Nix dependencies. In general, Nix only supports the latest versions, but by analysing <code>nixpkgs</code> repository history we can map the versions of the packages we’re interested in to the ranges in <code>nixpkgs</code> commit history which provide them. Using <code>opam</code>’s solver we can then find the maximum commit of <code>nixpkgs</code> satisfying the version constraints on the packages (as long as a state exists meeting the conditions).</p>
<p>Future work will address current limitations and bring improvements to the workflow for users. For example, <code>opam</code>’s Nix <code>depext</code> mechanism picks up the environment variables from the builder's shell, meaning it must manually specify the environment it wants to extract. It may be possible to access the <code>env</code> attribute of derivations directly as the Nix binary does. Ryan intends to keep working on these limitations as well as future goals, including shepherding Opam's Nix <code>depext</code> support through review and possibly productionising <code>opam-nix-repository</code> for use in <code>opam-repository.</code></p>
<p>For more information, check out the project's PRs, <a href="https://github.com/ocaml/opam/pull/5982">#5982</a> and <a href="https://github.com/RyanGibb/opam/pull/2">#2</a>.</p>
<h2>Nikolaus, Gospel and Ortac</h2>
<p>The culture around OCaml values safety and reliability, so it is no surprise that a suite of tools has been developed to ensure these qualities. One such tool, <a href="https://ocaml-gospel.github.io/gospel/">Gospel</a>, is a contract-based behavioural specification language that can provide a logical model for OCaml types and describe the intended behaviour of functions using pre- and post-conditions. <a href="https://github.com/ocaml-gospel/ortac/tree/main">Ortac</a>, in turn, is a tool that can generate a <a href="https://ocaml-multicore.github.io/multicoretests/">QCheck-STM</a> test suite based on the Gospel specification of a library. You can find out more about them in the dedicated post <a href="/blog/2024-09-03-getting-specific-announcing-the-gospel-and-ortac-projects/">Getting Specific: Announcing the Gospel and Ortac Projects</a>.</p>
<p>The goal of <a href="https://github.com/nikolaushuber">Nikolaus Huber's</a> internship was to lift some of the limitations of Ortac and QCheck-STM to expand its use cases for developers. It's essential to increase the types of tests that users can generate so that more OCaml code can be checked using Gospel. His project has produced three PRs, <a href="https://github.com/ocaml-gospel/ortac/pull/235">#235</a> which centres on allowing tests to run without system under test in the signature, <a href="https://github.com/ocaml-gospel/ortac/pull/237">#237</a> focusses on adding support for tests with tuples in their signature, and <a href="https://github.com/ocaml-gospel/ortac/pull/247">#247</a> aiming to introduce support for testing functions with multiple systems under test as arguments.</p>
<p>In addition to the PRs addressing Ortac limitations, Nikolaus has also fixed several issues regarding Gospel and Ortac. He <a href="https://github.com/ocaml-gospel/ortac/pull/234">added a new error</a> for when there are no commands produced during the translation from Gospel to OCaml,  <a href="https://github.com/ocaml-gospel/ortac/pull/240">fixed a bug within the QCheck-STM</a> that occurred when testing functions that return integers, and <a href="https://github.com/ocaml-gospel/ortac/pull/245">addressed another bug where a type check would incorrectly indicate that code was correct</a>. All in all, Nikolaus' project benefits developers who want to test their OCaml programs to ensure they perform predictably and correctly.</p>
<h2>Stay in Touch</h2>
<p>We want to hear from you! Follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> for the latest news from Tarides and to share your thoughts with us. Are you interested in completing an internship project with us? Keep an eye on <a href="/careers/">our careers page</a>, where we announce upcoming internship opportunities. Happy hacking!</p>
]]></description><link>https://tarides.com/blog/2024-09-24-summer-of-internships-projects-from-the-ocaml-compiler-team</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-24-summer-of-internships-projects-from-the-ocaml-compiler-team.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 24 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Eio From a User's Perspective: An Interview With Simon Grondin]]></title><description><![CDATA[<p>Are you curious about Eio but not sure what to expect? Eio is an effects-based direct-style concurrency library for OCaml 5. I recently spoke with <a href="https://github.com/SGrondin">Simon Grondin</a> from <a href="https://asemio.com">Asemio</a> about his personal experience using the I/O library <a href="https://github.com/ocaml-multicore/eio">Eio</a>. He was kind enough to share the good and the bad and give me insight into how the library has worked for his projects.</p>
<p>If you want to know what programming with Eio is like from another user's perspective, this interview is for you. Let's explore what Simon had to say!</p>
<h2>The Interview</h2>
<h3>"Hello Simon, thank you for meeting with me today! What first got you curious about Eio?"</h3>
<p>“We use OCaml at Asemio, and I use OCaml a lot for my own projects. But when it came to concurrent code, I had accepted that to reap the benefits, I needed to deal with the increased complexity that came with them. In traditional concurrency, the developer relies on abstractions for tasks and promises using async/await. I had grown to assume that <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">coloured functions</a> was a necessary price to pay when the Eio library with its effect handlers announced that we didn't have to deal with this problem at all."</p>
<blockquote>
<p>"Concurrency squeezes more out of code, so people use it, accepting that they will pay the cost of increased complexity. But we can enjoy the benefits of concurrency without the cost!"</p>
</blockquote>
<p>"I had heard of Eio relatively early in its development, around version 0.3. At that point, it was still in flux, and I wasn't comfortable that I could build something lasting on top. Then I saw the announcement post for version 0.10 on June 11th 2023, and that's when I decided to try and switch. The project looked much more stable, the developers had resolved several initial issues, and the code wasn't being rewritten nearly as much as before. All of my previous hesitations had been addressed.”</p>
<h3>"What did you do next?"</h3>
<p>“I had a project in mind for testing Eio, a tool I had initially developed as an internal application. At Asemio, we regularly process huge Excel files (think millions of cells). The challenge with these files is that there are multiple ways that they can be organised internally, and it is impossible to know how they are organised without opening them. We need to be able to stream and read the data of very large files.</p>
<p>This is where the <a href="https://github.com/asemio/SZXX">SZXX</a> library comes in. It's a streaming ZIP, XML, and XLSX library that can stream data from these file formats even when reading from a network socket, either in constant memory or with user-defined memory usage guarantees. I had initially used the <a href="https://github.com/ocsigen/lwt"><code>Lwt</code></a> library to implement concurrency in SZXX, but I decided to convert it to Eio.”</p>
<h3>"What benefit did you expect to see from switching to Eio?"</h3>
<p>“From my perspective, the main problem with using <code>Lwt</code> was the <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">function colouring problem</a>. Micromanaging code in that way is time-consuming, and I wanted to learn Eio mainly because I knew that effects could potentially eliminate the extra work and complexity that came with <code>Lwt</code>.</p>
<p>I also learned that Eio came with integrated access to <code>io_uring</code>, letting the developer use it without additional effort and the function colouring problem. These were the two features I was the most interested in when converting the SZXX project from Lwt to Eio.</p>
<p>In addition, I had observed a split between Lwt and Async within the community,  and for a while, I had been looking for a solution that could bridge the gap. From my research into the I/O library at that time, I felt confident that it had the potential to 'heal the split' between the different library users. In a way, converting SZXX to Eio was my way of testing that theory to see what effect the transition would have."</p>
<h3>"What was your initial experience with Eio? Did you have any frustrations?"</h3>
<p>"I actually did! When I first started using Eio, I had a hard time understanding how Paths, Flows, Files (and others) relate to one another. I also experienced a lack of documentation and examples of abstraction to help the newcomer get up and running. These are some of the first things a new user will experiment with, and they must work well. Since then, I have helped improve the documentation (alongside several other contributors) and made other quality-of-life contributions to the library."</p>
<p>These initial friction points were not enough to dissuade me from continuing my project, and after that initial hurdle, I was very happy."</p>
<blockquote>
<p>"I had been prepared to accept a performance degradation to get the benefits I was looking for, but I was pleasantly surprised to note that performance actually improved.”</p>
</blockquote>
<h3>"That's great! Did Eio affect your project in any other ways? Did anything surprise you?"</h3>
<p>“I was unprepared for just how much of a difference using non-monadic code would make. Eio has this concept called a Switch, which ties resources to scopes. It cleans up your code by managing the lifecycle of open resources automatically, for example, turning off background processes, closing file handles, and ensuring that all auxiliary states are set to their desired state upon exiting the scope.</p>
<p>When you write parallel programs, there can be hundreds of thousands of interleavings and computations, and ensuring that each computation is handled correctly has always been a challenge with monadic-style code. But with Eio and effects-based concurrency, the switch handles much of that complexity for you.</p>
<p>In fact, both of these benefits work synergistically. They include solving the function colouring problem (meaning you don't need to distinguish between async and 'normal' functions) and using non-monadic code. Each amplifies the other almost exponentially and frees up a lot of your complexity budget."</p>
<blockquote>
<p>"The human brain limits the complexity of code, and at some point, you just can't keep making things smaller to make them simple enough. Eio helps reduce complexity. With Eio, I was able to simplify and remove so much bloat from the code and achieve some really tricky optimisations, pushing the limit of what was possible.”</p>
</blockquote>
<h3>"That's an interesting insight, and I think one that only someone with real user experience can make. It's great to hear what results the user might expect. I know you have made significant contributions to Eio as well, and that the team has made you a maintainer in response to your contributions and reviews. Can you tell me about some of those?"</h3>
<p>“I started contributing to Eio from around version 0.11 and onwards. Some of my main projects include: <a href="https://github.com/ocaml-multicore/eio/blob/main/README.md#executor-pool"><code>executor_pool</code></a>, developing an <a href="https://github.com/Chris00/ocaml-csv/pull/40">Eio adapter for the popular CSV library</a> that I also <a href="https://github.com/inhabitedtype/angstrom/pull/227">added to Angstrom</a>, as well as work on an internal thread pool to speed up platforms without io_uring support. Along the way, I have added resources to Eio that make it easier for people to adopt the tool. For example, I designed the executor_pool module based on common patterns that I had often used in my projects.</p>
<p>Regarding the adaptors, in OCaml, library authors need to explicitly write one adaptor per I/O library they choose to support. The popular CSV library already had adaptors for both <code>Async</code> and <code>Lwt</code>. I knew that it would be hard for certain users to pick up Eio if it did not have an adaptor, so I wrote one and added it to both CSV and Angstrom. A lot of OCaml software depends on Angstrom through one dependency or another, and without Eio support, those codebases cannot make the switch to Eio.”</p>
<h3>"Thank you for answering my questions and sharing your experience with Eio! Do you have any final thoughts you would like to share?"</h3>
<blockquote>
<p>“Eio helped me reason about my code, and I discovered bugs and problems because of how much Eio had cleaned up the code. I uncovered hidden bugs in every program I converted from Lwt to Eio. Every single one also ended up being faster, not because Eio itself was faster (it was as fast as Lwt), but because of the optimisations I could now afford to make, thanks to the reduced complexity.”</p>
</blockquote>
<h2>Stay in Touch</h2>
<p>If you have questions about Eio or how to use it, you can fill in our <a href="/contact/">contact form</a>, and we will contact you. You can also ask questions on the <a href="https://discuss.ocaml.org/">OCaml Discuss forum</a> or open issues and pull requests in the <a href="https://github.com/ocaml-multicore/eio">Eio repo</a>.</p>
<p>To keep up with our activities, you can always follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>. We look forward to connecting with you!</p>
]]></description><link>https://tarides.com/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-19-eio-from-a-user-s-perspective-an-interview-with-simon-grondin.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 19 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing the `odoc` Cheatsheet: Your Handy Guide to OCaml Documentation]]></title><description><![CDATA[<p>For developers diving into the OCaml ecosystem, one of the essential tools you'll encounter is <code>odoc</code>. Whether you're a seasoned OCaml programmer or just starting out, understanding how to generate and navigate documentation efficiently is crucial. This is where <code>odoc</code> comes in, OCaml's official documentation generator. To make your experience with <code>odoc</code> even smoother, the <code>odoc</code> team has created the <code>odoc</code> Cheatsheet.</p>
<h2>What is <code>odoc</code>?</h2>
<p><code>odoc</code> is a <a href="/blog/2024-01-10-meet-odoc-ocaml-s-documentation-generator/">powerful documentation generator</a> specifically designed for the OCaml programming language. It transforms OCaml interfaces, libraries, and packages into clean, readable HTML, LaTeX, or man pages. If you've worked with JavaDoc or Doxygen in other programming languages, you'll find <code>odoc</code> to be a similarly indispensable tool in the OCaml world.</p>
<p>The purpose of <code>odoc</code> is twofold:</p>
<ol>
<li>It helps developers create comprehensive documentation for their projects.</li>
<li>It allows users to easily navigate and understand these projects through a standardised format.</li>
</ol>
<p>As OCaml projects grow in complexity, well-maintained documentation becomes increasingly important for collaboration, onboarding new team members, and ensuring long-term project sustainability.</p>
<h2>The <code>odoc</code> Cheatsheet: A Quick Reference for OCaml Developers</h2>
<p>While <code>odoc</code> is great for generating docs, it uses a syntax that is not widely known. Learning a new syntax can be cumbersome, if not downright difficult. Before this cheatsheet, the resource for the syntax was only in <a href="https://ocaml.github.io/odoc/odoc_for_authors.html">the <code>odoc</code> for Authors page</a>. However, this page offers extensive detail, covering far more than just the syntax. While excellent for in-depth exploration, it can be challenging when you're aiming for quick productivity.</p>
<p>The <a href="https://ocaml.github.io/odoc/cheatsheet.html"><code>odoc</code> Cheatsheet</a> is a very simple resource for writing simple things. It is easy to read it and discover syntax, and you can use it to recheck your syntax. Rather than explaining, it provides examples, which is less cognitive overhead for the developer. It serves as a concise reference guide that covers the most important aspects of <code>odoc</code>, helping you to quickly get up to speed without wading through extensive documentation.</p>
<p>Here’s a closer look at how this cheatsheet benefits you:</p>
<ol>
<li>
<p><strong>Easy Access to Essential <code>odoc</code> Syntax</strong>
The cheatsheet provides a useful list of <code>odoc</code> syntax. Whether you need to generate documentation for a single module or an entire project, the cheatsheet lays out the exact markup commands you need. This can save a lot of time, as you won’t have to search through various resources to find the correct syntax or options.</p>
</li>
<li>
<p><strong>Concise and Well-Organised Information</strong>
Information is presented in a clear, concise table that allows you to quickly find what you need.</p>
</li>
</ol>
<p>This organisation is particularly beneficial when you’re in the middle of coding and need to find a markup command quickly. The cheatsheet gives you instant access to the most relevant information.</p>
<ol start="3">
<li><strong>A Great Learning Tool for New OCaml Developers</strong>
For those new to OCaml, the <code>odoc</code> Cheatsheet doubles as a learning tool. By following the syntax provided, you’ll not only generate better documentation but also gain a deeper understanding of how to structure your code and its corresponding documentation effectively.</li>
</ol>
<p>The cheatsheet explains how to use specific annotations in your comments to generate informative documentation. This might not be immediately obvious to someone new to OCaml or <code>odoc</code>, but it can greatly enhance the usability of your generated docs.</p>
<h2>Conclusion: A Simple and Useful Resource for <code>odoc</code></h2>
<p>Whether you're maintaining a large OCaml project or just starting out, the <code>odoc</code> Cheatsheet simplifies the documentation process, making it easier to produce high-quality docs with minimal hassle. Keep this cheatsheet at your fingertips, and ensure your OCaml projects are documented as well as they are coded.</p>
<p>So, before you dive into your next OCaml project or documentation task, take a moment to explore the <a href="https://ocaml.github.io/odoc/cheatsheet.html"><code>odoc</code> Cheatsheet</a>. It could be the key to making your work more efficient and your documentation more effective.</p>
]]></description><link>https://tarides.com/blog/2024-09-17-introducing-the-odoc-cheatsheet-your-handy-guide-to-ocaml-documentation</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-17-introducing-the-odoc-cheatsheet-your-handy-guide-to-ocaml-documentation.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 17 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Feature Parity Series: Compaction is Back!]]></title><description><![CDATA[<p>Compaction is a feature that rearranges OCaml values in memory to free up space, which can then be returned to the operating system. In the <a href="https://github.com/ocaml/ocaml/releases/tag/5.2.0">OCaml 5.2 release</a>, the technique returns to the OCaml Garbage Collector for the first time since its removal in the 5.0 multicore update.</p>
<p>This is part one of our feature parity series highlighting features returning to OCaml in an effort to restore feature parity with OCaml 4.14. When OCaml gained multicore support (that is, the ability to execute on multiple domains) it had far-reaching implications on the way the runtime worked, and as a result, support for some features were dropped for the 5.0 release. To address these gaps, a significant amount of work has been done behind the scenes to adapt tools and runtime features to work safely and performantly with multiple domains. Tarides is part of the effort to restore these familiar features to OCaml 5.</p>
<h2>What is Compaction?</h2>
<p>In OCaml 5.0, the major heap in the parallel Garbage Collector employs size-segregated pools attached to each domain. Over time, as OCaml domains allocate and discard heap values, many of these pools end up being only partially used. For example, a program might allocate millions of two-element tuples when initialising but no longer need most of these afterwards. This will result in lots of 'size two' pools, most of which will only be sparsely filled. This is inefficient as OCaml will still consume system memory for all the pools, even if the heap values only take up a tiny proportion of each pool.</p>
<p>Compaction is not new to OCaml; the latest version to include the technique was 4.14. After OCaml adopted a multicore garbage collector, compaction needed to be rewritten to work with the major heap's new structure and to be safe in parallel execution. Compaction for OCaml 5 - as reintroduced in PR <a href="https://github.com/ocaml/ocaml/pull/12193">#12193</a> - achieved this by identifying a small number of shared pools in each size class big enough to contain all heap values in that size and then moving all heap values in the remaining pools into the selected pools. This results in many empty pools, the memory of which can be returned to the operating system.</p>
<p>The new algorithm is entirely different from the previous one, so let's look at how it works!</p>
<h2>Compaction in 5.2</h2>
<h3>Allocating Into the Major Heap</h3>
<p>Simply put, compaction is a means of rearranging fragmented pieces of memory to larger  - compact - chunks. To understand how the compaction introduced in 5.2 works, we must first understand how allocation works in the GC's major heap (you may want to skip this part if you're already familiar with the process). When you first allocate a value in the major heap, it is given a size from a size class table. Each size class, in turn, lists four different ways a pool can be stored: unswept available, unswept full, available, and full.</p>
<p>These pools are divided into <em>blocks</em>, and each pool contains a number of blocks of the same size class. When we allocate a value, an appropriate size class is chosen depending on the size of the value, and the first pool with space available in that class is selected. The header of that pool has a pointer indicating the next available block, and the value is written into that memory block. Each domain is responsible for this process independent of other domains. This is crucial for acceptable performance in parallel programming.</p>
<p>When the GC sweeps the pools, it will free certain blocks and add them to the free list in the header of their pool. After a while, cycles of sweeping and allocation create pools with free ('empty') blocks interspersed among live ('full') blocks. This process results in inefficient memory use and many partially filled pools.</p>
<h3>Compacting the Major Heap</h3>
<p>To address this inefficiency, the developer can compact the major heap and move the live blocks into an optimised order among the pools. In OCaml 5, this technique follows a specific sequence, which is as follows:</p>
<p><strong>1. Barrier:</strong><br>
Because this is parallel compaction, we must synchronise all the domains before proceeding. Each domain has its own heap, and the heaps are compacted in parallel, with each domain responsible for its own compaction. Synchronisation is achieved with the help of a barrier.</p>
<p><strong>2. Size Class:</strong><br>
The compaction process iterates through each size class, processing one at a time, starting with the smallest. A <em>stats</em> table is allocated for each domain, with a slot available for each pool of the current size class being processed. Since the GC has already swept everything we will be compacting, there are only two states a pool can be in, full or available, where the latter means there is free space available in it.</p>
<p>The process then continues by using the stats table to check whether pools are full or available (meaning they have at least one free block). This process means we don't delve deeply into the memory to read from it, and there is no cache contamination. There is no synchronisation between domains in this step, and the compaction process for a domain only proceeds from here if there is at least one available pool.</p>
<p><strong>3. Using the Stats Table:</strong><br>
By this time, each domain to be compacted will have a stats table with a list of all the available pools. In the next step, the process goes down each pool on the available list and counts the number of live and free blocks. This is done linearly through the pool.</p>
<p>Once the number of live and free blocks is known, the number of live blocks is deducted from the number of free blocks. The resulting number of free blocks lets us calculate which pools can be emptied and which will be retained. This is all the information we need from the stats table, so once this step is completed, the stats table is cleared.</p>
<p><strong>4. Pool Pointers and Live Links:</strong><br>
To summarise, we now know how much live space there is within the pools and how much free space we can liberate if we compact the live blocks together. To achieve compaction, we create pointers to two pools, one to the first pool we are evacuating and one to the first pool we are retaining (for those who are curious, these pointers are named <code>current_pool</code> and <code>to_pool</code> respectively).</p>
<p>The process starts with the first pool we know will be evacuated. It finds the first live block within that pool and uses the <code>current_pool</code> pointer to remove it from the pool and the <code>to_pool</code> pointer to insert it into one of the pools we know will still be live post-compaction (this information comes from the calculation we did using the stats table).</p>
<p><strong>5. Compaction!:</strong><br>
This is the operative part of compaction: copying all the live blocks from pools that will be evacuated into pools that will remain live after compaction using the two pointers. As this is done, the process writes forward pointers that point from the block where something used to be stored <em>forward</em> to the block where it is now stored post-compaction.</p>
<p><strong>6. Barrier 2:</strong><br>
Again, another barrier syncs all the domains – a crucial part of compaction on multiple domains.</p>
<p><strong>7. Scanning:</strong><br>
This part of compaction is the most expensive in terms of time. The entire OCaml heap has to be scanned for pointers pointing to old block locations (moved as the pool they were in was evacuated), and old pointers must be updated using the forward pointers. Each domain is responsible for updating its data.</p>
<p>This is a deceptively extensive process. For example, even pointers in objects that are too large for the size allocator (so over 128 words) and therefore never moved by compaction may still need to be updated after the compaction process as they may also contain pointers to the old block locations.</p>
<p><strong>8. Barrier 3:</strong><br>
Another barrier synchronises between domains.</p>
<p><strong>10. Freeing Evacuated Pools:</strong><br>
All the evacuated pools are freed and added to the free list.</p>
<p><strong>11. Barrier 4:</strong><br>
Another barrier to synchronise between domains.</p>
<p><strong>12. Release Memory:</strong><br>
One domain, whichever is the first one to get to that point, unmaps the free list. This means that the memory we asked the OS for initially, which currently belongs to the OCaml system, is released at this point and goes back to the OS. This is the end benefit of compaction; it reduces the size of the OCaml system on your machine and returns memory to the OS for use elsewhere.</p>
<h2>Final Details &amp; Next Steps</h2>
<p>Before we wrap up, let's look at one more detail about how compaction works in 5.2. In 5.2, compaction uses a slab allocator and size classes, whereas OCaml 4 uses a free list. This means that OCaml 5.2 does not provide the option to set an allocation policy like OCaml 4.14. Our testing has found that for most workloads, the chosen allocation policy (using the size classes) performs well. However, expert users can tune the configuration of the size classes in <code>gen_sizeclasses.ml</code> (necessitating that they build their own OCaml), which they may find useful for their own projects. This is just one example of the challenge that comes with adapting a feature as complex as compaction to be compatible with multiple cores, and the careful weighing of pros and cons it requires on behalf of the developers.</p>
<p>The next steps for OCaml 5 are the expected restoration of MSVC backends, Statmemprof support, and the return of the unloadable runtime coming in releases 5.3 and 5.4. Keep a look out for future posts detailing those features and the efforts put toward bringing them back.</p>
<p>It's great to have compaction restored to OCaml, and it is a testament to the hard work of several teams within Tarides and the wider open-source community surrounding OCaml. We are happy to be part of the team working on this secure and performant programming language.</p>
<h2>Share Your Experience!</h2>
<p>We want to understand your experience! We're open to suggestions and feedback about the process to help us optimise the feature and deal with any pain points. You can share your thoughts on <a href="https://discuss.ocaml.org">OCaml's discussion forum</a> or make suggestions <a href="https://github.com/ocaml">directly in the repo</a>.</p>
<p>You can stay in touch with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>, <a href="https://mastodon.social/@tarides">Mastodon</a>, <a href="https://www.threads.net/@taridesltd">Threads</a>, and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2024-09-11-feature-parity-series-compaction-is-back</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-11-feature-parity-series-compaction-is-back.html</guid><dc:creator><![CDATA[ Sadiq Jaffer, Nick Barnes, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 11 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Easy Debugging for OCaml With LLDB]]></title><description><![CDATA[<p>If you’re just getting started with OCaml, you may be wondering how to effectively debug your code when things go wrong. Fortunately, OCaml's ecosystem has evolved, offering modern tools that make debugging a more approachable task.</p>
<p>Tarides engineer <a href="https://lambdafoo.com/posts/2024-08-03-lldb-ocaml.html">Tim McGilchrist recently wrote a blog post</a> that explores how to debug OCaml programs using LLDB on macOS. Developers familiar with languages like C, C++, or Rust may already have experience using LLDB, as it is a common choice on Linux or FreeBSD. LLDB is also the debugger that ships with XCode on macOS.</p>
<h3>The Role of LLDB in OCaml Debugging</h3>
<p>OCaml has traditionally used <a href="https://ocaml.org/docs/debugging">its built-in debugging tool <code>ocamldebug</code></a> for OCaml bytecode, but LLDB offers a way to debug native executables.</p>
<p><a href="https://lldb.llvm.org/">LLDB</a>, the debugger from <a href="https://github.com/llvm/llvm-project">the LLVM project</a>, offers a powerful way to inspect and debug compiled programs. While LLDB is not specific to OCaml, Tim's blog post highlights how it can be effectively used to debug OCaml code.</p>
<h3>Tips and Tricks</h3>
<p>Tim's post also provides practical tips for getting the most out of LLDB when working with OCaml. For instance, it discusses how to deal with OCaml’s optimised code, which can sometimes make debugging more challenging. It suggests compiling without certain optimisations when debugging complex issues, to ensure that the debugging information remains intact and the code paths are easier to follow.</p>
<h3>Final Thoughts</h3>
<p>Debugging is a critical skill for any developer, and mastering it in OCaml can significantly improve your productivity and code quality. The method outlined in Tim's post demystifies the process of using LLDB with OCaml, making it more accessible to those who are new to the language or who may be transitioning from other programming environments.</p>
<p>If you’re eager to dive deeper, <a href="https://lambdafoo.com/posts/2024-08-03-lldb-ocaml.html">please read Tim's full blog post</a>, which gives both detailed instructions and examples to help you get started. With these tools at your disposal, debugging OCaml code becomes a much more manageable task. Happy coding!</p>
]]></description><link>https://tarides.com/blog/2024-09-05-easy-debugging-for-ocaml-with-lldb</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-05-easy-debugging-for-ocaml-with-lldb.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 05 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Getting Specific: Announcing the Gospel and Ortac Projects]]></title><description><![CDATA[<p>Part of the benefit of open-source development is the opportunity to collaborate on projects across traditional organisational boundaries, such as academia and industry. Tarides is part of a larger effort to develop a behavioural specification language and associated tooling for OCaml. The project creates an easy-to-use foundation for formal specifications, allowing users to include them in generated documentation and perform automated testing and verification. This important work is funded in part by <a href="https://anr.fr/en/">ANR</a>.</p>
<h2>The Gospel Project</h2>
<p>The French National Research Agency (ANR) is a public institution in France that funds innovative research projects where public institutions collaborate with each other or the private sector.  OCaml was invented at the French National Institute for Research in Digital Science and Technology, <a href="https://www.inria.fr/en">INRIA</a>, and ANR is <a href="https://anr.fr/en/funded-projects-and-impact/funded-projects/project/funded/project/b2d9d3668f92a3b9fbbf7866072501ef-bcaf728f49/?tx_anrprojects_funded%5Bcontroller%5D=Funded&amp;cHash=abe71830301addcbf212c5a439e7fbbf">funding a research project</a> as a collaboration between <a href="https://www.inria.fr/fr">INRIA</a>, <a href="/">Tarides</a>, <a href="https://lmf.cnrs.fr">LMF UPSaclay</a>, and <a href="https://www.nomadic-labs.com">Nomadic Labs</a>. The goal of the project is to develop and improve the specification language <a href="https://github.com/ocaml-gospel/gospel">Gospel</a> alongside its tooling ecosystem and demonstrate its usefulness in different case studies.</p>
<h2>What is Gospel?</h2>
<p>Gospel is a contract-based behavioural specification language that allows you to write specifications in the module interface you want to specify. As a specification language, it is a formal language, meaning its semantics are precisely defined (by means of translation into Separation Logic, see <a href="https://inria.hal.science/hal-02157484v2/document">this paper</a>).</p>
<p>By <em>behavioural</em> specification language, we mean a language that allows you to describe the expected functional behaviour of a function. Specifications don't reference resources such as CPU time or memory size, but only what the program does (so-called functional behaviour). Expected behaviour is expressed as a contract. The basic premise is that as long as the user of the library calls functions with arguments that respect the expressed preconditions, then the implementation of the library should behave per their description.</p>
<p>Per se, Gospel doesn't guarantee that your implementation respects its given specifications, it is simply a language that allows you to express precisely what your code <em>should</em> do. However, note that Gospel still comes with a type-checker. This type-checker lets you check that your specifications are well-formed and in sync with the interface. For example, if you add an extra argument to a function in your library, the Gospel type-checker will tell you if you forgot to update the specifications accordingly.</p>
<p>Gospel is a relatively new specification language and is bound to evolve, but it is already mature enough to specify a diverse set of libraries. It provides developers with a non-invasive and easy-to-use syntax to annotate their module interfaces with formal contracts describing type invariants, mutability, function pre- and postconditions, exceptions, etc.
For example, let's say you want to specify a fixed-size stack. The type declaration in the module interface would look like:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ model capacity : integer
</span><span class="ocaml-comment-block">    mutable model contents : a sequence
</span><span class="ocaml-comment-block">    with s
</span><span class="ocaml-comment-block">    invariant Sequence.length s.contents &lt;= s.capacity </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>You give two logical models to your datatype: an immutable one for the capacity of the stack and a mutable one for the content. Then, given a stack, <code>s</code>, you can formulate type invariants. Namely, the stack should not have more elements than capacity allows.
The specification for the <code>create</code> function would look like:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block">@ s = create n
</span><span class="ocaml-comment-block">    requires n &gt; 0
</span><span class="ocaml-comment-block">    ensures s.capacity = n
</span><span class="ocaml-comment-block">    ensures s.contents = Sequence.empty </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>The first line binds the arguments and the returned value to names so that we can talk about them. Then we express the precondition that the given argument should be strictly positive and the two postconditions that fill the logical models as expected.</p>
<p>Gospel is also a tool-agnostic specification language, meaning it doesn’t make any assumption about how and by which tools its specifications will be consumed. Some users use Gospel specifications to provide proofs of functional correctness for their implementations. For example, <a href="https://github.com/ocaml-gospel/cameleer">Cameleer</a> does so by leveraging the power of the <a href="https://www.why3.org/">Why3</a> deductive verification platform. At Tarides, with the <a href="https://github.com/ocaml-gospel/ortac">Ortac</a> project, we explore how to use Gospel specifications to do runtime assertion checking.</p>
<p>Gospel was initially developed by Cláudio Lourenço (<a href="https://www.lri.fr">LRI</a> post-doctorate) and is currently maintained by Jean-Christophe Filliâtre, <a href="https://github.com/shym">Samuel Hym</a>, <a href="https://github.com/n-osborne">Nicolas Osborne</a>, and Mário Pereira. <a href="https://github.com/pascutto">Clément Pascutto</a> also maintained Gospel for several years as part of his PhD work at Tarides and LRI.</p>
<h2>A Tool for Gospel: Ortac</h2>
<p><a href="https://github.com/ocaml-gospel/ortac">Ortac</a> stands for OCaml RunTime Assertion Checking. Clément's PhD thesis initiated the Ortac project which has since grown into a greater cooperative effort. Samuel Hym and Nicolas Osborne currently maintain it. At its core, Ortac translates the computable subset of Gospel terms into OCaml code. In addition, it provides a plugin architecture to implement different uses of this translation. Translating Gospel terms into runnable OCaml code opens the possibility of checking an implementation against the interface specification at runtime.</p>
<p>Three plugins have been built upon this translation, plus a fourth one, which is slightly different. Let’s take a look:</p>
<ol>
<li>The Ortac/Wrapper plugin was developed during Clément's PhD. Given the Gospel specified interface of a module, this plugin generates a new module with the same interface, wrapping the initial implementation with runtime checks coming from Gospel specifications. When a Gospel specification is violated by the client for preconditions or by the initial implementation for postconditions, the wrapped version will output an error message providing the user with useful information, such as which Gospel clause they have violated. Users can then use the new wrapped module in place of the original one in their project, to, for example, aid in debugging efforts. This plugin is still considered experimental.</li>
<li>The Ortac/Monolith plugin, which is based on Ortac/Wrapper and is the product of an internship, was presented at <a href="https://inria.hal.science/hal-03328646">ICFP 2021</a>. Given the specified interface of a module, this plugin generates the <a href="https://gitlab.inria.fr/fpottier/monolith">Monolith</a> standalone program, testing the initial implementation against the wrapped one. The idea is that, in case the implementation doesn't respect the specification, the wrapped version will return a special Ortac error while the bare initial one won't. Monolith allows you to use fuzzing to test your library and provides a runnable scenario that demonstrates the unexpected behaviour. This plugin is also still considered experimental.</li>
<li>The Ortac/QCheck-STM plugin is based on Naomi Spargo's internship project. Given the specified interface of a module and some user-provided extra information, this plugin generates standalone <a href="https://github.com/ocaml-multicore/multicoretests">QCheck-STM</a> tests. In addition to avoiding having to write the QCheck-STM test by hand and as a recently added feature, in case of a test failure, the generated tests will inform you which part of the specification has been violated, give you a runnable scenario demonstrating the unexpected behaviour, and the expected returned value when the Gospel specifications allow to compute it. This plugin has been released.</li>
<li>The Ortac/Dune plugin is slightly different as it doesn't rely on a Gospel specification. Instead, it helps you by generating the Dune rules necessary to run another plugin. So far, only the command for Dune rules related to the Ortac/QCheck-STM plugin is available, as it is the only one that has been released. The command yields the Dune rules required to generate and run the tests.</li>
</ol>
<h2>Future Steps</h2>
<p>Regarding Ortac, the 0.3.0 version of a set of Ortac packages, including the first Ortac/Dune release, has recently been <a href="https://discuss.ocaml.org/t/ann-ortac-0-3-0-dynamic-formal-verification-made-easy/14936">published</a>. Among other improvements and fixes, this release makes dynamic formal verification with Ortac very easy. The team is now investigating how to test more functions by lifting some limitations that come with a naive use of the QCheck-STM test framework, with an intern (Nikolaus Huber) working on this last topic.</p>
<p>With <a href="https://discuss.ocaml.org/t/ann-gospel-0-3-0/14480/2">Gospel 0.3.0 published</a>, the upcoming goals centre around continuing maintenance and development, using our engineering expertise to help our research partners bring new features to the Gospel language.</p>
<h2>Until Next Time</h2>
<p>Do you want to try out <a href="https://github.com/ocaml-gospel/gospel">Gospel</a> and <a href="https://github.com/ocaml-gospel/ortac">Ortac</a>? Check out the documentation and report back on your experience! If you want to stay informed about our projects, follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> for all our latest updates. Are you interested in using Gospel for your own projects? <a href="/contact/">Contact us</a>, and we will be happy to discuss the benefits of implementing specification languages in your workflow.</p>
]]></description><link>https://tarides.com/blog/2024-09-03-getting-specific-announcing-the-gospel-and-ortac-projects</link><guid isPermaLink="false">https://tarides.com/blog/2024-09-03-getting-specific-announcing-the-gospel-and-ortac-projects.html</guid><dc:creator><![CDATA[ Nicolas Osborne, Isabella Leandersson ]]></dc:creator><pubDate>Tue, 03 Sep 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[The Biggest Functional Programming Conference of the Year: What are we Bringing to ICFP?]]></title><description><![CDATA[<p>Feeling fashionable? Milan is calling! ICFP 2024 will be held in the Italian fashion capital from 2-7 September, and there is something there for everyone to enjoy. The <a href="https://icfp24.sigplan.org/">ACM SIGPLAN International Conference on Functional Programming</a> is a yearly highlight with various keynotes, tutorials, and tracks to discover.</p>
<p>Can't wait for September 2nd? Check out the talks we're bringing to the conference this year and get a taste of what's to come!</p>
<h2>Tarides Talks</h2>
<p>We are excited about all the talks at the OCaml Workshop this year, and it's great to see such a variety of topics being presented. You can browse all the accepted papers on the OCaml track of the <a href="https://icfp24.sigplan.org/home/ocaml-2024#event-overview">ICFP website</a>. This year, Tarides team members will give five talks at the OCaml workshop and one at the ML workshop, so let's dive into the topics!</p>
<h3><a href="https://icfp24.sigplan.org/details/mlworkshop-2024-papers/10/Wasm_of_ocaml">Wasm_of_ocaml at the ML Workshop</a> by Jérôme Vouillon</h3>
<p>Jérôme Vouillon will be presenting on the fork of the well-loved <a href="https://github.com/ocsigen/js_of_ocaml">Js_of_ocaml</a> compiler, <a href="https://github.com/ocaml-wasm/wasm_of_ocaml">Wasm_of_ocaml</a>, which translates OCaml bytecode to WebAssembly (Wasm) rather than to JavaScript. Wasm is a low-level virtual machine that is both platform- and language-independent and is attractive to developers as a portable compilation target deployable on a wide variety of platforms. Wasm_of_ocaml is already fairly mature, having been successfully used on large programs.</p>
<p>Curious about what this means for OCaml users? Jérôme's talk will give you an overview of the compiler's features, including that it's highly compatible with Js_of_ocaml (one can compile existing programs with minimal changes) and offers a performance boost over the same (2x or more!). In addition, the talk will give you some background to Wasm_of_ocaml's design, the implementation of its runtime environment, how it interacts with JavaScript APIs, how we can take advantage of JavaScript to implement functionalities that are currently unavailable in Wasm, and share some benchmarks to give you an idea of how the compiler performs. Check out Jérôme's talk in the ML track for all this and more!</p>
<h3><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/14/First-Class-Windows-Building-a-Roadmap-for-OCaml-on-Windows">First-Class Windows: Building a Roadmap for OCaml on Windows</a> by Sudha Parimala, Benjamin Canou, Pierre Boutillier, and David Allsopp</h3>
<p>The goal of the First-Class Windows team (which we discuss more in <a href="/blog/2024-05-22-launching-the-first-class-windows-project/">our blog post introducing the project</a> is to bring support for the Windows platform up-to-par with Tier-1 platforms like Linux and macOS. Reaching this milestone will require planning, collaboration, and iterative change. Still, we expect the process will significantly enhance the OCaml experience for many users.
Check out the talk at the OCaml Workshop track at ICFP to learn all about the project's background, the current state of OCaml on Windows, the launch of the Windows Working Group, what information they are gathering, and where the project is heading.</p>
<p>Most importantly, the talk will unveil the new roadmap, which outlines the plan for addressing the pain points and improving OCaml on Windows. The roadmap is intended as a living, collaborative document. Join the talk to be part of the discussion and improve OCaml on Windows!</p>
<h3><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/3/Project-wide-occurrences-for-OCaml-a-progress-report">Project-Wide Occurrences for OCaml: A Progress Report</a> by Ulysse Gérard</h3>
<p>Our next talk will be great news for anyone who writes programs for OCaml! Good editor tooling is indispensable when writing programs, and different language servers provide a variety of features.</p>
<p>Before the <a href="/blog/2024-05-15-the-ocaml-5-2-release-features-and-fixes/">OCaml 5.2</a> update, the main language servers were limited to providing occurrences inside the active buffer. After the update with the latest <a href="https://github.com/ocaml/dune">Dune</a>, <a href="https://github.com/ocaml/merlin">Merlin</a>, and <a href="https://github.com/ocaml/ocaml-lsp"><code>ocaml-lsp</code></a> servers, users can take advantage of a new feature enabling project-wide usages search.  It allows users to list all the occurrences of a given value, type, or module, quickly navigating between each instance.</p>
<p>The team prioritised three key areas when developing the feature: correctness, exhaustivity, and performance. Correctness refers to the tool's ability to list the occurrences of a value without including false positives; exhaustivity refers to the fact that it needs to list <em>all</em> occurrences and perform the way it yields these results in a reasonable amount of time.</p>
<p>Come to the talk for a demonstration of the powerful new search features available in OCaml! Learn about the project's design and implementation, the challenges they encountered  (including some noteworthy patches to the compiler, Dune, Merlin, and <code>ocaml-lsp</code>!),  and improvements planned for the future.</p>
<h3><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/5/Picos-Interoperable-effects-based-concurrency">Picos – Interoperable Effects-Based Concurrency</a> by Vesa Karvonen</h3>
<p>Picos is an ongoing project created to allow users to mix and match effects-based concurrent programming libraries and async I/Os. OCaml 5 introduced support for parallelism and effects, which enables the implementation of direct-style cooperative concurrency libraries like <a href="https://github.com/ocaml-multicore/eio">Eio</a>. In fact, many such libraries have been created, including <a href="https://github.com/c-cube/moonpool">Moonpool</a>, <a href="https://github.com/ocaml-multicore/domainslib">Domainslib</a>, <a href="https://github.com/robur-coop/miou">Miou</a>, and more!</p>
<p>The current difficulty is that all of these libraries are incompatible, meaning that a program that uses one cannot directly use another. This also means that, theoretically, libraries would need to provide an ever-increasing number of backends to ensure their users could combine them with the effects-based scheduler of their choice. This constant game of catch-up isn't desirable for developers or users of these libraries, and Picos aims to solve this problem.</p>
<p>Picos is an interface that enables interoperability between different libraries, created to provide users with greater flexibility and choice. Vesa's talk will introduce the nature of the problem, technical details of how Picos is implemented, details about its performance, and a vision for the future of schedulers in OCaml with Picos.</p>
<h3><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/10/Opam-2-2-and-beyond">Opam 2.2 and Beyond</a> by Raja Boujbel, Kate Deplaix, and David Allsopp</h3>
<p>The hardworking team behind <a href="https://opam.ocaml.org/blog/opam-2-2-0/"><code>opam</code> 2.2</a> are excited to present the update and share the new features! The update's main feature is native Windows support, something many community members have been looking forward to. With the project commencing in 2014, it has taken a significant amount of work and a long time to reach this important milestone – a process you can learn all about in their upcoming talk!</p>
<p>The talk will give listeners an opportunity to understand all the different elements that came together to bring native Windows support to OCaml, along with some of the other new features added in <code>opam</code> 2.2. This includes often overlooked aspects of releases like bug fixes and functional testing for stability.</p>
<p>Additionally, the talk presents insights into the maintenance of <code>opam</code> and moving towards a new release cycle, hinting at what the future will bring. If you're at all curious about <code>opam</code> and how maintainers bring new features to users, this is the talk for you!</p>
<h3><a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/12/Saturn-a-library-of-verified-concurrent-data-structures-for-OCaml-5">Saturn: A Library of Verified Concurrent Data Structures for OCaml 5</a> by Clément Allain, Vesa Karvonen, Carine Morel</h3>
<p>This talk presents a useful new OCaml 5 library that offers a collection of ready-made efficient concurrent data structures that are well-tested, benchmarked, and, in part, formally verified.</p>
<p>Parallel programs are complex, and sharing data between multiple threads presents a well-known difficulty. Using locks is a known way of managing this complexity, but it is not always the best option, potentially introducing unsatisfactory performance and liveness issues. In such cases, lock-free implementations may be preferable, but their complexity can make them hard to design. <a href="https://github.com/ocaml-multicore/saturn">Saturn</a> saves the OCaml 5 developer the trouble of designing their own lock-based or lock-free data structures by providing them with a selection of standard ready-to-use structures.</p>
<p>Join the talk to hear all about the library's design and details about the benchmarks, tests, and formal verification the team has done. You can look forward to a technical deep dive full of details about what creating parallel programs in OCaml really involves!</p>
<h2>Until Next Time!</h2>
<p>If you're attending ICFP this year, come and find us! We would love to chat with you about everything OCaml and functional programming. To stay up-to-date, you should follow us on <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> and <a href="https://bsky.app/profile/tarides.com">Bluesky</a>. We look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2024-08-30-the-biggest-functional-programming-conference-of-the-year-what-are-we-bringing-to-icfp</link><guid isPermaLink="false">https://tarides.com/blog/2024-08-30-the-biggest-functional-programming-conference-of-the-year-what-are-we-bringing-to-icfp.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 30 Aug 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Project-Wide Occurrences: A New Navigation Feature for OCaml 5.2 Users]]></title><description><![CDATA[<p>With the release of <code>merlin-lib</code> <code>5.1-502</code> and associated <code>ocaml-lsp-server</code>, we
brought a new, exciting feature to OCaml's editor tooling: project-wide
occurrences.</p>
<p>The traditional "occurrences" query in Merlin modes, named "Find All
References" in LSP-based mode, was used to only return results in the active buffer.
This is no longer the case!</p>
<p>Occurrences queries will now return every usage of
the selected identifier across all of the project's source files.</p>
<blockquote>
<p>There are some limitations that come with this initial release. When queried
from an identifier's <em>usage</em> or its <em>definition</em>, all other <em>usages</em> of it
are returned, but related <em>declarations</em> are not. In particular, this means
that queries should be made from implementation files, not interfaces (<code>.mli</code>).</p>
</blockquote>
<p>In this post, we will give an overview of the ecosystem's various parts that
need to work together for this feature to work.</p>
<h2>Try It!</h2>
<p>Before diving into technical details, let's see how it works. You can try it on any project that builds with Dune and is compatible with OCaml 5.2.</p>
<p>Update your switch by running <code>opam update &amp;&amp; opam upgrade</code> to get the required tool versions:</p>
<ul>
<li>Dune <code>&gt;= 3.16.0</code></li>
<li>Merlin <code>&gt;= 5.1-502</code></li>
<li>OCam-LSP <code>&gt;= 1.19.0</code></li>
</ul>
<p>Since we are looking for all occurrences, we need to build an index for Merlin
and LSP. Fortunately, this is well integrated in Dune, and you can build the index
for your project by running:</p>
<pre><code>dune build @ocaml-index
</code></pre>
<p>This alias ensures that all the artifacts needed by Merlin are built. You can
also add <code>--watch</code> to always keep the configuration and the indexes up to date
while you edit your source files.</p>
<blockquote>
<p>Note that unlike <code>dune build @check</code>, the <code>@ocaml-index</code> will build the entire project, including tests.</p>
</blockquote>
<p>Once the index is ready, you can query for project-wide occurrences:</p>
<ul>
<li><code>merlin-project-occurrences</code> in Emacs</li>
<li><code>MerlinOccurrencesProjectWide</code> in Vim</li>
<li><code>Find All References</code> in LSP-based plugins.</li>
</ul>
<p>Here is a comparison of a references query before, and after building the index:</p>
<center>
<video autoplay="" loop="" style="width: 100%">
  <source src="/blog/images/2024-08-29.pwo/pwo_side_by_side~zbgnf1BtBqKRaPalS-C9Ow.webm" type="video/webm">
</video>
</center>
<p>Now, let's dive into more technical details.</p>
<h2>High-Level Overview</h2>
<p>The base design is fairly simple. We want to iterate on every identifier of
every source file, determine their definition, and group together those that
share the same definition. This forms an index. Tools can then query that index
to get the location list of identifiers that share the same definition.</p>
<p>The following section describes how we implemented that workflow:</p>
<ol>
<li>Compute definitions using two-step shape reduction</li>
<li>Driving of the indexer tool by the build system</li>
<li>Changes to Merlin to properly answer queries</li>
</ol>
<h2>Two-Step Shape Reduction</h2>
<p>Finding an identifier's definition in OCaml is a difficult problem, mostly
because of its powerful module system. A solution to this problem has been recently described
in <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/10/Module-Shapes-for-Modern-Tooling">a presentation at the ML
Workshop</a>: shapes.
In short, shapes are terms of a simple lambda-calculus that represent an
abstraction of the module system. To find an identifier's definition, one
can build a shape from its path and reduce (as in beta-reduction) that shape.
The result should be a leaf with a UID that uniquely represents the
definition.</p>
<p>This has been implemented in the compiler, and Merlin already takes advantage of
it to provide a precise <code>jump-to-definition</code> feature.</p>
<p>For project-wide occurrences, we perform shape reduction in two steps:</p>
<p>First, at the end of a module's compilation, the compiler iterates on the
Typedtree and <em>locally</em> reduces every identifier's shape. If the
identifier is locally (in the same unit) defined, the result will be a
fully-reduced shape holding the definition's UID. However, if the identifier is
identified in another compilation unit, the result is a partially-reduced
shape, because we cannot load the <code>.cmt</code> files of other compilation
units (that are required to finish the reduction) without breaking the
separate compilation assumptions. These resulting UIDs or partially-reduced
shapes are stored in the unit's <code>.cmt</code> file:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">cmt_infos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">cmt_ident_occurrences</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-source">ident_with_loc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">def_uid_or_shape</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>Then, an external tool (called <code>ocaml-index</code>) will read that list and finish
the reduction of the shapes when necessary. This step might load the <code>.cmt</code> files
of any transitive dependency the unit relies on.</p>
<h2>Indexation by the Build System</h2>
<p>The tool we just introduced, <code>ocaml-index</code>, plays two roles:</p>
<ol>
<li>It finishes the reduction of the shapes stored in the <code>.cmt</code> files.</li>
<li>It aggregates locations that share the same definition's UID.</li>
</ol>
<p>The result is an index that is written to a file. Additionally, the tool
can merge multiple indexes together. This allows build systems to handle
indexation in the way they decide.</p>
<p>We only provide rules for Dune right now, but the tools themselves are built
system agnostic. The Dune rules are as follow:</p>
<p>For every stanza <code>library</code> or <code>executable</code>, we index every <code>.cmt</code> file and store
the results into an <code>index</code> file for the stanza.</p>
<ul>
<li>This process, similar to linking, depends on every transitive dependency of
the stanza being indexed, since shape reduction might require loading those
<code>cmt</code> files.</li>
<li>Additionally, if any of the dependencies' indexes have changed, each stanza's index must be rebuilt.</li>
</ul>
<p>This is a somewhat simple but heavy process, and it could be refined in the future. Right now it performs well enough to provide a usable watch mode in small to fairly large projects (like Dune itself).</p>
<h2>Index Configuration and Reading</h2>
<p>Last but not least, we need a way for Merlin to know were the <code>index</code> files are
located and how to read them.</p>
<p>This is done by using a new configuration directive <code>INDEX</code>. It can be used to
provide one or more <code>index</code> files to Merlin. Then, querying for all the usages of
the identifier under the cursor is done in the following way:</p>
<ul>
<li>Identify the path of the identifier under the cursor</li>
<li>Reduce the shape corresponding to this path to get the definition's UID.</li>
<li>Lookup this UID in the <code>index</code> files and in the current buffer's index
(which is computed by Merlin).</li>
<li>Return all the locations</li>
</ul>
<h2>Future Work</h2>
<p>Thank you for reading this post! We hope you will have a lot of fun exploring
your codebases using this new feature. We have a lot of exciting improvements on
our roadmap, some of which involve returning the declarations linked to an
identifier and providing project-wide renaming queries.</p>
<p>If you are interested to learn more about these features or to contribute,
please have a look at <a href="https://github.com/ocaml/merlin/issues/1780">this tracking
issue</a>. You can also have a look at
the
<a href="https://discuss.ocaml.org/t/ann-project-wide-occurrences-in-merlin-and-lsp/14847">announcement</a>
and <a href="https://github.com/ocaml/merlin/wiki/Get-project%E2%80%90wide-occurrences">wiki
page</a>.
Finally, feel free to attend future <a href="https://github.com/ocaml/merlin/wiki/Public-dev%E2%80%90meetings">Merlin public
meetings</a> and
watch the <a href="https://icfp24.sigplan.org/details/ocaml-2024-papers/3/Project-wide-occurrences-for-OCaml-a-progress-report">talk at the OCaml
Workshop</a>
during ICFP!</p>
]]></description><link>https://tarides.com/blog/2024-08-28-project-wide-occurrences-a-new-navigation-feature-for-ocaml-5-2-users</link><guid isPermaLink="false">https://tarides.com/blog/2024-08-28-project-wide-occurrences-a-new-navigation-feature-for-ocaml-5-2-users.html</guid><dc:creator><![CDATA[ Ulysse Gérard ]]></dc:creator><pubDate>Wed, 28 Aug 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[How TSan Makes OCaml Better: Data Races Caught and Fixed]]></title><description><![CDATA[<p>Parallel programming opens up brand-new possibilities. Using multiple cores means that users can benefit from powerful OCaml features (like formal proofs and high security) while enjoying greater performance, enabling them to <a href="/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain/">improve their services or projects</a>.</p>
<p>However, introducing such a significant change to the OCaml ecosystem would not be practical without providing tools that help users ensure memory safety in parallel programming. This is where <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">multicore tests</a> come in, as does ThreadSanitizer (TSan) support for OCaml. We have published two previous posts on TSan, <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">an overview of the tool</a> and an <a href="/blog/2024-01-17-what-are-data-races-and-do-they-threaten-your-business/">introduction to the danger of data races in general</a>. This post will give you a behind-the-scenes look at how we have used TSan to find and fix races in the OCaml runtime. Official support for TSan arrived with OCaml 5.2, but as you can tell <a href="https://github.com/ocaml/ocaml/tree/trunk/testsuite/tests/tsan">from the repository</a>, we have been using TSan internally for a while now.</p>
<h2>Catching Races in OCaml</h2>
<p>As a result of TSan coming with the <a href="/blog/2024-05-15-the-ocaml-5-2-release-features-and-fixes/">OCaml 5.2 update</a>, several bug fixes in the same update addressed data races. A data race occurs when two accesses are made to the same memory location; at least one is a <code>write</code>, and no order is enforced between them. Data race bugs can be hard to spot, but since they can result in unexpected behaviours, they are a high-priority item to fix.</p>
<p>It is important to note that not all data races are equal. In OCaml, the memory model guarantees that memory safety is preserved even when data races occur. This makes data races in OCaml much 'safer' than in many other languages, where races can impact memory safety. OCaml programs also require support from the OCaml runtime which provides low-level operations such as memory allocation and garbage collection. The OCaml runtime is written in C and must be data race-free according to the C memory model since a data race in the runtime would impact the validity of the whole program. While TSan has been integrated to detect data races in OCaml code, it has also proven invaluable in detecting errors in OCaml's runtime, many of which were subsequently fixed in 5.2.</p>
<h2>What has TSan Caught so Far?</h2>
<p>Ecosystem contributors make continuous efforts towards maintenance, and as part of this work, use TSan to check the OCaml runtime. To further simplify TSan usage we have added a TSan Continuous Integration (CI) run that executes the OCaml test suite in a TSan-enabled switch, automatically detecting inadvertently introduced data races in the runtime. This has allowed us to catch and fix several data races.
Some of these include:</p>
<ul>
<li>
<p><strong>Fixing a Race in the Minor GC:</strong> PRs <a href="https://github.com/ocaml/ocaml/pull/12595">#12595</a> describes a race condition occurring when <code>caml_collect_gc_stats_sample</code> made calls to <code>domain_terminate</code>, and <a href="https://github.com/ocaml/ocaml/pull/12597">#12597</a> outlines the fix implemented in 5.2. In this data race, the internal garbage collector data could cause the program to report incorrect garbage collection statistics.</p>
</li>
<li>
<p><strong>Data Race Between Marking and Sweeping Garbage Collector Phases:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12934">#12934</a> fixes a <a href="https://github.com/ocaml/ocaml/issues/12916">race between marking and sweeping functions</a> caught by TSan. When the garbage collector marks a value for collection, sweeping code (a later phase of the GC, effectively marking unreachable values as free space) in another thread may read the value simultaneously. This is normal behaviour, but the memory read was not marked as an atomic operation as it should have been, introducing a risk of undefined behaviour.</p>
</li>
<li>
<p><strong>Data Race on Global Pools Arrays:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12755">#12755</a> addresses races on <code>global_avail_pools</code> and <code>global_full_pools</code> members of the <code>struct pool_freelist</code> in <code>shared_heap.c</code>.<br>
For performance reasons, OCaml's runtime allocates major heap memory in chunks stored in pools that can be recycled across domains. While the algorithm for accessing them in parallel was correct, atomic qualifiers and explicitly qualified memory operations were missing, causing TSan to report unsynchronised memory accesses. The PR adds the necessary qualifiers.</p>
</li>
<li>
<p><strong>Data Races in <code>minor_gc.c</code>:</strong> This <a href="https://github.com/ocaml/ocaml/pull/12737">PR #12737</a> fixes two races. One in the minor GC occurring when promoting values in the remembered set, and one in the <code>Dynlink</code> library happening due to an incorrect C function trying to access an OCaml value even when the garbage collector might be running.</p>
</li>
<li>
<p><strong>Data Race fix for #12799:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12851">#12851</a> fixes a bug described in issue <a href="https://github.com/ocaml/ocaml/issues/12799">#12799</a>. When a <a href="https://v2.ocaml.org/manual/parallelism.html#s:par_domains">domain</a> terminates, it emits a <a href="https://ocaml.org/manual/5.1/runtime.html">runtime event</a> – a piece of information that can be monitored using a dedicated API to debug or profile the performance of OCaml programs. However, under certain circumstances, the emission of this event could race with the shutdown of the runtime events system itself, possibly leading to incorrect information being emitted or, worse, memory corruption. Proper synchronisation has been implemented to ensure this doesn't happen.</p>
</li>
<li>
<p><strong>Data Race When Using the Debug Runtime:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12969">#12969</a> resolves a data race involving <code>caml_scan_stack</code> and <code>caml_free_stack</code>. It was possible for two <a href="https://ocaml.org/manual/5.1/parallelism.html#s:par_domains">domains</a> to perform a garbage collector marking on the same <a href="https://ocaml.org/manual/5.1/effects.html#s:effects-fibers">fibre</a>, and in very rare cases when that fibre was terminating. This caused TSan to report a data race. While the code was correct in practice, we fixed the access to make it correct according to the C11 memory model, thus avoiding undefined behaviour.</p>
</li>
</ul>
<h2>Until Next Time!</h2>
<p>Seeing a tool we have developed so quickly benefit the larger ecosystem is excellent. TSan helps developers test their parallel programs for potential risks they would have difficulty discovering. We look forward to seeing users put TSan to the test and share the results!</p>
<p>The OCaml community welcomes contributions and feedback and invites users to share any issues in the <a href="https://github.com/ocaml/ocaml">OCaml GitHub repo</a>. The discussion forum <a href="https://discuss.ocaml.org/">OCaml Discuss</a> is another place to share your thoughts and get input from others in the community.</p>
<p>Would you like to stay up-to-date with us? Follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to see regular posts on our projects, announcements, tutorials, and more.</p>
]]></description><link>https://tarides.com/blog/2024-08-21-how-tsan-makes-ocaml-better-data-races-caught-and-fixed</link><guid isPermaLink="false">https://tarides.com/blog/2024-08-21-how-tsan-makes-ocaml-better-data-races-caught-and-fixed.html</guid><dc:creator><![CDATA[ Olivier Nicole, Fabrice Buoro, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 21 Aug 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Monoculture of Insecurity: How CrowdStrike's Outage Exposes the Risks of Unchecked Complexity in Cybersecurity]]></title><description><![CDATA[<p>A seismic event in the IT world, everyone is talking about the <a href="https://www.crowdstrike.com/en-us/">CrowdStrike</a> update that caused global chaos earlier this month. There are many <a href="https://cheriot.org/security/philosophy/2024/07/19/crowdstrike-is-the-opposite-of-cheriot.html">great articles and blog posts</a> dissecting the event and suggesting ways to avoid a repeat. Rather than join our voice to the chorus and explain how a small change could have avoided the entire palaver, we will approach the topic more broadly.</p>
<p>While it is helpful to understand what happened with CrowdStrike, the next major outage will likely arise from a different flaw altogether. As such, and since we expect that another major event is likely to come, it’s essential to consider the risk factors behind these major cyber events and ways to reduce those risks.</p>
<p>The cybersecurity sector is constantly facing new threats and challenges. How can we transform these obstacles into opportunities for growth and improvement, ensuring greater protection for those who rely on our services? This blog post explores the answers to this question, presenting some lesser-known solutions that deserve consideration, and providing a fresh perspective on staying ahead of emerging threats and building more resilient defences.</p>
<h2>What we Know so far</h2>
<p>Affecting at least 8.5 million Windows machines, the outage significantly affected the aviation, broadcasting, and healthcare industry. The <a href="https://www.bbc.com/news/articles/cpe3zgznwjno">BBC labelled</a> it “probably the largest ever cyber-event” and “one of the worst cyber-incidents in history.” But what actually happened?</p>
<p>Arising from something as commonplace as a software update for the Falcon platform, CrowdStrike’s own <a href="https://www.crowdstrike.com/falcon-content-update-remediation-and-guidance-hub/">preliminary Post Incident Review</a>, indicates that a security content configuration update delivered an undetected error (the now infamous Channel File 291) to user machines. The error slipped through the validation checks due to a bug, and trust in the tests allowed a faulty file with an out-of-bounds memory error to reach production. At its root, the global outage appears to be caused by unsafe parsers (<a href="https://cwe.mitre.org/top25/archive/2023/2023_top25_list.html">a classic error</a>), resulting in a parsing bug.</p>
<h2>What are the Best Practices for the Cybersecurity Sector?</h2>
<p>Now that we’ve covered this background, it’s time to look at some of the underlying factors that played a role in the CrowdStrike outage and the ones yet to come. The global outage served as an excellent wake-up call to the entire industry about the necessity of maintaining the checks and balances that keep our global systems secure.</p>
<h3>Supply Chain Delivery</h3>
<p>Let’s start with the fundamentals: how did the flawed update get delivered to so many? When deploying updates at scale, industry best practices can help mitigate risks by employing methods like staggered deployments and rollback options. In a best-case scenario updates are staggered, or rolled out incrementally, starting with 1% and, if all goes well, 5%, and so on. One can also use automated systems that rollback updates when a fault is detected, undoing the damage and keeping user’s machines operational. By using either method, preferably both in combination, a flaw in an update has negligible impact, voiding the kind of international chaos we saw on Friday.</p>
<p>Another critical aspect of the supply chain is the internal testing performed before deployment to ensure quality and safety. The recent chaos would likely have been preventable if the internal testing processes had caught the flaw earlier.While cost-cutting measures may be tempting, they can ultimately lead to much greater costs in the event of a cybersecurity breach or production downtime.</p>
<h3>OS Monocultures: Do you Really Need a Full Windows OS Stack to run an Airport Screen?</h3>
<p>The IT industry is increasingly dominated by a few major operating systems, primarily Microsoft Windows. This creates the emergence of 'OS monocultures', with one dominant provider overshadowing a few large ones (Linux and Mac), leaving little space for diverse, smaller suppliers. While these monocultures offer benefits, they also pose significant risks, and any introduced vulnerabilities can cause widespread damage.</p>
<p>This is what happened with CrowdStrike. The cybersecurity firm has around 20% market share, and because everyone uses the same stack, a single bug can have an enormous impact. One way to reduce the risk of a shared stack is to generate a unique stack for each application; that way, bugs are contained in that stack. One way of achieving this is by using <a href="https://en.wikipedia.org/wiki/Unikernel">unikernels</a> to build a small, highly specified stack with only what is required to run the application. <a href="https://mirage.io">MirageOS</a> is a library operating system that constructs unikernels to create secure, high-performance network applications with small attack surfaces.You don’t actually need to install a generic operating system to manage a single-purpose appliance, such as an airport screen.</p>
<h3>Formal Verification, Testing, and Organisational Change</h3>
<p>The road to security and reliability is paved with organisational changes that improve the overall stability of systems and reduce risk, impacting the way we develop software from start to finish. Scott Hanselmann, the VP of Developer Community at Microsoft, highlights this dynamic in one of his <a href="https://x.com/shanselman/status/1814458774704607572">posts on X</a>:</p>
<blockquote>
<p>“It’s always one line of code, but it’s NEVER one person... Engineering practices failed to find a bug multiple times, regardless of the seniority of the human who checked that code in. Solving the larger system thinking SDLC matters more than the null pointer check.”</p>
</blockquote>
<p>But what does changing the software development life cycle on an organisational level look like? For one, it involves spending the time and cost of <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">creating comprehensive tests</a> that catch bugs easily missed by developers before they ever reach production.</p>
<p>Including formal verification, for example of the device driver and the code it executes, is another aspect of software development that can prevent faults reaching production. Using formal verification, developers can mathematically prove that a program behaves according to its formal model and correctly performs a defined property.</p>
<p>As it relates to the CrowdStrike incident, formal verification of parsers is challenging but not impossible. For example, the <a href="https://www.microsoft.com/en-us/research/blog/everparse-hardening-critical-attack-surfaces-with-formally-proven-message-parsers/">EverParse</a> framework emits secure, formally verified code for parsers that can be used in programs, including OCaml programs. Creating a software development culture that includes formal verification, fuzz testing, and other tests decreases the risk of failures slipping through the net.</p>
<h3>The Role of Type Safety</h3>
<p>Finally, let’s look at perhaps a more obvious topic in the light of the fault in Channel File 291. Using type- and memory-safe languages, like OCaml and Rust, prevents out-of-bounds memory errors and a whole class of other bugs.</p>
<p>That’s not the key takeaway we can all learn from, however, which is about complexity.  The assertion that “the central enemy of reliability is complexity ... complex systems tend to not be entirely understood by anyone” from a <a href="https://ccianet.org/wp-content/uploads/2003/09/cyberinsecurity%20the%20cost%20of%20monopoly.pdf">cybersecurity paper authored by several industry specialists</a> holds true in this case. By eliminating a whole class of errors at compile time, a language like OCaml with a strong type system critically reduces the kind of complexity that leads to cyber-insecurity. A language like OCaml simplifies the developer workflow when it comes to catching bugs.</p>
<p>Most importantly, from an industry standpoint, and as mentioned above, the faulty file that caused the outage remained undetected , even in the face of extensive stress tests. Building critical systems with <a href="https://www.security.gov.uk/guidance/secure-by-design/principles/">secure-by-design</a> principles includes using building blocks that contribute to the robustness of the entire system by preventing faults and reducing complexity. One facet of this puzzle is to use languages immune to certain kinds of bugs, such as type- and memory-safe languages, but that is not enough. Varied and rigorous tests, like fuzz testing, are a necessary complement to any language. For example, the <a href="https://mirage.io/">MirageOS</a> network stack has had <a href="https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/kaloper-mersinjak">fuzz testing performed on it</a> to prevent parser issues, providing another layer of safety to the already type-safe OCaml.</p>
<h2>Join in the Conversation</h2>
<p>In essence, the cause and lessons from the CrowStrike outage are more complex than they may seem at first glance. It’s easy to reduce it to a line of faulty code, but the real takeaway is that the entire industry needs to implement better practices to safeguard users from risk. This outage was not the result of a cyber attack or malware, but the next one could be, and we cannot let the fate of our global networks rest entirely on endpoint security measures like antivirus programs, firewall management, and VPNs. We need to build foundational secure systems from the ground up.</p>
<p>In this context, and in light of <a href="/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release/">global calls for change in how cybersecurity is addressed</a>, it is the right time to have these conversations and strengthen the sector from within. We believe the best approach is to adopt a <a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">secure-by-design strategy</a> implemented in a type-safe and reliable language like <a href="https://ocaml.org/">OCaml</a>.</p>
<p>There are many perspectives on this, however, and we want to hear yours. Connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> and share your thoughts – we look forward to hearing from you!</p>
]]></description><link>https://tarides.com/blog/2024-08-01-monoculture-of-insecurity-how-crowdstrike-s-outage-exposes-the-risks-of-unchecked-complexity-in-cybersecurity</link><guid isPermaLink="false">https://tarides.com/blog/2024-08-01-monoculture-of-insecurity-how-crowdstrike-s-outage-exposes-the-risks-of-unchecked-complexity-in-cybersecurity.html</guid><dc:creator><![CDATA[ Miklos Tomka, Isabella Leandersson ]]></dc:creator><pubDate>Thu, 01 Aug 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Creating the SyntaxDocumentation Command - Part 3: VSCode Platform Extension]]></title><description><![CDATA[<p>In the final installment of our series on the <code>SyntaxDocumentation</code> command, we delve into its integration within the OCaml VSCode Platform extension. Building on our previous discussions about Merlin and OCaml LSP, this article explores how to make <code>SyntaxDocumentation</code> an opt-in feature in the popular VSCode editor.</p>
<p>In the first part of this series, <a href="/blog/2024-04-17-creating-the-syntaxdocumentation-command-part-1-merlin/">Creating the SyntaxDocumentation Command - Part 1: Merlin</a>, we explored how to create a new command in Merlin, particularly the <code>SyntaxDocumentation</code> command. In the second part, <a href="/blog/2024-06-12-creating-the-syntaxdocumentation-command-part-2-ocaml-lsp/">Creating the SyntaxDocumentation Command - Part 2: OCaml LSP</a>, we looked at how to implement this feature in OCaml LSP in order to enable visual editors to trigger the command with actions such as hovering. In this third and final installment, you will learn how <code>SyntaxDocumentation</code> integrates into the OCaml VSCode Platform extension as an opt-in feature, enabling users to toggle it on/off depending on their preference.</p>
<h2>VSCode Editor</h2>
<p><a href="https://code.visualstudio.com/">Visual Studio Code</a> is a free open-source, cross-platform code editor from Microsoft that is very popular among developers.
Some of its features include:</p>
<ul>
<li>Built-in Git support</li>
<li>Easy debugging of code right from the editor with an interactive console</li>
<li>Built-in extension manager with lots of available extensions to download</li>
<li>Supports a huge number of programming languages, including syntax highlighting</li>
<li>Integrated terminal and many more features</li>
</ul>
<h2>OCaml Platform Extension for VSCode</h2>
<p>The VSCode OCaml Platform extension enhances the development experience for OCaml programmers. It is itself written in the OCaml programming language using bindings to the VSCode API and then compiled into Javascript with <a href="https://github.com/ocsigen/js_of_ocaml"><code>js_of_ocaml</code></a>. It provides language support features such as <code>syntax-highlighting</code>, <code>go-to-definition</code>, <code>auto-completion</code>, and <code>type-on-hover</code>. These key functionalities are powered by the OCaml Language Server (<code>ocamllsp</code>), which can be installed using popular package managers like <a href="https://opam.ocaml.org/">opam</a> and <a href="https://esy.sh/">esy</a>. Users can easily configure the extension to work with different sandbox environments, ensuring a tailored setup for various project needs. Additionally, the extension includes comprehensive settings and command options, making it very versatile for both beginner and advanced OCaml developers.</p>
<p>The OCaml Platform Extension for VSCode gives us a nice UI for interacting with OCaml-LSP. We can configure settings for the server as well as interact with switches, browse the AST, and many more features. Our main focus is on adding a <code>checkbox</code> that allows users to activate or deactivate <code>SyntaxDocumentation</code> in OCaml LSP's <code>hover</code> response. I limited this article's scope to just the files relevant in implementing this, while giving a brief tour of how the extension is built.</p>
<h2>The Implementation</h2>
<h3>Extension Manifest</h3>
<p>Every VSCode extension has a manifest file, <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/package.json">package.json</a>, at the root of the extension directory. The <code>package.json</code> contains a mix of Node.js fields, such as scripts and <code>devDependencies</code>, and VS Code specific fields, like <code>publisher</code>, <code>activationEvents</code>, and <code>contributes</code>.
Our manifest file contains general information such as:</p>
<ul>
<li><strong>Name</strong>: OCaml Platform</li>
<li><strong>Description</strong>: Official OCaml language extension for VSCode</li>
<li><strong>Version</strong>: 1.14.2</li>
<li><strong>Publisher</strong>: OCaml Labs</li>
<li><strong>Categories</strong>: Programming Languages, Debuggers</li>
</ul>
<p>We also have commands that act as action events for our extension. These commands are used to perform a wide range of things, like navigating the AST, upgrading packages, deleting a switch, etc.
An example of a command to open the AST explorer is written as:</p>
<pre><code class="language-json">{
    "command": "ocaml.open-ast-explorer-to-the-side",
    "category": "OCaml",
    "title": "Open AST explorer"
}
</code></pre>
<p>For our case, enabling/disabling <code>SyntaxDocumentation</code> is a configuration setting for our language server, so we indicate this in the configurations section:</p>
<pre><code class="language-json">"ocaml.server.syntaxDocumentation": {
    "type": "boolean",
    "default": false,
    "markdownDescription": "Enable/Disable syntax documentation"
}
</code></pre>
<h3>Extension Instance</h3>
<p>The file <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/extension_instance.ml"><code>extension_instance.ml</code></a> handles the setup and configuration of various components of the OCaml VSCode extension and ensures that features like the language server and documentation are properly initialised. Its key functionalities are:</p>
<ul>
<li><strong>Managing the Extension State</strong>: It uses a record type that encapsulates the state of the extension, holding information about the sandbox, REPL, OCaml version, LSP client, documentation server, and various other settings.</li>
</ul>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">sandbox</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sandbox</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">repl</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Terminal_sandbox</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">ocaml_version</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ocaml_version</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">lsp_client</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">LanguageClient</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ocaml_lsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">documentation_server</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Documentation_server</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">documentation_server_info</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">StatusBarItem</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">sandbox_info</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">StatusBarItem</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">ast_editor_state</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ast_editor_state</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">codelens</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">extended_hover</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">dune_diagnostics</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<ul>
<li>
<p><strong>Interacting With the Language Server</strong>: This extension needs to interact with the OCaml language server (<code>ocamllsp</code>) to provide features like code completion, diagnostics, and other language-specific functionalities.</p>
</li>
<li>
<p><strong>Documentation Server Management</strong>: The file includes functionality to start, stop, and manage the documentation server, which provides documentation lookup for installed OCaml packages.</p>
</li>
<li>
<p><strong>Handling Configuration</strong>: This extension allows users to configure settings such as code lens, extended hover, diagnostics, and syntax documentation. These settings are sent to the language server to adjust its behaviour accordingly. For <code>SyntaxDocumentation</code>, whenever the user toggles the checkbox, the server should set the correct configuration parameters. This is done mainly using two functions <code>set_configuration</code> and <code>send_configuration</code>.</p>
</li>
</ul>
<pre><code><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Set configuration </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">set_configuration</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> ~</span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">lsp_client</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">client</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ocaml_lsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">send_configuration</span><span class="ocaml-source"> ~</span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<pre><code><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Send configuration </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">send_configuration</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">syntaxDocumentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Option</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source"> ~</span><span class="ocaml-source">f</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">enable</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-constant-language-capital-identifier">Ocaml_lsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">OcamllspSettingEnable</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> ~</span><span class="ocaml-source">enable</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">settings</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Ocaml_lsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">OcamllspSettings</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source">
</span><span class="ocaml-source">      ~</span><span class="ocaml-source">syntaxDocumentation</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">payload</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">settings</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">LanguageClient</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">DidChangeConfiguration</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source">
</span><span class="ocaml-source">        ~</span><span class="ocaml-source">settings</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Ocaml_lsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">OcamllspSettings</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t_to_js</span><span class="ocaml-source"> </span><span class="ocaml-source">settings</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">LanguageClient</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">DidChangeConfiguration</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t_to_js</span><span class="ocaml-source"> </span><span class="ocaml-source">settings</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">LanguageClient</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sendNotification</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">client</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">workspace/didChangeConfiguration</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">payload</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<h3>Interacting With OCaml LSP:</h3>
<p>The <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/ocaml_lsp.ml"><code>ocaml_lsp.ml</code></a> file ensures that <code>ocamllsp</code> is set up correctly and up to date. For <code>SyntaxDocumentation</code>, two important modules used from this file are: <code>OcamllspSettingEnable</code> and <code>OcamllspSettings</code>.</p>
<p><code>OcamllspSettingEnable</code> defines an interface for enabling/disabling specific settings in <code>ocamllsp</code>.</p>
<pre><code><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">OcamllspSettingEnable</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">include</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Interface</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">include</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">js</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">enable</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-source">or_undefined</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">js.get</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">enable</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">js.builder</span><span class="ocaml-source">]</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>The annotation <code>[@@js.get]</code> is a PPX used to bind OCaml functions to JavaScript property accessors. This allows OCaml code to interact seamlessly with JavaScript objects, accessing properties directly as if they were native OCaml fields, while <code>[@@js.builder]</code> facilitates the creation of JavaScript objects from OCaml functions. They both come from the <a href="https://github.com/LexiFi/gen_js_api/tree/master"><code>LexFi/gen_js_api</code></a> library.</p>
<p><code>OcamllspSettings</code> aggregrates multiple <code>OcamllspSettingEnable</code> settings into a comprehensive settings interface for <code>ocamllsp</code>.</p>
<pre><code><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">OcamllspSettings</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">include</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Interface</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">include</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">js</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">syntaxDocumentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">OcamllspSettingEnable</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">or_undefined</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">js.get</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> ?</span><span class="ocaml-source">syntaxDocumentation</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">OcamllspSettingEnable</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-support-type">unit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">js.builder</span><span class="ocaml-source">]</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">create</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">syntaxDocumentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">create</span><span class="ocaml-source"> ?</span><span class="ocaml-source">syntaxDocumentation</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<h3>Workspace Configuration</h3>
<p>The file <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/settings.ml"><code>settings.ml</code></a> provides a flexible way to manage workspace-specific settings, including:</p>
<ul>
<li>Creating settings with JSON serialisation and deserialisation</li>
<li>Retrieving and updating settings from the workspace configuration</li>
<li>Resolving and substituting workspace variables within settings</li>
<li>Defining specific settings for the OCaml language server, such as extra environment variables, server arguments, and features like <code>codelens</code> and <code>SyntaxDocumentation</code></li>
</ul>
<pre><code><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">create_setting</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">scope</span><span class="ocaml-source"> ~</span><span class="ocaml-source">key</span><span class="ocaml-source"> ~</span><span class="ocaml-source">of_json</span><span class="ocaml-source"> ~</span><span class="ocaml-source">to_json</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">scope</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">to_json</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">of_json</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">server_syntaxDocumentation_setting</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">create_setting</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">scope</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">ConfigurationTarget</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Workspace</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">key</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ocaml.server.syntaxDocumentation</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">of_json</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Jsonoo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Decode</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">bool</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">to_json</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Jsonoo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Encode</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">bool</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<h3>Activating the Extension</h3>
<p>The <a href="https://github.com/ocamllabs/vscode-ocaml-platform/blob/master/src/vscode_ocaml_platform.ml"><code>vscode_ocaml_platform.ml</code></a> file initialises and activates the OCaml Platform extension for VSCode. The key tasks include:</p>
<ul>
<li>Suggesting users select a sandbox environment</li>
<li>Notifying the extension instance of configuration changes</li>
<li>Registering various components and features of the extension</li>
<li>Setting up the sandbox environment and starting the OCaml language server</li>
</ul>
<p>In the context of <code>SyntaxDocumentation</code>, this code ensures that the extension is correctly configured to handle <code>SyntaxDocumentation</code> settings. The <code>notify_configuration_changes</code> function listens for changes to the <code>server_syntaxDocumentation_setting</code> and updates the extension instance accordingly. This means that any changes the user makes to the <code>SyntaxDocumentation</code> settings in the VSCode workspace configuration will be reflected in the extension's behaviour, ensuring that <code>SyntaxDocumentation</code> is enabled or disabled as per the user's preference.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">notify_configuration_changes</span><span class="ocaml-source"> </span><span class="ocaml-source">instance</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Workspace</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">onDidChangeConfiguration</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">listener</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">_event</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">syntax_documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-constant-language-capital-identifier">Settings</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">server_syntaxDocumentation_setting</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Extension_instance</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set_configuration</span><span class="ocaml-source"> </span><span class="ocaml-source">instance</span><span class="ocaml-source"> ~</span><span class="ocaml-source">syntax_documentation</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<h2>Conclusion</h2>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/syndoc_vscode-170w~NN9sER4Z0rlmoMgp6BvutQ.webp 170w, /blog/images/syndoc_vscode-340w~zDt2qySNFTX1tRgr-fVleA.webp 340w, /blog/images/syndoc_vscode-680w~GGS8N6GsNA7aAGKArlsb7A.webp 680w, /blog/images/syndoc_vscode-1360w~9uZkladGj6eisIBkUICWNA.webp 1360w" src="/blog/images/syndoc_vscode-1360w~9uZkladGj6eisIBkUICWNA.webp" alt="SyntaxDocument toggle"></p>
<p>In this final article, we explored how to integrate <code>SyntaxDocumentation</code> into OCaml VSCode Platform extension as a configurable option for OCaml LSP's <code>hover</code> command. We covered key components such as configuring the extension manifest, managing the extension state, interacting with the OCaml language server, and handling workspace configurations. By enabling users to toggle the <code>SyntaxDocumentation</code> feature on or off, we can ensure a flexible and customisable development experience for all users.</p>
<p>Feel free to contribute to this extension on the GitHub repository: <a href="https://github.com/ocamllabs/vscode-ocaml-platform"><code>vscode-ocaml-platform</code></a>. Thank you for following along in this series, and happy coding with OCaml and VSCode!</p>
]]></description><link>https://tarides.com/blog/2024-07-24-creating-the-syntaxdocumentation-command-part-3-vscode-platform-extension</link><guid isPermaLink="false">https://tarides.com/blog/2024-07-24-creating-the-syntaxdocumentation-command-part-3-vscode-platform-extension.html</guid><dc:creator><![CDATA[ Pizie Dust ]]></dc:creator><pubDate>Wed, 24 Jul 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Compiler Manual HTML Generation]]></title><description><![CDATA[<p>In order to avoid long, confusing URLs on the OCaml Manual pages, we set out to create a solution that shortens these URLs, including section references, and contains the specific version. The result improves readability and user experience. This article outlines the motivation behind these changes and how we implemented them.</p>
<h2>Challenge</h2>
<p>The OCaml HTML manuals have URL references such as https://v2.ocaml.org/manual/types.html#sss:typexpr-sharp-types, and they do not refer to any specific compiler version. We needed a way to easily share a link with the version number included. The OCaml.org page already has a mention of the compiler version, but it refers to specific https://ocaml.org/releases.</p>
<p>We wanted a canonical naming convention that is consistent with current and future manual releases. It would also be beneficial to have only one place to store all the manuals, and the users of OCaml.org should never see redirecting URLs in the browser. This will greatly help increase the overall backlink quality when people share the links in conversations, tutorials, blogs, and on the Web. A preferred naming scheme should be something like:</p>
<p>https://v2.ocaml.org/releases/latest/manual/attributes.html
https://v2.ocaml.org/releases/4.12/manual/attributes.html</p>
<p>Using this, we redirected the v2.ocaml.org to OCaml.org for the production deployment. Also, the changes help in shorter URLs that can be easily remembered and shared. The rel="canonical" is a perfectly good way to make sure only https://ocaml.org/manual/latest gets indexed.</p>
<h2>Implementation</h2>
<p>After a detailed discussion, the following UI mockup to switch manuals was provided <a href="https://github.com/ocaml/ocaml.org/issues/534#issuecomment-1318570350">via GitHub issue</a>, and <em>Option A</em> was selected.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/UI-Mockup-170w~SAoPK_zlNbBuzRgvQCTM_Q.webp 170w, /blog/images/UI-Mockup-340w~DlpF3X72C2MTNVPZE1GxSQ.webp 340w, /blog/images/UI-Mockup-680w~yXlq1opL_GnA_cth01Ddlw.webp 680w, /blog/images/UI-Mockup-1360w~juSPFyoGQry1P2d6IQH6iA.webp 1360w" src="/blog/images/UI-Mockup-1360w~juSPFyoGQry1P2d6IQH6iA.webp" alt="UI Mockup"></p>
<p>Our proposed changes to the URL are shown below:</p>
<p>Current: https://v2.ocaml.org/releases/5.1/htmlman/index.html<br>
Suggested: <code>https://ocaml.org/manual/5.3.0/index.html</code></p>
<p>Current: https://v2.ocaml.org/releases/5.1/api/Atomic.html<br>
Suggested: <code>https://ocaml.org/manual/5.3.0/api/Atomic.html</code></p>
<h2>HTML Compiler Manuals</h2>
<p>The HTML manual files are hosted in a separate GitHub repository at https://github.com/ocaml-web/html-compiler-manuals/. It contains a folder for each compiler version, and it also has the manual HTML files.</p>
<p>A script to automate the process of generating the HTML manuals is also available at https://github.com/ocaml-web/html-compiler-manuals/blob/main/scripts/build-compiler-html-manuals.sh. The script defines two variables, DIR and OCAML_VERSION, where you can specify the location to build the manual and the compiler version to use. It then clones the <code>ocaml/ocaml</code> repository, switches to the specific compiler branch, builds the compiler, and then generates the manuals. The actual commands are listed below for reference:</p>
<pre><code>echo "Clone ocaml repository ..."
git clone git@github.com:ocaml/ocaml.git

# Switch to ocaml branch
echo "Checkout $OCAML_VERSION branch in ocaml ..."
cd ocaml
git checkout $OCAML_VERSION

# Remove any stale files
echo "Running make clean"
make clean
git clean -f -x

# Configure and build
echo "Running configure and make ..."
./configure
make

# Build web
echo "Generating manuals ..."
cd manual
make web
</code></pre>
<p>As per the new API requirements, the <code>manual/src/html_processing/Makefile</code> variables are updated as follows:</p>
<pre><code>WEBDIRMAN = $(WEDBIR)/$(VERSION)
WEBDIRAPI = $(WEBDIRMAN)/API
</code></pre>
<p>Accordingly, we have also updated the <code>manual/src/html_processing/src/common.ml.in</code> file OCaml variables to reflect the required changes:</p>
<pre><code>
let web_dir = Filename.parent_dir_name // "webman" // ocaml_version

let docs_maindir = web_dir

let api_page_url = "api"

let manual_page_url = ".."
</code></pre>
<p>We also include the https://plausible.ci.dev/js/script.js script to collect view metrics for the HTML pages. The manuals from 3.12 through 5.2 are now available in the https://github.com/ocaml-web/html-compiler-manuals/tree/main GitHub repository.</p>
<h2>OCaml.org</h2>
<p>The OCaml.org Dockerfile has a step included to clone the HTML manuals and perform an automated production deployment as shown below:</p>
<pre><code>RUN git clone https://github.com/ocaml-web/html-compiler-manuals /manual

ENV OCAMLORG_MANUAL_PATH /manual
</code></pre>
<p>The path to the new GitHub repository has been updated in the configuration file, along with the explicit URL paths to the respective manuals. The v2 URLs from the <code>data/releases/*.md</code> file have been replaced without the v2 URLs, and the <code>manual /releases/</code> redirects have been removed from <code>redirection.ml.</code> The <code>/releases/</code> redirects are now handled in <code>middleware.ml</code>. The caddy configuration to allow the redirection of v2.ocaml.org can be implemented as follows:</p>
<pre><code>v2.ocaml.org {
	redir https://ocaml.org{uri} permanent
}
</code></pre>
<h2>Call to Action</h2>
<p>You are encouraged to checkout the latest <a href="https://github.com/ocaml/ocaml">OCaml compiler from trunk</a> and use the <code>build-compiler-html-manual.sh</code> script to generate the HTML documentation.</p>
<p>Please do report any errors or issues that you face at the following GitHub repository: https://github.com/ocaml-web/html-compiler-manuals/issues</p>
<p>If you are interested in working on OCaml.org, please message us on the <a href="http://discord.ocaml.org">OCaml Discord</a> server or reach out to the <a href="https://github.com/ocaml-web//html-compiler-manuals">contributors in GitHub</a>.</p>
<h2>References</h2>
<ol>
<li>
<p>(cross-ref) Online OCaml Manual: there should be an easy way to get a fixed-version URL. https://github.com/ocaml/ocaml.org/issues/534</p>
</li>
<li>
<p>Use <code>webman/*.html</code> and <code>webman/api</code> for OCaml.org manuals. https://github.com/ocaml/ocaml/pull/12976</p>
</li>
<li>
<p>Serve OCaml Compiler Manuals. https://github.com/ocaml/ocaml.org/pull/2150</p>
</li>
<li>
<p>Simplify and extend <code>/releases/</code> redirects from legacy v2.ocaml.org URLs. https://github.com/ocaml/ocaml.org/pull/2448</p>
</li>
</ol>
]]></description><link>https://tarides.com/blog/2024-07-17-ocaml-compiler-manual-html-generation</link><guid isPermaLink="false">https://tarides.com/blog/2024-07-17-ocaml-compiler-manual-html-generation.html</guid><dc:creator><![CDATA[ Shakthi Kannan ]]></dc:creator><pubDate>Wed, 17 Jul 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Deep Dive: Optimising Multicore OCaml for Windows]]></title><description><![CDATA[<p>We love hosting internships. It is rewarding to potentially facilitate someone’s first foray into the OCaml ecosystem, helping them establish a hopefully life-long foothold in the world of open-source programming. It is also a great opportunity to get new perspectives on existing issues. Fresh eyes can reinvigorate topics, highlighting different pain points and new solutions which benefit the entire community.</p>
<p>Sometimes, we also find ourselves just trying to keep up with our interns as they take off like rocket ships! Recently, we mentored a student who did just that. The initial goal of the internship was to investigate strange performance drops in the OCaml runtime that arose after the introduction of multicore support. These performance drops were most keenly felt on Windows machines, and the initial internship specification emphasised the need to improve the developer experience on that operating system.</p>
<p>Our intern <a href="https://github.com/eutro">@eutro</a> went above and beyond anything we could have expected and tackled the project thoroughly and ambitiously. In this post, I will attempt to give you a comprehensive overview of this intricate project and the problems it tackled.</p>
<h2>Get Busy Waiting?</h2>
<p>Before OCaml 5, only one thread would run at any given time. Users never had to worry about multiple threads trying to use a shared resource like the Garbage Collector (GC). In OCaml 5, however, the process is divided into several 'threads'<sup><a href="#fn-1" id="ref-1-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup>, and multiple threads regularly try to run parts of the GC simultaneously. The minor GC uses a Stop The World (STW) function to run in parallel on all threads, whereas the major GC’s work is split into slices. These may happen in parallel between threads and while the user’s program (also called the ‘mutator’) is making changes. This is one example of when a mechanism is needed to protect multiple threads from making changes that contradict each other and result in unexpected behaviours.</p>
<p>Locks are the traditional way of doing this, whereby other activity is halted (or locked) while one activity finishes. However, in multicore programming, this method would be incredibly inefficient since there can be many activities in progress simultaneously. In this case, we would need to introduce so many locks for the different parts of memory that doing so would cause memory and OS resource problems!</p>
<p>The approach we use for OCaml 5 combines a <a href="https://en.wikipedia.org/wiki/Compare-and-swap">Compare And Swap</a> (CAS) operation with <a href="https://en.wikipedia.org/wiki/Busy_waiting">Busy-Wait</a> loops. A CAS operation ensures that if two threads try to modify the same area of memory, only one will succeed. The one that fails will know it has failed and can then enter a period of Busy-Waiting (called <code>SPIN_WAIT</code> in the code). Busy-wait loops (also referred to as spins) describe a process that repeatedly ('busily') checks whether a condition is true. The process or task is only resumed once that condition is met.</p>
<h2>Sleeping Beauty</h2>
<p>Busy-wait loops are used successfully in OCaml for many purposes but have been optimised. They are mostly appropriate in cases where we think that the required condition will be met quickly or in a reasonable period of time. If that’s not the case, then theoretically, the thread that is waiting will just keep spinning. If one allows busy-wait loops to spin indefinitely, they waste a lot of power and CPU and can actually prevent the condition they are waiting for from being met. To avoid that happening, we can use a <code>sleep</code> function.</p>
<p>In order to implement spinning without wasting power, the loop checks the condition repeatedly, but after a while, it starts 'sleeping' between checks. Suppose a thread is waiting for condition <code>C</code> to come true, and it uses a Busy-Wait loop to check for this. The program spins a number of times, checking the condition, and then waits or goes to ‘sleep’ for a set amount of time – then it ‘wakes up’ and checks once more before (if it has to) going back to ‘sleep‘ again. The period of ‘sleep’ increases each time. This cycle repeats itself until the condition <code>C</code> finally comes true.</p>
<p>This was how the process was <em>supposed</em> to work, yet, for some unknown reason, certain processes would occasionally take much longer than expected. The performance drop was worst on Windows machines.</p>
<h2>Testing 1-2-3, Testing 1-2-3,</h2>
<p>The first order of business was to conduct a series of tests on the runtime. Not only to discover the possible cause of the performance drops but also to establish a baseline of performance against which to measure any changes (and hopefully improvements!).</p>
<p>We knew that there was a performance problem and that it was particularly painful on Windows, but we didn’t know why. Even if we had a hunch as to what might be causing it, it was crucial to build up a realistic image of what was happening before we attempted a fix.</p>
<p>@eutro began this process by identifying where Busy-Wait loops were spinning in the runtime and for how long. She also wanted to know if there were places in the runtime where processes would get ‘stuck’ in Busy-Wait loops and not move on, and if so, where and why.</p>
<p>She used the <a href="https://github.com/ocaml/ocaml/tree/trunk/testsuite/tests">OCaml testsuite</a> and measured how many <code>SPIN_WAIT</code> macros resolved successfully without needing to sleep and which ones did not. She discovered that in most cases, the spinning had the desired effect, and the process could continue after a reasonable amount of time when the condition it was waiting for was met. The garbage collector was also not experiencing any significant performance drops, so it could not be the cause of the problems on Windows. Instead, what she realised was that on Windows, <code>sleeps</code> cannot be shorter than one millisecond, and so the first sleeps that occur end up being much too long. This causes extended and unnecessary delays for processes running on Windows. Equipped with this realisation, @eutro got started on a solution. One that would be most helpful on Windows but still benefit users on other operating systems.</p>
<h2>Barriers and Futexes, Oh My!</h2>
<p>There are a few ways a thread in OCaml can wait for a condition:</p>
<ul>
<li>First, we may be able to proceed very soon (nanoseconds), in which case we will spin until we can proceed.</li>
<li>Then, if spinning doesn’t let us proceed, we sleep a few times until the condition comes true. In most cases (read: not on Windows), the first few sleeps are still very quick, and we can proceed soon once the condition is met.</li>
<li>An alternative to sleeping ‘blindly’ like this is to tell the OS specifically <em>what</em> we are waiting for so that we can be woken up only when we know the condition is true. You can think of this as taking a ticket and waiting for your number to be called rather than repeatedly asking if you can be seen.</li>
</ul>
<p>So what has changed? As things stood, only steps one and two were available, a series of increasingly long sleeps interleaved with checks. So you would spin <em>n</em> times, then sleep for 10µs (‘µs’ is short for microseconds), then you check the condition once more and might sleep for 20µs, then 35µs, and so on. The point is that the time spent sleeping kept gradually increasing.</p>
<p>However, as @eutro discovered, in many cases, the process took far too long to resume, even after the condition had come true. By the time they woke up from sleeping, they could have already proceeded if they had just ‘taken a ticket’ earlier and waited until they were notified. To improve performance, instead of repeatedly sleeping for longer increments, we use specialised ‘barriers’ to wait <em>until</em> we can proceed.</p>
<p>To solve the Windows problem, we now use the <code>SPIN_WAIT</code> function only in cases where we don’t expect to ever need to sleep. In cases where that first sleep would cause significant delay, we introduce a new <code>SPIN_WAIT_NTIMES</code> function, which lets the process spin for a set number of times before being combined with a barrier. @eutro used her previous benchmarks to determine which occasions could keep the <code>SPIN_WAIT</code> cycle as-is and which occasions required the new <code>SPIN_WAIT_NTIMES</code> combined with a barrier.</p>
<p>But things didn’t stop there! @eutro could also optimise the type of barrier. Traditionally, we use condition variables to wake up threads waiting on a condition. However, they are unnecessarily resource-intensive as they require extra memory, and since woken threads must acquire (and release) a lock before they continue. A <em>futex</em> is a lower-level synchronisation primitive that can similarly be used to wake up threads but without the added complexity of a condition variable.</p>
<p>@eutro added the use of futexes to the operating systems that permitted her to do so. Notably, macOS does <em>not</em> allow non-OS programs to use futexes so we fall back to using "condition variables" there.</p>
<p>By introducing the use of <code>SPIN_WAIT_NTIMES</code>, barriers, and futexes, @eutro implemented a number of optimisations that were applicable not only on Windows but on several operating systems. These optimisations save users time and processing power.</p>
<h2>How Much do You Bench(mark)?</h2>
<p>During the course of implementing these changes, @eutro did a lot of tests. It was important to be thorough in order to ensure that her changes didn’t have unintended consequences. It is incredibly difficult to reason about how programs will react to a specific alteration, as there are many things happening in the runtime and several ways that programs can interact.</p>
<p>She used the OCaml test suite again, this time to help her check that the OCaml runtime and other OCaml programs functioned correctly. Having verified that they were, @eutro also ran several benchmarks to check that she hadn’t actually made anything slower. For this, she used the <a href="https://github.com/ocaml-bench/sandmark">Sandmark test suite</a>.</p>
<p>I recommend checking out the tests and benchmarks for yourself <a href="https://github.com/ocaml/ocaml/pull/12579">in the Pull Request</a>. The PR also gives a more in-depth technical overview of the changes to the Busy-Waiting loops.</p>
<h2>You Can Join Us Too!</h2>
<p>It is great to see what someone with a passion for OCaml can bring to the system as a whole. I think it illustrates the benefits of open-source software development: when we invite fresh observations and suggestions, we create a community that supports innovation and collaboration. We are impressed with the hard work @eutro put into solving the complicated problem before her. She went above and beyond what we thought possible in such a short amount of time!</p>
<p>Would you like to complete an internship with us? We welcome people of varying experience levels – some interns have made open-source contributions before and are familiar with functional programming, and some of our interns have no functional programming experience at all! If you’re interested in developing your skills in a supportive environment, keep an eye on our <a href="/careers/">careers page</a>, where we post about any available internships. We also regularly communicate about available internships on <a href="https://bsky.app/profile/tarides.com">Bluesky</a>. We hope to hear from you!</p>
<section role="doc-endnotes"><ol>
<li id="fn-1">
<p>We are aware of the distinction between ‘threads’ and ‘domains’ in OCaml. We chose to use the former here, mainly to keep the content accessible for people who are less familiar with the subtleties of OCaml.</p>
<span><a href="#ref-1-fn-1" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
]]></description><link>https://tarides.com/blog/2024-07-10-deep-dive-optimising-multicore-ocaml-for-windows</link><guid isPermaLink="false">https://tarides.com/blog/2024-07-10-deep-dive-optimising-multicore-ocaml-for-windows.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 10 Jul 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing Olly: Providing Observability Tools for OCaml 5]]></title><description><![CDATA[<p>It might be tempting to think that we can write code that works perfectly the first time around, but in reality optimisation and troubleshooting forms a big part of programming. However, there are more and less productive (and frustrating!) ways of problem solving. Having the right tools to guide you, ones that show you where to look and what is going wrong, can make a huge difference.</p>
<p>We recently introduced you to the <a href="/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml/">monitoring system <code>runtime_events</code></a>, which allows users to monitor their runtime for, among other things, how programs are affecting performance. Alongside <code>runtime_events</code>, sits the observability tool <code>olly</code>, which provides users with a number of helpful formatting options for their runtime tracing data.</p>
<p>This is all part of how we’re making developing in OCaml easier by bringing new features and tools to the community. Olly is just one such tool, and it makes the monitoring system for OCaml significantly more accessible. With Olly, you don’t have to be an expert or spend time combing through the data that <code>runtime_events</code> extracts for you. Rather, Olly can generate the information you need in a way that makes it easy to understand, store, and query.</p>
<h2>What is Olly and How Does it Work?</h2>
<p>Olly, as an observability tool for OCaml 5, has the ability to extract runtime tracing data from <code>runtime_events</code>. This data can then be visualised with a variety of graphical options available.</p>
<p>How does Olly do this? Olly uses the Runtime API to provide you with monitoring metric information and associated data. The tool comes with several subcommands, each with its own function.</p>
<p>The command <code>olly trace</code> can generate runtime traces for programs compiled in OCaml 5 using its trace subcommand. The tracing data is generated in one of two formats, the <a href="https://fuchsia.dev/fuchsia-src/reference/tracing/trace-format">Fuschia trace format</a> or the <a href="https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview">Chrome tracing format</a> with the former being the default. Both formats can be viewed in <a href="https://ui.perfetto.dev">Perfetto</a>, but the Chrome format trace can also be viewed in <code>chrome://tracing</code> for Chromium-based browsers. Another example of a subcommand is <code>olly gc-stats</code>, which can report the running time of the garbage collector (GC) and the GC tail latency of an OCaml executable.</p>
<p>The motivation behind introducing an observability tool like Olly is to make data extracted using <code>runtime_events</code> more useful, since few developers will want to use the event tracing system directly. Olly makes it easy for users to troubleshoot their own programs, but it also makes it easy for a developer to diagnose why someone <em>else’s</em> program is slow. A client can send their <code>runtime_events</code> data, a feature that comes built in with every OCaml 5 switch, to a developer who can then use Olly to find the problem and suggest a solution. This makes working in OCaml is easier as optimisation and problem solving becomes more efficient and streamlined.</p>
<p>It doesn’t end there! One of our future goals for Olly is that it should be able to provide automatic reports and diagnosis of some problems. Look out for that exciting update in the future!</p>
<h2>Recent Update: Modularising Olly</h2>
<p>One of the latest updates to Olly is its modularisation by <a href="https://github.com/eutro">Eutro</a>, splitting the <code>bin/olly.ml</code> file into smaller discrete libraries including <code>olly_common</code>, <code>olly_trace</code>, and <code>olly_gc_stats</code>. By splitting up the large file, the user can exercise some control over which dependencies they want their library to have. They can create a minimal build with minimal dependencies, or stick with a fuller build relying on all the dependencies. For example, to build <code>olly_bare</code> on the trunk you now only require two dependencies: Dune and <code>cmdliner</code>. Both can be installed without using Opam. Since some developers will prefer this set up, it’s good to support a variety of configurations.</p>
<p>It also potentially makes it easier to maintain, since the smaller files have well-defined purposes and provide a clearer overview than just having one large file covering a multitude of functions. If something breaks, this segmentation can make it easier for a maintainer to triage and amend the problem. The same modularisation may also help newcomers get an overview of all the different components of the library. Sadiq Jaffer merged Eutro’s <a href="https://github.com/tarides/runtime_events_tools/pull/43">PR #43</a> into <code>Tarides: main</code> and it will form part of a future Olly release pending further testing.</p>
<h2>How to Use Olly: an Example</h2>
<p>Let's wrap up by looking at an example of when you might use Olly. When we want to visualise the performance of the OCaml runtime alongside any <a href="/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml/">custom events</a> we may have, the first step is to generate a trace. To generate a trace, we run the command <code>olly trace tracefile.trace</code> in combination with the name of the program we want to enable tracing for. If we wanted to generate a trace for the <code>solver-service</code>, the command would be <code>olly trace tracefile.trace 'solver-service'</code>.</p>
<p>For our example, we chose to generate the tracing data in the Fuschia trace format. Once we had the trace, we loaded it into Perfetto to get a helpful visual representation of what our code is doing and we ended up with the following image:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/olly-trace-2-170w~urm918zan3qm7XdrAPPcSg.webp 170w, /blog/images/olly-trace-2-340w~8GtKOi_ueM8xmIoYlvJrXw.webp 340w, /blog/images/olly-trace-2-680w~sOJ2-OuLe7SLLXYxqrbCVQ.webp 680w, /blog/images/olly-trace-2-1360w~rQU1Dgwwuu63xT-r-8tvig.webp 1360w" src="/blog/images/olly-trace-2-1360w~rQU1Dgwwuu63xT-r-8tvig.webp" alt="A diagram representing different processes running left to right along the image in different colours: green, yellow, pink, and grey. The visual representations of the processes are stacked on top of one another, forming different bands."></p>
<p>The UI in this image displays the processes down the side, each corresponding to a domain. Our program ended up using four cores, and therefore, the image shows four processes. Each process, in turn, shows the tracing for the OCaml runtime build plus the custom events generated by Eio. Let's zoom in on one process now:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/olly-expanded-2-170w~aMv0VIt_-Jfa2mQVOUOqMg.webp 170w, /blog/images/olly-expanded-2-340w~PP-3vrkFm154l_YyHPc82g.webp 340w, /blog/images/olly-expanded-2-680w~p--ixlM6KfLhFmUSioiHsw.webp 680w, /blog/images/olly-expanded-2-1360w~SYbdscJCyOR4wv8Geg5Mhw.webp 1360w" src="/blog/images/olly-expanded-2-1360w~SYbdscJCyOR4wv8Geg5Mhw.webp" alt="A diagram giving an expanded view of the events happening in process 0. The different activities are shown using various colours. Activities include ring_id 0 1, eio.exit_fiber:v:5, and eio_fiber:v:1"></p>
<p>This expanded view shows both the Garbage Collector's (GC) activity and times when Eio is suspended.</p>
<h2>Until Next Time!</h2>
<p>We want to create tools that make the developer experience in OCaml easier and more intuitive. Olly makes it possible to visualise your code's performance, helping you understand when your programs are slowing down and why. If you have suggestions or improvements to share, you are welcome to participate in <a href="https://github.com/tarides/runtime_events_tools">the Runtime Events Tools repo</a> on GitHub.</p>
<p>We want to hear from you! Connect with us on social media by following us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a>. You can also join in with the rest of the community on the forum <a href="https://discuss.ocaml.org/">Discuss</a> to share your thoughts on everything OCaml!</p>
]]></description><link>https://tarides.com/blog/2024-07-03-introducing-olly-providing-observability-tools-for-ocaml-5</link><guid isPermaLink="false">https://tarides.com/blog/2024-07-03-introducing-olly-providing-observability-tools-for-ocaml-5.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 03 Jul 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Enhancing the OCaml.org Community Page: Boosting UX and UI Based on User Research]]></title><description><![CDATA[<p>In March, the OCaml.org team at Tarides embarked on a mission to enhance the OCaml.org community pages. After engaging with the OCaml community, we identified several areas for improvement. Our goal was to boost the community's usability and visibility, ensuring it supported a wider audience and promoted more active engagement.</p>
<p>The OCaml community covers various domains, helping the language grow and supporting developers in their careers. To better serve our users, we decided to redesign the community area's concept pages. We conducted surveys, online discussions, and video calls to gather <a href="https://discuss.ocaml.org/t/shape-with-us-the-new-ocaml-org-community-area/14322">feedback and insights for this project</a>.</p>
<p>Our redesign targets key UX priorities. The navigation has been reorganised for easy content access, and the landing page now highlights essential community features. Additionally, we have revamped other pages, including events and job listings, to enhance engagement and activation within the OCaml community.</p>
<h2>Navigation</h2>
<p>Consistency is a fundamental aspect of our navigation design. As discussed in <a href="/blog/2024-04-03-updates-to-ocaml-org-s-learn-section-enhancing-ui-and-ux/">a previous post</a>, we have implemented a main navigation bar that provides subnavigation options for user consistency. This approach allows users to easily access events, blogs, job resources, and more with a single click, avoiding the need to scroll through a lengthy landing page with too many options.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/community1-170w~pvSU1RrQWlEx1xsXud3TUQ.webp 170w, /blog/images/community1-340w~JrHYNDFTqTVYX1WXxEjToA.webp 340w, /blog/images/community1-680w~PME9zdq_98tvy-cVFsqMoA.webp 680w, /blog/images/community1-1360w~8AnL0o7cJLXA7S3KCRAeCg.webp 1360w" src="/blog/images/community1-1360w~8AnL0o7cJLXA7S3KCRAeCg.webp" alt="Description"></p>
<p>For experienced users, this streamlined navigation meets their clear objectives when visiting the community area. Beginners, on the other hand, can benefit from an overview of what OCaml.org offers with highlighted subsections that showcase real-life community activities. This intuitive navigation ensures all users, whether newcomers or advanced users, can find what they need efficiently and effectively.</p>
<p>By focussing on these improvements, we aim to create a more engaging and user-friendly community page that supports the growth and development of the OCaml community.</p>
<h2>Landing Page</h2>
<p>We rearranged the landing page to retain important key elements from the current design, such as the community channel. We reorganised the rest of the page and added new features to boost user engagement and provide valuable resources.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/community2-170w~zsoczxr59SIbOoIW58-IYQ.webp 170w, /blog/images/community2-340w~MNjVUNxzCriryS-QxakiFA.webp 340w, /blog/images/community2-680w~z346se7Yk8Uy7xZ3TprUyQ.webp 680w, /blog/images/community2-1360w~Sw_BDIiWF9l7mWOEDXvsqw.webp 1360w" src="/blog/images/community2-1360w~Sw_BDIiWF9l7mWOEDXvsqw.webp" alt="Description"></p>
<h3>Key Updates</h3>
<ul>
<li>
<p><strong>Social Media Channels:</strong>
The existing list of social media channels remains a focal point due to its critical role in user interaction and collaboration.</p>
</li>
<li>
<p><strong>Upcoming Events:</strong>
We introduced a section for upcoming events, encouraging users to participate in retreats, conferences, and meetups. This addition aims to foster greater community involvement and real-world interaction.</p>
</li>
<li>
<p><strong>Job Opportunities and Resources:</strong>
Featuring job listings, internships, and essential documents for newcomers helps users find career opportunities and resources to aid their professional development.</p>
</li>
<li>
<p><strong>Code of Conduct:</strong>
We prominently displayed the community's code of conduct to remind users of the importance of respect and quality. This ensures a positive and collaborative environment, emphasising that the community is built by and for its members.</p>
</li>
</ul>
<p>By implementing these changes, our goal is to create a more engaging, informative, and user-friendly landing page that supports the OCaml community's growth and development.</p>
<h2>Inner Pages</h2>
<p>Behind the community landing page, we've enriched the content with more detailed offerings to enhance user engagement and information accessibility.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/community3-170w~ygWkc2sXOdHH7-HM_TMEzg.webp 170w, /blog/images/community3-340w~SQ7pTbhW9Uz_u1GWhMMTEg.webp 340w, /blog/images/community3-680w~iViFQdlazU1iGY7vzaHhBQ.webp 680w, /blog/images/community3-1360w~DbbpN91rYbgLbsBNvakH1A.webp 1360w" src="/blog/images/community3-1360w~DbbpN91rYbgLbsBNvakH1A.webp" alt="Description"></p>
<p>The <a href="https://ocaml.org/events">Events page</a> now lists all upcoming events, which users can filter by location and event type. Additionally, it features recurring events that happen at least once a year in various locations, bringing the global community together and fostering real-life interactions and discussions.</p>
<p>The <a href="https://ocaml.org/jobs">Job Board page</a> is a valuable resource for newcomers, showcasing the types of companies that use OCaml. Job offers are displayed by location, making it easy for users to find relevant opportunities. Companies can also add their job offers to the board, promoting their roles and increasing visibility within the OCaml community.</p>
<p>The <a href="https://ocaml.org/outreachy">Outreachy Program section</a> highlights real-life collaboration opportunities for newcomers, helping them discover and engage with the OCaml language. We maintain a record of past internships, offering resources and blog posts that can capture the interest of specific users. This provides valuable insights and inspiration for those looking to get involved.</p>
<p>The OCaml Conferences and Workshops pages showcase both upcoming and past conferences and workshops, providing a comprehensive record that includes valuable documentation such as videos and slides. Speakers can submit talks, and all sessions are recorded and made accessible anytime, which makes this page an excellent resource for deep diving into specific subjects. This extensive archive allows developers to learn from expert discussions and presentations to enhance their knowledge and engagement with the OCaml community.</p>
<p>The <a href="https://ocaml.org/events">Resources section</a> on the community page is a new addition. It serves as a comprehensive directory of OCaml tools and resources created by the community. This section is designed to help users easily find and access a wide range of resources that include learning resources such as OCamlverse, Learn OCaml, and materials from workshops and lectures. You will also find utilities like Sherlodoc / Sherlocode that allow you to search <code>opam-repository</code> and more. This page highlights the collaborative efforts of the OCaml community, making it an essential stop for both new and experienced developers looking to enhance their projects and deepen their understanding of OCaml.</p>
<p>By revising these pages, we hope to provide a more informative and engaging experience for all users, from experienced developers to newcomers exploring the OCaml community.</p>
]]></description><link>https://tarides.com/blog/2024-06-26-enhancing-the-ocaml-org-community-page-boosting-ux-and-ui-based-on-user-research</link><guid isPermaLink="false">https://tarides.com/blog/2024-06-26-enhancing-the-ocaml-org-community-page-boosting-ux-and-ui-based-on-user-research.html</guid><dc:creator><![CDATA[ Claire Vandenberghe ]]></dc:creator><pubDate>Wed, 26 Jun 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Keeping Up With the Compiler: How we Help Maintain the OCaml Language]]></title><description><![CDATA[<p>Not all of our projects have a definite end: a grand culmination of effort and time where we pop champagne and set off fireworks (which is, of course, how we celebrate most of the time). Indeed, providing ongoing support for the OCaml ecosystem is one of our biggest priorities, and it means that we resolve issues, maintain libraries, and improve features continuously over time.</p>
<p>Providing high-quality maintenance may not be glamorous, but it is crucial work that keeps the OCaml compiler running smoothly and efficiently. No matter how well-designed an individual feature is, if it does not receive regular maintenance, it risks going out of sync with all the other features. Since the compiler is made up of many interrelated parts, sustained and targeted maintenance of its individual parts, as well as their interactions, is crucial to ensure stability and robustness.</p>
<p>Read on to learn about the process behind ensuring the long-term quality of the compiler and discover what our team has achieved so far!</p>
<h2>How do we Maintain the Compiler?</h2>
<p>Many teams, companies, and community members collaborate on the development and maintenance of the OCaml compiler. We are just one group among many, and aligning our work with the goals and needs of the ecosystem at large is essential. Our team members generally focus on areas where they have expertise and can be most helpful, including multicore support, Windows compatibility, and build system enhancements. Their skills range from compiler front-end development (including the PPX and type system) to the runtime, from signal handling to the garbage collector.</p>
<p>We work closely with researchers at <a href="https://www.inria.fr/fr">Inria</a>, who created OCaml and continue to significantly contribute to its development and maintenance, to discuss existing issues and goals. Core maintainers and other contributors to OCaml hold triaging meetings every two weeks, led by Florian Angeletti at Inria, where they review recent issues and pull requests made <a href="https://github.com/ocaml/ocaml/issues">to the OCaml repository</a>. Each question, bug, or contribution is assigned to a developer responsible for ensuring it is addressed. This is a collaborative process between maintainers across organisations and companies. Several existing core maintainers are also Tarides staff members, and we support their work as part of our internal compiler maintenance effort. More and more Tarides engineers are becoming OCaml core maintainers, with the rights and responsibilities that come along with it, as the community recognises the high quality of their contributions.</p>
<p>Tarides encourages all of our compiler developers to allocate time towards maintenance. This helps us disseminate knowledge more evenly around the teams and ensures continuous attention is put towards improvements, fixes, and optimisations.</p>
<h2>Identifying Key Areas of Effective Long-Term Maintenance</h2>
<p>The best way to maintain a project is to target areas of strategic importance while remaining flexible and responding to issues as they arise. Our goal is to ensure that the maintenance of OCaml is effective in the long run, and to accomplish this, we focus on areas where we have considerable expertise and on fostering growing community involvement.</p>
<ul>
<li>
<p><strong>Long-Term Improvements:</strong>
Maintenance work on the compiler does not just mean fixing small issues as they come up but includes long-term work towards key features. For example, OCaml 5 introduced a <a href="https://ocaml.org/manual/5.2/memorymodel.html">relaxed memory model</a> that provides strong guarantees for programs that have data races. While the recommendation is prescriptive, OCaml, having started out as a sequential language, does not always enforce the memory model correctly. When such divergences (bugs) are identified, Tarides maintainers aim to identify the expected behaviour based on the prescriptive definition of the memory model and enforce it. In addition, we work on projects that make the build system simpler and hence more maintainable, that increase portability across various platforms (which was reduced in the move from OCaml 4 to OCaml 5), and <a href="/blog/2024-05-22-launching-the-first-class-windows-project/">improve the user experience on Windows</a>.</p>
</li>
<li>
<p><strong>Automated Quality Assurance:</strong>
Part of guaranteeing the long-term stability of the compiler happens through the Continuous Integration (CI) process. Contributions to the OCaml compiler are subject to rounds of testing to ensure that their code behaves predictably. The first round runs on <a href="https://github.com/marketplace/actions/build-and-test-with-ghcr-io-kxcinc-ocaml-general">GitHub actions</a> and is completed before a contribution is even accepted into the compiler. Only contributions that pass these tests can be accepted. The second round is significantly more exhaustive (and therefore requires much more computing power to execute) and is performed on PRs with a <code>needs-precheck</code> label. This round of testing uses the <a href="https://ci.inria.fr/hwloc/">Jenkins-CI</a> hosted by Inria. Jenkins-CI has a lot more backends than GitHub Actions and is used to check changes that may affect backend code.</p>
</li>
</ul>
<p>In addition to the above-mentioned rounds of CI, <a href="/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5/">multicore tests</a> are frequently performed on the compiler as part of the team's workflow, enabling them to catch and fix some hard-to-spot bugs, <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">including data races</a>. As a case in point, developers are using multicore tests alongside their ongoing efforts to restore MSVC support to OCaml 5.3 in order to ensure that changes do not introduce unwanted behaviours into the code.</p>
<ul>
<li><strong>Community:</strong>
When all is said and done, the OCaml compiler exists for the language's wider open-source community, so encouraging community engagement and feedback is vital. We prioritise clear documentation and open discussion in the public repositories to allow everyone to weigh in. We also organise regular compiler hacking events where we invite people into our offices (in 2023, we hosted people in <a href="/blog/2023-03-22-compiler-hacking-in-cambridge-is-back/">Cambridge</a> and in <a href="/blog/2023-11-09-ocaml-hacking-day-in-chennai/">Chennai</a>) to discuss, hack, and hang out. Our hacking days have sparked significant contributions to the compiler and brought new contributors on board. These initiatives are designed to facilitate open communication, share progress, and involve the community in the ongoing development of the OCaml compiler – thereby fostering alignment with community interests.</li>
</ul>
<p>The compiler team's effort, in collaboration with other ecosystem members, provides a core service to the OCaml community. Together, we improve and maintain fundamental parts of the OCaml compiler with high levels of oversight, safety, and transparency.</p>
<h2>Compiler Maintenance Fixes</h2>
<p>Here is a short list illustrating the range of issues, big and small, that the compiler engineers at Tarides address as part of compiler maintenance. This list is far from exhaustive and instead aims to give an overview of the kinds of tasks that we undertake:</p>
<ul>
<li>
<p><a href="https://github.com/ocaml/flexdll/pull/114">Fixing Bugs in MSVC on Windows</a>
Our compiler team not only introduces features and resolves problems relevant to Tarides but also addresses issues raised by different members of the OCaml community. This task includes reviewing external PRs and helping to keep the OCaml compiler maintained and up-to-date. <a href="https://github.com/dra27">David Allsopp</a> reviewed this PR, which fixed a bug happening when a 32-bit MSVC ran on Windows, and the CI script accidentally ran 32-bit Cygwin.</p>
</li>
<li>
<p><a href="https://github.com/ocaml/ocaml/pull/12213">Improving Error Messaging</a>
Improving the OCaml user experience involves making error messages easier to understand and, therefore, also falls under the goals of the compiler team. More accessible error messages make the language easier for beginners to use and learn. In this PR, <a href="https://github.com/shym">Samuel Hym</a> made a <code>symlink</code> error message that appeared when users tried to link non-existent files much easier to understand.</p>
</li>
<li>
<p><a href="https://github.com/ocaml/ocaml/pull/11594">Updated <code>framepointers</code> Tests to Avoid False Positives With Inlined C Functions</a>
This pull request addressed a problem where the <a href="https://gcc.gnu.org">Gnu Compiler Collection</a> C compiler would inline some C function backtraces and not others. <a href="https://github.com/fabbing">Fabrice Buoro</a> updated the <code>ocaml_program</code> frame-pointer backtrace to ignore differences in the case of inconsistent inlining decisions made by the C compiler. In addition, <code>caml_program</code> now does all of the previous 'backtrace post-processing' locally to the C code, eliminating the <code>awk</code>, <code>sh</code>, and <code>sed</code> dependencies.</p>
</li>
<li>
<p><a href="https://github.com/ocaml/ocaml/pull/12383">Improve Backtrace Abstractions Inside the Runtime</a>
Part of the compiler team's work is also to prepare the compiler for future features. <a href="https://github.com/NickBarnes">Nick Barnes</a> has been working on bringing <code>statmemprof</code> support to OCaml 5 and preparing the trunk runtime to be compatible with <code>statmemprof</code>; he has improved its backtrace abstractions in this PR. Previously, the OCaml backtrace API allowed backtraces to be obtained as a single per-domain buffer or as an object on the OCaml heap. However, <code>statmemprof</code> needs to be able to use the current backtrace at arbitrary allocation points when Caml heap allocation might not be possible. Nick's PR changed the <code>backtrace.h</code> abstraction by adding <code>caml_get_callstack()</code>.</p>
</li>
<li>
<p><a href="https://github.com/ocaml/ocaml/pull/12768">Improvements to <code>$ocaml_cv_cc_vendor</code> in Configure</a>
With this PR, <a href="https://github.com/MisterDA">Antonin Décimo</a> improved the detection of the C compiler on Windows, replacing the use of <code>$cc_basename</code> with <code>$ocaml_cv_cc_vendor</code>. This change helps users detect when <code>mingw-w64</code>  and <code>clang-cl</code> are used, fixes TSan detection on macOS, and removes uses of <code>$cc_basename</code>. A bonus of this change is that improved detection improves the quality of bug reports since users can report which C compiler they use when they experience a bug.</p>
</li>
<li>
<p><a href="https://github.com/ocaml/ocaml/pull/13207">Fixing a Hard-to-Reproduce Bug</a>
Some bugs can be far-reaching but hard to identify and reproduce (where developers try to re-create the condition under which the bug manifests). These cases call for a lot of patience and meticulousness on behalf of the programmer trying to solve it. In PR <a href="https://github.com/ocaml/ocaml/pull/13207">#13207</a> <a href="https://github.com/dustanddreams">Miod Vallat</a> worked on a bug identified by <a href="https://github.com/polytypic">Vesa Karvonen</a> that affected all 64-bit architectures apart from <code>amd64</code>. The bug could only be reproduced when the system was calling C code that reached back to OCaml code, and that OCaml code had had to grow its stack, and upon return from the invoked C code, there was an exception pending. In OCaml 5, as opposed to in OCaml 4, growing the OCaml stack changes the value of the exception pointer. Thus, the cached pointer was now pointing to the old stack - an issue fixed by ensuring that the register storing the cached exception pointer is refreshed upon return.</p>
</li>
<li>
<p><a href="https://github.com/ocaml/ocaml/pull/12707">Resolving a Complicated Bug in the Debug Runtime</a>
Finally, a decent chunk of the team's tasks requires a lot of investigative skills! Some bugs are so hard to define and understand that our engineers must act as detectives, piecing together a larger picture from hard-won clues. This is what <a href="https://github.com/jmid">Jan Midtgaard</a> did to understand the cause of a rare race condition in the debug runtime. It's essential to allow the problem-solving process to take the time it requires and not try to rush it. Complex problems require a patient and methodical approach, and we have set up the compiler maintenance project to manage these essential tasks.</p>
</li>
</ul>
<h2>Sharing the Load</h2>
<p>Aside from the fixes contributed by the team, Tarides also contributes significantly to reviewing issues and pull requests. For example, of the 273 PRs that went into the OCaml 5.2.0 release, Tarides was significantly involved in approximately 160. This includes 90 PRs from Tarides with new features and bug fixes (some mentioned above) and 70 PRs made by non-Tarides contributors where Tarides engineers were credited as reviewers. Among the many areas of OCaml maintenance, ensuring that issues and PRs are addressed is significant for the project's sustainability, and Tarides is proud to do its part.</p>
<h2>Stay in Touch!</h2>
<p>Tarides supports many critical projects and parts of the OCaml ecosystem, as we believe long-term maintenance is vital to improving and strengthening the language over time. We're lucky to collaborate with people and organisations passionate about the language, and we invite you to <a href="https://github.com/ocaml/ocaml/blob/b02f003ad335ec086614d6c99346067abac5784e/CONTRIBUTING.md">contribute your own expertise to the repo</a> and join the discussions on the <a href="https://discuss.ocaml.org/">OCaml Discuss Forum</a>.</p>
<p>Follow us on <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> and <a href="https://bsky.app/profile/tarides.com">Bluesky</a> for the latest news from Tarides. You can also <a href="/contact/">contact us</a> directly on our website if you have questions or want more information about our projects and how you can benefit from them.</p>
]]></description><link>https://tarides.com/blog/2024-06-19-keeping-up-with-the-compiler-how-we-help-maintain-the-ocaml-language</link><guid isPermaLink="false">https://tarides.com/blog/2024-06-19-keeping-up-with-the-compiler-how-we-help-maintain-the-ocaml-language.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 19 Jun 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Creating the SyntaxDocumentation Command - Part 2: OCaml LSP]]></title><description><![CDATA[<p>In the first part of this series, <a href="/blog/2024-04-17-creating-the-syntaxdocumentation-command-part-1-merlin/">Creating the <code>SyntaxDocumentation</code> Command - Part 1: Merlin</a>, we explored how to create a new command in Merlin, particularly the <code>SyntaxDocumentation</code> command. In this continuation, we will be looking at the amazing OCaml LSP project and how we have integrated our <code>SyntaxDocumentation</code> command into it. OCaml LSP is a broad and complex project, so we will be limiting the scope of this article just to what's relevant for the <code>SyntaxDocumentation</code> command.</p>
<h2>Language Server Protocol</h2>
<p>The <a href="https://microsoft.github.io/language-server-protocol/">Language Server Protocol (LSP)</a> defines the protocol used between an editor or IDE and a language server that provides language features like auto complete, go to definition, find all references, etc. In turn, the protocol defines the format of the messages sent using <a href="https://www.jsonrpc.org/">JSON-RPC</a> between the development tool and the language server. With LSP, a single language server can be used with multiple development tools, such as:</p>
<ul>
<li>Integrated Development Environments (IDEs): Visual Studio Code, Atom, or IntelliJ IDEA</li>
<li>Code editors: Sublime Text, Vim, or Emacs</li>
<li>Text editors with code-related features</li>
<li>Command-line tools for code management, building, or testing</li>
</ul>
<h3>How LSP Works</h3>
<p>Here's a typical interaction between a development tool and a language server:</p>
<ol>
<li><strong>Document Opened:</strong> When the user opens a document, this notifies the language server that a document is open <code>(textDocument/didOpen)</code>.</li>
<li><strong>Editing:</strong> When the user edits the document, this notifies the server about the changes <code>(textDocument/didChange)</code>. The server analyses the changes and notifies the tool of any detected errors and warnings <code>(textDocument/publishDiagnostics)</code>.</li>
<li><strong>Go to Definition:</strong> The user executes "Go to Definition" on a symbol. The tool sends a <code>textDocument/definition</code> request to the server, which responds with the location of the symbol's definition.</li>
<li><strong>Document Closed:</strong> The user closes the document. A <code>textDocument/didClose</code> notification is sent to the server.</li>
</ol>
<h2>OCaml LSP</h2>
<p><a href="https://github.com/ocaml/ocaml-lsp">ocaml-lsp</a> is an implementation of the Language Server Protocol for OCaml in OCaml. It provides language features like code completion, go to definition, find references, type information on hover, and more, to editors and IDEs that support the Language Server Protocol. OCaml LSP is built on top of Merlin, which provides the actual analysis and type information.</p>
<p>Currently, OCaml LSP supports several LSP requests such as <code>textDocument/completion</code>, <code>textDocument/hover</code>, <code>textDocument/codelens</code>, etc. For the purposes of this article, we will limit the scope to <code>textDocument\hover</code> requests because this is where our command is implemented. You can find out more about supported OCaml LSP requests at <a href="https://github.com/ocaml/ocaml-lsp/tree/master?tab=readme-ov-file#features">Features | OCaml LSP</a>.</p>
<h2>Hover Requests</h2>
<p>When a user hovers over a symbol or some syntax, their development tool sends a <code>textDocument/hover</code> request to the language server. To better understand this process, let us consider some sample code:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">get_children</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">position</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Lexing</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">position</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">root</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">node</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">some</span><span class="ocaml-source"> </span><span class="ocaml-source">code</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>When the user hovers over the function name, <code>get_children</code>, the hover request (taken from the server logs) is as follows:</p>
<pre><code class="language-json">[Trace - 4:07:21 AM] Sending request 'textDocument/hover - (13)'.
Params: {
    "textDocument": {
        "uri": "file:///home/../../merlin/src/kernel/mbrowse.ml"
    },
    "position": {
        "line": 279,
        "character": 10
    }
}
</code></pre>
<p>This request includes the following information:</p>
<ul>
<li>The URI of the document where the user is hovering</li>
<li>The position (line and character) within the document where the hover event occurred</li>
</ul>
<p>The language server then responds with information corresponding to what its hover query should do. This could be type information, documentation information, etc.</p>
<pre><code class="language-json">[Trace - 4:07:21 AM] Received response 'textDocument/hover - (13)' in 2ms.
Result: {
    "contents": {
        "kind": "markdown",
        "value": "```ocaml\nLexing.position -&gt; ('a * node) list -&gt; node\n```"
    },
    "range": {
        "end": {
            "character": 16,
            "line": 279
        },
        "start": {
            "character": 4,
            "line": 279
        }
    }
}
</code></pre>
<p>The response received indicates that at this position the type signature is <code>Lexing.position -&gt; ('a * node) list -&gt; node</code>, and it's formatted with Markdown, since it was done in VSCode. For development tools that don't support Markdown, this response will simply be plaintext. The <code>range</code> is used by the editor to highlight the relevant line(s) for the user.</p>
<h2><code>SyntaxDocumentation</code> Implementation</h2>
<p>With OCaml LSP, type information displayed from a hover request is taken from Merlin using the <code>type_enclosing</code> command, and the information returned is passed onto the hover functionality to be displayed as a response. With this, we can attach the result from querying Merlin about the <code>SyntaxDocumentation</code> command and add the results to the <code>type_enclosing</code> response.</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">type_enclosing</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">loc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">typ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">doc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">syntax_doc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Query_protocol</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">syntax_doc_result</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>To query Merlin for something, we use <code>Query_protocol</code> and <code>Query_command</code>. You can read more about what these do from <a href="/blog/2024-04-17-creating-the-syntaxdocumentation-command-part-1-merlin/">Part 1</a> of this article series.</p>
<pre><code><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">syntax_doc</span><span class="ocaml-source"> </span><span class="ocaml-source">pipeline</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">command</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Query_protocol</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Syntax_document</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Query_commands</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">dispatch</span><span class="ocaml-source"> </span><span class="ocaml-source">pipeline</span><span class="ocaml-source"> </span><span class="ocaml-source">command</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Found</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`No_documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span></code></pre>
<h3>Making <code>SyntaxDocumentation</code> Configurable</h3>
<p>Sometimes, too much information can be problematic, which is the case with the hover functionality. Most times, users just want a specific kind of information, and presenting a lot of unrelated information can have a negative effect on their productivity. For this reason, <code>SyntaxDocumentation</code> is made to be configurable, so users can toggle it on or off. This is made possible by passing configuration settings to the server.</p>
<pre><code><span class="ocaml-source">syntaxDocumentation</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">enable</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">boolean</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>For a piece of code such as:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">color</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Red</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Blue</span><span class="ocaml-source">
</span></code></pre>
<p>When SyntaxDoc is turned off, we receive the following response:</p>
<pre><code class="language-json">{
      "contents": { "kind": "plaintext", "value": "type color = Red | Blue" },
      "range": {
        "end": { "character": 21, "line": 1 },
        "start": { "character": 0, "line": 1 }
      }
    }
</code></pre>
<p>When SyntaxDoc is turned on, we receive the following response:</p>
<pre><code class="language-json">{
      "contents": {
        "kind": "plaintext",
        "value": "type color = Red | Blue. `syntax` Variant Type: Represent's data that may take on multiple different forms..See [Manual](https://v2.ocaml.org/releases/4.14/htmlman/typedecl.html#ss:typedefs)"
      },
      "range": {
        "end": { "character": 21, "line": 1 },
        "start": { "character": 0, "line": 1 }
      }
    }
</code></pre>
<h2>Conclusion</h2>
<p>In this article, we looked at the LSP protocol and a few examples of how it is implemented in OCaml. With OCaml LSP, the <code>SyntaxDocumentation</code> command becomes a very handy tool, empowering developers to get documentation information by just hovering over the syntax. If you wish to support the OCaml LSP project, you are welcome to submit issues and code constibutions to the repository at <a href="https://github.com/ocaml/ocaml-lsp/issues">Issues | OCaml LSP</a>. In the next and final part of this series, we will look at the VSCode Platform Extension for OCaml and how we can add a visual checkbox to the UI for toggling on/off <code>SyntaxDocumentation</code>.</p>
]]></description><link>https://tarides.com/blog/2024-06-12-creating-the-syntaxdocumentation-command-part-2-ocaml-lsp</link><guid isPermaLink="false">https://tarides.com/blog/2024-06-12-creating-the-syntaxdocumentation-command-part-2-ocaml-lsp.html</guid><dc:creator><![CDATA[ Pizie Dust ]]></dc:creator><pubDate>Wed, 12 Jun 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Secure From the Ground Up: Introducing the FIDES Project Combining RISC-V and MirageOS]]></title><description><![CDATA[<p>We entrust some of our most sensitive information to an invisible stream of information flowing back and forth across the globe. Cybersecurity concerns everyone, and Tarides is developing secure, lasting, and high-performing solutions for a diverse set of users. This post introduces you to cutting-edge technology, employing some of the latest research in software development and cybersecurity to give you a sense of what the future of hardware and software security looks like.</p>
<p>Tarides is collaborating with the <a href="https://www.iitm.ac.in/">Indian Institute of Technology (IIT), Madras</a> on the FIDES project. The project employs a hardware-enabled intra-process compartmentalisation technique on the <a href="https://shakti.org.in/">Shakti</a> <a href="https://riscv.org/">RISC-V</a> processor, capable of running bare-metal <a href="https://mirage.io/">MirageOS</a> unikernels written in <a href="https://ocaml.org/">OCaml</a> safely alongside C code. This combined hardware and software solution leverages hardware-enabled compartments and OCaml's safety guarantees to ensure excellent security for real-world applications that mix safe and unsafe code.</p>
<h2>Why Choose RISC-V and Shakti?</h2>
<p>You might have heard of <a href="https://arxiv.org/pdf/2309.11332.pdf">CHERI</a>, a UK-based project that adds security features on top of <a href="https://en.wikipedia.org/wiki/ARM_architecture_family">ARM</a>. As of 2022 ARM is the most widely used family of ISAs, and ARM processors power anything from smartphones to supercomputers.</p>
<p>The goal of the FIDES project is similar to CHERI. It extends the open-source Shakti RISC-V processors with additional security features. Unlike CHERI, FIDES is specialised to run MirageOS applications on top. <a href="https://mirage.io/">MirageOS</a> is a library operating system that uses properties of <a href="https://ocaml.org/">OCaml</a> (a <a href="/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application/">highly-secure programming language</a>) to construct Unikernels to make secure, high-performance, network applications compatible with a variety of cloud computing and mobile platforms. The key benefit of the FIDES approach is that we can take advantage of the language-level security offered by OCaml to alleviate some of the overheads of enforcing security purely through hardware. With FIDES, the user gets a secure system from the hardware all the way up to the application layer.</p>
<p>But what is RISC-V? In 2015, the University of California, Berkeley, wanted to encourage <a href="https://riscv.org/about/history/">"an open, collaborative community of software and hardware innovators based on the RISC-V ISA"</a> by founding the <a href="https://riscv.org/">RISC-V International Foundation</a>. ‘RISC’ stands for ‘Reduced Instruction Set Computer’ and RISC-V is an open standard Instruction Set Architecture (ISA), offered free of charge alongside its extensions for anyone looking to build solutions and services. The initiative has sparked countless projects since its inception.</p>
<p>Using the RISC-V design and ISA is already a good start for security. It has strong academic roots, originally developed from a series of computer design projects by experts at Berkeley. This solid foundation has only been strengthened by being open-source, where the entire RISC-V architecture can be scrutinised in the public domain. Unlike in closed-source projects where they can't see the code, users can be sure there are no back doors or hidden channels. Having a free, open-source alternative also increases competition in the sector; developers can use publicly available extensions and reference designs to create more secure options for the closed-source solutions available.</p>
<p>The <a href="https://shakti.org.in/">Shakti</a> initiative is an open-source project spearheaded by the Reconfigurable Intelligent Systems Engineering (RISE) team at IIT Madras. The project encompasses a family of processors that use the RISC-V ISA to create scalable, cost-efficient, and open-source hardware solutions.</p>
<p>The FIDES project combines the Shakti processor (using the RISC-V ISA), and runs secure MirageOS OCaml unikernels on top.</p>
<h2>The FIDES Approach</h2>
<p>FIDES provides two extensions on top of the base Shakti RISC-V code. The first is a fine-grained compartmentalisation scheme that splits a process into multiple logical code partitions with an explicit access policy. This restricts what code can be called from functions in a given compartment. For example, the code of the display driver in a <a href="https://en.wikipedia.org/wiki/Payment_terminal">point-of-sale device</a> has no reason to call into crypto code. Therefore, the two can be statically partitioned into separate compartments. This prevents vulnerabilities in the display driver from affecting the crypto modules. In addition, using a language like <a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">OCaml guarantees memory- and temporal- safety</a> through its compiler and garbage collector. Thus, the programming language mitigates common memory error vulnerabilities while the compartmentalisation scheme mitigates privilege escalation vulnerabilities.</p>
<p>The second extension that FIDES provides is a fat-pointer scheme for C code. A significant amount of code is written in C, and we want to be able to make use of those programs without compromising on security. When C programs are combined with OCaml programs in the same application (such as a MirageOS unikernel), the vulnerabilities in the C code can compromise the security guarantees of the OCaml code. Every MirageOS unikernel has a significant chunk of C code present in the OCaml runtime that connects external libraries implementing core components such as fast crypto modules.</p>
<p>The FIDES fat pointer augments a pointer to a memory region with additional information that records the address range of a memory region (which is used for spatial memory safety) and a temporal cookie (for temporal memory safety). The spatial and temporal safety is checked on every access. The fat-pointer scheme ensures that the C code also has spatial and temporal memory safety. In this way, fat-pointers for C ensure that when linked with OCaml, the security guarantees of OCaml are not violated by memory vulnerabilities in the C code.</p>
<p>For performance reasons, the compartmentalisation and the fat pointer scheme for C are accelerated with hardware support. Importantly, since OCaml has language-level safety, FIDES does not use fat pointers for OCaml and pays neither the memory overhead of fat pointers nor the performance overhead of checking their validity at runtime. Given that MirageOS lets the developers build much of the typical operating system services directly in OCaml, FIDES is able to pay the additional performance cost for security only for C code.</p>
<p>Since OCaml and C code can interact seamlessly the developer benefits from both safety and flexibility. The compartmental approach limits additional vulnerabilities and fat-pointers ensure that C code interacts safely with OCaml code. On top of this, FIDES deploys secure-by-design MirageOS applications, extending the MirageOS backend to execute on bare-metal RISC-V processors. In this way, from metal to application, FIDES combines open-source technologies with strong security guarantees to give users a transparent, safe, and flexible solution.</p>
<h3>Benefits</h3>
<p>Let's conclude with the key benefits of using FIDES:</p>
<ul>
<li><strong>Secure:</strong> Memory safety is guaranteed, and FIDES seamlessly supports linking to unsafe C code without compromising overall safety.</li>
<li><strong>Light-Weight:</strong> Does not require a memory management unit, an OS, or a hypervisor to guarantee isolation. It protects bare metal resource-constrained systems.</li>
<li><strong>User Friendly:</strong> FIDES is designed to let users stay oblivious to the compartment policies and use unmodified OCaml and C code without issue. The FIDES design also permits a security engineer to perform critical compartment mapping tasks as part of deployment rather than development.</li>
</ul>
<h2>Until Next Time!</h2>
<p>What sets FIDES apart is its focus on security, combining safety features for software and hardware to provide a secure and convenient way to deploy applications. Running MirageOS on the Shakti RISC-V processor shows excellent potential for delivering a high-performance, secure user experience.</p>
<p>Connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> for the latest updates on our projects. You can also <a href="/contact/">contact us</a> directly on our website with any feedback or questions. See you next time!</p>
]]></description><link>https://tarides.com/blog/2024-06-05-secure-from-the-ground-up-introducing-the-fides-project-combining-risc-v-and-mirageos</link><guid isPermaLink="false">https://tarides.com/blog/2024-06-05-secure-from-the-ground-up-introducing-the-fides-project-combining-risc-v-and-mirageos.html</guid><dc:creator><![CDATA[ KC Sivaramakrishnan, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 05 Jun 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Effective ML Through Merlin's Destruct Command]]></title><description><![CDATA[<p>The Merlin server and OCaml LSP server, two closely related OCaml language
servers, enhance productivity with features like autocompletion and type
inference. Their lesser known, yet highly useful <code>destruct</code> command simplifies
the use of pattern matching by generating exhaustive match statements, as we’ll
illustrate in this article. The command has recently received a bit of love,
making it more usable, and we are taking advantage of this refresh to introduce
it and showcase some use cases.</p>
<p>A <em>good</em> IDE for a programming language ought to provide contextual information,
such as completion suggestions, details about expressions like types, and
real-time error feedback. However, in an ideal world, it should also serve as a
code-writing assistant, capable of generating code as needed. And even though
there are undeniably commonalities among a broad range of programming languages,
allowing for the "generalisation" of interactions with a code editor via a
protocol (such as <a href="https://github.com/ocaml/ocaml-lsp">LSP</a>), some languages
possess uncommon or even unique functionalities that require special
treatment. Fortunately, it is possible to develop functionalities tailored to
these particularities. These can be invoked within LSP through <strong>custom
requests</strong> to retrieve arbitrary information and <strong>code actions</strong> to transform a
document as needed. Splendid! However, such functionality can be more difficult
to discover, as it somewhat denormalises the IDE user experience. This is the
case with the <code>destruct</code> command, which is immensely useful and saves a great
deal of time.</p>
<p>In this article, we'll attempt to fathom of the command's usefulness and its
application using somewhat simplistic examples. Following that, we'll delve into
a few less artificial examples that I use in my day-to-day coding. I hope that
the article is useful and entertaining both for people who already know
<code>destruct</code> and for people who don't.</p>
<h2>Destruct in Broad Terms</h2>
<p>OCaml allows the expression of <a href="https://ocamlbook.org/algebraic-types/">algebraic data
types</a> that, coupled with <a href="https://ocaml.org/docs/basic-data-types">pattern
matching</a>, can be used to describe data
structures and perform case analysis. In the event that a pattern match falls
short of being exhaustive, <strong>warning 8</strong>, known as <code>partial-match</code>, will be
raised during the compilation phase. Hence, it is advisable to uphold exhaustive
match blocks.</p>
<p>The <code>destruct</code> command aids in achieving completeness. When applied to a pattern
(via <code>M-x merlin-destruct</code> in Emacs, <code>:MerlinDestruct</code> in Vim, and <code>Alt + d</code> in
Visual Studio Code), it generates patterns. The command behaves differently
depending on the cursor’s context:</p>
<ul>
<li>
<p>When it is called on an expression, it replaces it by a pattern match over
its constructors.</p>
</li>
<li>
<p>When it is called on a pattern of a non-exhaustive matching, it will make the
pattern matching exhaustive by adding missing cases.</p>
</li>
<li>
<p>When it is called on a wildcard pattern, it will refine it if possible.</p>
</li>
</ul>
<blockquote>
<p>For those unfamiliar with the term <code>destruct</code>, pattern matching is case
analysis, and expressing the form (a collection of patterns) on which you
<em>match</em> is called <strong>destructuring</strong>, because you are unpacking values from
structured data. This is the same terminology <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment">used in
JavaScript</a>.</p>
</blockquote>
<p>Let's examine each of these scenarios using examples.</p>
<h3>Destruct on an Expression</h3>
<p>Destructing an expression works in a fairly obvious way. If the typechecker is
aware of the expression type (in our example, it knows this by
inference), the expression will be substituted by a matching on all enumerable
cases.</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-1~kHA8_iC67tU-2us0hsjbhQ.gif" alt="Destruct on expression">
</p>
<h3>Destruct on a Non-Exhaustive Matching</h3>
<p>The second behaviour is, in my opinion, the most practical. Although I rarely
need to substitute an expression with a pattern match, I often want to perform a
case analysis on all the constructors of a sum type. By implementing just a
single pattern, such as <code>Foo</code>, my match expression is non-exhaustive, and if I
<code>destruct</code> on this, it will generate all the missing cases.</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-2~h0Wv8gXWN0rskS6_w-ThXA.gif" alt="Destruct on non-exhaustive match">
</p>
<h3>Destruct on a Wildcard Pattern</h3>
<p>The final behaviour is very similar to the previous one; when you <code>destruct</code> a
wildcard pattern (or a pattern producing a wildcard, for example, a variable
declaration), the command will generate all the missing branches.</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-3~_VWYV_Exk4KJ46PdmxbGDA.gif" alt="Destruct on wildcard">
</p>
<h3>Dealing With Nesting</h3>
<p>When used interactively, it is possible to destruct nested patterns to quickly
achieve exhaustiveness. For example, let’s imagine that our variable <code>x</code> is of
type <code>t option</code>:</p>
<ul>
<li>We start by destructing our wildcard (<code>_</code>), which will produce two branches,
<code>None</code> and <code>Some _</code>.</li>
<li>Then, we can destruct on the associated wildcard of <code>Some _</code>, which will
produce all conceivable cases for the type <code>t</code>.</li>
</ul>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-4~vTIN7T3JhO0yjcwShn0A4g.gif" alt="Destruct on nested patterns">
</p>
<h3>In the Case of Products (Instead of Sums)</h3>
<p>In the previous examples, we were always dealing with cases whose domains are
perfectly defined, only destructing cases of simple sum type branches. However,
the <code>destruct</code> command can also act on products. Let's consider a very ambitious
example where we will make exhaustive pattern matching on a value of type <code>t * t option</code>, generating all possible cases using <code>destruct</code> alone :</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-5~IAWhnKdaaVJhB3Iki-jBzQ.gif" alt="Destruct on nested tuples">
</p>
<p>It can be seen that when used interactively, the command saves a lot of time,
and coupled with Merlin's real-time feedback regarding errors, one can quickly
ascertain when our pattern matching is exhaustive. In a way, it's a bit like a
manual "deriver."</p>
<p>The <code>destruct</code> command can act on any pattern, so it also works within function
arguments (although <a href="https://github.com/ocaml/ocaml/pull/12236">their representation has
changed</a> slightly for <code>5.2.0</code>), and
in addition to destructing tuples, it is also possible to destruct records,
which can be very useful for our quest for exhaustiveness!</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-6~uWJtPasoed3rVlH3jgbYPw.gif" alt="Destruct on nested records">
</p>
<h3>When the Set of Constructors is Non-Finite</h3>
<p>Sometimes types are not finitely enumerable. For example, how
are we to handle strings or even integers? In such situations, <code>destruct</code> will
attempt to find an example. For integers, it will be <code>0</code>, and for strings, it
will be the empty string.</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-7~qw12P2S9TTKCci78UQ749A.gif" alt="Destruct on non-enumerable values">
</p>
<p>Excellent! We have covered a large portion of the behaviors of the <code>destruct</code>
command, which are quite contextually relevant. There are others (such as cases
of destruction in the presence of GADTs that only generate subsets of patterns),
but it's time to move on to an example from the real world!</p>
<h2>The Quest for Exhaustiveness: Effective ML</h2>
<p>In 2010, <a href="https://x.com/yminsky">Yaron Minsky</a> gave an <a href="https://www.youtube.com/watch?v=-J8YyfrSwTk">excellent
presentation</a> on the reasons (and
advantages) for using OCaml at <a href="https://www.janestreet.com/">Jane Street</a>. In
addition to being highly inspiring, it provides specific insights and gotchas on
using OCaml effectively in an incredibly sensitive industrial context (hence the
name "Effective ML".)! It was in this presentation that the maxim "<em>Make
illegal states unrepresentable</em>" was publicly mentioned for the first time, a
phrase that would later be frequently used to promote other technologies (such
as <a href="https://www.youtube.com/watch?v=IcgmSRJHu_8">Elm</a>). Moreover, the
presentation anticipates many discussions on domain modeling, which are dear to
the <a href="https://en.wikipedia.org/wiki/Software_craftsmanship">Software Craftsmanship
community</a>, by proposing
strategies for domain reduction (later extensively developed in
the book <a href="https://pragprog.com/titles/swdddf/domain-modeling-made-functional/"><em>Domain Modeling Made
Functional</em></a>).</p>
<p>Among the list of effective approaches to using an ML language, Yaron presents a
scenario where one might too hastily use the wildcard in a case analysis. The
example is closely tied to finance, but it's easy to transpose into a simpler
example. We will implement an <code>equal</code> function for a very basic type:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Foo</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bar</span><span class="ocaml-source">
</span></code></pre>
<p>The <code>equal</code> function can be trivially implemented as follows:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">equal</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Foo</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Foo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bar</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bar</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source">
</span></code></pre>
<p>Our function works perfectly and is exhaustive. However, what happens if we add
a constructor to our type <code>t</code>?</p>
<pre><code><span class="diff-source">  type t
</span><span class="diff-source">    | Foo
</span><span class="diff-source">    | Bar
</span><span class="diff-punctuation-definition-inserted">+</span><span class="diff-markup-inserted">   | Baz
</span></code></pre>
<p>Our function, in the case of <code>equal Baz Baz</code>, will return <code>false</code>, which is
obviously not the expected behavior. Since the wildcard makes our function
exhaustive, <strong>the compiler won't raise any errors</strong>. That's why Yaron Minsky
argues that in many cases with a wildcard clause, it's probably a mistake. If
our function had been exhaustive, adding a constructor would have raised a
<code>partial-match</code> warning, forcing us to explicitly decide how to behave in the
presence of the new constructor! Therefore, using a wildcard in this context
<strong>deprives us of the fearless refactoring</strong>, which is a strength of OCaml. This
is indeed an argument in favor of using a preprocessor to generate equality
functions, using, for example <a href="https://github.com/ocaml-ppx/ppx_deriving?tab=readme-ov-file#plugins-eq-and-ord">the <code>eq</code> standard
deriver</a>
or the more hygienic <a href="https://github.com/janestreet/ppx_compare"><code>Ppx_compare</code></a>.
But sometimes, using a preprocessor is not possible. Fortunately, the <code>destruct</code>
command can assist us in defining an exhaustive equality function!</p>
<p>We will proceed step by step, specifically separating the different cases and
using nested pattern matching to make the various cases easy to express in a
recurrent manner:</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-8~oY2PNq-cCp4GUov8aoDZ0Q.gif" alt="Destruct for equal on Foo and Bar">
</p>
<p>As we can see, <code>destruct</code> allows us to quickly implement an exhaustive <code>equal</code>
function without relying on wildcards. Now, we can add our <code>Baz</code> constructor to
see how the refactoring unfolds! By adding a constructor, we quickly detect a
recurring pattern where we try to give the <code>destruct</code> command <strong>as much leeway
as possible</strong> to generate the missing patterns!</p>
<p align="center">
<img src="/blog/images/2024-05-21.merlin-destruct/merlin-destruct-9~6-SlJ_0fJKMCUmPd5GJscg.gif" alt="Destruct for equal on Foo, Bar and Baz">
</p>
<p>Fantastic! We were able to quickly implement an <code>equal</code> function. Adding a
new case is trivial, leaving <code>destruct</code> to handle all the work!</p>
<p>Coupled with modern text editing features (e.g., using multi-cursors),
it's possible to save a tremendous amount of time! Another example of the
immoderate use of <code>destruct</code> (but too long to be detailed in this article) was
the <a href="https://github.com/xhtmlboi/yocaml/blob/main/lib/yocaml/mime.ml">Mime</a> module
implementation in <a href="https://github.com/xhtmlboi/yocaml">YOCaml</a> for generating RSS feeds.</p>
<h2>In Conclusion</h2>
<p>Paired with a formatter like
<a href="https://github.com/ocaml-ppx/ocamlformat">OCamlFormat</a> (to neatly reformat
generated code fragments), <code>destruct</code> is an unconventional tool in the IDE
landscape. It aligns with algebraic types and pattern matching to simplify code
writing and move towards code that is easier to refactor and thus maintain!
Aware of the command's utility, the <a href="https://github.com/ocaml/merlin">Merlin</a>
team continues to maintain it, streamlining the latest features of OCaml to make
the command as usable as possible in as many contexts as possible!</p>
<p>I hope this collection of illustrated examples has motivated you to use the <code>destruct</code>
feature if you were not already aware of it. Please do not hesitate to send
us ideas for improvements,
fixes, and <strong>fun use cases</strong> via <a href="https://bsky.app/profile/tarides.com">Bluesky</a> or
<a href="https://www.linkedin.com/company/tarides">LinkedIn</a>!</p>
<p><em>Happy Hacking</em>.</p>
]]></description><link>https://tarides.com/blog/2024-05-29-effective-ml-through-merlin-s-destruct-command</link><guid isPermaLink="false">https://tarides.com/blog/2024-05-29-effective-ml-through-merlin-s-destruct-command.html</guid><dc:creator><![CDATA[ Xavier Van de Woestyne ]]></dc:creator><pubDate>Wed, 29 May 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Launching the First-Class Windows Project]]></title><description><![CDATA[<p>We want to make learning and using OCaml easier for more people. Realising this goal involves expanding OCaml support to where the users are and making their experience smooth and hassle-free.</p>
<p>It is generally accepted that the current state of OCaml on Windows is not comparable to other popular platforms like Linux and macOS. This is the case even though Windows is the preferred platform for <a href="https://survey.stackoverflow.co/2023/#section-most-popular-technologies-operating-system">60% of developers</a> and the platform that around <a href="https://plausible.ci.dev/ocaml.org">33% of OCaml.org visitors</a> use! To address this misalignment, we are launching a new project called the <em>First-Class Windows Project</em>.</p>
<h2>Why Windows?</h2>
<p>If the above statistics are not compelling enough, consider that the Windows platform accounts for around <a href="https://gs.statcounter.com/os-market-share/desktop/worldwide/#monthly-202401-202402-bar">73% of laptop/desktop installations</a> and around <a href="https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam?platform=combined">97% of PC gaming</a>. What this tells us is that for many people around the world, Windows is the platform that they are most familiar with. It follows that if we make it easier to use OCaml on Windows, we open up the language to an influx of new users.</p>
<p>These include independent developers who may only be comfortable with the Windows platform, large organisations that can't easily switch their operating system, and several teaching institutions that may use Windows as the default OS for their students.</p>
<h2>Long- and Short-Term Goals of the Project</h2>
<p>The long-term goal is to make Windows a 'first-class' citizen among OCaml platforms, meaning that we deliver an OCaml development experience that matches Linux and macOS. This ambitious project is split into two parts: technical work encompassing the compiler, build tools, development environments, community packages, and set-up tools, alongside advocacy work centred around communication and community collaboration. We need to collaborate closely with the community to not only create the best technical solutions but also be transparent about how they are prioritised and why.</p>
<p>To this end, our first port of call will be to deliver a roadmap outlining the steps necessary to  fully support OCaml on Windows. This roadmap needs to be based on real data and insights from users and reviewed by the OCaml community to ensure it matches the needs of the wider ecosystem. We aim to go beyond "just works" to provide a top-notch developer experience on Windows. For this, we intend to take the following approach.</p>
<h2>Creating a Roadmap</h2>
<p>Is OCaml Windows yet? We have made significant progress towards a better OCaml experience on Windows with Opam 2.2, adding <a href="https://discuss.ocaml.org/t/ann-opam-2-2-0-beta2/14461">native Windows support</a>. Furthermore, the 5.3 release later this year will <a href="https://github.com/ocaml/ocaml/pull/12954">restore MSVC ports</a> to the compiler.</p>
<p>But there's more work to do! Our first mission is to craft a precise list of tasks collaboratively with OCaml community members and commercial users. This will help us prioritise areas of development in the best way. Tarides has the resources to lead this work as we do extensive work on the OCaml compiler, develop many of the platform tools, and lead the CI for the community. Combining our knowledge with that of the community, we have all the components needed to succeed.</p>
<p>To get us started, we have identified five steps required to create our roadmap:</p>
<ol>
<li>Document the current state of OCaml on Windows</li>
<li>Survey the community and commercial users</li>
<li>Survey Windows support in other languages</li>
<li>Review package managers and installation software for Windows</li>
<li>Publish review results and produce a roadmap</li>
</ol>
<h3>Document the Current State of OCaml on Windows</h3>
<p>There are several open-source alternatives available for installing OCaml on Windows machines. Each comes with its own challenges. We expect that the release of <code>opam</code> 2.2 will alleviate some, but it is not a silver bullet.</p>
<p>Another factor we need to address is that the current system is deeply tied to Unix-based scripting, necessitating a Unix layer on Windows for OCaml to function. At present, <a href="https://www.cygwin.com">Cygwin</a> provides this layer, and there's no plan to change that anytime soon. However, as part of this project, we also want to implement native Windows support and remove the Unix layer.</p>
<p>Against this background, step one aims to understand the state of current Windows solutions in OCaml and evaluate their effectiveness. Having this baseline is crucial for any future improvements.</p>
<h3>Survey the Community and Commercial Users</h3>
<p>To coordinate the efforts of many individuals and organisations and enable knowledge sharing, we will set up a working group composed of representatives with different interests. This group will be open and meet regularly to take stock of challenges and progress towards solutions. There are multiple ways to use OCaml on Windows, and different groups of users congregate around different workflows. We want to get as much information from these users as possible, and we mustn't section off these discussions in silos.</p>
<p>We have also published a survey for OCaml users on Windows as another way to understand user experience. We will publish the results of this survey and details on the working group soon.</p>
<p>As a personal aside, I (Sudha) was oblivious to many ways to use OCaml on Windows merely because I had never tried to use OCaml on a Windows machine. This changed when we had to help participants set up OCaml on Windows machines at our OCaml Workshop at the IndiaFOSS conference. Adding to our woes, the venue's WiFi was not great, making it even harder to install WSL. We had to finally resort to using an online REPL (<code>replit</code>).</p>
<h3>Survey Windows Support in Other Languages</h3>
<p>OCaml developers tend to disproportionately favour Unix-based systems, and according to the <a href="https://docs.google.com/forms/d/1OZV7WCprDnouU-rIEuw-1lDTeXrH_naVlJ77ziXQJfg/viewanalytics#:~:text=On%20which%20platform(s)%20do%20you%20develop%20OCaml%20on%3F">2020 OCaml User Survey</a>, only 11% of the respondents programmed on a Windows machine. Over time, this has led to the compiler and applications being optimised for Unix-based systems. Furthermore, the lack of testing on Windows contributes to a subpar experience on the platform. Our goal is to provide a native Windows experience; to do so, we need to account for any bias that may be present due to a lack of experience with OCaml on Windows.</p>
<p>To correct bias and explore best practices, we intend to study other languages that have excellent Windows support, such as Rust and Go, and languages native to Windows, such as C# and F#. Interestingly, similar efforts have been made in languages like Rust to get the UX and performance on Windows on par with Unix environments. We hope lessons from the past will help us chart an informed path for OCaml.</p>
<h3>Review Package Managers and Installation Software for Windows</h3>
<p>As part of this project, we will evaluate the feasibility of moving from a Unix-centric approach to Windows-native components. To do so, we will prototype the packaging of the OCaml platform using the following systems: the OCaml compiler distributed via <a href="https://vcpkg.io/en/">VCPKG</a>, an Opam repository with some selected external dependencies (<code>depexts</code>) migrated to VCPKG, and Opam available via <a href="https://learn.microsoft.com/en-us/windows/package-manager/winget/">Winget</a>. This exploration will help us gain information and experience before crafting a solution.</p>
<h3>Publish Review Results and Produce a Roadmap</h3>
<p>We will display the results of our review efforts on a page titled "Is OCaml Windows yet?", similar to <a href="https://ocaml.org/docs/is-ocaml-web-yet">"Is OCaml Web Yet"</a>. The page will also host the roadmap and be regularly updated so that OCaml.org visitors get an accurate and current status of the project.</p>
<h2>We Want Your Input!</h2>
<p>The success of this project depends on input from the OCaml community. We need to build a picture of what works and what doesn't, not just in OCaml but in other languages. These pain points and best practices will then inform the roadmap we create, and your feedback is critical to ensure we can make the Windows developer experience in OCaml as good as possible!</p>
<p>If you want to share something regarding Windows or anything else about this project, please <a href="/contact/">contact us</a> on our website and let us know your thoughts. You will find updates on the project on the <a href="https://discuss.ocaml.org/">OCaml Discuss forum</a>, as well as on our <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>.</p>
]]></description><link>https://tarides.com/blog/2024-05-22-launching-the-first-class-windows-project</link><guid isPermaLink="false">https://tarides.com/blog/2024-05-22-launching-the-first-class-windows-project.html</guid><dc:creator><![CDATA[ Sudha Parimala, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 22 May 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[The OCaml 5.2 Release: Features and Fixes!]]></title><description><![CDATA[<p>There has been a new release of OCaml! The 5.2 release brings several new features, along with improvements, optimisations, and bug fixes. New features include compaction, ThreadSanitizer, and restored support for compiling to the POWER architeture on OCaml, plus other crucial changes that prepare the ground for future updates.</p>
<p>This post highlights new and restored features and gives you a good overview of the release. We won’t cover everything, however, so if you’re looking for an exhaustive list I recommend that you read the <a href="https://github.com/ocaml/ocaml/blob/5.2/Changes">Changes document</a> on GitHub. Let’s get started!</p>
<h2>Compaction</h2>
<p>The 5.2 release reintroduces compaction to OCaml 5.*. Compaction is a technique where blocks of memory are reordered to be adjacent to each other, releasing the fragmented free spaces  between them in the pools back to the allocator.  In 5.2, the compaction process needs to be multicore compatible, so a parallel compactor is added for the shared pools that make up the GC’s major heap.</p>
<p>This work is part of the ongoing effort to achieve feature parity between OCaml 5.* and OCaml 4.14, and providing users with familiar favourites from previous iterations of the language. The bulk of the compaction work can be found in <a href="https://github.com/ocaml/ocaml/pull/12193">PR #12193</a>, which details how the compaction algorithm works and how the pools of memory are released back to the OS. <a href="https://github.com/sadiqj">Sadiq Jaffer</a> and <a href="https://github.com/NickBarnes">Nick Barnes'</a> impressive efforts, alongside reviews and input from the wider OCaml community, have brought compaction back to OCaml.</p>
<p>There have also been two additional pull requests, <a href="https://github.com/ocaml/ocaml/pull/12859">#12859</a> and <a href="https://github.com/ocaml/ocaml/pull/12850">#12850</a>, which update and fine-tune the commands for compaction to be more accurate and useful. <a href="https://github.com/ocaml/ocaml/pull/12850">#12850</a> adds <code>caml_collect_stats_sample_stw</code> to the major heap cycling stop-the-world (STW) meaning that <code>Gc.quick_stat</code> reflects the state of the heap after a major cycle or compaction accurately. <a href="https://github.com/ocaml/ocaml/pull/12859">#12859</a> ensures that <code>Gc.compact</code> completes a full major cycle before compacting (in contrast to <code>Gc.quick_compact</code> which only performs a single one).</p>
<p>Look out for a post on compaction coming to our blog soon!</p>
<h2>ThreadSanitizer Support</h2>
<p><a href="https://clang.llvm.org/docs/ThreadSanitizer.html">ThreadSanitizer</a> (or TSan) is a tool originally developed by Google that can detect data races that occur during a program's execution. Data races can happen in parallel programs and easily go undetected. Developers can use TSan to monitor their programs, flagging data races so that they can be eliminated before the program is released. Since OCaml 5 brings multicore capabilities to the language, adding support for a reliable way to detect data races has been a top priority.</p>
<p>In the 5.2 release, the big PR <a href="https://github.com/ocaml/ocaml/pull/12114">#12114</a> adds TSan support and introduces a new configure-time flag <code>--enable-tsan</code> to enable compilation with TSan instrumentation. When enabled, the OCaml compiler instruments your executables with calls to TSan's runtime, which keeps a record of previous memory accesses (at a cost to performance). Executables instrumented with TSan will report data races without false positives. The original TSan PR added support for Linux on the x86_64 architecture, and since then the community has added support for all actively maintained tier 1 platforms.</p>
<p>PRs <a href="https://github.com/ocaml/ocaml/pull/12876">#12876</a>, <a href="https://github.com/ocaml/ocaml/pull/12809">#12809</a>, <a href="https://github.com/ocaml/ocaml/pull/12810">#12810</a>, <a href="https://github.com/ocaml/ocaml/pull/12907">#12907</a>, and <a href="https://github.com/ocaml/ocaml/pull/12915">#12915</a>, all extend the TSan support to further platforms including FreeBSD on x86_64, Linux and macOS on arm64, and Linux on RISC-V, POWER, and s390x. PRs <a href="https://github.com/ocaml/ocaml/pull/12681">#12681</a> and <a href="https://github.com/ocaml/ocaml/pull/12746">#12746</a> fix false positives and tidy up some annotations, and PR <a href="https://github.com/ocaml/ocaml/pull/12802">#12802</a> adds a chapter on TSan to the OCaml reference manual. We applaud the hard work of <a href="https://github.com/OlivierNicole">Olivier Nicole</a>, <a href="https://github.com/fabbing">Fabrice Buoro</a>, and <a href="https://github.com/dustanddreams">Miod Vallat</a> (based on initial efforts by <a href="https://github.com/anmolsahoo25">Anmol Sahoo</a>) to bring TSan to OCaml, with feedback and input from <a href="https://github.com/jhjourdan">Jacques-Henri Jourdan</a>, <a href="https://github.com/maranget">Luc Maranget</a>, <a href="https://github.com/shindere">Sébastien Hinderer</a>, <a href="https://github.com/art-w">Arthur Wendling</a>, <a href="https://github.com/gadmm">Guillaume Munch-Maccagnoni</a>, and more!</p>
<p>If you would like to learn more about TSan, you can <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">check out our blog post on the tool</a>.</p>
<h3>TSan in Action</h3>
<p>As part of the work on this update, <a href="https://github.com/gasche">Gabriel Scherer</a>, <a href="https://github.com/eutro">Eutro</a>, <a href="https://github.com/OlivierNicole">Olivier Nicole</a>, <a href="https://github.com/fabbing">Fabrice Buoro</a>, and others have been able to use TSan to catch and fix data race bugs in different parts of the OCaml runtime. A direct benefit of TSan support is the number of data race fixes that this update brings to users, which include:</p>
<ul>
<li><strong>Fix for a Race in the Minor GC:</strong> PRs <a href="https://github.com/ocaml/ocaml/pull/12595">#12595</a> and <a href="https://github.com/ocaml/ocaml/pull/12597">#12597</a> describe a race condition occurring when <code>caml_collect_gc_stats_sample</code> makes calls to <code>domain_terminate</code>, and #12597 outlines the fix implemented in 5.2.</li>
<li><strong>Data Race Between Marking and Sweeping:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12934">#12934</a> fix a <a href="https://github.com/ocaml/ocaml/issues/12916">reported race between marking and sweeping</a> caught by TSan.</li>
<li><strong>Data Race on Global Pools Arrays:</strong> <a href="https://github.com/ocaml/ocaml/pull/12755">PR #12755</a> addresses races on <code>global_avail_pools</code> and <code>global_full_pools</code> members of the <code>struct pool_freelist</code> in <code>shared_heap.c</code>.</li>
<li><strong>Data Races in <code>minor_gc.c</code>:</strong> This <a href="https://github.com/ocaml/ocaml/pull/12737">PR #12737</a> fixes two races, one in the minor GC occurring when promoting the values that are in the remembered set, and one in <code>caml_natdynlink_open</code>.</li>
<li><strong>Data Race fix for #12799:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12851">#12851</a> fixes a bug described in issue <a href="https://github.com/ocaml/ocaml/issues/12799">#12799</a> where runtime events teardown and event emission could race each other.</li>
<li><strong>Data Race When Using the Debug Runtime:</strong> PR <a href="https://github.com/ocaml/ocaml/pull/12969">#12969</a> resolves a data race involving <code>caml_scan_stack</code> and <code>caml_free_stack</code>.</li>
</ul>
<h2>User Experience</h2>
<p>Improving user experience is a high priority and iterative changes are made regularly to make OCaml easier for developers to use, with a special focus on newcomers. Each new release therefore brings quality-of-life improvements alongside the bigger features. In 5.2, examples of these user experience improvements include:</p>
<ul>
<li><strong>Improve Dynlink Error Messages:</strong> In PR <a href="https://github.com/ocaml/ocaml/pull/12213">#12213</a>, <a href="https://github.com/shym">Samuel Hym</a> addresses some complexities in the Dynlink library making certain error messages hard to parse. Changes to the way the errors are wrapped means that they are now simplified and easier to understand.</li>
<li><strong>New Chapter in the Manual:</strong> <a href="https://github.com/OlivierNicole">Olivier Nicole's</a> PR <a href="https://github.com/ocaml/ocaml/pull/12840">#12840</a> adds a new chapter on custom events to the OCaml reference manual. Improved documentation is key to help newcomers get the most out of OCaml, and help developers adopt new tools.</li>
</ul>
<h2>POWER Backend Restored</h2>
<p>In PR <a href="https://github.com/ocaml/ocaml/pull/12276">#12276</a> <a href="https://github.com/xavierleroy">Xavier Leroy</a> restores native-code support for the POWER/PowerPC backend, specifically for the 64 bit little endian architecture. In a subsequent <a href="https://github.com/ocaml/ocaml/pull/12667">PR #12667</a> <a href="https://github.com/awilfox">A. Wilcox</a> extends the support to include 64 bit big endian as well. Leroy's <a href="https://github.com/ocaml/ocaml/pull/12601">PR #12601</a> highlights some of the fine-tuning that took place to transition between OCaml 4.* and 5.*, specifically to implement the leaf functions in a way that would work better for the 64 bit architecture.</p>
<p>The <a href="https://en.wikipedia.org/wiki/IBM_Power_Systems#:~:text=IBM%20Power%20Systems%20is%20a,and%20System%20i%20product%20lines.">IBM POWER processor family</a>, including PowerPC, is used in servers, supercomputers, embedded systems, and even on personal computers. Historically, OCaml has supported the POWER processors, and restoring this support lets users take advantage of post 5.* features on it as well.</p>
<h2>Bug Fixes</h2>
<p>There are literally dozens and dozens of bug fixes included with this release, and I can’t mention them all here, but let’s take a look at a few:</p>
<ul>
<li><strong>Fixing a Segmentation Fault:</strong> In PR <a href="https://github.com/ocaml/ocaml/pull/12726">#12726</a> <a href="https://github.com/nojb">Nicolas Ojeda Bär</a> addresses a segmentation fault that happens when <code>ocamlrun.exe</code> is not found in the PATH.</li>
<li><strong>Locking Bugs:</strong> In PR <a href="https://github.com/ocaml/ocaml/issues/12897">#12897</a> <a href="https://github.com/talex5">Thomas Leonard</a> identifies a locking bug that affects custom events tracing, where the program <code>runtime_events</code> stops indefinitely without being able to proceed.  <a href="https://github.com/gasche">Gabriel Scherer</a> introduces a fix in the form of a mutex in PR <a href="https://github.com/ocaml/ocaml/pull/12900">#12900</a>.</li>
<li><strong><code>Threads</code> Crash:</strong> The <code>threads</code> library contained a bug that could cause a crash, where users could not update <code>Caml_state–&gt;backtrace_last_exn</code> using direct assignment. Doing so caused the program to crash. In PR <a href="https://github.com/ocaml/ocaml/pull/12861">#12861</a> <a href="https://github.com/mshinwell">Mark Shinwell</a> identified and fixed the bug.</li>
</ul>
<h2>Preparing for Upcoming Features</h2>
<p>Part of the contributions to each release focus on preparing the way for future features. 5.2 lays the foundation for some long-anticipated additions including:</p>
<ul>
<li><strong>Project-Wide Occurrences:</strong> <a href="https://github.com/voodoos">Ulysse Gérard's</a> PR <a href="https://github.com/ocaml/ocaml/pull/12508">#12508</a> provides the required steps to support <code>project-wide-occurences</code> in OCaml projects, a feature that many <a href="https://github.com/ocaml/merlin">Merlin</a> users in particular have been waiting for. Work is ongoing to implement this in Merlin to enhance code navigation and refactoring. This change is possible thanks to OCaml's <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/10/Module-Shapes-for-Modern-Tooling">Shapes</a> feature.</li>
<li><strong>Statmemprof:</strong> Statmemprof is a well-loved statistical memory profiler that was removed from OCaml before the multicore 5.0 release, due to unanswered questions about how it would perform. Significant efforts have gone into bringing <code>statmemprof</code> back, and 5.2 prepares the way in two PRs. PR <a href="https://github.com/ocaml/ocaml/issues/11911">#11911</a> features much of the initial conversation and collaboration to bring the feature back, and <a href="https://github.com/NickBarnes">Nick Barnes's</a> <a href="https://github.com/ocaml/ocaml/pull/12381">#12381</a> PR changes part of the memory profiler’s API to prepare it for multicore. This feature is expected to be included in 5.3 this autumn.</li>
<li><strong>MSVC:</strong> MSVC is Microsoft's C/C++ compiler and users can compile OCaml 4.14 with MSVC. However, in OCaml 5.0 the runtime uses C features that MSVC doesn't support, making it incompatible with MSVC. <a href="https://github.com/MisterDA">Antonin Décimo's</a> PR <a href="https://github.com/ocaml/ocaml/pull/12769">#12769</a> unifies MSVC and MinGW-w64 code paths to prepare for full MSVC support for OCaml 5, also expected to be re-introduced in OCaml 5.3 later this autumn.</li>
</ul>
<h2>What’s Next?</h2>
<p>Work on OCaml never stops! The following months will bring more bug fixes and updates to OCaml in the lead up to the 5.3 release, where popular features like MSVC support and <code>statmemprof</code> are being reintroduced. It’s great to see contributors from many different backgrounds coming together to work on improving the OCaml language. Anyone can gain more insight into how the language is developed by exploring the public <a href="https://github.com/ocaml/ocaml">OCaml GitHub repository</a> and the official <a href="https://ocaml.org">OCaml Website</a></p>
<p>Stay in touch with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> – we would love to hear about your experience with 5.2 and how you are using OCaml!</p>
]]></description><link>https://tarides.com/blog/2024-05-15-the-ocaml-5-2-release-features-and-fixes</link><guid isPermaLink="false">https://tarides.com/blog/2024-05-15-the-ocaml-5-2-release-features-and-fixes.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 15 May 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[How to Setup OCaml on Windows with WSL]]></title><description><![CDATA[<p>The stable opam 2.2 and a fully Windows compatible ecosystem of OCaml libraries and tools are getting closer every month. That's extremely exciting! With opam 2.2, Windows users will be able to use OCaml directly and natively without extra set-up or workarounds. Everyone is excited about this future, so we often forget people who want to use OCaml on Windows <em>now</em> and without set-up problems.</p>
<p>In this guide, we demonstrate how to set up OCaml on Windows using WSL2. With WSL2, you can write OCaml programs on your Windows computer while utilising all the benefits of working in a Linux environment.</p>
<p>Note that as an alternative to WSL2, it's already possible to install OCaml natively on Windows. If you wish to install OCaml directly on your PC, follow <a href="https://ocaml.org/install">this guide on how to install OCaml natively on Windows using Diskuv</a> or you can use opam 2.2. We recommend native Windows in the following circumstances:</p>
<ul>
<li>For people who want to have Windows commands available (e.g., <code>dir</code>) instead of Unix commands (e.g., <code>ls</code>). Although it's a tougher set-up process.</li>
<li>For people who want to build OCaml native Windows binaries, e.g., to distribute OCaml apps on Windows</li>
<li>For people who are really into Windows from a perspective of technical curiosity</li>
<li>For advanced programmers who are happy to try native Windows support from the beginning</li>
</ul>
<p>For anyone else, like people who just want to try OCaml on their Windows machine, OCaml on Windows via WSL2 is a great solution, so we'll dive right in.</p>
<h2>Prerequisite</h2>
<ul>
<li>Windows 10 or 11 with WSL2 enabled (check <a href="https://learn.microsoft.com/en-us/windows/wsl/install">Mircrosoft: WSL setup</a> or <a href="https://canonical-ubuntu-wsl.readthedocs-hosted.com/en/latest/guides/install-ubuntu-wsl2/">Ubuntu: WSL setup</a> for a guide on how to setup WSL.)</li>
<li>Ubuntu installed on your WSL setup</li>
<li>An active internet connection</li>
</ul>
<h2>Procedure</h2>
<h3>Update Packages</h3>
<ul>
<li>Open your Ubuntu terminal (e.g., by searching "Ubuntu" in the Start menu).</li>
<li>Run the following command to update the package list and upgrade all packages:</li>
</ul>
<pre><code><span class="sh-source">$ sudo apt update </span><span class="sh-keyword-operator-list">&amp;&amp;</span><span class="sh-source"> sudo apt upgrade
</span></code></pre>
<h3>Install Required Packages</h3>
<ul>
<li>Run the following command to install packages required to succesfully install OCaml:</li>
</ul>
<pre><code><span class="sh-source">$ sudo apt install gcc build-essential curl unzip bubblewrap
</span></code></pre>
<h3>Download and Install opam</h3>
<p><a href="https://opam.ocaml.org/">Opam</a> is OCaml's package manager. It makes it easier to install additional libraries and tools relevant to various OCaml projects. It is similar to package managers like <a href="https://pip.pypa.io/en/stable/">pip</a> for Python or <a href="https://www.npmjs.com/">npm</a> for JavaScript. To download and install opam, run the following command:</p>
<pre><code><span class="sh-source">$ bash -c </span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">sh &lt;(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-source">
</span></code></pre>
<p>The error below (or similar) is due to DNS issues:
If you don't experience this error, you can skip directly to the
<a href="#initialise-opam">Initialise opam section</a></p>
<pre><code><span class="sh-source">curl: </span><span class="sh-punctuation-definition-subshell">(</span><span class="sh-meta-scope-subshell">6</span><span class="sh-punctuation-definition-subshell">)</span><span class="sh-source"> Could not resolve host: raw.githubusercontent
</span></code></pre>
<p>To resolve:</p>
<ul>
<li>Use the nano editor to edit the <code>resolv.conf</code> file by running:</li>
</ul>
<pre><code><span class="sh-source">$ sudo nano /etc/resolv.conf
</span></code></pre>
<ul>
<li>Change the address <code>127.0.0.1</code> to <code>8.8.8.8</code>. The final file should look like this:</li>
</ul>
<pre><code><span class="sh-source">nameserver 8.8.8.8
</span></code></pre>
<ul>
<li>Now, we will use the nano editor to edit the <code>wsl.conf</code> file by running:</li>
</ul>
<pre><code><span class="sh-source">$ sudo nano /etc/wsl.conf
</span></code></pre>
<ul>
<li>Add this entry to the file:</li>
</ul>
<pre><code><span class="sh-source">generateResolvConf = </span><span class="sh-support-function-builtin">false</span><span class="sh-source">
</span></code></pre>
<ul>
<li>The final file should look like this:</li>
</ul>
<pre><code><span class="sh-source">[boot]
</span><span class="sh-source">systemd=true
</span><span class="sh-source">generateResolvConf = </span><span class="sh-support-function-builtin">false</span><span class="sh-source">
</span></code></pre>
<ul>
<li>At this point, we can re-run our script to install opam:</li>
</ul>
<pre><code><span class="sh-source">$ bash -c </span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">sh &lt;(curl -fsSL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-source">
</span></code></pre>
<p>Our script should now run without any issues. If you are prompted for questions, such as where to install it, just press enter to accept the default locations and values.</p>
<h3>Initialise opam</h3>
<p>Now that we have installed opam, the next step is to initalise it. This step creates a new <a href="https://ocaml.org/docs/opam-switch-introduction">opam switch</a>, which acts as an isolated environment for your OCaml development. We can do this by running the command:</p>
<pre><code><span class="sh-source">$ opam init
</span></code></pre>
<p>When prompted, press <code>y</code> to modify the <code>profile</code> file so we can easily run <code>opam</code> commands. If you didn't press <code>y</code>, we can activate the switch by running:</p>
<pre><code><span class="sh-source">$ </span><span class="sh-support-function-builtin">eval</span><span class="sh-source"> </span><span class="sh-punctuation-definition-string-begin">$(</span><span class="sh-string-interpolated-dollar">opam env</span><span class="sh-punctuation-definition-string-end">)</span><span class="sh-source">
</span></code></pre>
<h3>Setup A Development Environment</h3>
<p>Now that we have OCaml setup. The next thing is to setup a development environment by installing packages that make programming in OCaml a much nicer experience. We can do this by using the <code>opam install &lt;package-name&gt;</code> command.
Below are the basic recommended packages and tools for OCaml development:</p>
<ul>
<li><code>ocaml-lsp-server</code> provides an LSP implementation for OCaml giving us a nice experience with editors and IDEs.</li>
<li><code>odoc</code> is a documentation tool that generates human-readable documentation from OCaml code, including comments, types, and function signatures.</li>
<li><code>ocamlformat</code> formats OCaml code according to a defined style guide, ensuring consistent formatting across different code files, which improves readability and maintainability.</li>
<li><code>utop</code> provides an interactive environment for OCaml development where you can directly type in OCaml expressions and see their results immediately. It's perfect for experimenting and testing.</li>
</ul>
<p>We can easily install all these packages at once and set them up by using the <a href="https://github.com/tarides/ocaml-platform-installer?tab=readme-ov-file#trying-the-platform">OCaml Platform Installer</a>. Alternatively, we can manually by type the command below, then follow the installation and set-up instructions:</p>
<pre><code><span class="sh-source">$ opam install ocaml-lsp-server odoc ocamlformat utop
</span></code></pre>
<p>The most supported editors include:</p>
<ul>
<li>VSCode: VSCode also has a WSL version.</li>
<li>Vim</li>
<li>Emacs</li>
</ul>
<p>Based on your preference, you can install any of the editors and begin programming in OCaml. Follow this helpful guide on how to setup your editor: <a href="https://ocaml.org/docs/set-up-editor">Setting up your editor for OCaml development</a>.</p>
<p>We have succesfully setup OCaml on our Windows PC using WSL. Now you can work on OCaml projects just like you would on a Linux computer.</p>
]]></description><link>https://tarides.com/blog/2024-05-08-how-to-setup-ocaml-on-windows-with-wsl</link><guid isPermaLink="false">https://tarides.com/blog/2024-05-08-how-to-setup-ocaml-on-windows-with-wsl.html</guid><dc:creator><![CDATA[ Pizie Dust ]]></dc:creator><pubDate>Wed, 08 May 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[We Host Our First OCaml Retreat in India!]]></title><description><![CDATA[<p>Hacking retreats are great ways for programmers to connect, explore new ideas, and learn from each other. One of the most popular OCaml retreats is the much-loved <a href="https://retreat.mirage.io/">Mirage retreat</a>, organised annually in Morocco. But for us OCaml enthusiasts located elsewhere, the journey can be challenging (not to mention the climate impact!). That’s why the Chennai office decided to host a retreat in Tamil Nadu, India, this past March.
Hosting local events and retreats across the globe encourages more people to participate in the OCaml community. Retreats bring people of different experience levels and interests together, fostering an exchange of ideas, collaboration on projects, and support from more experienced members. These meetings of minds make the community stronger and more diverse.</p>
<h2>What Does an OCaml Retreat Look Like?</h2>
<p>Rather than specialising in one sub-topic, our retreat had a more general focus on OCaml. We encouraged participants to explore new things and share what they had worked on in the past. Our only selection criteria was that people had experience with functional programming, software development, or OCaml. As a result, our participants all came from different backgrounds, from Haskell hackers to cybersecurity developers, and each of them brought a unique perspective to the retreat.</p>
<p>We chose to host the retreat on the outskirts of the cultural landmark city of <a href="https://auroville.org/">Auroville</a> in a quaint villa surrounded by greenery. The villa acted as the main base of the retreat. It was close enough to civilisation that participants could easily access shops and other conveniences but not so close as to be a distraction. Ideally, we wanted to get the benefits of all being in one place without getting side-tracked by the bustle of the city – and it worked!</p>
<p>Our first day began with introductions and discussions of what everyone wanted to work on during the retreat. We assembled in the morning of each day to present our projects and discuss what we were working on that day. Talks were more spur-of-the-moment and casual, and no formal slides were expected (though sometimes still provided!). Our daily discussions sparked new hacking projects and collaboration between attendees.
We shared breakfast and lunch in the house and walked a scenic two kilometres every evening to get to and from dinner. We shared a Google calendar with any upcoming events and where to gather to keep us in sync. But we did more than just code! For example, we went on a tour of Auroville to see some of its most iconic buildings. The most memorable was the Matrimandir, the city’s famous golden dome-like structure, a place for contemplation and meditation.</p>
<p>To help participants find a project to work on, KC, Puneeth, and I put together a list of suitable tasks in a <a href="https://github.com/orgs/tarides/projects/32">GitHub project</a> that they named HackaCamel. We recommend you check it out if you’re curious about the projects completed during the retreat or want to take on a project yourself!</p>
<p>We were oversubscribed for participants for this retreat, which was great to see and gives us an incentive to get a bigger venue next time!</p>
<h2>OCaml Retreat: What Do the Attendees Think?</h2>
<p>Here are some quotes from participants sharing their experience, what they found useful, and what the general atmosphere was like. If you’re considering whether you would like to go to an OCaml retreat, this gives you a helpful overview:</p>
<blockquote>
<p>“My intention to come to the retreat was to know “What I didn’t know that I didn’t know”. Now that I have known what I didn’t know, I will start working on it. As a starter, I enjoyed toying with “Joy” using OCaml to draw.
Now, I am trying to understand MirageOS and its unique capabilities. This, I feel is a new beginning to both me and our organization. Hoping to meet you again in the following retreat.” – Kaushik Hatti.</p>
</blockquote>
<blockquote>
<p>“Coming from a Haskell and Rust background, I struggled to see the additional value I’d get from OCaml, so this was a perfect venue for me to explore it. Getting the facts straight from the source was very valuable. Seeing the faith and energy everybody put into this language was very inspiring.
Other than that, connecting to other functional programmers/learners in person after a long time felt amazing. Also great was being able to interact with people from my own culture with similar taste. Going forward, I hope to contribute to the community in the form of meetups in Singapore and future collaboration with the peers I met at the retreat.” – Karthik Ravikanti.</p>
</blockquote>
<blockquote>
<p>“I was very happy when I got the email offering a spot to attend the OCaml retreat at Auroville, as I was not sure if I was qualified for it.
Being an OCaml newbie, I was not sure what to work on. The HackaCamel list of ideas to work on during the retreat (and beyond) helped me. I choose to go through the OCaml5 tutorial, Parallel Programming in multicore OCaml tutorial and OCaml Effects tutorial. I wish I knew more OCaml to ask questions to the various experts who were present. Hope to do it in the next retreat.
My plan is to learn OCaml and contribute to its open-source projects this year. The retreat has provided enough motivation and ideas for this.” – C R Anish.</p>
</blockquote>
<blockquote>
<p>“I went to the retreat, not knowing what to expect. All I had thought of was that there would be no distractions, and this would be a good week to focus on getting something done. That was there, but the retreat was so much more than that. All the informal interactions - asking for help, discussing each other’s work, talking about X interesting thing - were instead the highlight for me.
There was so much to learn from each other and with each other.” – Kaustubh.</p>
</blockquote>
<blockquote>
<p>"The OCaml retreat at Auroville was a memorable one, especially for me. I made my first contribution to OCaml during this retreat.
I came to this retreat to learn how Okra works. Since I’m working on extracting the raw data from the engineers’ weekly reports, I wanted to understand how the tool(Okra) works, which could be helpful for me later.
I was new to coding. So, Puneeth suggested starting with OCaml Joy. Thanks to Sudha and Kaustubh, I started learning OCaml Joy. It was fun playing with Joy. Later, I created a new PR with the help of Puneeth, Sudha and Kaustubh to draw a smile emoji using Joy.” – Ganesh (the only non-programmer at the retreat!).</p>
</blockquote>
<p>You can discover more experience reports from the attendees in a <a href="https://ocamlretreat.org/2024/03/24/retreat-experience.html">post on the website</a> created for the retreat.</p>
<h2>Picture This</h2>
<p>To really give you an idea of what the retreat was like, it's best we just show you! Here are a few photos from the retreat:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/aurovillecoffee-170w~hg-EIeoiIMlBrS9OCEC3Qg.webp 170w, /blog/images/aurovillecoffee-340w~l2xSXfXnI-BYytAN6eyFvQ.webp 340w, /blog/images/aurovillecoffee-680w~ttmi6mFctqTblir4TCMtMA.webp 680w, /blog/images/aurovillecoffee-1360w~LT8kxAbtQ40upqmtyFD6fg.webp 1360w" src="/blog/images/aurovillecoffee-1360w~LT8kxAbtQ40upqmtyFD6fg.webp" alt="One of the participants stands in a big sliding doorway between the main house and the garden. He is drinking a cup of coffee and looking out over the lawn and trees."></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/aurovillegroupphoto-170w~HSAs19ETKZcI8_hFSnwrUw.webp 170w, /blog/images/aurovillegroupphoto-340w~uRiusxgbBK2z3yRZOdiQ8Q.webp 340w, /blog/images/aurovillegroupphoto-680w~zLLACf3lBV_p32CZ1IRAfQ.webp 680w, /blog/images/aurovillegroupphoto-1360w~lu6mIMskNhaJPgHH2XKGhw.webp 1360w" src="/blog/images/aurovillegroupphoto-1360w~lu6mIMskNhaJPgHH2XKGhw.webp" alt="A group photo of ten retreat participants. They are standing in front of an outdoor bar and smiling at the camera."></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/aurovillemerch-170w~XVlUlexU-padLApjOVfWxA.webp 170w, /blog/images/aurovillemerch-340w~Oiw3p1D9f-5FONlHsiT_Ug.webp 340w, /blog/images/aurovillemerch-680w~dowA5EOastbOSzJmDfe-3w.webp 680w, /blog/images/aurovillemerch-1360w~Z2wwfSgqgBg8Dbg-pgJxCw.webp 1360w" src="/blog/images/aurovillemerch-1360w~Z2wwfSgqgBg8Dbg-pgJxCw.webp" alt="A t-shirt with a graphic design depicting the Auroville dome with a two-humped camel in front."></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/aurovilletalk-170w~fkiP28eHOTKQPZW2rzd_Eg.webp 170w, /blog/images/aurovilletalk-340w~WzxJtTgTKPoBAwSwMJ1yFg.webp 340w, /blog/images/aurovilletalk-680w~8PHxw24cnk5LVha_3VFkyg.webp 680w, /blog/images/aurovilletalk-1360w~_9El1mHCzS9x57mf_JfMOw.webp 1360w" src="/blog/images/aurovilletalk-1360w~_9El1mHCzS9x57mf_JfMOw.webp" alt="One of the participants is giving a talk standing in front of a projector screen with the text &quot;unikernels and library operating system&quot;."></p>
<h2>Until Next Time</h2>
<p>Retreats allow developers to be face-to-face, motivating and inspiring each other to learn new skills and start new projects. Organising more in different places around the globe will encourage new people to discover what the OCaml community has to offer.</p>
<p>Would you like to attend one of our retreats? Or stay up-to-date with what we’re doing in general? Follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to get the latest updates from us!</p>
]]></description><link>https://tarides.com/blog/2024-05-01-we-host-our-first-ocaml-retreat-in-india</link><guid isPermaLink="false">https://tarides.com/blog/2024-05-01-we-host-our-first-ocaml-retreat-in-india.html</guid><dc:creator><![CDATA[ Sudha Parimala, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 01 May 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Under the Hood: Developing Multicore Property-Based Tests for OCaml 5]]></title><description><![CDATA[<p>In 2022, Multicore OCaml became reality. Programming on multiple threads brings new possibilities, but also new complexities. In order to foster confidence in OCaml 5 and retain OCaml's reputation as a trustworthy and memory-safe platform, Tarides has developed <a href="https://github.com/ocaml-multicore/multicoretests"><code>multicoretests</code></a>: Two property-based testing libraries with a test suite built on top. This effort <a href="https://github.com/ocaml-multicore/multicoretests#issues">has successfully pinpointed a range of issues</a> and contributed towards a stable multicore environment for the OCaml community to build on.</p>
<p>In this article and in the upcoming part two, I describe how we developed <a href="https://github.com/ocaml-multicore/multicoretests">property-based tests</a> for OCaml 5, the challenges we encountered, and the lessons we learned. Part one will focus mainly on the two open-source testing libraries <code>STM</code> and <code>Lin</code>, including some of our findings along the way. It may be of interest to both compiler hackers and library writers who are curious about how their code behaves under parallel usage.</p>
<h2>Unit Testing vs. Property-Based Testing</h2>
<p>In traditional <a href="https://en.wikipedia.org/wiki/Unit_testing">unit testing</a>, a developer asserts an expected result for a given input on a case-by-case basis. For example, when calling OCaml's <code>floor : float -&gt; float</code> function with argument <code>0.5</code> we expect the result to be <code>0.</code>:</p>
<pre><code><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.5</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<p>In general, one can imagine a range of test cases on this form:</p>
<pre><code><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.5</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.9999999</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1.0000001</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">10.</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">10.999999</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>Rather than manually writing several of these tests, property-based testing (PBT) (QuickCheck) advocates for expressing a general <em>property</em> which should hold true across all inputs. For example, for any input <code>f</code> given to the <code>floor</code> function, we expect the result to be less or equal to <code>f</code>: <code>floor f &lt;= f</code> to capture that the function is rounding down. Based on this presumption, we can test this property on any <code>f</code> provided by a <em>generator</em> conveniently named <code>float</code>, here phrased as a <a href="https://github.com/c-cube/qcheck">QCheck</a> test:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">floor_test</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Test</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-support-type">float</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<p>Such a parameterised property-based test allows us to check the property for each input generated. For the above example, this corresponds to a collection of test cases beyond what developers like to write by hand:</p>
<pre><code><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1313543.66397378966</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1313543.66397378966</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">24763.5086878848342</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">24763.5086878848342</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1280.58075149504566</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1280.58075149504566</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">0.00526932453845931851</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">0.00526932453845931851</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">35729.1783938070657</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">35729.1783938070657</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">152180.150007840159</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-float">152180.150007840159</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">floor</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.000198774450118538313</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.000198774450118538313</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>By default, QCheck's <code>Test.make</code> runs 100 such test cases, but by passing an optional <code>~count</code> parameter, we can raise the test count to our liking with minimal effort. In <code>QCheck</code>, the input for each such test case is randomised and produced with the help of a <a href="https://en.wikipedia.org/wiki/Pseudorandom_number_generator">pseudo-random number generator (PRNG)</a>. By passing the same seed to the PRNG, we are thus able to trigger the same test case runs and reproduce any issue we may encounter.</p>
<p>Testing is still incomplete since, as captured in the immortal words of Edsger Dijkstra:</p>
<blockquote>
<p>“Program testing can be used to show the presence of bugs, but never to show their absence!” -- Edsger W. Dijkstra, <a href="https://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF">Notes On Structured Programming</a>, 1970</p>
</blockquote>
<p>Property-based testing does not change that incompleteness. However, because the amount of test cases is less tied to developer effort, PBT tends to be 'less incomplete' than handwritten test cases and can thus reveal the presence of otherwise undetected bugs. Its effectiveness, however, depends on the strength of the tested properties and the distribution of test inputs from the generator. For example, the property <code>floor f &lt;= f</code> alone does not fully capture <code>floor</code>'s correct behaviour. Similarly, we may want to adjust the generator's distribution to exercise <code>floor</code> on corner cases such as <code>nan</code> or floating point numbers ending in <code>.0</code> or <code>.5</code>.</p>
<h2>Property-Based Testing With a State-Machine Model</h2>
<p>Above, we saw an example of using randomised input to test a property of <em>one</em> function, <code>floor</code>, in isolation. Often, software defects only appear when combining a particular sequence of function calls. A property-based test against a state-machine model allows us to test behaviour across a <em>random combination of function calls</em>. For each call, we perform it twice: once over the <em>system under test</em> and once over a purely functional reference <em>model</em>, and finally compare the two results as illustrated in the below figure. This idea grew out of the <a href="https://clean-lang.org">Clean</a> and <a href="https://citeseerx.ist.psu.edu/document?repid=rep1&amp;type=pdf&amp;doi=b268715b8c0bcebe53db857aa2d7a95fbb5c5dbf">Erlang QuickCheck</a> communities and has since been <a href="https://github.com/jmid/pbt-frameworks">ported to numerous other programming languages</a>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/model-based-170w~kEfw_Ufkb96eqmnxL67Kxg.webp 170w, /blog/images/model-based-340w~ISmXWJIQ4P-qLKHfEfSFIA.webp 340w, /blog/images/model-based-680w~8deyGv2JgfZdEAVWx1_rew.webp 680w, /blog/images/model-based-1360w~99yrjjPreNm1vVdT9WHSTg.webp 1360w" src="/blog/images/model-based-1360w~99yrjjPreNm1vVdT9WHSTg.webp" alt="A diagram with two rows of 'call' boxes (one for the System Under Test and one for the Model) along with arrows between the corresponding call boxes"></p>
<p>Suppose we want to test a selection of the <code>Float.Array</code> interface across random combinations using OCaml's <code>qcheck-stm</code> test library. To do so, we first express a type of symbolic commands, <code>cmd</code>, along with a function <code>show_cmd</code> to render them as strings:</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">cmd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">To_list</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sort</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Set</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-support-type">float</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">show_cmd</span><span class="ocaml-source"> </span><span class="ocaml-source">cmd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">cmd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">To_list</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">To_list</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sort</span><span class="ocaml-source">       </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Sort</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Set</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">x</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Printf</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sprintf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Set (</span><span class="ocaml-constant-character-printf">%i</span><span class="ocaml-string-quoted-double">, </span><span class="ocaml-constant-character-printf">%F</span><span class="ocaml-string-quoted-double">)</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source">
</span></code></pre>
<p>We will furthermore need to express the type of our 'System Under Test' (SUT), how to initialise it, and how to clean up after it:</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">sut</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">floatarray_size</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init_sut</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-source">floatarray_size</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1.0</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cleanup</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>This will test a float array of size 12, initialised to contain <code>1.0</code> entries.
For the <code>cleanup</code> we will just let OCaml's garbage collector reclaim the array for us.</p>
<p>We can now phrase an interpreter over the symbolic commands. We annotate each result with combinators and wrap each result up in a <code>Res</code> constructor for later comparison. For example, since <code>Float.Array.to_list</code> returns a <code>float list</code> it is annotated with the combinator <code>list float</code> mimicking its return type:</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">run</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">fa</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">To_list</span><span class="ocaml-source">   </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Res</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-support-type">float</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_list</span><span class="ocaml-source"> </span><span class="ocaml-source">fa</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sort</span><span class="ocaml-source">      </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Res</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-support-type">unit</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sort</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare</span><span class="ocaml-source"> </span><span class="ocaml-source">fa</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Set</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">f</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Res</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">result</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source"> </span><span class="ocaml-source">exn</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">protect</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">fa</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Since <code>Float.Array.set</code> may raise an out of bounds exception, we wrap its invocation with <code>protect</code> which will turn the result into an OCaml <code>Result</code> type, and suitably annotate it with <code>result unit exn</code> to reflect that it may either complete normally or raise an exception.</p>
<p>Now, what should we compare the <code>Float.Array</code> operations to? We can express a pure <em>model</em>, capturing its intended meaning. The state of a float array can be expressed as a simple <code>float list</code>. We then explain to <code>STM</code> how to initialise this model with <code>init_state</code> and how each of our 3 commands change the state of the model using a second interpreter:</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">state</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">float</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init_state</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">init</span><span class="ocaml-source"> </span><span class="ocaml-source">floatarray_size</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1.0</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">next_state</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">To_list</span><span class="ocaml-source">   </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sort</span><span class="ocaml-source">      </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sort</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Set</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">f</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">mapi</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">j</span><span class="ocaml-source"> </span><span class="ocaml-source">f'</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">j</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">f'</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span></code></pre>
<p>Out of the three commands, only <code>To_list</code> does not change the underlying array and hence returns our model <code>s</code> unmodified. The <code>Sort</code> case utilises <code>List.sort</code> to sort the model accordingly. Finally the <code>Set</code> case expresses how the <code>list</code> model is updated on the <em>i</em>-th entry, to reflect the array assignment of a new entry <code>f</code>.</p>
<p>With a model in place, we can then express as pre- and post-conditions what we deem acceptable behaviour. As none of the functions have pre-conditions we leave <code>precond</code> as constantly <code>true</code>:</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">precond</span><span class="ocaml-source"> </span><span class="ocaml-source">_cmd</span><span class="ocaml-source"> </span><span class="ocaml-source">_s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">postcond</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">s</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">float</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">To_list</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Res</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">fs</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-source">fs</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sort</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Res</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Unit</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">r</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Set</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Res</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Result</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Unit</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-constant-language-capital-identifier">Exn</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">||</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Invalid_argument</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">index out of bounds</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ok</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source">
</span></code></pre>
<p>In the <code>To_list</code> case we use <code>List.equal</code> to compare the actual list result to the model. Since <code>Sort</code> returns a <code>unit</code> and is executed for its side effect, there is not much to verify about the result. Finally in the <code>Set</code> case we verify that <code>cmd</code> fails as expected when receiving invalid array indices.</p>
<p>As a final piece of the puzzle we write a function <code>arb_cmd</code> to generate arbitrary commands using QCheck's combinators:</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">arb_cmd</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int_gen</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Gen</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">frequency</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">small_nat</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">                                   </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">7</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">int_bound</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">float_gen</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Gen</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">float</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">QCheck</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> ~</span><span class="ocaml-source">print</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">show_cmd</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Gen</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">oneof</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">return</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">To_list</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">               </span><span class="ocaml-source">return</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sort</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">               </span><span class="ocaml-source">map2</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Set</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">f</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">int_gen</span><span class="ocaml-source"> </span><span class="ocaml-source">float_gen</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>The function accepts a <code>state</code> parameter to enable model-dependent <code>cmd</code> generation. Here we use it to generate an array index guaranteed to be within bounds in 7/8 of the cases. In the other cases we fall back on QCheck's <code>small_nat</code> generator to check the out-of-bounds indexing behaviour of <code>Float.Array.set</code>. Overall, we choose uniformly between generating either a <code>To_list</code>, a <code>Sort</code>, or a <code>Set</code> <code>cmd</code>.</p>
<p>Assuming we surround the above code in a suitable OCaml module <code>FAConf</code>, we can pass it
to the functor <code>STM_sequential.Make</code> to create a runnable sequential <code>STM</code> test:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">FAConf</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">FA_STM_seq</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">STM_sequential</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">FAConf</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">QCheck_base_runner</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run_tests_main</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">FA_STM_seq</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">agree_test</span><span class="ocaml-source"> ~</span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">1000</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Sequential STM Float Array test</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>This test quickly checks 1000 <code>cmd</code> lists for agreement with our model:</p>
<pre><code>random seed: 271125846
generated error fail pass / total     time test name
[✓] 1000    0    0 1000 / 1000     0.5s Sequential STM Float Array test
================================================================================
success (ran 1 tests)
</code></pre>
<p>This hasn't always been so – on all platforms at least. When testing OCaml 5's newly restored PowerPC backend, <a href="https://github.com/ocaml/ocaml/issues/12482">we first started to observe crashes on array-related tests such as the above</a>. This was <a href="https://github.com/ocaml/ocaml/pull/12540">fixed by Tarides compiler engineer Miod Vallat by changing the PowerPC compiler backend to avoid using signals for array-bounds checks</a>. However that fix alone wasn't enough to get the <code>STM</code> float array test passing. In particular, <a href="https://github.com/ocaml/ocaml/pull/12540#issuecomment-1713640499">it found wrong <code>float</code> values appearing, causing a disagreement between the SUT and the model on the 64-bit PowerPC platform</a>.</p>
<p>For example, here is the output of our example model from one such failing run under PowerPC:</p>
<pre><code>random seed: 421297093
generated error fail pass / total     time test name
[✗]   27    0    1   26 / 1000     0.0s STM Float Array test sequential

--- Failure --------------------------------------------------------------------

Test STM Float Array test sequential failed (0 shrink steps):

   To_list
   Sort
   Set (1, 9.02935000701)
   Set (0, 118.517154099)
   Set (8, -0.33441184552)
   To_list
   Set (5, -0.000114416837276)
   Sort
   To_list


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test STM Float Array test sequential:

  Results incompatible with model

   To_list : [1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.]
   Sort : ()
   Set (1, 9.02935000701) : Ok (())
   Set (0, 118.517154099) : Ok (())
   Set (8, -0.33441184552) : Ok (())
   To_list : [118.517154099; 9.02935000701; 1.; 1.; 1.; 1.; 1.; 1.; -0.33441184552; 1.; 1.; 1.]
   Set (5, -0.000114416837276) : Ok (())
   Sort : ()
   To_list : [-0.33441184552; -0.000114416837276; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 118.517154099; 1.; 9.02935000701]
================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)
</code></pre>
<p>Here the counterexample consists of 9 <code>cmd</code>s first printed without the returned results and then with the observed result. After <code>set</code>ting 4 entries (index 1, 0, 8, and 5) arbitrarily the result of the final <code>to_list</code> appears unsorted with the <code>118.517154099</code> entry strangely out of place! A similar observation from our larger model-based <code>STM</code> test prompted us <a href="https://github.com/ocaml/ocaml/pull/12540#issuecomment-1713640499">to create and share a small stand-alone reproducer illustrating the misbehaviour</a>. With that at hand, OCaml's own Xavier Leroy then quickly identified and fixed the bug, which was caused by PowerPC's FPR0 floating point register not being properly saved and restored across function calls. Both these fixes went into <a href="https://github.com/ocaml/ocaml/pull/12540">ocaml/ocaml#12540</a> and will be included in the forthcoming OCaml 5.2 release, restoring the 64-bit POWER backend.</p>
<h2>Testing Parallel Behaviour Against a Sequential <code>STM</code> Model</h2>
<p>Since Multicore OCaml programs can be non-deterministic – meaning that they can behave in not just one way but in a number of different, equally acceptable, ways – it is harder to capture acceptable behaviour in a test. Furthermore, errors may go unnoticed or be hard to reproduce, which further complicate their testing and debugging.</p>
<p>Fortunately a sequential model can also function as an oracle for the observed behaviour of the SUT under parallel usage. This idea originates from the paper <a href="http://happy-testing.com/hans/papers/ICFP2009-PULSE.pdf"><em>"Finding Race Conditions in Erlang with QuickCheck and PULSE"</em> by Claessen et al.,</a> from ICFP 2009. Rather than generate a sequential list of arbitrary <code>cmd</code>s, one can generate two such <code>cmd</code> lists to be executed in parallel. If we add a "sequential prefix" to bring the SUT to an arbitrary state before <code>spawn</code>-ing two parallel <code>Domain</code>s, the result is an upside-down <code>Y</code>-shaped test.</p>
<p>Without changing anything, from the model above we can create a parallel <code>STM</code> test by passing our specification module <code>FAConf</code> to the functor <code>STM_domain.Make</code>:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">FA_STM_dom</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">STM_domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">FAConf</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">QCheck_base_runner</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run_tests_main</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">FA_STM_dom</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">agree_test_par</span><span class="ocaml-source"> ~</span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">1000</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Parallel STM Float Array test</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>This will test that each observed parallel behaviour can be explained by <em>some</em> sequential interleaved <code>cmd</code> run of the model. As such, the property performs an interleaving search.
To counter that each random <code>Y</code>-shaped <code>cmd</code> input may yield a non-deterministic answer, <code>STM</code> repeats each property 25 times and fails if just one of the runs cannot be explained by an interleaved model run.</p>
<p>Running the test quickly finds a counterexample, illustrating that float arrays are not safe to use in parallel:</p>
<pre><code>random seed: 224773045
generated error fail pass / total     time test name
[✗]    1    0    1    0 / 1000     1.0s Parallel STM Float Array test

--- Failure --------------------------------------------------------------------

Test Parallel STM Float Array test failed (7 shrink steps):

                                  |
                       Set (8, -327818.639845)
                                  |
                     .------------------------.
                     |                        |
                  To_list                   Sort


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test Parallel STM Float Array test:

  Results incompatible with linearized model

                                                               |
                                               Set (8, -327818.639845) : Ok (())
                                                               |
                                 .-----------------------------------------------------------.
                                 |                                                           |
     To_list : [1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.]                          Sort : ()

================================================================================
failure (1 tests failed, 0 tests errored, ran 1 tests)
</code></pre>
<p>The produced counterexample shows that from a float array with all <code>1.0</code>-entries, if one sets entry 8 to, e.g., <code>-327818.639845</code> and then proceeds to sample the array contents with a <code>To_list</code> call in parallel with executing a mutating call to <code>Sort</code>, we may experience unexpected behaviour: The <code>To_list</code> result indicates no <code>-327818.639845</code> entry! This illustrates and confirms that OCaml arrays and <code>Float.Array</code> in particular are not safe to use in parallel without coordinated access, e.g., with a <code>Mutex</code>.</p>
<h2>Lowering Model-Requirements With <code>Lin</code></h2>
<p>One may rightfully point out that developing an <code>STM</code> model takes some effort. This inspired us to develop a simpler library <code>Lin</code>, requiring substantially less input from its end user. <code>Lin</code> also tests a property by performing and recording the output of a Y-shaped parallel run like <code>STM</code>. However, it does so by trying to consolidate the outcome against <em>some</em> sequential run of the tested system, by performing a search over all possible <code>cmd</code> interleavings. In effect, <code>Lin</code> thus reuses the tested system as a "sequential oracle".</p>
<p>Here is the complete code for a corresponding <code>Float.Array</code> example, now using <code>Lin</code>:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">FAConf</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">array_size</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-source">array_size</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">1.0</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cleanup</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lin</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">int_small</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">api</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Float.Array.to_list</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_list</span><span class="ocaml-source">        </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-support-type">float</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Float.Array.sort</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sort</span><span class="ocaml-source"> </span><span class="ocaml-source">compare</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Float.Array.set</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">     </span><span class="ocaml-constant-language-capital-identifier">Float</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source">            </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">float</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning_or_exc</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>Just as with <code>STM</code>, <code>Lin</code> needs to be told the type <code>t</code> of the system under test, how to initialise it with <code>init</code>, and how to clean up after it with <code>cleanup</code>. Finally, we describe the type signatures of the tested system with a combinator-based DSL in the style of OCaml's <code>ctypes</code> library. In the <code>to_list</code> case, the input is pretty close to the signature <code>to_list : t -&gt; float list</code> from the <code>Float.Array</code> signature. In the <code>sort</code> case, we test the result of passing the <code>Float.compare</code> function. Finally, in the <code>set</code> case, the <code>returning_or_exc</code> combinator expresses that an out-of-bounds exception may be raised.</p>
<p>Based on the above, we can now create and run our <code>qcheck-lin</code> test as follows:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">FA_Lin_dom</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lin_domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">FAConf</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">QCheck_base_runner</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run_tests_main</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">FA_Lin_dom</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">neg_lin_test</span><span class="ocaml-source"> ~</span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">1000</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Lin Float.Array test with Domain</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>The result of the <code>Lin_domain.Make</code> functor offers both <code>lin_test</code> for a positive test of sequential consistency and <code>neg_lin_test</code> as we use here for a negative test, confirming absense of sequential consistency with a counterexample:</p>
<pre><code>random seed: 349336243
generated error fail pass / total     time test name
[✓]    2    0    1    1 / 1000     1.1s Lin Float.Array test with Domain

--- Info -----------------------------------------------------------------------

Negative test Lin Float.Array test with Domain failed as expected (27 shrink steps):

                                            |
                            Float.Array.set t 0 111.772797434
                                            |
                          .----------------------------------.
                          |                                  |
                Float.Array.to_list t               Float.Array.sort t


+++ Messages ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Messages for test Lin Float.Array test with Domain:

  Results incompatible with sequential execution

                                                                                                   |
                                                                              Float.Array.set t 0 111.772797434 : Ok (())
                                                                                                   |
                                                   .-----------------------------------------------------------------------------------------------.
                                                   |                                                                                               |
     Float.Array.to_list t : [111.772797434; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 1.; 111.772797434]                                     Float.Array.sort t : ()

================================================================================
success (ran 1 tests)
</code></pre>
<p>Note how the above result is a passing test, which found a counterexample in the second attempt: 2 generated, 1 pass, 1 fail. After shrinking, the resulting shape is similar to the one found by <code>STM</code>. The details differ from run to run, reflecting the non-determinism of the tested program. The output this time illustrates how the non-<code>1.0</code> entry <code>111.772797434</code> can unexpectedly show up twice in a result from <code>to_list</code>, again illustrating how <code>Float.Array</code> is unsafe to use in parallel.</p>
<h2>Growing a Test Suite</h2>
<p>Over time we have developed a growing test suite, which includes tests of (parts of) <code>Array</code>, <code>Atomic</code>, <code>Bigarray</code>, <code>Buffer</code>, <code>Bytes</code>, <code>Dynlink</code>, <code>Ephemeron</code>, <code>Hashtbl</code>, <code>In_channel</code>, <code>Out_channel</code>, <code>Lazy</code>, <code>Queue</code>, <code>Semaphore</code>, <code>Stack</code>, <code>Sys</code>, and <code>Weak</code> from OCaml's <code>Stdlib</code> in addition to the above-mentioned <code>Float.Array</code> test.</p>
<p>Not everything fits into the <code>STM</code> and <code>Lin</code> formats. To stress-test the new runtime's primitives underlying the <code>Domain</code> and <code>Thread</code> modules, we have developed separate ad-hoc property-based tests of each of these, as well as a property-based test of their combination.</p>
<h2>Growing a CI System</h2>
<p>Starting from a single GitHub Actions workflow to run the test suite on Linux, we have gradually added additional CI targets, to the point that we can now run the test suite under Linux, macOS, and
Windows (MinGW + Cygwin). Furthermore, we can run the test suite on OCaml with particular configurations, such as</p>
<ul>
<li>Bytecode builds</li>
<li>32-bit builds</li>
<li>Enabling frame pointers</li>
<li>The debug runtime</li>
</ul>
<p><a href="https://github.blog/changelog/2024-01-30-github-actions-introducing-the-new-m1-macos-runner-available-to-open-source/">Until recently</a> GitHub Actions offered only amd64-based machines for testing, limiting testing to just one OCaml compiler backend. Our colleague Ben Andrew therefore built <a href="https://github.com/ocurrent/multicoretests-ci"><code>multicoretests-ci</code></a> – an <code>ocurrent</code>-based CI system that lets us run the testsuite on a range of additional platforms:</p>
<ul>
<li>Linux ARM64</li>
<li>macOS ARM64</li>
<li>Linux PowerPC64</li>
<li>Linux s390x</li>
<li>FreeBSD amd64</li>
</ul>
<p>The above mentioned POWER bugs were found thanks to <code>multicoretests-ci</code> runs.</p>
<h2>Understanding Issues Found</h2>
<p>Up to this point, we have found 30 issues, ranging from test cases crashing OCaml's runtime to discovering that the <code>Sys.readdir</code> function may behave differently on Windows. In order to better understand the issues we have found, we have divided them into categories:</p>
<ul>
<li><code>runtime</code> – for issues requiring a change in OCaml's runtime system</li>
<li><code>stdlib</code> – for issues requiring a change in OCaml's standard library</li>
<li><code>codegen</code> – for issues requiring a change in a backend code generator</li>
<li><code>flexdll</code> – for issues requiring a change in the <code>FlexDLL</code> tool for Windows</li>
<li><code>dune</code> - for issues requiring a change in the <code>dune</code> build system tool</li>
<li><code>domainslib</code> – for issues discovered while testing the <code>domainslib</code> library</li>
<li><code>lockfree</code> – for issues discovered while testing the <code>lockfree</code> (now: <code>saturn</code>) library</li>
</ul>
<p>The found issues are distributed as follows:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/bug-distribution-170w~tQy03nJYyUsV3YBpKwNn7w.webp 170w, /blog/images/bug-distribution-340w~GvIrQDb4tEFOCdEk_Zscvg.webp 340w, /blog/images/bug-distribution-680w~Vxve0CZDYe-SFWQFA3_zxw.webp 680w, /blog/images/bug-distribution-1360w~g03jil8i1jQzxeoRyI54dg.webp 1360w" src="/blog/images/bug-distribution-1360w~g03jil8i1jQzxeoRyI54dg.webp" alt="A pie chart illustrating the following distribution: 50% runtime, 17% stdlib, 13% codegen, 10% domainslib, 3% dune, 3% flexdll, 3% lockfree"></p>
<p>This distribution was a surprise to us: Half of the issues are runtime related! Initially, we had expected to use the PBT approach to test OCaml's existing <code>Stdlib</code> for safety under parallel usage. However, as testing has progressed, it has become apparent that the approach also works well to stress test and detect errors in the new multicore runtime.</p>
<p>The above only counts fixed issues for which PRs have been merged. Without a 'fix PR', it is harder to judge where changes are required and thus to categorise each issue. In addition to the above, we are currently aware of at least 3 additional outstanding issues that we need to investigate further.</p>
<h2>Lin vs STM vs TSan vs DSCheck</h2>
<p>At Tarides, we test multicore OCaml with a variety of tools. <code>Lin</code> tests are relatively easy to write and useful in themselves, but they test a weaker property compared to <code>STM</code>. For one, they only test a parallel property. Secondly, they do not express anything about the intended semantics of a tested API, e.g. a module with functions consistently raising a <code>Not_implemented_yet</code> exception would pass a <code>Lin</code> test. Furthermore, if we had only used a negative <code>Lin</code> test such as the above, the PowerPC register bug is likely to have been missed or disregarded as a parallel-usage misbehaviour.</p>
<p>On the other hand, a <code>Lin</code> test was sufficient to reveal, e.g. <a href="https://github.com/ocaml-multicore/multicoretests/pull/214">early out-of-thin air values from the <code>Weak</code> module</a> or <a href="https://github.com/ocaml/ocaml/issues/11878">reading of uninitialised bytes with <code>In_channel.seek</code> on a channel</a>. As such, <code>Lin</code> and <code>STM</code> present a trade-off between required user input and provided guarantees in a passing test.</p>
<p><a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">ThreadSanitizer (TSan) for OCaml</a> is a compiler instrumentation mode targeted at detecting data races in OCaml code. For comparison, TSan may detect races even if a thread schedule has not revealed a difference in the resulting output, an ability which is beyond <code>Lin</code> and <code>STM</code> as <a href="https://en.wikipedia.org/wiki/Black-box_testing">black-box testing libraries</a>. On the other hand, <code>Lin</code> and <code>STM</code> can detect a broader class of observable defects. In one case, <code>STM</code> even detected <a href="https://github.com/ocaml/ocaml/pull/12707">an issue caused by a race between atomic reads and writes</a>. As such TSan and PBT are very much complementary tools.</p>
<p><a href="/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1/">DSCheck</a> is a model-checking tool that exhaustively explores all thread schedules (up to some bound). As such, OCaml code tested with DSCheck ensures correctness even for very rarely occurring schedules, thus offering an advantage over <code>Lin</code> and <code>STM</code>. On the other hand, <code>Lin</code> and <code>STM</code> excel in exploring random combinations of <code>cmd</code>s and input parameters, rather than the thread schedules they may give rise to. As such, we again see DSCheck and PBT as supplementary.</p>
<p>Finally, <a href="/blog/2022-12-22-ocaml-5-multicore-testing-tools/">Arthur Wendling's earlier blog post</a> offers an example that nicely illustrates how <code>Lin</code>, TSan, and DSCheck can complement each other well when developing and testing multicore OCaml code. <a href="/blog/2024-04-10-multicore-testing-tools-dscheck-pt-2/">Part two of our DSCheck series</a> provides additional background on how the model checker works and how we use it to test data structures for the <a href="https://github.com/ocaml-multicore/saturn">Saturn</a> library.</p>
<h2>End of Part One</h2>
<p>We will stop here for now and continue in the second part of this miniseries. The next part will focus on the challenges and lessons learned during the process I've described. I will share some findings that surprised us and threw spanners in the works, as well as how we got creative to overcome them.</p>
<p>In the meantime, you can stay up-to-date with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>, and sign up <a href="/contact/">for our Newsletter</a>. See you next time!</p>
]]></description><link>https://tarides.com/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5</link><guid isPermaLink="false">https://tarides.com/blog/2024-04-24-under-the-hood-developing-multicore-property-based-tests-for-ocaml-5.html</guid><dc:creator><![CDATA[ Jan Midtgaard ]]></dc:creator><pubDate>Wed, 24 Apr 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Creating the SyntaxDocumentation Command - Part 1: Merlin]]></title><description><![CDATA[<p>OCaml development has never been more enchanting, thanks to Merlin – the wizard of the editor realm. The <a href="/blog/2022-07-05-the-magic-of-merlin/">magic of Merlin</a> is something that makes programming in OCaml a very nice experience. By Merlin, I don't mean the old gray-haired, staff-bearing magic guy. I'm talking about the editor service that provides modern IDE features for OCaml.</p>
<p>Merlin currently has an arsenal of tools that enable code navigation, completion, and a myriad of others. To use Merlin, we give it commands, kinda like reading spells from a magic book. In Merlin's magical world, each spell (command) works in a particular way (the logic), requires a specific set of items to work (the inputs), and produces a specific set of results (the output).</p>
<p>This article is the first of a three-part series. Here, we'll be looking at how to implement a new command in Merlin, taking the <code>SyntaxDocumentation</code> functionality as a case study. The second will explore how we integrate this command with <code>ocaml-lsp-server</code>, and in the final article, we'll learn how to include this command as a configurable option on the VSCode OCaml Extension.</p>
<h2>The SyntaxDocumentation Command</h2>
<h3>The Problem</h3>
<p>Before going into the implementation details, let's take some time to understand what this command is all about and why it's even needed in the first place.</p>
<p>A common challenge faced by OCaml developers is the need for quick and accurate documentation about their code's syntax. While OCaml is a powerful language, its syntax can sometimes be complex, especially for newcomers or developers working on unfamiliar codebases.</p>
<p>Without proper documentation, understanding the syntax can be like navigating a maze blindfolded. You may spend valuable time sifting through hundreds of pages of documentation to find what you are looking for. Googling "syntax symbols" doesn't really help much unless someone faced the same problem and specifically used the exact syntax. This inefficiency not only slows down development, but it also increases the likelihood of errors and bugs creeping into the codebase. Programming should be about the solution you're implementing, not just about the language's syntax, so having a quicker way to understand syntax will go a long way to make programming in OCaml a much nicer experience.</p>
<h3>The Solution</h3>
<p>Most programmers write code using a text editor, such as Vim, Emacs, VSCode, etc. An editor typically has a basic interface for writing and navigating code by using a cursor. Whenever the user's cursor is over some code, the editor tells us what that code is and provides further information about the syntax.
The <code>SyntaxDocumentation</code> command basically grabs the code under the cursor and uses Merlin's analysis engine to extract relevant information about its syntax. This information is then presented back to the user.</p>
<h3>The Implementation</h3>
<p>To implement the <code>SyntaxDocumentation</code> command, we'll use a simple three-step approach:</p>
<ol>
<li><strong>The Trigger</strong>: What the user should do to trigger this command.</li>
<li><strong>The Action</strong>: What actions should be executed when the user has triggered this command.</li>
<li><strong>The Consequence</strong>: How the results should be presented when the action(s) are completed.</li>
</ol>
<p>The trigger here will be a simple hover, such as placing your cursor above some code. We won't go into detail how this works. (See Part 2).</p>
<p>In this article, our focus will be on step 2 and 3. This covers which actions to run after the trigger and what the result should be.</p>
<h4>1. New Commands</h4>
<p>For our <code>SyntaxDocumentation</code> command to be possible, we have to let Merlin know of the new command by defining it. This involves telling Merlin the name of the command, what the command needs as input, and what Merlin should do if this command is called.</p>
<p>To create a new command, we need to add its definition to some files:</p>
<ul>
<li><a href="https://github.com/ocaml/merlin/commit/0f64255167b63d8eab606419693ac2ca83d132f0#diff-cbfaeb02002660c15c9f7a82955822acd5ec25cc7362c06bf025f5efedc0957eR202-R215">new_commands.ml</a>: In this file, we define our new command and indicate the inputs it requires. By giving it a helpful description, the user can learn about it from the help menu.</li>
</ul>
<pre><code><span class="ocaml-source">command</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">syntax-document</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">doc</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Returns documentation for OCaml syntax for the entity under the cursor</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">spec</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">arg</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">-position</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">&lt;position&gt; Position to complete</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">(</span><span class="ocaml-source">marg_position</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-source">_pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">default</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`None</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">begin</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">buffer</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">failwith</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">-position &lt;pos&gt; is mandatory</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">#</span><span class="ocaml-constant-language-capital-identifier">Msource</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">position</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">as</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">run</span><span class="ocaml-source"> </span><span class="ocaml-source">buffer</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Query_protocol</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Syntax_document</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<p>Basically, this tells Merlin to create a new command called <code>syntax-document</code> that requires a position. <code>Msource</code> is a helpful module containing useful utilities to deal with positions.</p>
<ul>
<li><a href="https://github.com/ocaml/merlin/commit/0f64255167b63d8eab606419693ac2ca83d132f0#diff-4dee2c70efab5997f53cb009604f12caa24a233f38889b3e0b622982c2cfa281R143-R147">query_protocol.ml</a>: In this file, we specifically define our command's input and output types. Here we tell Merlin that our command needs an input of type <code>position</code> from the <code>Msource</code> module, and our output can either be <code>No_documentation</code> or that documentation with the type <code>syntax_doc_result</code> has been found.</li>
</ul>
<pre><code><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Syntax_document</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Msource</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">position</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Found</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">syntax_doc_result</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`No_documentation</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span></code></pre>
<p><code>syntax_doc_result</code> is defined as a record that contains a name, a description, and a link to the documentation:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">syntax_doc_result</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">name</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">description</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>So basically our command will receive a cursor position as input and either return some information (name, description, and a documentation link) or say no documentation has been found.</p>
<ul>
<li><a href="https://github.com/ocaml/merlin/commit/0f64255167b63d8eab606419693ac2ca83d132f0#diff-b8705d092b3c3b7c1194dc783a4e75b0041b663b890c31defe9af4d23277c62eR115-R116">query_json.ml</a>: In this file, we write the code for how Merlin should format the response it sends to other editor plugins, such as Vim and Emacs.</li>
</ul>
<pre><code><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Syntax_document</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">mk</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">syntax-document</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">position</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">mk_position</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>Here, <code>mk_position</code> is a function that takes our cursor position and serialises it for debugging purposes.</p>
<pre><code><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Syntax_document</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">response</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">response</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Found</span><span class="ocaml-source"> </span><span class="ocaml-source">info</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-constant-language-polymorphic-variant">`Assoc</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">(</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">name</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`String</span><span class="ocaml-source"> </span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">name</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">(</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">description</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`String</span><span class="ocaml-source"> </span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">description</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">(</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">url</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`String</span><span class="ocaml-source"> </span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">documentation</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`No_documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`String</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">No documentation found</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>This code serialises the output into JSON format. It is also used by the different editor plugins.</p>
<p>Great! Now that we have seen how Merlin handles our inputs and outputs, it's time to understand how we convert this input into the output.</p>
<h4>2. Command Logic</h4>
<p>The implementation of our logic is found in the file <a href="https://github.com/ocaml/merlin/commit/0f64255167b63d8eab606419693ac2ca83d132f0#diff-8009410bbeda8c44724e5b305d6d128561d75292378904d5d82f448cbee7f801R506-R514s">query_commands.ml</a>.</p>
<p>Before diving deep, let's understand a few concepts that we'll use in our implementation.</p>
<ul>
<li>
<p><strong>Source Code</strong>: refers to the OCaml code written in a text editor</p>
</li>
<li>
<p><strong>Parsetree</strong>: a more detailed internal representation of the source code</p>
</li>
<li>
<p><strong>Typedtree</strong>: an enhanced version of the Parsetree where type information is attached to each node</p>
</li>
</ul>
<p>Basically, the source code is the starting point. The parser then takes this source code and builds a Parsetree representing its syntactic structure. The type checker analyses this Parsetree and assigns types to its elements, generating a Typedtree.</p>
<p>The Merlin engine already provides a lot of utilities that we can use to achieve all of this, such as:</p>
<ul>
<li><code>Mpipeline</code>: This is the core pipeline of Merlin's analysis engine. It handles the various stages of processing OCaml code, such as lexing and parsing.</li>
<li><code>Mtyper</code>: This module provides us with utilities for interacting with the Typedtree.</li>
<li><code>Mbrowse</code>: This module provides us with utilities for navigating and manipulating the nodes of the Typedtree.</li>
</ul>
<p>Our implementation code is:</p>
<pre><code><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Syntax_document</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">typer</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mpipeline</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">typer_result</span><span class="ocaml-source"> </span><span class="ocaml-source">pipeline</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mpipeline</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get_lexing_pos</span><span class="ocaml-source"> </span><span class="ocaml-source">pipeline</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mtyper</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">node_at</span><span class="ocaml-source"> </span><span class="ocaml-source">typer</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Syntax_doc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get_syntax_doc</span><span class="ocaml-source"> </span><span class="ocaml-source">pos</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Found</span><span class="ocaml-source"> </span><span class="ocaml-source">res</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`No_documentation</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>We are using the <code>Mpipeline</code> module to get the Typedtree and the lexing position (our cursor's position in the source code). Once this position is found, we use the <code>Mtyper</code> module to grab the specific node found at this cursor position in the Typedtree. <code>Mtyper</code> uses the <code>Mbrowse</code> module to navigate through the Typedtree until it arrives at the node that has the same position as our lexing position (cursor position).</p>
<p>Example: Let's consider a simple variant type:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">color</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Red</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Green</span><span class="ocaml-source">
</span></code></pre>
<p>The node tree will be:</p>
<pre><code>[ type_kind; type_declaration; structure_item; structure ]
</code></pre>
<p>Once we have received this node tree, we pass it to our custom module <a href="https://github.com/ocaml/merlin/commit/0f64255167b63d8eab606419693ac2ca83d132f0#diff-2067d79c6bb02a49ccb063ad859e08560060013395b911988a2cf856af1a526b">Syntax_doc.ml</a> and call the function <code>get_syntax_doc</code> within it. This pattern-matches the node tree and extracts the relevant information or returns no information. An excerpt from our custom module is presented below:</p>
<pre><code><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">get_syntax_doc</span><span class="ocaml-source"> </span><span class="ocaml-source">cursor_loc</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language">_</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Type_kind</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ttype_variant</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language">_</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Type_declaration</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">name</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Type Variant</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">description</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Represent's data that may take on multiple different forms.</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">                </span><span class="ocaml-source">documentation</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">https://v2.ocaml.../typev.html</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span></code></pre>
<p>The output returned conforms to <code>type syntax_info = Query_protocol.syntax_doc_result</code>, which is defined in <a href="https://github.com/ocaml/merlin/commit/0f64255167b63d8eab606419693ac2ca83d132f0#diff-4dee2c70efab5997f53cb009604f12caa24a233f38889b3e0b622982c2cfa281R99-R105">query_protocol.ml</a>.</p>
<h3>Results and Tests</h3>
<p>After Merlin runs the logic, it has to return some results for us. To test that our code works well, we use the Cram testing framework to check it's functionality.
Example: Say we write the following source code in a file called <code>main.ml</code>:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">rectangle</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">length</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">width</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>We can use Cram to call Merlin and ask it to get us some information:</p>
<pre><code><span class="sh-punctuation-definition-variable">$</span><span class="sh-variable-other-normal">MERLIN</span><span class="sh-source"> single syntax-document -position 1:12 -filename ./main.ml </span><span class="sh-keyword-operator-redirect">&lt;</span><span class="sh-source"> ./main.ml
</span></code></pre>
<p>Here we are telling Merlin to give us information for the cursor position <code>1:12</code>, which means the first line on the 12th column (begin from the first character of line 1 and count 12 characters). This will place our cursor over the word <code>rectangle</code>.</p>
<p>When we run this test, the result returned will be:</p>
<pre><code class="language-json">{
   "name": "Record types",
   "description": "Allows you to define variants with a fixed...",
   "url": "https://v2.ocaml....riants.html"
}
</code></pre>
<h3>Conclusion</h3>
<p>We have finally come to the end of the first part of this article series. In this part, we explored the problem we are trying to solve, hypothesised about a possible solution, and then implemented the solution. This implementation is like the engine/backend of our solution.</p>
<p>It was a very fun and exciting experience working on this command. I got to learn a lot about OCaml, especially how Merlin works internally. In Part 2 of this series, we'll look at how this new functionality has been integrated to work with various editors such as Vim, Emacs, and VSCode.</p>
<p>Here are some lessons I learned on this journey:</p>
<ul>
<li>An idiomatic approach is better. Coming from a non-functional background, I am used to writing code a certain non-functional way and wasn't yet baptised in the functional programming waters. I would frequently write these long <code>if-else</code> statements instead of just using pattern matching, the OCaml way. Once I understood it, it became like magic dust for me. I used it very much.</li>
<li>Read the documentation. When working with a new project, the best thing you can do is spend some time reading through the documentation and looking at how the code is written. Most times, something you may be struggling to write may have already been implemented, so you just have to use it. Stop reinventing the wheel.</li>
<li>Ask questions. I can't count how many times I got stuck and had to scream for help, like a kid in a candy shop who lost his toy. I have amazing mentors who are always willing to point me in the right direction. This support literally feels like wielding Thanos' gaunlet!</li>
</ul>
]]></description><link>https://tarides.com/blog/2024-04-17-creating-the-syntaxdocumentation-command-part-1-merlin</link><guid isPermaLink="false">https://tarides.com/blog/2024-04-17-creating-the-syntaxdocumentation-command-part-1-merlin.html</guid><dc:creator><![CDATA[ Pizie Dust ]]></dc:creator><pubDate>Wed, 17 Apr 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Multicore Testing Tools: DSCheck Pt 2]]></title><description><![CDATA[<p>Welcome to part two! If you haven't already, check out <a href="/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1/">part one</a>, where we introduce DSCheck and share one of its uses in a naive counter implementation. This post will give you a behind-the-scenes look at how DSCheck works its magic, including the theory behind it and how to write a test for our naive counter implementation example. We’ll conclude by going a bit further, showing you how DSCheck can be used to check otherwise hard-to-prove properties in the <a href="https://github.com/ocaml-multicore/saturn">Saturn</a> library.</p>
<h2>How Does DSCheck Work?</h2>
<p>Developers use DSCheck to catch non-deterministic, hard-to-reproduce bugs in their multithreaded programs. DSCheck does so by ensuring that all the executions possible on the multiple cores (called interleavings) are valid and do not result in faults. Doing this without a designated tool like DSCheck would be incredibly resource-intensive.</p>
<h3>In Theory</h3>
<p>DSCheck operates by simulating parallelism on a single core, which is possible thanks to <a href="https://overreacted.io/algebraic-effects-for-the-rest-of-us/">algebraic effects</a> and a custom scheduler. DSCheck doesn't actually exhaustively 'check' all interleavings but examines a select number of relevant ones that allow it to ensure that all terminal states are valid and that no edge cases have been missed.</p>
<p>You may reasonably be asking yourself how this works. Well, the emergence of <a href="https://users.soe.ucsc.edu/~cormac/papers/popl05.pdf">Dynamic Partial-Order Reduction</a> (DPOR) methods have made DSCheck-like model checkers possible. The DPOR approach to model-checking stems from observations of real-world programs, where many interleavings are equivalent. If at least one of them is covered, so is its entire equivalent class – which is called a trace. DSCheck, therefore, checks one interleaving per trace, which lets it ensure that the whole equivalent trace is without faults.</p>
<p>Let's illustrate this with a straightforward example:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Domain A </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> A1 </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> A2 </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">ignore</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Domain B </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> B </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">ignore</span><span class="ocaml-source">
</span></code></pre>
<p>There are three possible interleavings: A1.A2.B, A1.B.A2, and B.A1.A2. The ordering between B and the second step of the first domain, A2, does not matter as it does not affect the same variable. Thus, the execution sequences A1.A2.B and A1.B.A2 are different interleavings of the same trace, which means that if at least one is covered, so is the other.</p>
<p>DPOR skips the redundant execution sequences and provides an exponential performance improvement over an exhaustive (also called naive) search. Since naive model checkers try to explore every single interleaving, and since interleavings grow exponentially with the size of code, there quickly comes a point where the number of interleavings far exceeds what the model checker can cover in a reasonable amount of time. This approach is inefficient to the degree that the only programs a naive model checker can check are so simple that it's almost useless to do so.</p>
<p>By reducing the amount of interleavings that need to be checked, we have significantly expanded the universe of checkable programs. DSCheck has reached a point where its performance is strong enough to test relatively long code, and most significantly, we can use it for data structure implementation.</p>
<p>In addition to the DPOR approach, some conditions must be met for the validations that DSCheck performs to be sound. These conditions include:</p>
<ul>
<li><strong>Determinism:</strong> DSCheck runs the same program multiple times (once per explored interleaving). There should be no randomness in between these executions, meaning that they should all run with the same seed, since otherwise DSCheck may miss some traces and thus miss bugs.</li>
<li><strong>Data-Races:</strong> The program being tested cannot have data races between non-atomic variables, as DSCheck does not see such different behaviours. You should use <a href="https://github.com/ocaml-multicore/ocaml-tsan">TSan</a> before running DSCheck to remove data races.</li>
<li><strong>Atomics:</strong> Domains can only communicate through atomic variables. Validation, including higher-level synchronisation primitives (like mutexes), has not yet been implemented.</li>
<li><strong>Lock-Free:</strong> Programs being tested have to be at least lock-free. Lock-free programs are programs that have multiple threads that share access memory, where none of the threads can block each other. If a program has a thread that cannot finish independently, DSCheck will explore its transitions ad infinitum. To partially mitigate this problem, we can force tests to be lock-free. For example, we can modify a spinlock to explicitly fail once it reaches an artificial limit.</li>
</ul>
<h3>In Practice</h3>
<p>Let's look at how a DSCheck test can catch a bug in a naive counter implementation. To see how we set up the naive counter implementation in this example, have a look <a href="/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1/">at part one</a> of our two-part DSCheck series.</p>
<p>This is how to write a test for the previous naive counter module we introduced in part one:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dscheck</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">TracedAtomic</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> The test needs to use DSCheck's atomic module </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">test_counter</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">trace</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> [Atomic.spawn] is the DSCheck function to simulate [Domain.spawn] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> There is no need to join domains as DSCheck does not actually spawn domains. </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">final</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">check</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">==</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>As you can tell, the test is very similar to the <a href="/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1/"><code>main</code> function we wrote previously</a> to check our counter manually, but now it uses DSCheck's interface. This includes:</p>
<ul>
<li>Shadowing the atomic module with DSCheck's <code>TracedAtomic</code>, which adds the algebraic effects we need to compute the interleavings</li>
<li><code>Atomic.trace</code> takes the code for which we want to test its interleavings as an input.</li>
<li><code>Atomic.spawn</code> simulates <code>Domain.spawn</code>.</li>
</ul>
<p>In this case, DSCheck will return the following output:</p>
<pre><code>Found assertion violation at run 2:
sequence 2
----------------------------------------
P0 P1
----------------------------------------
start
get a
                        start
                        get a
set a
                        set a
----------------------------------------
</code></pre>
<p>This output reveals the buggy interleaving with one column per domain (<code>P0</code> and <code>P1</code>). We need to infer ourselves that <code>a</code> means <code>counter</code> here, but once we know that this is pretty straightforward to read, isn't it?</p>
<h2>Case Study: Saturn</h2>
<p>Let's take a closer look at using DSCheck with <a href="https://github.com/ocaml-multicore/saturn">Saturn</a>. Offering industrial-strength, well-tested data structures for OCaml 5, the library makes it easier for Multicore users to find data structures that fit their needs. If you use a data structure from Saturn, you can be sure it has been tested to perform well with Multicore usage. In Saturn, DSCheck has two primary uses: firstly, the one demonstrated above, i.e. catching interleavings that return buggy results; secondly, we use it to detect blocking situations.</p>
<p>Most data structures available through Saturn need to be lock-free. As part of the lock-free property’s definition, the structure also needs to be obstruction-free, which technically means that a domain running in isolation can always make progress or be free of blocking situations. So, if all domains bar one are paused partway through their execution, the one still working can finish without issue or being blocked. The most common blocking situation is due to a lock; if one domain acquires a lock, all the other domains must wait until the first one has released the lock to proceed.</p>
<p>Let's take a look at how DSCheck tests for blocking situations:</p>
<p>Here is an example of a code that is <strong>not</strong> obstruction-free. This is a straightforward implementation of a barrier for two domains. Both domains need to increment it to pass the whole loop.</p>
<pre><code><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">barrier</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">work</span><span class="ocaml-source"> </span><span class="ocaml-source">id</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">print_endline</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Hello world, I'm </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-operator">^</span><span class="ocaml-source"> </span><span class="ocaml-source">id</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">barrier</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">while</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">barrier</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">domainA</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">work</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">A</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">domainB</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">work</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">B</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">join</span><span class="ocaml-source"> </span><span class="ocaml-source">domainA</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">join</span><span class="ocaml-source"> </span><span class="ocaml-source">domain</span><span class="ocaml-source">
</span></code></pre>
<p>In this example, if B is paused by the operating system after printing <code>"Hello world, I'm B"</code>, then A can not progress past the barrier even though it is the only domain currently working. This code is thus not obstruction-free.</p>
<p>If we run this code through DSCheck, here is one interleaving it will explore.</p>
<div role="region"><table>
<tbody><tr>
<th>Step</th>
<th>Domain A</th>
<th>Domain B</th>
<th>Barrier</th>
</tr>
<tr>
<td>1</td>
<td></td>
<td>prints "Hello world, I'm B!"</td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td>prints "Hello world, I'm A!"</td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>Increases <code>barrier</code></td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td>Reads <code>barrier</code> and loops</td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>5</td>
<td></td>
<td>Increases <code>barrier</code></td>
<td>2</td>
</tr>
<tr>
<td>6</td>
<td></td>
<td>Reads <code>barrier</code> and passes the loop</td>
<td>2</td>
</tr>
<tr>
<td>7</td>
<td>Reads <code>barrier</code> and passes the loop</td>
<td></td>
<td>2</td>
</tr>
</tbody></table></div><p>In this interleaving, domain A only performs the loop once, waiting for domain B to increase the barrier. However, nothing prevents A from looping forever here if B never takes step 5. In other words, step 4 can be repeated one, two, three… an infinite number of times, creating a <em>new</em> trace (i.e. a new interleaving that is not equivalent to the previous one) each time. As DSCheck will try to explore every possible trace (i.e. each equivalent class of interleavings), the test will never finish. We can determine that a test is not going to finish by noting how the explored interleavings keep growing in size. In this case, they will look like B-A-A-A-B-B-A, then B-A-A-A-A-B-B-A, then B-A-A-A-A-A-B-B-A and so on. When this scenario occurs, we can conclude that our code is not obstruction-free.</p>
<p>To summarise, if DSCheck runs on some accidentally blocking code, it will not be able to terminate its execution as it will have infinite traces to explore. This is one way to determine if your tested code is obstruction-free, a property that is otherwise hard to prove, and a handy test for Saturn as it's a property that most of its data structures are supposed to have. It is important to note that we have simplified our example for this post, and in practice DSCheck also checks for lock-freedom.</p>
<h2>Want More Info?</h2>
<p>We invite you to discover more about DSCheck and the features that come with the multicore testing suite. You can read about the library on <a href="https://github.com/ocaml-multicore/dscheck#motivation">GitHub</a>, including more examples and performance optimisations since its initial release.</p>
<p>We also previously published a <a href="/blog/2022-12-22-ocaml-5-multicore-testing-tools/">blog post about Multicore testing tools</a> that includes a section on DSCheck. It provides additional helpful context to the set of tools that surround DSCheck.</p>
<h2>Until Next Time</h2>
<p>We hope you enjoyed this sojourn into the realm of DSCheck! If you have any questions or comments, please reach out to us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> or <a href="https://www.linkedin.com/company/tarides">LinkedIN</a>.</p>
<p>You can also <a href="/contact/">contact us</a> with questions or feedback and <a href="/newsletter/">sign up for our newsletter</a> to stay up-to-date with what we're doing.</p>
]]></description><link>https://tarides.com/blog/2024-04-10-multicore-testing-tools-dscheck-pt-2</link><guid isPermaLink="false">https://tarides.com/blog/2024-04-10-multicore-testing-tools-dscheck-pt-2.html</guid><dc:creator><![CDATA[ Carine Morel, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 10 Apr 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Updates to OCaml.org's Learn Section: Enhancing UI and UX]]></title><description><![CDATA[<p>Over the past year, the OCaml.org team has been hard at work addressing user feedback to make the OCaml.org Learn section more accessible and organised in order to facilitate learning OCaml and enriching the overall OCaml experience.</p>
<p>In 2023, I joined the OCaml.org team as a UX /UI Designer. One of the challenges was to analyse and revise the OCaml.org Learn area's user-flow. This post provides an overview of the recent updates from the official OCaml.org documentation. These updates are part of our ongoing efforts to make the website more user-friendly and efficient for OCaml developers.</p>
<h3>Refined Learn Landing Page</h3>
<p>The OCaml.org team restructured the <a href="https://ocaml.org/docs">Learn landing page</a> for better clarity and focus. We moved nonessential sections to highlight key resources such as Books and Tutorials. Based on user priority such as <a href="https://ocaml.org/install">Installation of OCaml</a> and the <a href="https://v2.ocaml.org/releases/5.1/api/index.html">Standard Library</a>, we reorganised the call-to-action elements. We also created cards tailored for specific types of materials to ensure consistency and provide a user-friendly overview of the content, thereby reducing the need for excessive scrolling. This change aims to streamline access to important learning materials.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/LearnArea1-170w~78hePoRtBfgy1JlBuxSFlw.webp 170w, /blog/images/LearnArea1-340w~8VO6Scr97ogM17yu8_bFFw.webp 340w, /blog/images/LearnArea1-680w~FhvkGNOfE_YkaFjaYV-W1Q.webp 680w, /blog/images/LearnArea1-1360w~oKiYMaiw7ClS4TFMCRjbtg.webp 1360w" src="/blog/images/LearnArea1-1360w~oKiYMaiw7ClS4TFMCRjbtg.webp" alt="01_IMG-Blog-Landing_Page_Before_After"></p>
<h3>Improved Navigation Experience</h3>
<p>The navigation within the Learn section has been redesigned to reduce complexity. We aim to provide a more straightforward and less crowded browsing experience, facilitating easier access to various resources. The primary navigation within the Learn area has been enhanced, now positioned below the main navigation as a subnavigation. Sections have been reorganised, and they can collapse when necessary for a more streamlined user experience.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/LearnArea2-170w~pAtbpW3KknGsIhD6UeTmgg.webp 170w, /blog/images/LearnArea2-340w~NgAuOmIAJ_FpxY3c9IZzRQ.webp 340w, /blog/images/LearnArea2-680w~1Z03bENZP2APAOoLQ5u89Q.webp 680w, /blog/images/LearnArea2-1360w~UDncGtTVoQSZRlSHoJewuA.webp 1360w" src="/blog/images/LearnArea2-1360w~UDncGtTVoQSZRlSHoJewuA.webp" alt="02_IMG-Blog-Navigation_experience"></p>
<h3>Search Functionality Within the Learn Section</h3>
<p>Many OCaml developers have asked for a search option in the Learn area. We understand it's important, so we're currently working on it and hope to release it soon.</p>
<h3>Clear Skill Level Categorisation</h3>
<p>The content within the Learn section has been organised into three skill levels: Beginner, Intermediate, and Advanced. This categorisation intends to guide users more effectively to resources that match their proficiency level.
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/LearnArea3-170w~Vs5ZYkqKR-aTwIklx0Qs_A.webp 170w, /blog/images/LearnArea3-340w~ftWt31Sh-NCtn459lbGVAw.webp 340w, /blog/images/LearnArea3-680w~izIcTQhDmwnW4sZSP_vewg.webp 680w, /blog/images/LearnArea3-1360w~3hEUNGinzuiXmBXSTaUI6w.webp 1360w" src="/blog/images/LearnArea3-1360w~3hEUNGinzuiXmBXSTaUI6w.webp" alt="03_IMG-Blog-Levels"></p>
<h3>Enhanced Onboarding and Installation Instructions</h3>
<p>The onboarding process for new users has been improved with more detailed installation instructions. The goal is to simplify the initial setup process for newcomers, reducing barriers to entry.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/LearnArea4-170w~83NUbmZYtPGY9GwbGkCBiA.webp 170w, /blog/images/LearnArea4-340w~1W-KJYjTAGtF4JtFP2k0BA.webp 340w, /blog/images/LearnArea4-680w~mKAKifTaje51MMUFsDwyFg.webp 680w, /blog/images/LearnArea4-1360w~AJYL6LDYW4Lx--iAmuea5g.webp 1360w" src="/blog/images/LearnArea4-1360w~AJYL6LDYW4Lx--iAmuea5g.webp" alt="04_IMG-Install_OCaml"></p>
<h3>UI and Accessibility Improvements</h3>
<p>The user interface (UI) and accessibility features of OCaml.org have been updated to create a more user-friendly platform. These improvements are part of our commitment to ensuring the site is accessible and easy to navigate for all users. The Design System is consistently updated, incorporating new components.</p>
<h3>Conclusion</h3>
<p>The recent updates to OCaml.org's Learn area mark significant strides in enhancing the platform's usability and accessibility for OCaml developers. From the revamped Learn landing page to improved navigation, skill level categorisation, onboarding instructions, and UI/accessibility enhancements, our ongoing efforts aim to streamline the learning experience. We value your feedback immensely as we continue to refine and improve OCaml.org.</p>
<p>Please take a moment to explore the new features and share your thoughts via our <a href="https://forms.gle/72oVhvTNzvwhs6YC8">User Satisfaction Survey</a>. Your input guides us toward future enhancements, and we eagerly anticipate hearing from you. Thank you for your continued support and engagement as we work to create a more user-friendly and efficient environment for OCaml developers worldwide.</p>
<p>We look forward to hearing from you!</p>
<blockquote>
<p>If you'd like to give your input on the OCaml.org Revamp Project, please join the <a href="https://discuss.ocaml.org/tag/ocamlorg">converstaion on Discuss</a>. We also have an OCaml.org Newsletter each month summarising our progress. Find them and let us know your thoughts using the <a href="https://discuss.ocaml.org/tag/ocamlorg-newsletter">ocamlorg-newsletter tag</a>.</p>
</blockquote>
<blockquote>
<p>Feel free to <a href="/contact/">contact Tarides</a> for support while learning OCaml and how it can benefit your business. Follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a> to ensure you never miss a post, and join the OCaml discussion on <a href="https://discuss.ocaml.org/">Discuss</a>!</p>
</blockquote>
]]></description><link>https://tarides.com/blog/2024-04-03-updates-to-ocaml-org-s-learn-section-enhancing-ui-and-ux</link><guid isPermaLink="false">https://tarides.com/blog/2024-04-03-updates-to-ocaml-org-s-learn-section-enhancing-ui-and-ux.html</guid><dc:creator><![CDATA[ Claire Vandenberghe ]]></dc:creator><pubDate>Wed, 03 Apr 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[NetHSM: Bringing Open Source to the World of Hardware Security Modules]]></title><description><![CDATA[<p><a href="https://www.nitrokey.com/">Nitrokey</a> is one of the world’s foremost open-source hardware security companies. They develop IT security hardware for data encryption, decryption, and signing, including key and user authentication. After eight years of development, they recently released the first fully open-source Hardware Security Module (HSM): an easy-to-use, highly-secure, and customisable security solution.</p>
<p>Tarides is proud to have played a part in the development of Nitrokey’s HSM solution <a href="https://www.nitrokey.com/products/nethsm">NetHSM</a>, helping to get the project over the finish line after its initial implementation by <a href="https://robur.coop/">Robur</a> and Nitrokey. We value the benefits of open-source, which for NetHSM include customisability, vendor independence, and backdoor checking. Having an open-source option changes the landscape of hardware security and gives users greater choice and more robust security guarantees.</p>
<h2>Why Use an HSM?</h2>
<p>HSMs are physical devices that are used for managing secrets, such as digital keys and other sensitive data, and for cryptography including decryption and signing. HSMs are used in many industries for use cases where security is paramount, including in banking, engineering, chemistry, blockchain, etc. For example, one way to use an HSM is as part of a Public Key Infrastructure (PKI), where it generates, stores, and manages asymmetric keys to sign messages and verify signatures. When keys are mapped to identities, this infrastructure can be used to control access to sensitive resources or secure internal communications.</p>
<p>While general-purpose computers are technically capable of performing the same operations as HSMs, using an HSM has its advantages. By separating all security operations from others into a dedicated device, it is much easier to build and audit defences against tampering, and in general audit usage logs for misuse. In some cases, having a physical HSM also opens up the possibility of specialised acceleration hardware for cryptographic operations, enabling the processing of requests in bulk very efficiently compared to general-purpose hardware.</p>
<h2>What Makes NetHSM Stand Out?</h2>
<p>NetHSM comes with more benefits than general-purpose HSMs, combining several additional features into a powerful security solution. Some of these include:</p>
<ul>
<li>High Performance and Scalability: One NetHSM alone can handle thousands of cryptographic key operations per second, and due to their statelessness, several NetHSM devices can be clustered together to enable extremely high throughput and availability.</li>
<li>Memory- and Type-Safe Programming Language: NetHSM is mostly implemented in OCaml,  a type- and memory-safe programming language. The main system at all levels – including the TCP/IP, HTTP, TLS, and application stack – is completely written in OCaml from scratch. The implication is that many security vulnerabilities are eliminated, thanks to the <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">secure-by-design principles of OCaml.</a></li>
<li>Transparency: NetHSM's source code is available for anyone to read in its <a href="https://github.com/Nitrokey/nethsm">open-source repository</a>. Easy access to the code means that the system's implementation can be independently audited for the absence of back doors and security flaws by users. Even if you're just curious, you can look at the repo and discover how the system and its features are implemented.</li>
<li>Easy to Use: NetHSM is easily managed via a convenient command-line interface, and client systems can integrate the <a href="https://nethsmdemo.nitrokey.com/api_docs/index.html">REST API</a> using the SDKs available in 35 programming languages or use the <a href="https://www.ibm.com/docs/en/linux-on-systems?topic=introduction-what-is-pkcs-11">PKCS#11</a> module. For a quick, pain-free, start, users can access the free NetHSM service or run it as a container. Due to being open-source all tools, drivers, and documentation are publicly available to users.</li>
<li>Small Attack Vector: NetHSM is based on <a href="https://mirage.io">MirageOS unikernel</a> technology, that combines operating system and application into a uniquely tailored firmware that contains no unnecessary code. As a result, NetHSM achieves a very small overall system size (around 30 MB) which constitutes a minimal attack vector making it significantly more challenging for bad actors to target.</li>
</ul>
<h2>MirageOS and OCaml Make a Big Difference</h2>
<p>Nitrokey chose to develop NetHSM using <a href="https://ocaml.org/">OCaml</a> and <a href="https://mirage.io/">MirageOS</a>. As previously mentioned, OCaml is a type- and memory-safe language with <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">strong security features</a>. In fact, the language’s design entirely <a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">eliminates the risk of the most common cyber attacks</a>. OCaml’s safety record, combined with its growing open-source community, optimised workflows, tools, and performance is a great choice for a groundbreaking project such as NetHSM.</p>
<p>In addition, the library operating system MirageOS leverages the strengths of OCaml to construct secure, high-performance unikernels. At its lowest level, NetHSM runs the <a href="https://muen.sk/">Muen</a> separation kernel which securely hosts multiple independent components. Muen is a kernel that has been formally verified (using mathematical guarantees) that it does not contain any runtime errors.</p>
<p>The core component running within Muen is ‘Keyfender’, a MirageOS unikernel with a critical role. Keyfender provides the HTTP endpoints to the NetHSM API, and performs requests made to that endpoint. All cryptographic operations are performed in this unikernel and it is the only component with decrypted access to key stores. Consequently, the security of this component is of paramount importance, which is why Nitrokey chose to use MirageOS and OCaml. Furthermore, the code performing the cryptographic operations themselves, particularly elliptic curves operations, is derived from the <a href="https://github.com/mit-plv/fiat-crypto">fiat-crypto</a> project, which generates cryptographic primitives that are formally proved for functional correctness.</p>
<h3>How We Helped</h3>
<p>Tarides joined the project mid 2022 to help get the project to the finish line. We fixed some remaining issues with networking and endpoint interfaces, and added caching for performance and improvements to the test suite. We also did a lot of general maintenance work, ironing out small issues before release. <a href="https://robur.coop/">Robur</a> did the lion’s share of the initial work designing and building Keyfender. If you would like to discover more, the software is (of course) open-source and can be found on the <a href="https://github.com/Nitrokey/nethsm/">NetHSM GitHub repository</a>.</p>
<h2>Until Next Time!</h2>
<p>Ensuring cybersecurity and protecting sensitive data is crucial for the functioning of several industries in our modern world. Without the guarantees of HSM devices, including secure key management and tamper-resistant storage, users’ personal and sensitive information would be at risk. With NetHSM, information is safeguarded above and beyond what other solutions offer. The open-source solution allows for backdoor checking, and MirageOS, Muen, and OCaml add their own layers of protection.</p>
<p>Check out <a href="https://www.nitrokey.com/news/2023/after-8-years-development-nethsm-10-available-first-open-source-hardware-security-module">Nitrokey’s blog post</a> to discover more about how you can use NetHSM in your own projects and for your business. At Tarides, we value working with partners who prioritise open-source, secure, and high-quality solutions just like we do and we are proud to bring our expertise to projects like NetHSM.</p>
<p>You can stay up-to-date with Tarides on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a>, and <a href="/contact/">contact us</a> on our website for more information or for help with your projects!</p>
]]></description><link>https://tarides.com/blog/2024-03-27-nethsm-bringing-open-source-to-the-world-of-hardware-security-modules</link><guid isPermaLink="false">https://tarides.com/blog/2024-03-27-nethsm-bringing-open-source-to-the-world-of-hardware-security-modules.html</guid><dc:creator><![CDATA[ Isabella Leandersson, Virgile Robles ]]></dc:creator><pubDate>Wed, 27 Mar 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Eio 1.0 Release: Introducing a new Effects-Based I/O Library for OCaml]]></title><description><![CDATA[<p>The OCaml 5 update brought much-anticipated support for programming on multiple cores. It also introduced support for concurrency via effect handlers – one of the first mainstream languages to do so. This significant update has had <a href="/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations/">profound performance and UX implications</a>, propelling OCaml into new areas of software development. At the core of this leap forward is the ambition to craft a modern, direct-style I/O stack that seamlessly interfaces with the latest kernel I/O advancements, such as <a href="https://unixism.net/loti/what_is_io_uring.html">io_uring</a>. This is where <a href="https://github.com/ocaml-multicore/eio">Eio</a> comes in.</p>
<p>When we started work on the I/O stack there was no previous ecosystem to build upon, and we had to break new ground for everything (threading, scheduling, and so on). As users started to port large applications to Eio - such as <a href="https://routine.co">Routine</a>, <a href="https://github.com/talex5/ocaml-wayland">ocaml-wayland</a> or <a href="https://irmin.io">Irmin</a> - we learnt a lot. We received many suggestions for how to improve various aspects of the I/O project, especially on how to write large-scale applications combining effect-handlers. This feedback loop accelerated the ecosystem's evolution and led to new research and community insights, such as those discussed in <a href="https://kcsrk.info/papers/compose_ocaml22.pdf">"Composing Schedulers using Effect Handlers"</a>, and experiments with lightweight effect-based concurrency models like <a href="https://github.com/ocaml-multicore/picos">Picos</a>. Today, Eio has matured to a point where it is the first "feature complete" direct-style effects library, and we are calling this version 1.0.</p>
<p><em>But this is just the start of our shared journey to figure out what the effects story is</em>! There will be plenty of iterations as we go through future releases, and we look forward to continuing the community-driven exploration. If you're looking to get stuck in with Eio immediately, we recommend <a href="https://github.com/ocaml-multicore/eio/blob/v1.0/README.md">exploring its documentation on OCaml.org</a>, including installation instructions. If, however, you want to know more about the context and features of the I/O stack, this post has you covered. We will guide you through the motivations and history behind Eio as well as some of its most prominent features.</p>
<h2>Why Eio?</h2>
<p>Eio provides an effects-based direct-style I/O stack for OCaml 5. You can use Eio to read and write files, make network connections, or perform CPU-intensive calculations running multiple operations simultaneously. After much hard work and optimisations, the <a href="https://ocaml.org/p/eio_main/1.0">first full release of Eio 1.0</a> is now publicly available. This release focuses on two main areas:</p>
<ul>
<li>
<p><strong>Performance:</strong>
Eio 1.0 capitalises on newer kernel I/O interfaces for enhanced parallelism efficiency. It features <a href="https://github.com/ocaml-multicore/eio/tree/main/lib_eio_posix">Eio_posix</a> for broad platform compatibility, whilst also targeting optimal support for various modern (and often incompatible) kernel interfaces. Notably, Eio 1.0 introduces an <code>io_uring</code> backend for Linux and a specialised <a href="https://github.com/ocaml-multicore/eio_js">Eio_js</a> scheduler for JavaScript platforms. Moreover, we developed prototype backends for extensive support across systems: aiming to accommodate <a href="https://github.com/patricoferris/eio/tree/apple-gcd/lib_eio_gcd">Grand Central Dispatch</a> on macOS, <a href="https://github.com/ocaml-multicore/eio/issues/125">IOCP</a> on Windows, <a href="https://github.com/patricoferris/eio/tree/kqueue/lib_eio_kqueue">kqueue</a> on BSD*, and <a href="https://github.com/TheLortex/eio-solo5">Solo5</a> for MirageOS, demonstrating our confidence in the API's versatility for future adaptations.</p>
</li>
<li>
<p><strong>Security:</strong>
First, to improve safety, some parts of Eio have been <a href="https://github.com/ocaml-multicore/eio/blob/main/HACKING.md#formal-verification">formally verified</a>. Second, the landscape of containment approaches across macOS, Windows, Linux, BSD, as well as within hypervisors, containers, and WebAssembly (Wasm), has diversified significantly necessitating new strategies that ensure portability. Eio addresses this with distinct interfaces: a "low-level" interface tailored to each supported system, which offers the best platform-specific control but isn't portable, and a "high-level" portable interface designed for applications, which prioritises security by not exposing ambient resources. Instead, it provides an <a href="https://roscidus.com/blog/blog/2023/04/26/lambda-capabilities">interface that grants capabilities</a>, facilitating a secure and controlled environment for application development (further details below). These capabilites can also be enforced at runtime: for instance, Eio 1.0 comes with <a href="https://ocaml-multicore.github.io/eio/eio/Eio_unix/Cap/index.html"><code>capsicum</code></a> support on systems that provide the <code>cap_enter</code> system call (FreeBSD). This is similar to Rust's <a href="https://github.com/bytecodealliance/cap-std/tree/main/cap-std"><code>cap-std</code></a> (and to a lesser extent to Scala 3's <a href="https://dotty.epfl.ch/docs/reference/experimental/canthrow.html#from-effects-to-capabilities-1">checked effects</a>).</p>
</li>
</ul>
<p>As an important side note, the compiler itself remains unopinionated about the user's choice of scheduling policies. While you may want to use Eio to benefit from its features, you do not <em>have</em> to use Eio with OCaml 5.</p>
<h2>The Improvements Coming With Eio</h2>
<p>What makes Eio different from its predecessors? The Unix library, which uses blocking I/O operations, provided the previous I/O stack for OCaml. Blocking operations are not well suited for concurrent programming and hence with OCaml 4.*, two libraries provide this support instead: <a href="https://dev.realworldocaml.org/concurrent-programming.html">Async</a> and <a href="https://ocsigen.org/lwt/latest/manual/manual">Lwt</a>, which both have a monadic interface. These libraries let the developer write code as if there are multiple threads of execution running, each with their own stack, where the stacks are simulated using the heap. However, these libraries require the developer to <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">use non-concurrent and concurrent code in different ways</a>, modifying the style in which they write their code depending on the context. Doing so causes extra work while reading and writing code. Eio uses effect handlers instead which enable users to write their code in a natural direct style (as opposed to the callback-oriented style), while still benefiting from performant asynchronous I/O.</p>
<h3>Effects</h3>
<p>OCaml 5 added support for effects, removing the need for monadic code in the I/O stack and making an effects-based I/O stack like Eio possible. The <a href="https://v2.ocaml.org/manual/effects.html">OCaml Manual</a> describes effect handlers as follows: "Effect handlers are a mechanism for modular programming with user-defined effects. Effect handlers allow the programmers to describe <em>computations</em> that <em>perform</em> effectful <em>operations</em>, whose meaning is described by <em>handlers</em> that enclose the computations."</p>
<p>Using effect handlers has several advantages:</p>
<ol>
<li><strong>Speed</strong>: Using effects speeds up the code since no heap allocations are needed to simulate a stack.</li>
<li><strong>Ease-of-use</strong>: Developers can write concurrent code in the same style as plain non-concurrent code.</li>
<li><strong>Language Features</strong>: Developers can now use OCaml language features like <code>try ... with</code> in their concurrent code.</li>
</ol>
<p>In addition to these benefits from effects, having an effects-based I/O stack lets users take advantage of some additional features of modern operating systems. Many modern operating systems provide high-performance alternatives to the traditional Unix <code>select</code> call. For example, Linux's alternative <code>io_uring</code> has applications that write the operations they want to perform to a ring buffer, which can then handle those operations asynchronously, something Eio can take advantage of.</p>
<h3>Eio 1.0 Features</h3>
<p>Let's take a look at some of Eio's main features. If you want to discover more, including some code examples, I recommend you check out the <a href="https://github.com/ocaml-multicore/eio">readme in the repo</a>.</p>
<ul>
<li><strong>Tracing</strong></li>
</ul>
<p>Eio 1.0 can take advantage of the <a href="/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml/">tracing tools</a> available with OCaml 5.1. When switched on, Eio can write events about various actions, such as creating fibres or resolving promises. <a href="https://github.com/ocaml-multicore/eio-trace"><code>Eio-trace</code></a> and third-party tools like  <a href="https://github.com/tarides/runtime_events_tools?tab=readme-ov-file#trace"><code>olly</code></a> can capture and display the traces of these events in a window (for example using <a href="https://perfetto.dev/">Perfetto</a>, giving users a visual representation of their data. Having an I/O stack compatible with the tracing capabilities of OCaml 5.1 gives developers the tools to visualise what their code is doing and monitor it for changes.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/eio_trace-170w~pAJnWVKLqj_EdofXzL5PAw.webp 170w, /blog/images/eio_trace-340w~VMc5ReCNFkg9XDECPnk95g.webp 340w, /blog/images/eio_trace-680w~Lz_m3OYfpWG4qo16qkmUKQ.webp 680w, /blog/images/eio_trace-1360w~hmKPWgmBDoVOjc6OkHKvVQ.webp 1360w" src="/blog/images/eio_trace-1360w~hmKPWgmBDoVOjc6OkHKvVQ.webp" alt="Eio-trace providing a graphical representation of the Eio tutorial's networking example"></p>
<ul>
<li><strong>Multicore Support</strong></li>
</ul>
<p>OCaml 5 now lets programs create multiple domains to run code, meaning that programs can use multiple CPUs simultaneously. This significantly speeds up processing time, especially for CPU-intensive tasks.</p>
<p>Eio 1.0 provides <a href="https://ocaml-multicore.github.io/eio/eio/Eio/Executor_pool/index.html"><code>Eio.Executor_pool</code></a>, which distributes jobs (functions to execute) among a pool of domain workers. Domains are reused and can execute multiple jobs concurrently, and jobs are queued up if they cannot be started immediately due to all workers being busy. Having an I/O stack that can run on multiple domains lets developers create more efficient code, reducing the time it takes to perform a task. <code>Eio.Executor_pool</code> is the recommended module for leveraging OCaml 5's multicore capabilities. It is built on top of the low-level <a href="https://ocaml-multicore.github.io/eio/eio/Eio/Domain_manager/index.html"><code>Eio.Domain.manager</code></a>, which lets developers use multiple domains by having the fibres of the calling domain run in parallel with another, new, domain.</p>
<ul>
<li><strong>Integrations</strong></li>
</ul>
<p>It was essential to the team designing Eio that it was <a href="https://github.com/ocaml-multicore/eio?tab=readme-ov-file#integrations">compatible with other I/O libraries</a> and that you could use Eio in the same domain as Async and Lwt. For example, you can use <a href="https://github.com/ocaml-multicore/lwt_eio"><code>Lwt_eio</code></a> and <a href="https://github.com/talex5/async_eio"><code>Async_eio</code></a> to run Lwt, Async, and Eio fibres together in a single domain and to convert promises from Lwt and Async to Eio. This is useful when porting existing code to Eio.</p>
<p>You can use Eio with OCaml's Unix module by using <code>Eio_unix</code>, and <a href="https://github.com/ocaml-multicore/domainslib">Domainslib</a> and <a href="https://github.com/ocaml-multicore/kcas"><code>kcas</code></a> can also interact with Eio. It can be helpful to send work to a pool of Domainslib worker domains when managing compute-intensive tasks. Helpfully, resolving an Eio promise from a non-Eio domain is possible, making the results easy to retrieve. Eio is compatible with <code>kcas</code> which provides blocking in lock-free software transactional memory (STM) implementations.</p>
<p>Integration is important because it gives users the flexibility to use multiple tools and the tools they prefer for the job. Rather than forcing everyone to use one workflow, the team behind Eio wanted to open up possibilities for the user whilst giving them an upgraded I/O that leverages effects and multicore programming.</p>
<h2>History of Eio</h2>
<p>Originally, OCaml 5 was not supposed to include support for effects! Therefore, Eio started out as a prototype by the same team working on OCaml Multicore at Tarides. To explore and test the experimental I/O design, we ported it to several large libraries and applications with the help of other open-source maintainers. A special thanks go to <a href="https://tezos.com/">Tezos</a> and <a href="https://mirage.io">MirageOS</a> maintainers for early reviews and discussions!</p>
<p>During the porting and testing, we discovered and fixed <a href="https://github.com/ocaml/ocaml/issues/12948%5D">several</a> <a href="https://github.com/ocaml/ocaml/issues/12584">bugs</a> in the OCaml 5 runtime. The Tarides team also created and upstreamed many different <a href="/blog/2022-12-22-ocaml-5-multicore-testing-tools/">testing</a> and <a href="/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml/">monitoring</a> tools that make concurrent programming with effects easier. Thanks to this rigorous process, Eio is now a very efficient, portable (it's the only OCaml scheduler compatible with Linux, MacOS, Windows, JavaScript and MirageOS!), and flexible stack. These efforts also laid the groundwork for other scheduler libraries to take advantage of new tools and past learnings to explore new design options (while - hopefully - keeping the ecosystems compatible). See for instance <a href="https://github.com/riot-ml/riot">RIOT</a> that brings Erlang-style concurrency to OCaml using a multicore actor-model, or <a href="https://discuss.ocaml.org/t/ann-miou-a-simple-scheduler-for-ocaml-5/12963">Miou</a>, <a href="https://github.com/c-cube/fuseau">Fuseau</a>,or <a href="https://github.com/dbuenzli/affect">Affect</a>, and so on.</p>
<p>We can't just focus on the positives – as with any new workflow, Eio has had its share of growing pains. Early in its development, the community gave constructive criticism about the API design. Some users didn't like the explicit capability model -- other disliked the use objects in the API as it was yet another feature of the language that newcomers would have to learn before writing a relatively 'simple' direct-style I/O application. As a response we tried to keep the explicit capability model, but without using objects. While some users appreciated the change, the resulting API is more complex. We need more community input to decide what to do for Eio 2.0, so please continue to voice your opinions on the relevant <a href="https://github.com/ocaml-multicore/eio/pull/553">GitHub issue</a>.</p>
<p>Finally, some users found the API surface too big and making it difficult to compose with other schedulers. We explored several solutions for better composability and our latest effort is <a href="https://github.com/ocaml-multicore/picos">Picos</a>. We plan to integrate Picos with Eio once the project is more mature. We've also received feedback that the capability API is too intrusive for some users. Whilst its model works well with modern sandboxing tools like Wasm and Capsicum and is compatible with <a href="https://www.janestreet.com/tech-talks/effective-programming/">typed effects</a>, we understand it's a divisive design choice. To address these comments, which we have already begun to do with Picos, we are open to collaborate further with the community on a design that works the best for everyone.</p>
<h2>Stay in Touch</h2>
<p>The focus for the upcoming months is to gather information about how the I/O stack is performing and whether there are any pain points or improvement opportunities. The team appreciates your feedback, so if you are using Eio in your projects or are curious to test it out, please share your experience directly <a href="https://github.com/ocaml-multicore/eio/issues">in the repo</a> or on <a href="https://discuss.ocaml.org/">OCaml Discuss</a>. There are also <a href="https://docs.google.com/document/d/1ZBfbjAkvEkv9ldumpZV5VXrEc_HpPeYjHPW_TiwJe4Q">regular developer meetings</a> and <a href="https://matrix.to/#/#eio:roscidus.com">forum discussions</a> that are open to everyone. We also recommend our <a href="/blog/2023-09-27-tutorial-how-to-port-lwt-applications-to-eio/">tutorial on porting Lwt applications to Eio</a> if you want to get started with porting some of your on applications to the new I/O stack.</p>
<p>Want to stay in touch? You can follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIN</a> to keep up-to-date with our projects and events.</p>
<h3>Acknowledgements</h3>
<p>We would like to thank all the Eio contributors, including but not limited to: Bikal Gurung, Patrick Ferris, Simon Grondin, Christiano Haesbaert, Lucas Pluvinage, and Vesa Karvonen and the Tarides Multicore Application team, notably Sudha Parimala, for their support. The initial development of Eio has been partly sponsored by Jane Street and the Tezos Foundation.</p>
]]></description><link>https://tarides.com/blog/2024-03-20-eio-1-0-release-introducing-a-new-effects-based-i-o-library-for-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2024-03-20-eio-1-0-release-introducing-a-new-effects-based-i-o-library-for-ocaml.html</guid><dc:creator><![CDATA[ Isabella Leandersson, Thomas Leonard, Anil Madhavapeddy ]]></dc:creator><pubDate>Wed, 20 Mar 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[My experience at IndiaFOSS 2023: Community, Workshop, and Talks]]></title><description><![CDATA[<p>There are plenty of exciting computer programming events happening in India, including the <a href="https://ocamlretreat.org">5 day OCaml retreat</a> that Tarides is hosting in Auroville this week – look out for future posts on that! Another great (and bigger!) event is the annual free and open source software conference <a href="https://indiafoss.net/2023">IndiaFOSS</a> organised by <a href="https://fossunited.org">FOSS united</a>, most recently held in Bengaluru this past October. At the conference, I had the pleasure of presenting on my experience introducing a Code of Conduct (CoC) to an open-source community; I also co-hosted an <a href="https://github.com/Sudha247/learn-ocaml-workshop">OCaml workshop</a> with KC Sivaramakrishnan, Deepali Ande, and Kaustubh M, which offered attendees helpful context and starter exercises in the language.</p>
<p>Sad you weren't there? Don't be – I'll give you a taste of what you missed so you can get ready for this year's IndiaFOSS!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/conferencecentre-170w~IAwSPzadnaJlKUo5FI37kw.webp 170w, /blog/images/conferencecentre-340w~pe9zt4oZiBCGet4AH7gmjA.webp 340w, /blog/images/conferencecentre-680w~rxxa46BJSqWlc6R7r36X6A.webp 680w, /blog/images/conferencecentre-1360w~pKyg9ozLcggemX75KaTRIg.webp 1360w" src="/blog/images/conferencecentre-1360w~pKyg9ozLcggemX75KaTRIg.webp" alt="A modern building with big windows and round pillars, with a sign saying 'convention centre'. The picture looks like it was taken in the early morning or evening, with a twilight sky giving the scene a soft orange glow."></p>
<h2>Code of Conduct for the OCaml Community</h2>
<p>As I explained in my talk, a CoC outlines the behaviours and responsibilities that people who participate in a community are expected to follow. By having a CoC, communities signal to their participants that they are inclusive and that there are repercussions for bad behaviour and harassment. This can help increase diversity, as people from underrepresented groups feel safer to participate.</p>
<p>My presentation centred on my experience implementing a CoC for the open-source OCaml community. I was part of a team established in 2022 that successfully implemented a CoC with feedback gathered from the OCaml community.</p>
<p>Check out my <a href="https://sudha247.github.io/coc-presentation/#/">presentation slides</a> to gain insight into how we handle enforcement, choose candidates for the enforcement team, smooth the path to adoption, and dispel some common myths about CoCs.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/ocamlworkshop-170w~OVdZTzp7CfrXgG8t7y9NzA.webp 170w, /blog/images/ocamlworkshop-340w~ZC5REXNqSQxntBnLtNdO1g.webp 340w, /blog/images/ocamlworkshop-680w~LQ4kEFJ4Nl_LoiZ6Os5rQA.webp 680w, /blog/images/ocamlworkshop-1360w~LW4HmwISeE53R3D79RJ3qQ.webp 1360w" src="/blog/images/ocamlworkshop-1360w~LW4HmwISeE53R3D79RJ3qQ.webp" alt="A group of people gathered in a room. They're sitting on chairs in a half-circle formation facing the camera. Most are intently looking at either their own or their neighbour's laptop."></p>
<h2>OCaml Workshop</h2>
<p>We planned to introduce as many curious people as possible to OCaml. The <a href="https://github.com/Sudha247/learn-ocaml-workshop">workshop repo</a> we used has five sections: installation, exercises, a GitHub challenge, a Frogger challenge, and finally some sources. It's still available if you would like to try it!</p>
<p>There were many positive takeaways from the workshop and we had a healthy amount of participants – around 30 people. We were impressed with how much progress the attendees made with the exercises. Many reported that they enjoyed them and actually discovered that OCaml is fun!</p>
<p>It was helpful to hear the participant's feedback on the process and what could be improved. There were parts of the installation instructions that they thought could be clearer – including the tutorial on using Stdlib. We knew beforehand that installing OCaml on Windows was difficult, but we realised it would be helpful to find a better portable solution (and maybe encourage people to install WSL beforehand).</p>
<p>Sadly, we didn't have enough space to accommodate everyone who wanted to attend, and we had to turn people away. Whilst this sounds like a good problem to have, we do want to be able to provide everyone with an opportunity to try OCaml. Next time, we should plan ahead and get a bigger space.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/indiafosstalk-170w~kRlrbdgjoTeawC1NxZ9xfw.webp 170w, /blog/images/indiafosstalk-340w~KZSokiMSF_scyd7au6FslQ.webp 340w, /blog/images/indiafosstalk-680w~wjg1dIUwIyBxGyG4vYsKLA.webp 680w, /blog/images/indiafosstalk-1360w~8gmjfF_y4T0dZ74nciSGaw.webp 1360w" src="/blog/images/indiafosstalk-1360w~8gmjfF_y4T0dZ74nciSGaw.webp" alt="An auditorium with a stage and red and blue curtains surrounding it. A presentation slide is projected onto the centre-back wall of the stage. It has three bullet points which read: freedom in code and community, collaboration, transparency. There are people in red chairs facing the stage and paying attention to the talk."></p>
<h2>Interesting Talks</h2>
<p>Attending the conference was a welcome opportunity to listen to talks from other organisations. There was a variety of great presentations, and some of my favourites include:</p>
<ul>
<li><strong>Vyakaran - Visualisation Tool for Formal Grammar</strong>
<ul>
<li>Akash Hamirwasia presented Vyakaran, a visualisation tool for formal grammar. While some older tools exist for visualising formal grammar, this is a fresh take on the problem built with modern tools. The talk offered some unconventional wisdom, noting that it's okay to build things from scratch sometimes. He even made the repository public on stage!</li>
</ul>
</li>
<li><strong>Empowering Innovation With the Julia Language</strong>
<ul>
<li>Anant Thazhemadam and Sharan Yalburgi from the Julia team presented how Julia is used in various scientific software. Julia has found a niche and succeeded at it. The carbon-friendliness studies on functional programming languages were especially interesting, as was  understanding how Julia tops them. (And that OCaml is right behind!)</li>
</ul>
</li>
<li><strong>Role of FOSS in Bringing Equity and Quality Learning to Education</strong>
<ul>
<li>Nidhi Anarkat, Co-founder and CEO of NavGurukul, spoke about their initiative to train students from marginalised communities, mainly women, in tech. They provide lodging and boarding and set the students up for tech internships. The initiative has led to many students securing well-paying jobs and elevating their families.  Another member of the initative is Anup Kalbalia, former leader of CodeChef, a popular coding platform in India.</li>
</ul>
</li>
<li><strong>Stories From TinkerSpace: On Building Community Hacker Spaces</strong>
<ul>
<li>Moosa Mehar presented the story of TinkerSpace, a physical hackerspace in Kochi, India. Tinkerspace provides a place for hackers to gather around and hack on stuff. Not only that, it also provides users with a space for running community events. Their space is free to use for participants and funded by non-profit organisations. They're on a path towards self-sustenance.</li>
</ul>
</li>
<li><strong>Illustrations for the Sub-Continent</strong>
<ul>
<li>One of the few design talks of the conference! Sidika Sehgal presented the story behind how the illustration library <em>Obvious</em> was created – due to a lack of colour/race-inclusive illustrations for human characters. Even better, they decided to open source it, for anyone to use for free. It is nice to have representative and relatable illustrations for the sub-continent.</li>
</ul>
</li>
<li><strong>B(I)LUG: A 25-year Retrospective</strong>
<ul>
<li>Dr Sachin Garg took us through the decades-long journey of the Bangalore Linux User group. It was fascinating to see the history of the internet in India and how Open Source Software (starting from Linux) made its way into the ecosystem. He emphasised that the success of the group is the result of one man's vision, Atul Chitnis.</li>
</ul>
</li>
<li><strong>Co(de)mmunity!</strong>
<ul>
<li>Karkee founded the Villupuram Linux user group in 2013, in a remote village with an abysmal literacy rate where computers were hard to come by. Since then the group and its volunteers have lead many outreach events, persuading many to pursue a career in tech/OSS. They've even gotten approval to set up an IT park in their hometown. I had heard of Karkee and his team's work at <a href="https://tn23.mini.debconf.org">Minidebconf</a> earlier in the year. It was great that the organisers provided them with a platform to showcase their important work. I hope this inspires more OSS activity in unconventional places.</li>
</ul>
</li>
</ul>
<h2>Until Next Year</h2>
<p>IndiaFOSS is a great conference that I would recommend to anyone interested in free and open-source software. It was nice to meet so many passionate people from all over India and hear about their projects and initiatives. I'm looking forward to attending more conferences and hope to see you around!</p>
<p>Want to keep up with Tarides? You can <a href="https://bsky.app/profile/tarides.com">follow us on Bluesky</a> and on <a href="https://www.linkedin.com/company/tarides">LinkedIN</a>.</p>
]]></description><link>https://tarides.com/blog/2024-03-13-my-experience-at-indiafoss-2023-community-workshop-and-talks</link><guid isPermaLink="false">https://tarides.com/blog/2024-03-13-my-experience-at-indiafoss-2023-community-workshop-and-talks.html</guid><dc:creator><![CDATA[ Sudha Parimala ]]></dc:creator><pubDate>Wed, 13 Mar 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[A Time for Change: Our Response to the White House Cybersecurity Press Release]]></title><description><![CDATA[<p>As <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">seasoned proponents of safety-by-design</a>, we were pleased to see the February 26th <a href="https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/">White House press release</a> titled "Future Software Should Be Memory Safe." The accompanying report touches on important topics, most significantly regarding the critical importance of memory safety. The U.S. government's emphasis on secure-by-design measures in software development sets a commendable example in the global cybersecurity landscape.</p>
<p>As experts in <a href="https://ocaml.org">OCaml</a>, a pragmatic language that combines memory safety and security by design, we encourage everyone to recognise security as a first-class software consideration. Here, we outline how OCaml fulfils the security goals presented in the report and why it is an ideal candidate for developing secure and mission-critical applications. Given Europe’s rich history of technological innovation and the EU’s commitment to digital sovereignty, wider adoption of OCaml could serve as a strategic move to strengthen cybersecurity whilst fostering the EU’s technical independence and leadership.</p>
<h2>The Key Message</h2>
<p>The report raises several important points with which we agree. Firstly, the focus and burden of managing existing cybersecurity threats is unreasonably placed on the user instead of the software and hardware manufacturer. The report calls for "rebalancing the responsibility to defend cyberspace to those most capable and best positioned to reduce risks for all." This means that rather than patching on fixes after the fact, the industry should aim to create solutions and products that are secure by design and safe from the start.</p>
<p>Secondly, the report correctly identifies that memory-safety issues are the industry's most pervasive and destructive class of vulnerabilities today. As we have <a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">previously outlined in our blog post</a>, up to 70 per cent of zero-day attacks arise due to memory safety exploits, and the report further outlines how many of the significant cybersecurity exploits in recent time have been facilitated by memory safety vulnerabilities.</p>
<p>The most important conclusion, agreed upon by the report, Tarides, and numerous computer programming experts, is that your choice of programming language significantly impacts the cybersecurity of the final product.</p>
<h2>How Does OCaml-Based Technology Address the Threats?</h2>
<p>The report outlines the main factors affecting hardware and software security, including memory-safe programming, formal methods, software measurability and new challenges facing embedded systems in space. Let's examine these factors and whether using OCaml and OCaml-built tools can mitigate the risks.</p>
<ul>
<li><strong>Memory Safety</strong>
As mentioned above, the report identifies memory safety as a critical issue. Malicious actors can take advantage of memory-unsafe languages like C and C++ <a href="/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application/">to access hardware, steal data, deny access to the user and other malicious activities</a>.</li>
</ul>
<p>OCaml has a strong static type system and is type and memory-safe, including both spatially and temporally memory-safe. This eliminates the most pervasive and currently most destructive vulnerabilities from ever occurring in OCaml-based software. In response to the report's assertion that:</p>
<blockquote>
<p>"The highest leverage method to reduce memory safety vulnerabilities is to secure one of the building blocks of cyberspace: the programming language."
We could not agree more, and OCaml is a prime example of a programming language that solves the problem that memory exploits present.</p>
</blockquote>
<ul>
<li><strong>Formal Methods and Model Checkers</strong>
The report argues that formal methods can “serve as another powerful tool to give software developers greater assurance that entire classes of vulnerabilities, even beyond memory safety bugs, are absent.”</li>
</ul>
<p>OCaml, a language that grew out of an academic context, has several formal method tools available to the developer. Formal verification lets the developer prove the correctness of a piece of code with respect to a certain formal specification or property. <a href="https://github.com/formal-land/coq-of-ocaml#">Coq-of-ocaml</a>, for example, enables programmers to use the expressive formal language Coq to verify properties in OCaml. Coq (also written in OCaml) has been used to formally verify the crypto-currency protocol <a href="https://tezos.com/">Tezos</a>, consisting of over 100,000 lines of OCaml code. Other options available for formal verification include <a href="https://fstar-lang.org/tutorial/tutorial.html#sec-introduction">F*</a>, <a href="https://www.why3.org/">Why3</a>, and <a href="https://www.imandra.ai/">Imandra</a>. Most formal verification tools can export OCaml code, which is what <a href="https://www.nitrokey.com/">Nitrokey</a> uses to run formally verified cryptographic primitives for their <a href="https://www.nitrokey.com/products/nethsm">NetHSM project</a>.</p>
<p>These formal method tools can be integrated directly into the developer toolchain. The <a href="https://ocaml.org/docs/platform">OCaml Platform</a> provides a helpful workflow for developers to write, test, and document their code in a way that includes formal methods. For examples of formal methods in production, both Nitrokey (as mentioned above) and <a href="https://tezos.com/">Tezos</a> use formally verified OCaml components in their workflows.</p>
<p>Another way to apply formal methods is via model checking, and <a href="https://github.com/ocaml-multicore/dscheck">DSCheck</a> allows OCaml programmers to test their concurrent programs and catch hard-to-reproduce bugs. In fact, OCaml Multicore has <a href="/blog/2022-12-22-ocaml-5-multicore-testing-tools/">extensive tooling</a> available for developers to test their multithreaded programs. Consequently, OCaml enables developers to incorporate formal methods in their programs and projects, ensuring more reliable and secure software solutions.</p>
<ul>
<li><strong>Software Measurability</strong>
The report asserts that software measurability is one of the hardest open research problems to address, yet it is still a top priority for improving transparency, introducing useful metrics, and improving cybersecurity in the industry. Significant effort has been made to bring monitoring tools to OCaml to allow users to understand and visualise how their code is performing.</li>
</ul>
<p><a href="https://v2.ocaml.org/releases/5.0/api/Runtime_events.html">Runtime_events</a> and <a href="https://github.com/ocaml-multicore/eio?tab=readme-ov-file#tracing">Eio-trace</a> provide OCaml developers with tools to monitor the OCaml GC and runtime (including custom user-generated events) and the I/O stack <a href="https://github.com/ocaml-multicore/eio">Eio</a>, respectively for how they are behaving, including why they might be underperforming. The observability tool <a href="https://github.com/tarides/runtime_events_tools?tab=readme-ov-file#olly">Olly</a> allows users to visualise this information in ways that make it more accessible and easier to understand. It is important to provide these software measurability tools directly to all OCaml users and developers, allowing them to check their code in a transparent manner.</p>
<p>Furthermore, the <a href="/blog/2022-12-22-ocaml-5-multicore-testing-tools/">OCaml multicore testing tools</a> allow developers to test their multicore code to ensure it is without faults. This gives developers and users insight into how their code behaves before deployment.</p>
<ul>
<li><strong>Space and Embedded Systems</strong>
We were pleased to see the report’s emphasis on secure-by-design systems for outer space, a topic we have addressed alongside <a href="https://parsimoni.co/">Parsimoni</a> in our <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a> proposal. The report states that:</li>
</ul>
<blockquote>
<p>First, the language must allow the code to be close to the kernel so that it can tightly interact with both software and hardware; second, the language must support determinism so the timing of the outputs is consistent; and third, the language must not have – or be able to override – the “garbage collector,” a function that automatically reclaims memory allocated by the computer program that is no longer in use.</p>
</blockquote>
<p>SpaceOS easily meets the first two requirements. Firstly, thanks to <a href="https://mirage.io/">MirageOS</a> unikernel technology, which consists of specialised unikernels running under a hypervisor, SpaceOS has an extremely small footprint with code that runs directly without a kernel. Secondly, SpaceOS is written in OCaml, a deterministic language replete with security features, including memory safety, formal verification, and model checkers, just to name a few. The operating system is being <a href="/blog/2023-12-29-announcing-the-orchide-project-powering-satellite-innovation/">incorporated into advanced satellite systems</a> to facilitate edge computing and software-defined satellites.</p>
<p>Somewhat unsurprisingly, the report cautions against garbage-collected languages, of which OCaml is one. This caution is likely due to the difficulties inherent in verifying garbage collectors and the allocation problems prevalent in most garbage collectors. However, there is a way to allow OCaml’s type system and runtime to control allocation using modal types, something <a href="https://blog.janestreet.com/oxidizing-ocaml-locality/">Jane Street has demonstrated</a> on their blog, and <a href="https://cakeml.org/itp17.pdf">recent research has shown</a> that verified GCs are possible.</p>
<p>Furthermore, we disagree with the claim that garbage collectors are unsuitable for embedded systems. For embedded systems that do not require real-time guarantees, GCs are a generally accepted technology, but even in systems that do have these requirements, there is <a href="https://dl.acm.org/doi/abs/10.1145/1620405.1620421">prior research to show</a> that using a garbage-collected language works well. It is short-sighted to rule out GC languages completely since 99% of application software tolerates GC pauses. It is also important to note that using a language without a GC does not automatically mean that the hard real-time guarantees are met – the allocator may take a non-constant time to find the right-sized slot for an object allocation, for example. GC vs non-GC is a distinction without a difference for the vast majority of embedded system technologies, including systems designed for space.</p>
<ul>
<li><strong>Reduced Attack Surface</strong>
Finally, we want to address an important point that was hinted at in the White House’s report and explicitly mentioned in both the <a href="https://media.defense.gov/2023/Dec/06/2003352724/-1/-1/0/THE-CASE-FOR-MEMORY-SAFE-ROADMAPS-TLP-CLEAR.PDF">CISA report report "The Case for Memory Safe Roadmaps”</a>  and the EU’s <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52022PC0454">Cyber Resilience Act (CRA)</a>, which pertains to an application’s attack surface. The CRA recommends that manufacturers reduce the attack surface of their products to make it more difficult for malicious actors to exploit.</li>
</ul>
<p><a href="https://mirage.io/">MirageOS</a> is a technology written in OCaml that constructs specialised, fully standalone, <a href="https://en.wikipedia.org/wiki/Unikernel">unikernels</a> that can run directly on bare metal or under a hypervisor. Unikernels act as individual software components, and each unikernel is standalone and responsible for one function or task. An application consists of several unikernels working together as a distributed system. As a result, MirageOS applications use up to <a href="https://mirage.io/blog/ccc-2019-leipzig">25 times less memory than traditional applications</a> and have a significantly smaller attack surface than comparable virtualised solutions.</p>
<p>The design principles of MirageOS perfectly align with the report’s assertion that cybersecurity features should be implemented in a product or service from the start. OCaml and MirageOS projects operate on the philosophy of identifying and removing errors as early as possible, building secure solutions and applications from the ground up.</p>
<h2>An Opportunity for Change</h2>
<p>The White House's stance on cybersecurity begs an essential question: have we reached the time for a complete overhaul of our industry's cybersecurity strategy? We have seen multiple governments argue for change in the <a href="https://media.defense.gov/2023/Dec/06/2003352724/-1/-1/0/THE-CASE-FOR-MEMORY-SAFE-ROADMAPS-TLP-CLEAR.PDF">CISA report</a>, and recently, the <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52022PC0454">Cyber Resilience Act</a> came one step closer to being taken into effect.</p>
<p>Tarides researches, develops, and implements secure-by-design software written in the functional and safe programming language OCaml. OCaml emerged from the French national research institute <a href="https://www.inria.fr/fr">Inria</a> and is now <a href="https://www.janestreet.com/what-we-do/overview/">powering a large part of Wall Street exchanges</a> and <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">serves the network traffic of tens of millions of containers daily</a>. Its sophisticated type system, efficient garbage collection, and emphasis on safety make it an ideal candidate for developing secure and mission-critical applications.</p>
<p>At Tarides, we advocate for OCaml to be part of a new strategy in the European cybersecurity sector. Globally, wider adoption of programming languages like OCaml and technologies built upon secure-by-design principles would put research-led, proven technologies first and send a clear message to the industry that the time for change has arrived.</p>
<h2>Get in Touch</h2>
<p>We want to hear from you! Do you have questions about this topic and what it may mean for your projects? You can <a href="/contact/">contact us</a> on our website or share your thoughts on <a href="https://discuss.ocaml.org/">OCaml Discuss</a> if you want to discuss the report's implications further.</p>
<p>You can also stay up-to-date with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIN</a>, and <a href="/contact/">sign up for our newsletter</a> to get regular updates on our projects.</p>
<h3>Acknowledgements</h3>
<p>We would like to thank Isabella Leandersson for her work on this article.</p>
]]></description><link>https://tarides.com/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release</link><guid isPermaLink="false">https://tarides.com/blog/2024-03-07-a-time-for-change-our-response-to-the-white-house-cybersecurity-press-release.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Thu, 07 Mar 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Two Major Improvements in odoc: Introducing Search Engine Integration]]></title><description><![CDATA[<p>In the world of OCaml documentation generation, there have been two significant enhancements that promise to make navigating OCaml documentation easier and more efficient. These improvements are divided into two distinct but interrelated components: changes in <code>odoc</code> itself and improvements in a search engine known as Sherlodoc.</p>
<p>These updates make navigating OCaml documentation more efficient and user-friendly, benefiting both seasoned OCaml programmers and those just venturing into the world of OCaml.</p>
<p>If you're new to <code>odoc</code>, you can read more about it in <a href="/blog/2024-01-10-meet-odoc-ocaml-s-documentation-generator/">this introductory blog post</a> and on <a href="https://ocaml.github.io/odoc/">its website</a> (which was also generated by <code>odoc</code>!)</p>
<p>Now, let's dive into these transformative upgrades and see how they enhance the OCaml documentation landscape.</p>
<h2>Changes in <code>odoc</code>: Search Engine Integration</h2>
<p>This year, <code>odoc</code> has taken a giant step forward by allowing seamless integration with search engines. This change is modular, ensuring flexibility in search implementation. As long as a search engine can provide search results in a specific format, it can be utilised with <code>odoc</code>. These changes can be server-side or client-side, written in either JavaScript or OCaml, and executed locally or remotely.</p>
<h4>Search Bar and Display</h4>
<p><code>odoc</code> now offers a user interface that includes a search bar and a display of search results. This feature enhances the overall user experience, making it more intuitive and accessible.</p>
<h4>JSON Index</h4>
<p><code>odoc</code> provides an index in JSON format that search engines can utilise. The search engine uses this index to generate its representation of the search index. OCaml programmers can also leverage specific functions in <code>odoc</code> to generate their custom search indexes.</p>
<h4>Flexible Search Engine</h4>
<p>With the new search bar, you can even write your own search engine. This feature opens up new possibilities for customisation and tailoring the search experience to your specific needs.</p>
<p>The search engine can take the form of a JavaScript file, which can either contain an index for local searches or make requests to a server for larger indexes.</p>
<h4>Web Worker Integration</h4>
<p><code>odoc</code> runs the search engine in a web worker, ensuring asynchronicity and preventing UI blockage. This design decision contributes to a smoother user experience, especially when dealing with extensive documentation.</p>
<p>In essence, <code>odoc</code> has made it incredibly straightforward to integrate search engines into its documentation output. This modular approach allows for diverse and customised search solutions while keeping the user interface consistent.</p>
<h2>Sherlodoc: The Canonical OCaml Search Engine</h2>
<p>While any search engine can be used with <code>odoc</code>, Sherlodoc stands out as the canonical search engine for OCaml. It has undergone significant improvements to enhance its usability and expressiveness. These changes make Sherlodoc an ideal choice for OCaml documentation.</p>
<h4>Integration with <code>odoc</code></h4>
<p>Sherlodoc can now be directly integrated into documentation generated by <code>odoc</code>. It offers the choice to compile it into JavaScript or keep it as a native program, accommodating both client-side and server-side search functionality.</p>
<h4>Search Features</h4>
<p>Sherlodoc introduces powerful search capabilities, allowing users to search within docstrings, constructors, modules, and more. Its expressive search queries provide an efficient way to find the information you need.</p>
<p>The Sherlodoc search engine is designed with OCaml in mind. It allows you to search within docstrings, generated standalone pages, as well as directly in the interface even using specific OCaml-like syntax.</p>
<p>Since it knows about the OCaml type system, you can quickly find functions, values, etc., by writing their type in the query. This level of tailoring enhances the OCaml documentation experience, making it even more accessible and productive.</p>
<p>For instance, searching for <code>_ -&gt; int</code> will result in all functions which produce an integer</p>
<h4>Future Integration</h4>
<p>One of the most exciting aspects of this update is that Sherlodoc will be fully integrated into OCaml.org. Instead of relying on a separate website or tool, you can access all the powerful search capabilities right on OCaml's official website!</p>
<p>This integration brings a multitude of benefits to OCaml.org users. Currently, users can only search using strings, but the new search bar will provide more sophisticated searching. Users can now search more effectively within a specific package or for values with particular types, simplifying the process of locating the exact information you need. Now you spend less time searching and more time coding!</p>
<p>These enhancements align perfectly with our goal of delivering a more cohesive, robust, and user-friendly documentation experience on OCaml.org.</p>
<h2>Collaborative Efforts</h2>
<p>The journey to these enhanced search capabilities involved numerous design phases and questions. We pondered how to create the features we envisioned, how to seamlessly integrate them into OCaml.org, and how to make the search bar functional for small databases, local documentation, and the entirety of opam. These considerations guided the development process, ensuring that the new search bar is versatile, efficient, and accessible to all OCaml engineers.</p>
<p>This achievement wouldn't have been possible without the dedication of several individuals from Tarides. <a href="https://github.com/panglesd">Paul-Elliot Anglès d'Auriac</a> spearheaded this project. It was his main focus since the summer. He implemented the <code>odoc</code> support for search engine. <a href="https://github.com/Julow">Jules Aguillon</a> supplied invaluable oversight and review, ensuring the project's quality. <a href="https://github.com/emiletrotignon">Emile Trotignon</a> played a vital role in the search engine development and user interface. Their combined efforts have resulted in this remarkable upgrade to <code>odoc</code>.</p>
<h2>Conclusion</h2>
<p>Documentation is an integral part of software development, and <code>odoc</code> is a powerful tool that makes OCaml coding more accessible and efficient. As OCaml continues to evolve and gain traction in various domains, <code>odoc</code>'s commitment to providing robust documentation tools is a testament to the language's growth and the support it offers to its community of developers.</p>
<p>With <code>odoc</code>'s new search bar and the Sherlodoc integration, exploring and navigating OCaml documentation has never been more effortless and productive.</p>
<p>These two major improvements promise to revolutionise how OCaml documentation is accessed and navigated. The combination of a modular <code>odoc</code> and the canonical Sherlodoc search engine brings flexibility, efficiency, and a robust search experience to OCaml developers and users.</p>
<p>This new era of OCaml documentation will provide users with a smooth and intuitive experience, helping them access relevant information quickly and efficiently. The improvements in <code>odoc</code> and the integration of Sherlodoc signal a bright future for OCaml documentation and its users.</p>
<p>Stay tuned for more exciting developments in the world of OCaml!</p>
<blockquote>
<p>Please don't hesitate to join the <code>odoc</code> conversation on <a href="https://discuss.ocaml.org/c/eco/15">discuss.ocaml.org</a> under the Ecosystem category by using the <code>odoc</code> tag. Also, feel free to <a href="https://github.com/ocaml/odoc">open a GitHub issue</a> with any concerns or ideas, as we're always striving to improve our products.</p>
</blockquote>
]]></description><link>https://tarides.com/blog/2024-02-28-two-major-improvements-in-odoc-introducing-search-engine-integration</link><guid isPermaLink="false">https://tarides.com/blog/2024-02-28-two-major-improvements-in-odoc-introducing-search-engine-integration.html</guid><dc:creator><![CDATA[ Paul-Elliot Anglès d'Auriac, Christine Rose ]]></dc:creator><pubDate>Wed, 28 Feb 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[My Experience With Tarides at ICFP 2023!]]></title><description><![CDATA[<p><a href="https://icfp24.sigplan.org">ICFP 2024</a> will be upon us sooner than you might think! The call for papers closes on the 28th of February, and I wish everyone submitting good luck. Hopefully I will see you around Milan this upcoming September (at the OCaml Workshop, of course!) and to stave off your ICFP cravings until then, enjoy my account of last year's conference.</p>
<p>The 28th ACM SIGPLAN <a href="https://icfp23.sigplan.org">International Conference on Functional Programming</a> (ICFP) was held in Seattle, WA, US between 4-9 September 2023. Tarides was a silver sponsor for the conference and the Programming Languages Mentoring Workshop (PLMW). In this post, I want to share my personal experience at the conference.</p>
<h2>Exploring Conference Activities</h2>
<p>On the first workshop day, I attended the Functional High-Performance and Numerical Computing (FHPNC) workshop organised by Gabriele Keller (Utrecht University) and Sam Westrick (Carnegie Mellon). The workshop has been part of ICFP since 2021, and there were several interesting talks on deep learning, efficient GPU implementation, performance versus correctness, and Multicore parallelisation.</p>
<p>The conference proceedings began in earnest on the second day, with the keynote speech delivered by Dr Anil Madhavapeddy on <a href="https://crank.recoil.org/w/jMqpdGnw5Tuf5sLVr6zpNU">"Programming for the Planet"</a>. Using satellite and ground sensing data to assess the globe’s health, combined with maps and illustrations, Anil explained the interplay between functional programming and biodiversity. The OCaml libraries used in the Geographic Information Systems (GIS) applications are <a href="https://github.com/geocaml">geocaml</a> and <a href="https://github.com/carboncredits."><code>carboncredits</code></a></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/quantify-earth-170w~WIvVBO5JsKf2L-jNc7iOGw.webp 170w, /blog/images/quantify-earth-340w~czx-HY8m0fuj6K0foUuOiw.webp 340w, /blog/images/quantify-earth-680w~FLgVoX0TaR54EF1Hwh1GjA.webp 680w, /blog/images/quantify-earth-1360w~ck-xvvCtDQRRj26jZjwlOw.webp 1360w" src="/blog/images/quantify-earth-1360w~ck-xvvCtDQRRj26jZjwlOw.webp" alt="A slide from Anil's presentation showing a satellite view of a forest with some highlighted green areas. On the left, a toolbar indicates different sliders that the user can press to get different views of the satellite image. Views include: pixel pairings, project area, leakage area, and trajectories. 'Project area' is currently active. Image credit: Patrick Ferris"></p>
<p>In the afternoon, I attended the second keynote by Andreas Rossberg on <a href="https://icfp23.sigplan.org/details/icfp-2023-icfp-keynotes/38/As-low-level-as-possible-but-no-lower">"As low-level as possible, but no lower"</a>, where he explained the history of the WebAssembly (Wasm) project, the
current development efforts, and their future roadmap. It was interesting to discover the use and impact of OCaml in the Wasm project.</p>
<p>I was doing Tarides booth activity on all three days at the conference, and it was good to have met many students and faculty interested in working with OCaml. There were a number of industry folks who learnt about Tarides, and the work we do with the OCaml compiler, platform and ecosystem. As always, we had swag consisting of t-shirts, brochures, tags, and socks for the participants. We did see an interest in OCaml Scientific Computing and SpaceOS from people who visited our booth.</p>
<p>On the fourth day, I participated in the <a href="https://icfp23.sigplan.org/details/icfp-2023-tutorials/4/Porting-Lwt-applications-to-OCaml-5-and-Eio">Tutorial on porting Lwt applications to OCaml 5 and
Eio</a> organised by Thomas Leonard and Jonathan Ludlam. Eio provides easy-to-read code, handy troubleshooting diagnostics, and solid performance. You can use an intermediate <code>Lwt_eio</code> compatibility package to ensure that the transition is smooth before completely switching to Eio. The <a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial">Lwt to Eio tutorial</a> is available to help you port an OCaml 4 program to OCaml 5 and Eio.</p>
<h2>The OCaml Workshop</h2>
<p>On the last day of the conference, I attended the sessions at the OCaml workshop. Thomas Leonard gave a wonderful overview on <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/5/Eio-1-0-Effects-based-IO-for-OCaml-5">Eio 1.0 - Effects-based IO for OCaml
5</a>. Eio provides an effects-based direct-style IO for OCaml 5, which has Multicore support and uses lock-free data structures and modular programming. An experience report in migrating from OCaml 4 to 5 was also shared along with benchmark results for comparison.</p>
<p>Olivier Nicole and Fabrice Buroro shared their knowledge in the <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/12/Runtime-Detection-of-Data-Races-in-OCaml-with-ThreadSanitizer">Runtime Detection of Data Races in OCaml with ThreadSanitizer</a> talk. With parallel code, it is challenging to detect data race bugs. The ThreadSanitizer library provides the instrumentation and tooling to reliably detect such data races at runtime. A blog post on <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">using ThreadSanitizer in OCaml</a> is also available on the Tarides website.</p>
<p>The <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/6/Building-a-lock-free-STM-for-OCaml">Building a lock-free STM for OCaml</a> talk was presented by Vesa Karvonen. This talk introduced the <a href="https://github.com/ocaml-multicore/kcas"><code>kcas</code></a> library which provides an efficient, lock-free, and composable software transactional memory for OCaml. Vesa started with the anatomy of a transaction and then explained <code>kcas</code>' agnostic scheduling, interfaces, atomic guarantees, and performance optimisation of the implementation. The presentation slides are <a href="https://polytypic.github.io/kcas-talk/#/">available online</a>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/vesaICFP-170w~2-LFYv2JoZoBGkmr48khlg.webp 170w, /blog/images/vesaICFP-340w~zKXjECVrOkkkW1sFjC3jgA.webp 340w, /blog/images/vesaICFP-680w~NzWHQXx2X1C8G3yXsfLxcw.webp 680w, /blog/images/vesaICFP-1360w~DdwEJ7tkjJk8Xi2y1CN0_A.webp 1360w" src="/blog/images/vesaICFP-1360w~DdwEJ7tkjJk8Xi2y1CN0_A.webp" alt="A slide from Vesa's presentation showing a diagram of the 'Verify' process."></p>
<p>Thibaut Mattio presented <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/15/State-of-the-OCaml-Platform-2023">The state of the OCaml Platform 2023</a> reflecting the progress made since the release of <code>opam 1.0</code>. The <a href="https://dune.build/">Dune project</a> has become the primary build system tool for the OCaml ecosystem. He mentioned the development efforts and support for the OCaml Language Server Protocol (LSP). Thibaut also shared the now adopted <a href="https://discuss.ocaml.org/t/the-ocaml-platform-roadmap-is-adopted/13459">OCaml Platform Roadmap</a> and plan for the coming years, expressing our wish to continue the engagement with the community for feedback and to work closely with the Platform tool maintainers and industrial users of OCaml.</p>
<p>There were of course many other incredibly interesting presentations at ICFP. I recommend checking out the entire <a href="https://icfp23.sigplan.org/home/ocaml-2023#program">program for the OCaml workshop</a>, as well <a href="https://www.youtube.com/watch?v=RH1sKJMZI3g&amp;list=PLyrlk8Xaylp5yZHjvOlJo_63AtJiOQhag&amp;index=2">Anil Madhavapeddy's keynote</a> on using functional programming to help the planet, and <a href="https://www.youtube.com/watch?v=Lb45xIcqGjg&amp;list=PLyrlk8Xaylp5yZHjvOlJo_63AtJiOQhag&amp;index=3">Andreas Rossberg's keynote on Wasm</a>. It's especially interesting to note the role that OCaml plays in both of these important projects.</p>
<h2>Until Next Time!</h2>
<p>Since the big announcement of Multicore OCaml at ICFP 2022 in Ljubljana, Slovenia, we have observed a growing interest in OCaml, and we hope to continue this momentum to reach out to the larger community. We encourage you to review the tutorials, presentations, and libraries for yourself. Reach out to us on <a href="https://discuss.ocaml.org/">OCaml Discuss</a> with your feedback and questions, or <a href="/contact/">contact us</a> directly on our website. You can also <a href="https://bsky.app/profile/tarides.com">follow us on Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIN</a>.</p>
<p>I would like to thank Tarides for sponsoring my travel, and I look forward to seeing you in Milan, Italy for ICFP 2024!</p>
]]></description><link>https://tarides.com/blog/2024-02-21-my-experience-with-tarides-at-icfp-2023</link><guid isPermaLink="false">https://tarides.com/blog/2024-02-21-my-experience-with-tarides-at-icfp-2023.html</guid><dc:creator><![CDATA[ Shakthi Kannan ]]></dc:creator><pubDate>Wed, 21 Feb 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Multicore Testing Tools: DSCheck Pt 1]]></title><description><![CDATA[<p>Reaping the plentiful benefits of parallel programming requires the careful management of the intricacies that come with it. Tarides played a significant part in making <a href="https://github.com/ocaml-multicore/ocaml-multicore">OCaml Multicore</a> a reality, and we have continued to work on supporting tools that make parallel programming in OCaml as seamless as possible.</p>
<p>To that end, the OCaml <a href="https://kcsrk.info/webman/manual/memorymodel.html">memory model</a> is carefully designed to help developers reason about their programs, and OCaml 5 introduced several guarantees to make multi-threaded programming safer and more predictable. Tarides also recently brought <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">ThreadSanitizer</a> support to OCaml, which lets users check their code for possible data races.</p>
<p>This post introduces the <a href="https://github.com/ocaml-multicore/dscheck#motivation">DSCheck</a> library, a model checker written in OCaml. It helps developers catch non-deterministic, hard-to-reproduce bugs in parallel programs. Read on to discover why and how we use DSCheck to thoroughly test multi-threaded code before deploying it!</p>
<h2>Why Use DSCheck in the First Place?</h2>
<p>Formally, concurrent programming means that an <a href="https://medium.com/@itIsMadhavan/concurrency-vs-parallelism-a-brief-review-b337c8dac350#:~:text=A%20system%20is%20said%20to,the%20phrase%20%E2%80%9Cin%20progress.%E2%80%9D">application is making progress on more than one task at the same time</a>. In addition, parallel programming allows for more than one concurrent process to happen simultaneously. When several concurrent processes share a resource in parallel, the complexity and the possibility of bugs increases by several degrees.</p>
<p>This is why developers hear the terms deadlock, starvation, and data races so often when they learn about multicore programming or concurrence. These terms are all different ways of saying the same thing: multicore programming is hard!</p>
<p>Enter DSCheck and other <a href="/blog/2022-12-22-ocaml-5-multicore-testing-tools/">multicore testing tools</a>! DSCheck is a testing tool with a particular use case: it is used for algorithms that do not spawn domains themselves, are lock-free, and use atomics. Developers can use lock-free multicore programming to significantly boost performance, but they need to check that all the executions possible on the multiple cores (called interleavings) are valid to assert that their program is without faults. Manually, this would be a gargantuan task but with DSCheck, it's made easy!</p>
<h2>When To Use DSCheck: A Naive Counter Implementation</h2>
<p>Let’s look at a practical example of when to use DSCheck. For instance, if we choose to implement a naive counter in OCaml Multicore, it might look something like this:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">		</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">old_value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">		</span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">old_value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">get</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>Now, say we increment the counter on 2 domains in parallel. If our implementation is correct, the counter should be at 2 at the end.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">main</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">domainA</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">domainB</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">join</span><span class="ocaml-source"> </span><span class="ocaml-source">domainA</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">join</span><span class="ocaml-source"> </span><span class="ocaml-source">domainB</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">	</span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Counter</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">counter</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>However, there are several ways in which the incrementation can go wrong<sup><a href="#fn-1" id="ref-1-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup> and cause the counter to actually hold the value <code>1</code> by the end of the execution. To figure out what has gone wrong, we need to unfold all the possible interleavings in which the domains can execute their accesses to their shared values – here only <code>counter</code>. Both domains perform two accesses to the counter: first a read (<code>get</code>) then a write (<code>set</code>).</p>
<p>Since OCaml’s <code>atomics</code> guarantee that program order is kept between atomic operations on a single domain, we can not reorder the operations made by the same domain.</p>
<p>However, there are still a few possible interleavings. Domain B could for example only begin working after A is done. This gives us interleaving 1:</p>
<div role="region"><table>
<tbody><tr>
<th>Step</th>
<th>Domain A</th>
<th>Domain B</th>
<th>Counter</th>
</tr>
<tr>
<td>1</td>
<td><code>Atomic.get counter</code></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td><code>Atomic.set counter (0+1)</code></td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td></td>
<td><code>Atomic.get counter</code></td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td></td>
<td><code>Atomic.set counter (1+1)</code></td>
<td>2</td>
</tr>
</tbody></table></div><p>Or they could do the same in reverse order (interleaving 2):</p>
<div role="region"><table>
<tbody><tr>
<th>Step</th>
<th>Domain A</th>
<th>Domain B</th>
<th>Counter</th>
</tr>
<tr>
<td>1</td>
<td></td>
<td><code>Atomic.get counter</code></td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td></td>
<td><code>Atomic.set counter (0+1)</code></td>
<td>1</td>
</tr>
<tr>
<td>3</td>
<td><code>Atomic.get counter</code></td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td><code>Atomic.set counter (1+1)</code></td>
<td></td>
<td>2</td>
</tr>
</tbody></table></div><p>Or they could actually <em>interleave</em> their actions, which results in interleaving 3 (and interleaving 4 by permuting A and B)</p>
<div role="region"><table>
<tbody><tr>
<th>Step</th>
<th>Domain A</th>
<th>Domain B</th>
<th>Counter</th>
</tr>
<tr>
<td>1</td>
<td><code>Atomic.get counter</code></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>2</td>
<td></td>
<td><code>Atomic.get counter</code></td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td><code>Atomic.set counter (0+1)</code></td>
<td></td>
<td>1</td>
</tr>
<tr>
<td>4</td>
<td></td>
<td><code>Atomic.set counter (0+1)</code></td>
<td>1</td>
</tr>
</tbody></table></div><p>Interleavings 1 and 2 work fine as the counter ends up with a value of 2. A problem arises in interleavings 3 and 4, since both A and B witness the counter with a 0 value thus causing the counter to end up with a value of 1.</p>
<p>This is obviously quite a simple example (and yes, you can totally avoid this bug by using <code>Atomic.incr</code>), but what it shows is that even with just two domains doing the same thing, composed of only two lines of code, we end up with four different interleavings. This is why we need DSCheck!</p>
<p>DSCheck is a model checker. Its job is to find all possible interleavings (what we just did manually) and test that every single one returns the expected result. If not, DSCheck returns the first interleaving it finds that is not working, and lets you do the work of debugging the code now that you know where and what the problem is. We will show you how we write a test for a naive counter implementation using DSCheck in part 2, so keep a look out for that.</p>
<h2>What is DSCheck Currently Used For?</h2>
<p>We have used DSCheck in several of the libraries and tools we maintain to verify certain aspects of them. For example, in the <a href="https://github.com/ocaml-multicore/saturn">Saturn</a> library of parallelism-safe data structures for OCaml, DSCheck is used to verify both the lock-freedom and safety properties of the structures. The effects-based IO stack <a href="https://github.com/ocaml-multicore/eio">Eio</a> also has its internal lock-free data structures verified with DSCheck. In short, DSCheck lets you build complex Multicore data structures with confidence.</p>
<h2>Part Two</h2>
<p>We'll end this part here now that you have an idea of what DSCheck is and when it is used. Next time, we will look at how DSCheck works in greater detail, including how we use it in Saturn. Check out our blog for part two, and connect with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to stay up-to-date with what we're up to. See you soon!</p>
<section role="doc-endnotes"><ol>
<li id="fn-1">
<p>Note that the probability of this bug happening is low and may be hard to witness. You will most likely need to add a barrier to synchronise the domains.</p>
<span><a href="#ref-1-fn-1" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
]]></description><link>https://tarides.com/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1</link><guid isPermaLink="false">https://tarides.com/blog/2024-02-14-multicore-testing-tools-dscheck-pt-1.html</guid><dc:creator><![CDATA[ Carine Morel, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 14 Feb 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Improving OCaml.org to Provide an Engaging UX and Trusted User Resources]]></title><description><![CDATA[<p>The OCaml.org team, which includes members from Tarides, have been working hard to improve OCaml.org. In the first half of 2023, our primary focus was on the much-needed overhaul of OCaml.org's documentation and overall UX Design. We aspired to enhance user experience and accessibility throughout the year, with an eye on boosting accessibility as we progress. The transition to more semantic HTML was a crucial step in addressing the previously subpar accessibility. Simultaneously, we've been continuously working on revamping the Learn section, striving for a more streamlined and informative experience. The team's efforts also included ongoing UI enhancements and dark mode designs to align with modern preferences, which will be implemented shortly, promising a sleek and adaptable interface for users.</p>
<p>Users must be able to browse package documentation, learn and experiment with OCaml, and explore the vibrant community and extensive ecosystem surrounding OCaml. The website's revamp aims to align with these core objectives, ensuring that users effortlessly find, navigate, and gather essential information. By refining OCaml.org's UX and documentation, we aspire to bridge the gap between curiosity and comprehension, empowering users to delve into OCaml confidently, engage with its resources, and ultimately contribute to and thrive within its dynamic community.</p>
<p>While there's much to say about the updated documentation and reconstruction of the Learn section, we'll save that for another post. For this announcment, we'll focus on the design changes. Tarides UX/UI Designer, Claire Vandenberghe, led the aesthetic renovations outlined below.</p>
<h3>Modes for Accessibility</h3>
<p>Currently, the design phase is underway, and we've made partial strides toward implementing the light mode. However, our focus extends far beyond merely publishing updated landing pages. Our efforts also involve an extensive overhaul guided by <a href="https://www.w3.org/WAI/standards-guidelines/wcag/">Web Content Accessibility Guidelines (WCAG)</a> color contrast guidelines to guarantee improved accessibility. This includes creating a dark mode. This comprehensive approach demonstrates our commitment to not only refine the overall aesthetic of OCaml.org but also enhance inclusivity and user experience across the board.</p>
<p>Our primary focus has been completing three main landing pages, and they've been designed to provide both light and dark modes. They will be fully published soon, but here's a sneak peak of the improvements so far:</p>
<ul>
<li>Learning: Discover our educational resources and opportunities to expand your knowledge.</li>
<li>Packages: Explore and find the perfect package for your needs.</li>
<li>Community: Connect with like-minded individuals and contribute to OCaml conversations through the dynamic <a href="https://discuss.ocaml.org/">Discuss</a> forums. You can also submit your blog's RSS feed to <a href="https://v2.ocaml.org/community/planet/">OCaml Planet</a>, a collection of OCaml-based blog posts.</li>
</ul>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/packages2-170w~ypBE9KKk0uDNEgZB3PXYZA.webp 170w, /blog/images/packages2-340w~foMgnk2e8qp6Ud0zpDFVAg.webp 340w, /blog/images/packages2-680w~8WL5dj3YabjtdTvndES0oQ.webp 680w, /blog/images/packages2-1360w~SD6_nCzDcKKwnzCSbS8bHg.webp 1360w" src="/blog/images/packages2-1360w~SD6_nCzDcKKwnzCSbS8bHg.webp" alt="Mobile Preview"></p>
<h3>Mobile and Tablet Optimisation</h3>
<p>In addition to creating visually appealing landing pages, we are committed to making sure they are easily accessible on mobile and tablet devices. You can expect a seamless experience regardless of your preferred platform.</p>
<h3>Prioritising Accessibility and User Engagement</h3>
<p>We believe in making our project inclusive and user-friendly. To achieve this, we’ve dedicated time to improving accessibility, engaging in valuable user discussions, conducting surveys, and carrying out user interviews. This collaborative effort ensures that our project maintains consistency across all pages and modes.</p>
<h3>Teamwork and Collaboration</h3>
<p>The past weeks have seen fantastic teamwork and collaboration between our users and team members. We are immensely grateful for your input and support in making our project the best it can be.</p>
<h3>Get a Sneak Peek</h3>
<p>If you’re curious and want to take a closer look at our design, visit <a href="https://www.figma.com/file/6BSOEqSsyQeulwLo2pjs9r/Untitled?type=design&amp;node-id=0%3A1&amp;mode=design&amp;t=GwVxvrXItX7k8pP9-1">our Figma design files</a>. You can add comments expressing your ideas and feedback directly on this Figma file, too!</p>
<h3>Conclusion</h3>
<p>As the OCaml.org design team's work unfolds, our dedication to both aesthetics and accessibility is at the core of our mission. Landing web pages should not only capture the eye but also offer a user experience that transcends the boundaries of devices.</p>
<p><a href="https://ocaml.org">OCaml.org</a> looks forward to the continued support and engagement from our users as they march ahead, committed to creating web pages that are as beautiful as they are accessible.</p>
<p>We’re excited about the progress we’ve made and can’t wait to share more with you in the coming months. Thank you for being a part of our journey!</p>
<p>If you'd like to give your input on the OCaml.org Revamp Project, please join the converstaion on <a href="https://discuss.ocaml.org/tag/ocamlorg">Discuss</a>. We have an OCaml.org Newsletter each month summarising our progress. Find them and let us know your thoughts using the <a href="https://discuss.ocaml.org/tag/ocamlorg-newsletter"><code>ocamlorg-newsletter</code> tag</a>.</p>
]]></description><link>https://tarides.com/blog/2024-02-07-improving-ocaml-org-to-provide-an-engaging-ux-and-trusted-user-resources</link><guid isPermaLink="false">https://tarides.com/blog/2024-02-07-improving-ocaml-org-to-provide-an-engaging-ux-and-trusted-user-resources.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 07 Feb 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Are Your Programs Doing What You Think They're Doing? Introducing Monitoring Tools for Multicore OCaml]]></title><description><![CDATA[<p>As programs grow in size and complexity, they become more challenging to optimise. When the cause of a particular performance issue can theoretically be attributed to multiple sources, developers need concrete data to drive their decision making and avoid time-consuming guesswork. As you can imagine, OCaml 5’s new multicore capabilities – whilst bringing <a href="/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations/">significant performance improvements</a> – can compound this problem even further.</p>
<p>In light of this, it’s easy to see how organisations that use large numbers of co-operating servers to run big systems can struggle to narrow down even the <em>when</em> and <em>how</em> of a performance drop. Fortunately for users of OCaml, the language comes with built-in features that allow them to monitor its runtime and get automatic reports.</p>
<p>OCaml 5 introduced multicore support and, alongside it, the ring-buffer-based monitoring system <a href="https://v2.ocaml.org/releases/5.0/api/Runtime_events.html"><code>runtime_events</code></a>. From then onwards, teams have been working on adding more features, including custom events, which allows developers to monitor user events, and <a href="https://github.com/tarides/runtime_events_tools">Olly</a>, a tool providing nicely formatted data helping users to visualise program behaviour. These features make performance in OCaml easier to troubleshoot, optimise, and monitor.</p>
<h2>The Eventlog Legacy</h2>
<p>Monitoring the OCaml runtime is not a new idea, and before the 5.0 release, the language supported it via a feature called <em>Eventlog</em>. As the name suggests, <code>eventlog</code> did indeed monitor the runtime logging events to a file. This method would eventually result in massive log files that used up a lot of disk space.</p>
<p>Due to <code>eventlog</code>’s design, it was unsuitable for long-running programs that needed to be monitored continuously for extended periods. Users had to set up their runtime in a special way to use it in the first place, and then it would still introduce a performance hit. As a result, there was a push to upgrade the feature and make it more widely applicable in conjunction with OCaml 5. Enter Eventring!</p>
<h2>Ring Buffers and Runtime Events</h2>
<p><code>Eventring</code> was the previous name for what is now called <code>runtime_events</code>. In 2021, <a href="https://tezos.com/">Tezos</a> needed more monitoring tools for the OCaml runtime, and they originally funded <a href="https://github.com/sadiqj">Sadiq Jaffer’s</a> (then as part of <a href="https://www.opsian.com/blog/">Opsian</a>) efforts at introducing the <code>eventring</code> monitoring system. The ‘ring’ part hints at the ring-buffer-based system he used to replace <code>eventlog</code>.</p>
<p>A ring buffer is a data structure that consists of two pointers in a linear backing array, where the “<a href="https://v2.ocaml.org/manual/runtime-tracing.html">tail pointer points to a location where new events can be written and the head pointer points to the oldest event in the buffer that can be read.</a>” When there is no more space in the array (when the tail pointer reaches the head pointer), the head pointer is advanced, and the oldest events are overwritten. When either pointer reaches the end of the array, it wraps around to the beginning.</p>
<p>The ring buffer can continuously write and overwrite data from the runtime into the array, keeping the memory used constant. This system stays lightweight and low-impact rather than creating an ever-increasing log file. When enabled, the <code>runtime_events</code> architecture introduces less than 0.5% overhead so that users can monitor their runtime continuously without performance woes.</p>
<p><code>Runtime_events</code> emits raw events, which are low-level pieces of data ready to be combined into meaningful reports by tools like Olly. There are three main types of events that <code>runtime_events</code> emits:</p>
<ul>
<li>Spans: These are events spanning a period of time, with a starting and an ending point. For example, a span beginning when a minor collection starts in the garbage collector (GC) and ending when it stops.</li>
<li>Lifecycle Events: These occur at a moment in time. For example, a lifecycle event can be emitted when a domain terminates.</li>
<li>Counters: These events include a measurement of quantity, such as the number of words promoted from the minor to the major heap during the last minor GC.</li>
</ul>
<p>These events allow developers to monitor the OCaml runtime by enabling <code>runtime_events</code> and choosing what classes of events they want to receive. You can run OCaml as usual and leave the monitoring system in the background. When something of interest happens, such as a performance drop, you can retrieve the recently emitted events from the ring buffers and examine precisely what the runtime was doing.</p>
<p>With <code>runtime_events</code>, users gain an unprecedented understanding of what the runtime environment is doing at different points of interest. With <code>runtime_events</code>, end users can expect continuous performance data extraction with very low overhead.</p>
<p>As of OCaml 5.0, this feature was exclusive to GC and runtime events, meaning user events were left out. However, the OCaml 5.1 release would change this by introducing <strong>custom events</strong>.</p>
<h2>Custom Events</h2>
<p>The addition of <code>runtime_events</code> and its spans, counters, and lifecycle events inspired <a href="https://www.lortex.org/">Lucas Pluvinage</a> to add support for custom events. Custom events are generated by the user as opposed to by OCaml itself. Custom event support allows you to generate events from <code>runtime_events</code> to see what the GC and runtime were doing when your program was active and if other user events were triggered.</p>
<p>One example of how to use custom events comes from Lucas himself. He was trying to understand some performance issues he was experiencing in a multicore Eio program. He figured out – using <code>runtime_events</code> and <code>olly</code> – that the domains had difficulty synchronising in the major GC. One domain was waiting for 200 milliseconds(!) without him being able to figure out why. But, by adding custom events of the span types in <code>eio</code>, he could see that the time was spent in a system call not marked as a blocking section. Armed with this information, Lucas could finally address the underlying issue.</p>
<p>The motivation behind this feature is to give more experienced users tools that give them greater freedom and specificity to monitor and optimise their workflows. For casual users, the three standard events (spans, counters, and lifecycle events) are great for getting a good overview of the runtime. Still, custom events allow for a more granular approach.</p>
<p>Now, adding custom events for phases of your program does help you understand how they affect the runtime and vice-versa, but that is not the only way to use them. If you wanted to, you could ignore events from the runtime entirely and only use custom events to understand what your own programs are doing. For example, how long have they been waiting for data from external services? How big are their internal queues? What kind of latency is there for each request they serve?</p>
<p>This new feature adds another dimension to the event monitoring systems for OCaml, allowing for more customised monitoring. Users can discover how aspects of their code are affecting the runtime and use the reports to optimise their programs. In turn, this benefits the rest of the ecosystem as programs become faster and more efficient. For example, the observability tool <code>eio-trace</code> (more about that at another time) uses custom events to give the user a graphical representation of what is happening with their programs – shown here tracing the Eio tutorial's networking example:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/eio_trace-170w~pAJnWVKLqj_EdofXzL5PAw.webp 170w, /blog/images/eio_trace-340w~VMc5ReCNFkg9XDECPnk95g.webp 340w, /blog/images/eio_trace-680w~Lz_m3OYfpWG4qo16qkmUKQ.webp 680w, /blog/images/eio_trace-1360w~hmKPWgmBDoVOjc6OkHKvVQ.webp 1360w" src="/blog/images/eio_trace-1360w~hmKPWgmBDoVOjc6OkHKvVQ.webp" alt="A graphical representation of the Eio tutorial's networking example"></p>
<p>Please note that whilst custom events were introduced in OCaml 5.1, the update to the OCaml manual reflecting this change will be introduced in OCaml 5.2.</p>
<h2>Olly &amp; Observability</h2>
<p>To make these features as accessible as possible for the end user, the team working on <code>runtime_events</code> also introduced observability tooling. The observability tool for OCaml is called Olly, and it helps users visualise the data collected from <code>runtime_events</code>.</p>
<p>But more about that another time! Look out for future posts about Olly and how to use it to understand what your programs are doing. It’s a fantastic tool that can change how you interact with your code, removing the guesswork and giving you great insight into any performance problems you encounter.</p>
<h2>Stay in Touch!</h2>
<p>If you’re curious to explore these features and how they can benefit your workflow, you’re in luck! We would be happy to <a href="/contact/">talk to you</a> about how OCaml can benefit you and your projects and how monitoring tools can help you get the most out of your software.</p>
<p>Stay in touch with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and on <a href="https://www.linkedin.com/company/tarides">LinkedIN</a>, where we regularly post updates about what we are working on.</p>
<h3>Acknowledgements</h3>
<p>Thank you to <a href="https://github.com/sadiqj">Sadiq Jaffer</a> and <a href="https://github.com/NickBarnes">Nick Barnes</a> for their help with this article.</p>
]]></description><link>https://tarides.com/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2024-01-31-are-your-programs-doing-what-you-think-they-re-doing-introducing-monitoring-tools-for-multicore-ocaml.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 31 Jan 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS: Designing a More Resilient Networking Stack With µTCP]]></title><description><![CDATA[<p>There is no room for complacency in software development! In the <a href="https://ocaml.org/">OCaml</a> ecosystem, improvements are continuously introduced to optimise existing workflows, introduce new features, and boost performance.</p>
<p><a href="https://mirage.io/">MirageOS</a> is a toolchain for creating unikernels (very small images that embed both an application and the OS components needed to run it) from several libraries written in OCaml. It allows users to create robust, fast, and secure applications.</p>
<p>We invest in the development of MirageOS, supporting great work both internally and at other organisations. One of these projects is <a href="https://hannes.robur.coop/About">Hannes Mehnert’s</a> work at <a href="https://robur.coop/Home">Robur</a>, developing an updated <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol</a> (TCP) for MirageOS.</p>
<h2>Networking in MirageOS</h2>
<p>Historically, much work has gone into implementing a complete network stack native to MirageOS. Getting networking right without relying on the Linux stack and in a <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">safe language like OCaml</a> is tremendously important for a tool that aims to build unikernels deployed in a cloud environment. Good network management is paramount for applications with large user bases. Over time, developers have written multiple libraries for MirageOS to handle every layer of the networking stack, from Ethernet to HTTP.</p>
<h2>TCP</h2>
<p>The TCP protocol, described in <a href="https://hannes.robur.coop/Posts/TCP-ns">Hannes's recent blog post</a>, is a protocol that is a fundamental building block for most of the higher-level protocols and applications. Its role is to handle reliable data delivery from one point to the other so that the rest of the stack can focus on the actual content of that data. The protocol is specified in English in a series of RFCs, and a variety of implementations exist for this specification, with potentially different behaviours depending on their interpretation of (or even compliance with) the specification. This means that TCP traffic in the wild is not uniform, and coping with it means accepting a broad interpretation of the specification and recovering gracefully from non-complying traffic.</p>
<p>The historical library that implements TCP in MirageOS is the widely used <a href="https://github.com/mirage/mirage-tcpip">mirage-tcpip</a>. <code>Mirage-tcpip</code> is used by thousands of developers every single day in <a href="https://www.docker.com/products/docker-desktop/">Docker Desktop</a>, where it handles traffic between container and host. While it usually works well under controlled circumstances (well-formed traffic, local networks), this library has some issues that are not easily solved, which Hannes describes in his post (memory leakage, low tolerance to non-ideal traffic, etc).</p>
<h2><a href="https://github.com/robur-coop/utcp">µTCP</a> and Formal Methods</h2>
<p>As part of our mission to build robust and secure systems, we strive to support technology that roots its design in the field of <a href="https://en.wikipedia.org/wiki/Formal_methods">formal methods</a>, a variety of techniques, tools and approaches which provide a rigorous framework for the specification and implementation of software. HOL4 is one such tool, allowing users to formally specify and prove theorems. It can be used as a foundation to define complex specifications that can then serve as a reference to test or prove compliance with implementations.</p>
<p>Peter Sewell et al. used HOL4 to build a semantic model of the TCP specification (by describing the byte-stream service provided to users) that could cope with diverse complying implementations of TCP, as checked by testing it against real traffic. That same model is what Hannes is using to derive an OCaml implementation usable by MirageOS, called µTCP. While there is no formal link between the model and the implementation (the work is primarily manual, and the transposition is not always 1:1), it is a perfect opportunity for a new TCP implementation that copes with more realistic traffic by design, and may avoid some of the other problems of <code>mirage-tcpip</code> such as memory leaks and performance limitations. Eventually, it may become a suitable replacement for <code>mirage-tpcip</code> within MirageOS, where the change should be seamless thanks to the abstract TCP interface of MirageOS.</p>
<h2>Next Steps</h2>
<p>Keep an eye out for updates from Hannes and Robur, as they will be testing µTCP more to get it ready for public release. The team appreciates feedback, so please check out µTCP on your own and report on your experience <a href="https://github.com/robur-coop/utcp">in the repo</a>. We are thrilled to see improved features for MirageOS, and hopefully it will bring even more users to the ecosystem.</p>
<p>You can also follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIN</a> to keep up with our projects. We would <a href="/contact/">love to discuss</a> how MirageOS can benefit you in your projects!</p>
]]></description><link>https://tarides.com/blog/2024-01-24-mirageos-designing-a-more-resilient-networking-stack-with-tcp</link><guid isPermaLink="false">https://tarides.com/blog/2024-01-24-mirageos-designing-a-more-resilient-networking-stack-with-tcp.html</guid><dc:creator><![CDATA[ Virgile Robles ]]></dc:creator><pubDate>Wed, 24 Jan 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[What are Data Races? And do They Threaten Your Business?]]></title><description><![CDATA[<p>Imagine you have a brand-new coffee machine. One morning, you traipse excitedly down the stairs only to discover that, alas, your appliance has turned into a potato. The next day, it turns back into a coffee machine.</p>
<p>And so it continues: on some days, it dispenses you perfect, frothy coffee; on other days, it dispenses Coca-Cola; and on some days, it doggedly persists with being a root vegetable. Overall, how satisfied would you be with your new coffee machine? Sure, sometimes the beverage you receive is perfect, but you can never be confident about what you will get.</p>
<p>Software users, just like coffee drinkers, value reliability. Your product could be excellent, but its value quickly diminishes if you can’t guarantee that it will meet your users’ expectations every time. This uncertainty is what data races do to code. Data races are a type of bug that introduces unpredictability into software programs, with potentially dire effects if not eliminated.</p>
<h2>What is a Data Race?</h2>
<p>The term ‘<a href="https://coderrect.com/data-races-what-are-they-and-why-are-they-evil-part-1/#:~:text=So%2C%20in%20contrast%20to%20a,so%20it%20is%20a%20BUG">data race</a>’ describes a problem that can occur in programs that use multiple threads (also known as parallel programming). Multiple threads allow several operations to be performed simultaneously, significantly boosting the performance of programs, applications, and software.</p>
<p>Parallel programming introduces complexity that needs to be skillfully managed to ensure it does not have negative implications. For example, say two operations are happening simultaneously. If two threads try to write to the same memory location in parallel (and there are no rules that manage the event), there is no telling which access might be recorded, if any, or which might be dropped and why.</p>
<p>This is what a data race looks like. Formally, a data race occurs when two conflicting memory accesses are accessing the same piece of memory, are performed concurrently, at least one is a <code>write</code>, and both are ungoverned by synchronisation or locks.</p>
<p>If code allows data races to occur, it introduces an element of randomness that can make programs act unpredictably. Critically, problems caused by data races are not deterministic – you can have your program crash upon launch one day and start-up seemingly normally on the next. This randomness makes it hard for developers to debug data races and frustrating for the end users who cannot predict the behaviour of the software they’re using.</p>
<p>In languages such as C, C++, Go, and unsafe Rust, this unpredictability causes crashes, memory corruption, and a variety of strange behaviours by the program. Importantly, this is <strong>not</strong> the case in OCaml.</p>
<p>In practice, data races undermine the trustworthiness of programs and data. Tech-savvy users may call the reliability of a program into question for even allowing data races – this happened in 2020 when the <a href="https://coderrect.com/data-races-what-are-they-and-why-are-they-evil-part-1/">accuracy of Covid-19 simulation software</a> was debated online due to the suspected presence of data races in the code. Even users who are unaware of what data races are notice their effects, such as random crashes and unexpected behaviours, which damage their perception of the program.</p>
<h2>How we Help Our Partners Avoid the Perils of Data Races</h2>
<p>Now that you know what data races are and why they are undesirable let me tell you how Tarides manages the risks of data races and helps our partners avoid their pitfalls.</p>
<p>We use the programming language <a href="https://ocaml.org/">OCaml</a> to power our solutions. There are many reasons why we have chosen to use OCaml, one of the most significant ones being its <a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">strong safety guarantees</a>. The language is type- and memory- safe, avoiding the security vulnerabilities that would otherwise occur as a result of <a href="/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application/">memory safety issues</a>. OCaml’s expressive type system also catches many bugs for the developer at compile time, making debugging easier.</p>
<p>Thanks to the rigorous safety measures extant in OCaml, data races are also less destructive than in languages like C or C++. Furthermore, data races in OCaml do not result in crashes, and its <a href="https://v2.ocaml.org/manual/memorymodel.html">memory model</a> guarantees that memory safety is preserved even for programs with data races. These factors already mitigate two of the biggest problems brought about by data races: crashes and memory corruption.</p>
<p>Still, data races can create surprises for the developer, even in OCaml. This is because data races can cause behaviours that cannot be explained by mere interleavings of operations from different threads. Put simply, the code doesn’t operate in a way that makes it easy to reason about its behaviour. Developers who are unfamiliar with the symptoms will struggle when faced with a data race.</p>
<p>OCaml helps developers reason about unpredictable behaviours by imposing limits on the unpredictability. The <a href="https://v2.ocaml.org/manual/memorymodel.html">OCaml Memory Model</a> offers strong guarantees even for programs with data races. For example, data races cannot result in <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">out-of-thin-air values</a>, meaning that the only values that can be observed are those that have been previously written to – so no nasty surprises. OCaml offers what is known as a <em>Local Data Race Freedom Sequential Consistency</em> (LDRF-SC) guarantee. It helps the developer reason about a parallel program, guaranteeing that even if it has data races, the rest of it will follow predictable patterns.</p>
<p>These factors help the developer be more productive as they face fewer headaches. Yet, we still want to give programmers an easy way to find and eliminate data races in their OCaml programs. Even though data races in OCaml are less unpredictable than those in other languages, it is still good to eliminate them whenever possible. To this end, we have brought the data race detector <a href="https://clang.llvm.org/docs/ThreadSanitizer.html">ThreadSanitizer</a> to OCaml.</p>
<p>For a deep dive into this topic, I recommend checking out <a href="https://kcsrk.info/papers/pldi18-memory.pdf">this paper</a> for more technical background on the OCaml Memory Model.</p>
<h2>Bringing ThreadSanitizer and its Benefits to OCaml</h2>
<p>ThreadSanitizer, or TSan, is an open-source tool that reliably detects data races at runtime. TSan can be enabled, alert the developer of any data races in their code, and then disabled. This allows users to quickly identify and remove data race bugs without introducing overhead.</p>
<p>Tarides is one of the largest maintainers of OCaml libraries and related tools, and our team members <a href="/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml/">worked hard to help bring TSan support to OCaml</a>. With TSan support, users now have a comprehensive and straightforward way to detect and eliminate data races from their code. TSan makes the developer experience in OCaml smoother – less time spent bug-hunting makes for happier developers who can use their time and energy on better (and more fun!) things.</p>
<h2>The Main Message</h2>
<p>We build our technologies in the programming language OCaml, which brings several benefits to our partners. OCaml has always been known for its high levels of security, which is one of the reasons why the trading firm <a href="https://www.janestreet.com/">Jane Street</a> uses the language for its services. The OCaml 5 update introduced Multicore support, drastically increasing the performance potential of programs and applications using OCaml. By bringing TSan to OCaml, we’re helping both the community and our partners benefit from the security and performance of Multicore OCaml with minimal headaches.</p>
<p>If you want to discover how we can leverage OCaml to your advantage, please <a href="/contact/">contact us</a>, and we will be happy to chat. You can also follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> to stay up to date on what we’re up to!</p>
]]></description><link>https://tarides.com/blog/2024-01-17-what-are-data-races-and-do-they-threaten-your-business</link><guid isPermaLink="false">https://tarides.com/blog/2024-01-17-what-are-data-races-and-do-they-threaten-your-business.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 17 Jan 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Meet odoc, OCaml's Documentation Generator]]></title><description><![CDATA[
<p>Effective documentation is a cornerstone of software development. It helps developers understand how to use a language, its libraries, and its tooling, which leads to more robust and maintainable code. When it comes to OCaml, <code>odoc</code> is the wizard behind the scenes, ensuring developers not only understand OCaml's quirks but also become familiar with its libraries and tools. <code>odoc</code> powers the OCaml.org package documentation, so it's used widely by the entire OCaml community.</p>
<p><a href="https://ocaml.github.io/odoc/"><code>odoc</code></a> is a documentation generator specifically designed for OCaml. It takes comments in the source code and generates documentation in HTML, man pages, and LaTex for PDF generation. Think of it like <a href="https://www.sphinx-doc.org/en/master/">Sphinx</a> or <a href="https://en.wikipedia.org/wiki/Javadoc">Javadoc</a> specifically made for OCaml. In other words, what Sphinx is to Python, <code>odoc</code> is to OCaml. With <code>odoc</code>, you can automatically create documentation for your libraries.</p>
<p>See it in action! <a href="https://ocaml.github.io/odoc/#overview">The <code>odoc</code> documentation</a> pages, the <a href="https://aantron.github.io/dream/">Dream docs</a>, and the <a href="https://ocaml.org/p/cmdliner/latest">online OCaml Packages docs (e.g., for <code>cmdliner</code>)</a> were rendered using <code>odoc</code>.</p>
<h2>History of <code>odoc</code></h2>
<p><a href="https://v2.ocaml.org/manual/ocamldoc.html">OCamldoc</a>, developed by Maxence Guesdon, was the earliest documentation generator for OCaml. It created docs in HTML by using comments in the source code tagged with <code>(**</code> and <code>*)</code>. This allowed developers to produce documentation that closely followed the code and made it easy to keep the docs up to date. It provided a simple and efficient way to generate documentation for OCaml projects.</p>
<p>In recent years, there has been a transition to <code>odoc</code>, an open-source project that OCaml Labs and Tarides developed. This more modern and expandable documentation generator is built on the OCaml compiler's infrastructure. This makes <code>odoc</code> more tightly integrated with the language, so it supports all aspects of the OCaml language.</p>
<p><code>odoc</code> offers several advantages over OCamldoc. It provides support for features like cross-referencing, modular documentation, and custom HTML theming, making it easier for developers to generate comprehensive and visually appealing documentation for their OCaml projects. Other recent features added to <code>odoc</code> include linking to source code and support for searching through the documentation.</p>
<p>The adoption of <code>odoc</code> marks a significant step forward to improve the quality and accessibility of OCaml documentation. It reflects the OCaml community's continuous effort to build an ecosystem of well-documented packages to improve OCaml developer experience and support the community's growth.</p>
<h2>Generate Docs With <code>odoc</code></h2>
<p><code>odoc</code> produces documentation by reading special comments embedded into the source code. <code>odoc</code>'s rich markup language allows standard formatting elements such as bold, italic, lists, and code sections, as well as section headings or even tags (<a href="https://ocaml.github.io/odoc/odoc_for_authors.html">like @param</a>) for adding custom data to specific aspects.</p>
<p>It's quite simple to use <code>odoc</code> because it reads the comments delineated with <code>(** ... *)</code>. For example:</p>
<pre><code>(** This is an OCaml docstring. It supports {b bold}, {e italic}, [code], and much more!
    Here is a list:
    - Item 1
    - Item 2
    And even a table:
    {t
    Table | support
    ------|--------
    is    | cool!
    } *)
</code></pre>
<p>Here is what it looks like after rendering:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/odoc_table-170w~-2ofzG-FNbX2bMLbXSxRhw.webp 170w, /blog/images/odoc_table-340w~sdofbDRSJdd02mrcpT767w.webp 340w, /blog/images/odoc_table-680w~LaRkAyVuUVso9Qayk819Hw.webp 680w, /blog/images/odoc_table-1360w~WPLQGtVyyPw4zLj9XjgHfg.webp 1360w" src="/blog/images/odoc_table-1360w~WPLQGtVyyPw4zLj9XjgHfg.webp" alt="odoc table"></p>
<h2><code>odoc</code> Features</h2>
<h3>Modules</h3>
<p>In <code>odoc</code>, the basic unit of organisation is the module. The <code>odoc</code> tool generates one page for each module, module type, class, and class type for HTML, LaTex, or man pages output.</p>
<h3>Extensivity</h3>
<p><code>odoc</code> will document all of the values, types, and classes, along with those specially-formatted comments, for each module.</p>
<h3>Cross-Reference</h3>
<p><code>odoc</code> has an accurate cross-referencer that can calculate links between types, modules, module types, and more. A simple click will take you to the type's definition. So if you've ever been baffled by exactly what the <code>t</code> was in <code>val f : t -&gt; unit</code>, <code>odoc</code> will link to it!</p>
<h3>Expander</h3>
<p>Figuring out a module's exact content from its signature is not always easy when using OCaml's expressive features such as <code>include</code> or <code>module type of</code>. <code>odoc</code> always expands such constructs to provide the reader the exact list of items available in the module!</p>
<h2>Tarides and <code>odoc</code></h2>
<p>Tarides wants to improve the OCaml developer experience and remove the blockers for new developers to adopt OCaml. An ecosystem of well-documented packages is critical to language adoption and creating a great developer experience. We aim to provide package authors with excellent tooling, so they can write rich documentation. Tarides' commitment to improving <code>odoc</code> directly addresses these goals.</p>
<p>Tarides contributes significantly to the development of <code>odoc</code>. Tarides engineer Jon Ludlam has led the project and has been rewriting <code>odoc</code>'s model for 2.0. We've added source code linking and support for search. Plus, we developed new <code>odoc</code> rules for Dune that can generate a link to the dependencies' documentation. We even created the <code>odoc</code> driver and CI pipeline that produce the package documentation for OCaml.org</p>
<h2>Conclusion</h2>
<p>Effective documentation guides developers through a language's intricacies, library functionalities, and tool utilisation. A documentation generator proves to be an indispensable conduit that translates complicated code into accessible, structured documentation. A <em>dedicated</em> documentation generator, like <code>odoc</code>, offers more than just documentation creation. It plays a crucial role in helping OCaml developers create well-documented libraries and applications, making it easier for programmers to work with the language and its rich ecosystem of libraries. It also encourages collaboration, accelerates learning curves, and crucially nurtures the growth of robust, well-documented, maintainable codebases. Tarides understands the huge benefit this gives developers, which is why we have a team dedicated to maintaining and improving <code>odoc</code>.</p>
<p>Join the <code>odoc</code> conversation on <a href="https://discuss.ocaml.org/c/eco/15">discuss.ocaml.org</a> under the Ecosystem category by using the <code>odoc</code> tag. Also, please don't hesitate to <a href="https://github.com/ocaml/odoc">open an issue</a> on GitHub, as we're always striving to improve our products. Get started by <a href="https://github.com/ocaml/odoc">installing <code>odoc</code> today</a>!</p>
]]></description><link>https://tarides.com/blog/2024-01-10-meet-odoc-ocaml-s-documentation-generator</link><guid isPermaLink="false">https://tarides.com/blog/2024-01-10-meet-odoc-ocaml-s-documentation-generator.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 10 Jan 2024 00:00:00 GMT</pubDate></item><item><title><![CDATA[Announcing the ORCHIDE Project: Powering Satellite Innovation]]></title><description><![CDATA[<p>It has been a few months since we announced our new product for both older and newer models of satellites: <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a>. Since then, there have been exciting updates to the use of SpaceOS for innovative satellite applications.</p>
<p>Tarides is thrilled to announce our partnership with several esteemed organisations in the earth observation and space technology sector: <a href="https://www.thalesaleniaspace.com/en">Thales Alenia Space</a> France, <a href="https://upb.ro/">University Politechnica of Bucharest</a>, <a href="https://kplabs.space/">KpLabs</a>, and <a href="https://www.thalesgroup.com/en">Thales</a> Romania. Under the banner of project ORCHIDE, part of the <a href="https://ec.europa.eu/info/funding-tenders/opportunities/portal/screen/opportunities/topic-details/horizon-cl4-2023-space-01-11">EU HORIZON CL4-2023-SPACE-01-11 programme</a>, Tarides and its spinout company <a href="https://parsimoni.co/">Parsimoni</a> will collaborate on high-performance secure-by-design software solutions tailored to the challenges of space.</p>
<h2>The Changing Landscape of Satellite Solutions</h2>
<p>New satellite technologies open up new use cases – making satellites accessible to a much wider audience.  The future of the market emphasises smaller, low-cost satellites, reducing the cost of services for end users. On-board data processing is also key as transferring large amounts of data from satellites is getting more and more challenging (for example due to radio wave limitations).</p>
<p>These modern satellites must also be reactive and dynamic, with reprogrammable designs that allow multiple users to share hardware and deploy their own applications, driving costs down. <a href="https://ourworldindata.org/grapher/yearly-number-of-objects-launched-into-outer-space">More satellites have been launched into space in the last two years than ever before</a> enabling new (and lower cost) business use cases.</p>
<p>To support this increased demand, updated software solutions are required. Software in space needs to be secure as a remote hack could be catastrophic, and resource-efficient to maximise the use of limited energy and memory. This is where <a href="https://queue.acm.org/detail.cfm?id=2566628">unikernel</a> technology, with its light-weight structure and security features, is a game changer.</p>
<p>Unikernels strip down the operating system to only the necessary components required by the application, resulting in smaller and more efficient applications tailored to specific use cases. For example <a href="https://mirage.io/">MirageOS</a>, an OCaml-based operating system that constructs unikernels, the smaller attack surface in combination with OCaml’s <a href="/blog/2023-12-14-ocaml-memory-safety-and-beyond/">strong safety features</a> illustrates the cybersecurity benefits of unikernel technology.</p>
<h2>The Future is Bright</h2>
<p>The solutions resulting from the ORCHIDE project will enable the satellite industry to use edge computing resources with great flexibility whilst maintaining high levels of security. As a result, operators can scale production more efficiently, creating significant value for users of earth observation data.</p>
<p>ORCHIDE heralds a new era in satellite deployment, with lower barriers to entry, greater flexibility and security, and new opportunities for both operators and end users.</p>
<p>We will keep you updated on our projects in space, so stay tuned to our blog! You can <a href="/contact/">sign up for our newsletter</a> for monthly updates, and follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIN</a> to stay <em>à jour</em> with everything Tarides.</p>
]]></description><link>https://tarides.com/blog/2023-12-29-announcing-the-orchide-project-powering-satellite-innovation</link><guid isPermaLink="false">https://tarides.com/blog/2023-12-29-announcing-the-orchide-project-powering-satellite-innovation.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 29 Dec 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Survey: Developers' Perception, Interest, and Perceived Barriers]]></title><description><![CDATA[<p>Tarides is conducting <a href="https://forms.gle/T7Ya6UUiZ6xTMisV8">a survey</a> targeting non-OCaml programmers to learn their thoughts about this functional programming language and uncover any misconceptions surrounding it. Please take a few mintues to fill it out if you haven't yet done so.</p>
<p>This post shows our preliminary findings based on a relatively small sample size within the Twitter community. The survey aimed to shed light on the challenges hindering its broader acceptance among programmers unfamiliar with its ecosystem and principles.</p>
<p>Our goal was also to discover what information could help new users try OCaml. We found it important to explore any perceived barriers to better understand their impact on adoption. Additionally, we identified some topics we can cover on our blog and the overall OCaml documentation.</p>
<h2>Participants' Interest in OCaml</h2>
<p>Results showed that 96.6% of the 120 respondents were already familiar with OCaml. The survey data reflects diverse reasons for interest in OCaml. Several respondents value its functional programming paradigm, drawn to its powerful FP and type-level features, reminiscent of Haskell but perceived as less intimidating. Others appreciate its similarity to F# or its multiparadigm nature, enabling cross-compilation to JavaScript and highlighting its industrial applicability.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/ocaml_graph-170w~txgoR0_rsmBvAcUXu4oIow.webp 170w, /blog/images/ocaml_graph-340w~fFB3PprJVQmmVgO20EqlXg.webp 340w, /blog/images/ocaml_graph-680w~1VjD4pj4mB0L64ww4HqgLw.webp 680w, /blog/images/ocaml_graph-1360w~hSUndCn_gRuFO6gh5uKtvA.webp 1360w" src="/blog/images/ocaml_graph-1360w~hSUndCn_gRuFO6gh5uKtvA.webp" alt="Heard of OCaml Graph"></p>
<p>The language's fast compilation speed comparable with Go, the enthusiastic OCaml community, and its strong type system also emerge as significant draws. While some respondents found its similarity to Rust appealing, others were attracted to its practicality and performance. There was a general sentiment among respondents that OCaml offers a blend of functional programming without the rigidity seen in some other languages, making it an intriguing choice for developers seeing to explore new paradigms.</p>
<p>Its pragmatic approach and its potential applicability in diverse problem domains have also intrigued developers.</p>
<h2>What Piques Users' Interest in OCaml</h2>
<p>The survey respondents offered various factors that would pique their interest in learning and using OCaml. Job opportunities using OCaml, exceptional tooling, high performance, and a healthy ecosystem of libraries were primary motivators. Additionally, a desire for enhanced type systems, specific functionalities similar to other languages, and an interest in functional programming combined with native compilation emerged as strong incentives.</p>
<p>Some participants already had an interest in OCaml but sought more time or clearer learning paths, while others looked for real-world examples, robust web frameworks, or significant projects utilising OCaml to inspire their adoption. Better resources, tutorials, or online REPLs, as well as use cases that showcase OCaml's strengths, were also highlighted as essential for fostering interest.</p>
<p>Overall, job prospects, powerful tooling, distinct advantages over other languages, clear learning paths, compelling use cases, and abundant learning resources emerged as key factors that would attract individuals to learn and use OCaml.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/try_ocaml-170w~JyZ3ZFu5DOi8pFZuzkGUnQ.webp 170w, /blog/images/try_ocaml-340w~A0TZSPy-dVp2C4MfOLEDZw.webp 340w, /blog/images/try_ocaml-680w~2jzD0YACP6Iv9fhHBGVsBg.webp 680w, /blog/images/try_ocaml-1360w~gHnd31yeN-DkjDh6R4H75A.webp 1360w" src="/blog/images/try_ocaml-1360w~gHnd31yeN-DkjDh6R4H75A.webp" alt="Try OCaml Graph"></p>
<h2>What Programmers Want to Know Before Trying OCaml</h2>
<p>Several key inquiries emerged as top concerns amongst respondants. A significant 74.6% sought information on OCaml's real-world interface possibilities encompassing web servers, GUI, APIs, and databases. Additionally, 69.3% were keen on discovering a comprehensive learning path, documentation, and video resources aiding beginners in getting started. Another prevalent query, registering at 66.7%, centered on locating useful libraries.</p>
<p>Beyond these top three, between 25% and 35% of respondents expressed interest in aspects like compatibility with operating systems, clouds and hosting providers, IDE integration, availability of VS Code plugins, web request capabilities, access to practice exercises, and the presence of an active community forum for discussions and OCaml-related assistance.</p>
<p>Participants reported that they get most of their information about OCaml on Twitter (X), which isn't surprising because we queried participants found through social media.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/ocaml_news-170w~e5DirVSBQ0Q9FesgvS2uyw.webp 170w, /blog/images/ocaml_news-340w~mUNfH9trN-YGKH2KVz-i6g.webp 340w, /blog/images/ocaml_news-680w~vuTXaoIGOJ6_mbuJALZ68Q.webp 680w, /blog/images/ocaml_news-1360w~wwcTXLhNmwgl2pU-QxcB2w.webp 1360w" src="/blog/images/ocaml_news-1360w~wwcTXLhNmwgl2pU-QxcB2w.webp" alt="OCaml News"></p>
<p>These responses reinforced that there needs to be much more information on OCaml out there, and they give us a good place to start addressing this lack of information.</p>
<h2>Perceived Barriers to Adoption</h2>
<p>The survey reveals a spectrum of reasons why individuals have refrained from adopting OCaml. Time constraints and current comfort with other languages stand as significant factors. Additionally, syntax preferences and challenges in learning functional paradigms contribute to the reluctance to adopt OCaml.</p>
<p>Some participants express hesitancy due to their unfamiliarity with OCaml's strengths and weaknesses, a perceived lack of utility or clear advantages over their existing toolset, or doubts about its practicality in their field or industry. Others believe OCaml fits in a particular niche, has a smaller ecosystem and limited learning resources, or doubts the language's readiness for production.</p>
<p>These results validated that certain outdated misunderstandings require widespread clarification. Our intention is to release blog articles soon to tackle these common misconceptions.</p>
<h2>Coders' Thoughts on OCaml's Garbage Collection</h2>
<p>The survey responses indicated a variety of sentiments toward OCaml's use of garbage collection (GC). For many, GC was seen as a positive attribute, easing memory management and simplifying programming. Some felt it didn't influence their decision, being accustomed to GC in their work with other languages. Others saw it as a necessary feature or a non-issue, especially in higher-level programming environments.</p>
<p>There were reservations, too. A few respondents preferred languages without GC, expressing concerns about potential performance impacts or complexities in reasoning about memory. However, the general consensus leaned toward a neutral-to-positive view, indicating that while GC might affect performance in certain scenarios, it wasn't a decisive factor for most respondents in trying out OCaml.</p>
<h2>Tarides and the OCaml Community</h2>
<p>In collaboration with the OCaml community, Tarides is actively engaged in addressing the identified pain points. We acknowledge the these significant issues and are committed to implement effective solutions. Your input is invaluable, so please <a href="https://forms.gle/T7Ya6UUiZ6xTMisV8">fill out our short survey</a>. It is anonymous. Your participation not only amplifies your voice but also plays a pivotal role in our collective efforts to enhance and refine the OCaml language, its features, and its documentation.</p>
<p>The survey's nuanced insights illuminate both the growing curiosity around OCaml and the pivotal steps required to facilitate its wider adoption among programmers unfamiliar with its ecosystem and principles. Together, we're dedicated to fostering positive changes based on your invaluable feedback.</p>
<h2>Conclusion</h2>
<p>In conducting this survey aimed at understanding the perspectives of non-OCaml programmers, we've uncovered invaluable insights into the varied interests, concerns, and barriers surrounding the adoption of this functional programming language.</p>
<p>Diverse reasons for interest in OCaml, ranging from its powerful functional programming paradigm to its pragmatic approach and potential applicability across different problem domains, paint a promising picture for its appeal among developers seeking innovative paradigms. However, the survey has also highlighted crucial gaps in information regarding OCaml's real-world interface possibilities, learning resources, and practical applications, underscoring the need for comprehensive documentation and focused learning materials.</p>
<p>Despite certain reservations about its adoption due to factors like existing comfort with other languages, syntax preferences, and uncertainties about its practicality, the general sentiment remains positive. The findings provide a compelling foundation to address these misconceptions, with our forthcoming blog articles aiming to bridge these informational gaps and dispel any lingering uncertainties.</p>
<p>It's crucial to emphasise the limited exposure of this survey, primarily accessed by those within our social media circle. To broaden the scope and gather diverse perspectives, we encourage more individuals to participate in <a href="https://forms.gle/T7Ya6UUiZ6xTMisV8">the survey</a>. Your input is invaluable, so we invite you to voice your support or concerns. Your contributions will enrich our understanding and aid in shaping the future of OCaml. We look forward to reading your responses.</p>
]]></description><link>https://tarides.com/blog/2023-12-20-ocaml-survey-developers-perception-interest-and-perceived-barriers</link><guid isPermaLink="false">https://tarides.com/blog/2023-12-20-ocaml-survey-developers-perception-interest-and-perceived-barriers.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 20 Dec 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml: Memory Safety and Beyond]]></title><description><![CDATA[<p>Your choice of programming language matters. A <a href="https://www.nsa.gov/Press-Room/Press-Releases-Statements/Press-Release-View/Article/3608324/us-and-international-partners-issue-recommendations-to-secure-software-products/">recent press release</a> from the US National Security Agency (NSA), in tandem with the US Cybersecurity and Infrastructure Security Agency (CISA) alongside international cybersecurity agencies, urges the adoption of memory-safe programming languages for enhanced software security. We see a strong alignment between this global effort and Tarides’ principles and practices.</p>
<p>Our recent blog posts "<a href="/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application/">Your Programming Language and Its Impact on Cybersecurity</a>" and "<a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">Zero-Day Attacks: What Are They, and Can a Language Like OCaml Protect You?</a>" underscore the importance of selecting the right programming languages for robust cybersecurity measures.   However, while memory safety is crucial, it's only part of the picture. OCaml as a language not only addresses this need but goes beyond it with even stronger guarantees.</p>
<h2>Addressing the Core: Memory Safety</h2>
<p>OCaml stands out in the landscape of programming languages due to its unique combination of features that promote software reliability and security. It's a functional language with a strong static type system, making it inherently robust against many common software vulnerabilities. Its design encourages developers to write code that is not only efficient but inherently safer and easier to maintain.</p>
<p>One of the key strengths of OCaml is its type system, which eliminates a whole class of errors at compile time, significantly reducing the possibility of runtime errors and vulnerabilities. This aspect of OCaml is particularly valuable in developing critical systems where reliability is paramount. Moreover, OCaml's functional programming paradigm encourages immutability and stateless designs, further contributing to its robustness. These features make OCaml an ideal choice for developing applications where security and reliability are non-negotiable.</p>
<h2>The Remaining 30%: Beyond Memory Safety</h2>
<p>While memory safety is crucial, as it addresses approximately <a href="https://www.cisa.gov/news-events/news/urgent-need-memory-safety-software-products">70% of security bugs</a>,  OCaml's capabilities extend to the remaining 30% by offering robust solutions for the more nuanced and complex aspects of software security. Among these solutions are formal verification and unikernels – tools that Tarides actively combines for security-oriented customers like Nitrokey and their <a href="https://www.nitrokey.com/products/nethsm">NetHSM</a> products.</p>
<p><strong>Formal verification</strong> involves using mathematical and logical techniques to rigorously prove the correctness of software systems. This process ensures that programs perform as expected and adhere to specific, precisely defined specifications. Tools like Inria’s <a href="https://coq.inria.fr/">Coq</a> or Microsoft’s <a href="https://www.fstar-lang.org/">F*</a> – both systems developed in OCaml – can generate fully verified OCaml components like the <a href="https://github.com/mit-plv/fiat-crypto">cryptographic primitives</a> used in the <a href="https://mirage.io/">MirageOS</a> unikernel project. They enable developers to construct formal mathematical proofs, verifying the exact properties of code and ensuring its behaviour aligns with its intended function. Other tools like <a href="https://github.com/ocaml-gospel/gospel">GOSPEL</a> allow programmers to annotate OCaml programs with specifications that can be verified statistically or enforced at runtime.</p>
<p><strong>Unikernels</strong> are specialised, single-address-space machine images constructed by using library operating systems. By using OCaml to develop unikernels, such as in the <a href="https://mirage.io">MirageOS</a> project, application complexity and attack surface is greatly reduced. Unikernels strip down the traditional operating system to the bare minimum, removing unnecessary components that could be potential vectors for security breaches. This streamlined architecture not only enhances performance but significantly bolsters security by reducing the areas that malicious actors can exploit, leading to an order of magnitude reduction of the size of the deployed systems in production.</p>
<p>The combination of OCaml's memory safety features and built-in integration with a formal verification ecosystem, as well as the extreme specialisation of its unikernel compilation target via MirageOS, makes it an exceptional choice for developing high-assurance software for embedded and cloud deployments. This end-to-end approach to security tackles the prevalent issues of memory safety and the more subtle, complex vulnerabilities that require deeper mathematical rigour to resolve.</p>
<h2>Want to Give OCaml a Try? Contact us Today</h2>
<p>OCaml more than meets the NSA and CISA's <a href="https://media.defense.gov/2023/Dec/06/2003352724/-1/-1/0/THE-CASE-FOR-MEMORY-SAFE-ROADMAPS-TLP-CLEAR.PDF">criteria for memory safety</a>: it goes above and beyond, offering comprehensive solutions to a whole host of software security challenges. Its combination of functional programming with a strong type system and a rich formal verification ecosystem positions it as a powerful tool for secure software development. MirageOS and its unikernel based technology leverages these strengths to build small, secure, and efficient applications.</p>
<p>Embracing OCaml today means you are both responding to immediate security concerns and preparing for the complexities of future cybersecurity challenges. Tarides can help you leverage the power of OCaml to achieve a higher standard of software security and reliability.</p>
]]></description><link>https://tarides.com/blog/2023-12-14-ocaml-memory-safety-and-beyond</link><guid isPermaLink="false">https://tarides.com/blog/2023-12-14-ocaml-memory-safety-and-beyond.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Thu, 14 Dec 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[International Disability Day 2023: Why It Matters]]></title><description><![CDATA[<p>When I was in my early 20s, I developed a chronic illness that (amongst other things) has affected my mobility. Becoming disabled is an eye-opening experience, as it makes you acutely aware of two things: First, there are many ways in which society genuinely tries to help people with disabilities, and second, there are just as many (if not more) ways that society fails to accommodate disabled people. This is usually not done maliciously – or even consciously – but arises due to ignorance, lack of resources, or misinformation.</p>
<p>The (very!) positive implications are that raising awareness and dispelling myths can make a big difference. By highlighting ways to improve access and openly discussing the challenges that disabled people face, we can remove some of the barriers to improving accessibility.</p>
<p>In recognition of the recent <a href="https://www.un.org/en/observances/day-of-persons-with-disabilities">International Day of Persons With Disabilities</a>, Tarides wants to promote a greater understanding of the topic. They encouraged me to write a post highlighting some of the most common misconceptions about disabled people, along with ways that people and companies can improve accessibility in the workplace. Disabled people can and do make amazing contributions to society, and by improving accessibility we allow everyone to benefit from a more inclusive and diverse world of work.</p>
<h2>Why do we Need to Talk About Disabilities?</h2>
<p>Whether it be through media campaigns or designated courses, when people are better informed they are less likely to <a href="https://www.gov.scot/publications/works-reduce-prejudice-discrimination-review-evidence/pages/5/">believe misconceptions or perpetuate stereotypes</a>.</p>
<p>In turn, less stereotyping means less stigma for disabled people to deal with! It may sound like a trivial benefit, but stigma actually has several negative effects on the person or persons on the receiving end. Erving Goffman’s seminal work on stigma is still helpful today:</p>
<blockquote>
<p><a href="https://scholar.harvard.edu/files/matthewclair/files/stigma_finaldraft.pdf">For Goffman, stigma is a general aspect of social life that complicates everyday micro-level interactions—the stigmatized may be wary of engaging with those who do not share their stigma, and those without a certain stigma may disparage, overcompensate for, or attempt to ignore stigmatized individuals.</a>
As you can imagine, managing stigma takes an emotional and mental toll on a person, and reducing this burden has a tangible effect.</p>
</blockquote>
<p>Better information also helps non-disabled people be better allies! By knowing what challenges exist, non-disabled people become active advocates for accessibility improvements. Understanding the myths and misconceptions perpetuated against disabled people helps non-disabled people recognise and dispel them on a greater societal level.</p>
<h2>Common Misconceptions</h2>
<p>I recently held a presentation at Tarides where I addressed some common myths about disabilities:</p>
<ul>
<li><strong>Disabilities fall Into Neat Categories:</strong>
People tend to approach disabilities with a black-and-white perspective: if someone uses a hearing aid or signs, they must be completely deaf; if someone uses a wheelchair, they must be completely unable to use their legs, and so on.</li>
</ul>
<p>People who think like this are often quick to question the legitimacy of a disabled person who does not fit their narrow understanding of what a disability ‘should’ look like. In reality, disabled people are not a monolith, and we use aids in a variety of ways and for a variety of reasons.</p>
<p>For example, there are people who are <a href="https://www.thisismeagency.co.uk/ambulatory-wheelchair-users/">ambulatory wheelchair users</a> who can use their legs and even walk to some extent, but who still need to use wheelchairs. Remember, people are unique, as are their impairments. There is no ‘correct’ way to have a disability.</p>
<ul>
<li><strong>All Disabilities are Static or Unchanging:</strong>
Following on from the point above, many disabilities arise out of conditions that are variable. It follows that a person’s experience of their impairment may be more or less disabling on one day or occasion than another. Chronic illnesses vary in severity due to external factors such as weather, time of day, time of year, how much a person has exerted themselves on that day or week, levels of inflammation, underlying infections, and so on.</li>
</ul>
<p>Other conditions may be easier or harder to manage depending on the context: whether it is loud or quiet, whether temperatures are cold or hot, whether the surroundings are familiar or unfamiliar, whether it is dark or light, etc. The same person, managing the same condition, may present differently on different occasions.</p>
<ul>
<li><strong>You can Tell Whether Someone is Disabled:</strong>
Many disabilities are so-called <a href="https://hdsunflower.com/uk/insights/post/what-is-a-hidden-disability">hidden or non-visible disabilities</a>, meaning they are hard to detect simply by observing someone. Examples of hidden disabilities include cancers, diabetes, asthma, lupus, arthritic conditions, autoimmune conditions, long covid, chronic fatigue, fibromyalgia, epilepsy, etc. What many people don’t realise is that several mental health conditions <em>also</em> fall under the umbrella of hidden disabilities, including anxiety, OCD, PTSD, bipolar disorder, schizophrenia, depression, and so on.</li>
</ul>
<p>The <a href="https://social.desa.un.org/issues/disability/crpd/convention-on-the-rights-of-persons-with-disabilities-crpd">UN Convention On The Rights Of Persons With Disabilities</a> defines disabilities as:</p>
<blockquote>
<p>“<a href="https://www.un.org/development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities/article-1-purpose.html#:~:text=Persons%20with%20disabilities%20include%20those,an%20equal%20basis%20with%20others.">Persons … who have long-term physical, mental, intellectual or sensory impairments which in interaction with various barriers may hinder their full and effective participation in society on an equal basis with others.</a>”
This definition clearly covers a variety of people, many of whom will not necessarily be <em>visibly</em> disabled.</p>
</blockquote>
<p>Sadly, many people with hidden disabilities face difficulties when trying to access the public resources that they need, such as priority parking spaces, accessible toilets, and priority seats. We can all help make their lives easier by reminding ourselves and others that not all disabilities are visible.</p>
<h2>Disabilities and the Workplace</h2>
<p>Working is challenging enough without facing down stigma, misconceptions, and stereotypes. By promoting an inclusive workplace that celebrates diversity and rejects stereotypes, companies can improve the social experience of disabled people at work.</p>
<p>Concretely, there are several ways in which companies can improve both the physical and social aspects of work. Examples include:</p>
<ul>
<li>
<p><strong>Accessibility of the workplace:</strong>
Consider things like step-free access, accessible toilets, accessible staff rooms, etc. Smaller but still impactful changes include having a tactile map available, getting a portable ramp, having an evacuation plan for emergencies, etc.</p>
</li>
<li>
<p><strong>Accessibility of tools &amp; equipment:</strong>
Ensuring that you provide the correct tools to your employees is crucial. If someone has a disability, they may need an adapted tool or assistive technology. Tarides provides employees with the tools we need to do our work, including equipment like adapted chairs, keyboards, and technology.</p>
</li>
<li>
<p><strong>Flexible working schedules:</strong>
When someone has a disability they may benefit from flexible working hours to accommodate rest breaks, treatments, or appointments. For example, at Tarides we manage our own time and contracts are flexible to accommodate individual needs. This goes for all employees but is really beneficial for those of us with disabilities.</p>
</li>
<li>
<p><strong>Training:</strong>
Organising regular training sessions at different levels that raise awareness about disabilities, educate on the importance of access, and share best practices on accessibility are important to help employees, managers, and leadership create a better working environment. I recently gave a presentation at Tarides for just this purpose – it was great to have the opportunity to do so and help educate my colleagues on the topic.</p>
</li>
<li>
<p><strong>Work events:</strong>
Making every work event accessible requires early planning to ensure that the access needs of each participant are met. It is good practice to provide advanced information about venues and transport at the event, giving the disabled person notice so that they know what to expect on the day.</p>
</li>
<li>
<p><strong>Feedback:</strong>
Of course, we are all human and we all make mistakes. That’s why having sound systems in place for employees to give feedback is so beneficial. When workplaces have safe and clear communication channels, employees feel more comfortable sharing their concerns and suggesting improvements.</p>
</li>
</ul>
<p>This is an area where allies can make a big difference: non-disabled people can help advocate for improvements to the accessibility of their workplace, amplifying the messages of their disabled colleagues. No one likes to feel that they are ‘causing a fuss’, so hearing that other people agree with you is encouraging and reassuring.
It is also important for leadership and managers to take a proactive approach, signposting their willingness and want to accommodate the needs of their team members. Rather than plucking up the courage to start the conversation, the disabled person is invited to discuss their access needs. Since circumstances change, having an ongoing conversation about access is key to ensuring that someone has the support they need.</p>
<h2>Tarides’ Commitment</h2>
<p>Tarides values diversity and inclusion and is committed to encouraging it in our communities and ecosystems. To make good on our commitment we provide flexible hours and contracts, adaptive technology and equipment, feedback opportunities, and training. We also employ inclusive hiring and promotion practices and sponsor several internships <a href="https://www.outreachy.org/">targeting marginalised groups</a>.</p>
<p>Tarides also supports the further development of <a href="https://ocaml.org/">OCaml.org</a> and accessibility is a big piece of that puzzle. We want to diversify and simplify the ways that people learn and engage with OCaml, supporting the sustainable and inclusive growth of the community.
If you want to give us feedback on how we can improve you can <a href="/contact/">contact us</a> on our website. You can also <a href="https://bsky.app/profile/tarides.com">follow us on Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIN</a>.</p>
<h3>Sources</h3>
<ul>
<li><a href="ttps://social.desa.un.org/issues/disability/crpd/convention-on-the-rights-of-persons-with-disabilities-crpd">UNCRPD 2006</a></li>
<li><a href="https://www.europarl.europa.eu/RegData/etudes/IDAN/2017/603981/EPRS_IDA(2017)603981_EN.pdf">European Disability Policy</a></li>
<li><a href="https://www.who.int/news-room/fact-sheets/detail/disability-and-health">WHO Disability and Health</a></li>
<li><a href="https://www.acas.org.uk/disability-at-work">Disability in the Workplace</a></li>
<li><a href="https://www.w3.org/">Onlice Accessibility</a></li>
</ul>
]]></description><link>https://tarides.com/blog/2023-12-05-international-disability-day-2023-why-it-matters</link><guid isPermaLink="false">https://tarides.com/blog/2023-12-05-international-disability-day-2023-why-it-matters.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 05 Dec 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[How to Install OCaml 5: A Video Tutorial]]></title><description><![CDATA[<p>Have you had difficulty installing OCaml for your projects? We have created a <a href="https://youtu.be/sy4EQirNMUI">video tutorial</a> showing you how to use Opam to install OCaml on Linux and macOS.</p>
<p>All you need to complete the tutorial is a computer running either Linux or macOS, and an internet connection. By the end of the tutorial you will have OCaml 5 installed on your machine, and can start your journey with OCaml. For ideas of what to do next, check out <a href="https://ocaml.org/docs/tour-of-ocaml">the tutorials on OCaml.org</a>.</p>
<p><a href="https://www.youtube.com/watch?v=sy4EQirNMUI" title="How to Install OCaml on Linux and macOS"><img src="https://img.youtube.com/vi/sy4EQirNMUI/0.jpg" alt="OCaml Tutorial – Downloading and Installing OCaml on Linux and macOS"></a></p>
<p>Our goal is to promote and encourage wider adoption of OCaml, and to that end we support the <a href="https://ocaml.org/docs/platform">OCaml Platform</a>. The Platform gives users an overview of the active tools available in OCaml, streamlining their experience in setting up an OCaml toolsuite. The video uses the recommended toolchain outlined in the OCaml Platform.</p>
<p>As a part of encouraging the adoption of OCaml, we are committed to simplifying the process of learning the language. We recognise that everyone learns differently. Some people prefer to read tutorials and do exercises, and others use videos to understand concepts visually. Since most of us combine different ways of learning to fit our needs, it’s important for us to provide a visual alternative to the <a href="https://ocaml.org/install">installation instructions</a>.</p>
<p>There are several great content creators making videos about OCaml, including streamers that let you follow along as they are hacking. We are excited to see the community growing in this way, and welcome this contribution to the OCaml ecosystem. If you would like to engage more with the OCaml community check out <a href="https://discuss.ocaml.org/">OCaml Discuss</a>.</p>
<p>Depending on whether users find this kind of content helpful, we will continue to create more tutorials. So let us know what we can improve and if there are topics you would like us to cover next! You can comment directly on the video, or chat with us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> or <a href="https://www.linkedin.com/company/tarides">LinkedIN</a>.</p>
]]></description><link>https://tarides.com/blog/2023-11-21-how-to-install-ocaml-5-a-video-tutorial</link><guid isPermaLink="false">https://tarides.com/blog/2023-11-21-how-to-install-ocaml-5-a-video-tutorial.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 21 Nov 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Hacking Day in Chennai!]]></title><description><![CDATA[<p>It is a truth universally acknowledged, that a programmer in possession of a free Saturday afternoon must be in want of a good old OCaml hacking session. To address this pressing need, we recently hosted another OCaml Hacking Day event – this time at our Tarides India office in Chennai. We had a total of twelve hackers with varying levels of experience participating. Everyone gathered for talks, snacks, mingling, and most excitingly some hands-on hacking!</p>
<h2>A Talk and a Welcome</h2>
<p>People began to arrive at the office at 14.30, getting themselves situated in the dining room (probably admiring all the beautiful plants on the way). To kick things off, KC Sivaramakrishnan gave a talk welcoming the participants and introducing them to OCaml and Tarides.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/KCtalk-170w~X986KGr7zTtA2ae48j0h5w.webp 170w, /blog/images/KCtalk-340w~zLD8Hzsd2eZppqFqMY41Cw.webp 340w, /blog/images/KCtalk-680w~oTXFlNqyqQB-8AIVYvYeqg.webp 680w, /blog/images/KCtalk-1360w~VtA-pxettVxA5OPh9SXQUw.webp 1360w" src="/blog/images/KCtalk-1360w~VtA-pxettVxA5OPh9SXQUw.webp" alt="KC is sitting on a bar stool with his laptop open in front of him on the right hand side of the image. He is speaking and making a gesture with his hands, looking to his right towards his presentation slide. The slide is at the centre of the image, and it says OCaml Hacking Today with some smaller bullet points beneath the heading. There are two other people in the image also looking at the presentation slide, both sitting on chairs on the right hand side behind KC."></p>
<p>For the uninitiated: we’re a tech company of over 70 people from across the world, in countries including Denmark, Spain, USA, Australia, and Finland. This is in addition to our three main offices in France, India, and the UK. KC further explained how we focus on building robust and high-performing systems in OCaml. In a nutshell, this includes working on big features like <a href="/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations/">OCaml Multicore</a>; the <a href="https://ocaml.org/docs/platform">OCaml Platform</a> with build, package, editor, and documentation tooling; cutting-edge technologies like <a href="https://mirage.io">Mirage</a> and <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a>; and finally on community-focused projects including <a href="https://ocaml.org">OCaml.org</a> and the maintenance of essential OCaml libraries.</p>
<p>With introductions out of the way, KC moved on to the heart of what it means to contribute to open-source projects. He likened it to maintaining a public garden, a useful simile that emphasises the key aspects that make open-source ecosystems successful. In open-source, you need to work collaboratively, balancing the needs and wants of the many with your own interests. Open-source community projects are designed to benefit not just the contributors but also others in the community who may not be part of that particular effort. Projects also (hopefully) outlast just one contributor: as more people join and care about something, it takes on a life of its own. Finally, open-source community projects require that participants allot significant amounts of time toward maintenance.</p>
<p>Public gardens also require a sense of community planning and collective responsibility. No one person can (hopefully) put down a pool or an ice rink in a public park just because they want to, it has to benefit the entire community that uses it. For public gardens to retain their value, a group of people have to take on necessary but unglamorous tasks: weed pulling, leaf raking, bulb planting, and so on. A garden that’s properly maintained and cared for can bring value to an entire community for multiple generations.</p>
<p>This idea is particularly impactful for people who are used to working on code in individual projects that have distinct ‘beginnings’ and ‘ends’ – for example university students who work on things in class. The public garden simile helps us understand the traits that define open-source systems and make them beneficial.</p>
<p>Lastly, KC encouraged everyone to get stuck into some hacking – inspiring several participants to make their first open-source contribution during the day!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/terracehacking-170w~IbN4UrrIqw1-xfS-jSQ-rw.webp 170w, /blog/images/terracehacking-340w~lL5f_hArp6C0zUJDkGCOdA.webp 340w, /blog/images/terracehacking-680w~RxQgY0MWCMBhbirpPtlcvw.webp 680w, /blog/images/terracehacking-1360w~S8HXUn1a1-lAcoBW0sXxmw.webp 1360w" src="/blog/images/terracehacking-1360w~S8HXUn1a1-lAcoBW0sXxmw.webp" alt="Five people are sitting outside on a large covered roof terrace. They are gathered around two coffee tables with their laptops. Three of them are having a discussion, and the two others are looking at a laptop and focusing on that. There are several potted plants along the railing of the terrace, and we can see the wall of a white house in the background as well as a tree."></p>
<h2>Get Hacking!</h2>
<p>To make things as easy as possible for the participants, the engineers at the Chennai office made a great <a href="https://docs.google.com/spreadsheets/d/1EYHH2aHITp4L6fHwiVSxCNVe1fczK6zsNhmDt7AWSvM/edit#gid=0">list of projects</a> suitable for different experience levels. Each project had a description, a difficulty level from one to five, and any extra references that could be useful. Everyone would pick a project and get started, with Sudha, Shakthi, and KC on standby to help out if anyone got stuck.</p>
<p>An impressive amount of contributions came out of the Hacking Day, with several pull requests created by the end of the session:</p>
<ul>
<li><a href="https://github.com/ocaml-multicore/effects-examples/pulls?q=is%3Apr+is%3Aclosed">Multicore Tutorial on OCaml Effects</a>: There were several fixes to the Multicore tutorials on effects, updating links and information to accurately reflect current information after the 5.1.0 update.</li>
<li><a href="https://github.com/ocaml-multicore/parallel-programming-in-multicore-ocaml/pull/19">Multicore Tutorial for parallel programming</a>: This PR adds a new section describing how to use <a href="https://github.com/tarides/runtime_events_tools">Olly</a> for instrumentation if the user is running OCaml version 5.0 or later.</li>
<li><a href="https://github.com/ocaml-multicore/ocaml5-tutorial/pull/11">Updated OCaml 5 readme</a>: An update to the OCaml 5 readme to install OCaml 5.1.0.</li>
<li><a href="https://github.com/tarides/runtime_events_tools/pull/28">Runtime Event Tools</a>: Added more information about GC-Stats to the Olly Manual Page.</li>
<li>TSan: One participant used TSan to identify some races in <a href="https://github.com/ocaml-multicore/saturn">Saturn’s</a> test-suite!</li>
<li><a href="https://github.com/ocaml/ocaml/pull/12683">OCaml Trunk</a>: This PR adds a <code>noalloc</code> annotation to a function. This optimisation makes the function run faster and will help in circumstances where it is being called many times in one program.</li>
<li><a href="https://github.com/ocaml-multicore/domainslib/pull/120">Domainslib</a>: Adds parallel versions of commonly used Array iterators.</li>
</ul>
<h2>Snacks &amp; Swag</h2>
<p>We couldn’t let our visitors leave hungry or empty-handed! There was a delicious assortment of samosas in the afternoon and of course pizzas in the evening. Having food together was a great opportunity for people to have discussions and share their experiences even after the main hacking was done.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/eatingfood-170w~OQ3-6mL9nuSr89dAUg8dJA.webp 170w, /blog/images/eatingfood-340w~EW113-GJazNABGtfqlcApQ.webp 340w, /blog/images/eatingfood-680w~LWBAkAMerlI_h4sYcRZkMA.webp 680w, /blog/images/eatingfood-1360w~KCxDM8VoKe9ZJsIZrsTPzg.webp 1360w" src="/blog/images/eatingfood-1360w~KCxDM8VoKe9ZJsIZrsTPzg.webp" alt="Five women are seated around a dining table on the terrace having food. The angle is from head of the table looking at the other end. One woman is looking into the camera and smiling. In the background are two young children playing. The wall behind the table is white, and to the right is a banister behind which is the white wall of another house."></p>
<p>We also gave attendants an assortment of swag to take home – including a t-shirt, custom keyrings, stickers, and flyers.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/swagchennai-170w~BBPJrlTXq-6poSjERPajXA.webp 170w, /blog/images/swagchennai-340w~_nOhpanHwjbY9VLAZ2jYpg.webp 340w, /blog/images/swagchennai-680w~GeuZqAJhZP7L7nbaN5UgUw.webp 680w, /blog/images/swagchennai-1360w~0X_G7g8zfX5gP4ny6lzq3w.webp 1360w" src="/blog/images/swagchennai-1360w~0X_G7g8zfX5gP4ny6lzq3w.webp" alt="A satchel type back with a black-and-white chevron design and a brown handle and shoulder strap lies on a white table. Attached to the bag is an OCaml keychain,which looks like a black square with the orange OCaml camel silhouette on top. Next to the back is a set of keys which also have the same keychain attached."></p>
<h2>Until Next Time!</h2>
<p>It was great to see such interest in OCaml and open-source programming – hopefully it has given everyone a taste for more. We look forward to seeing more people attend our Hacking Days in the future, so keep an eye out for future announcements. You should follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIN</a> to stay up-to-date with what we’re up to as well as when we host events.</p>
]]></description><link>https://tarides.com/blog/2023-11-09-ocaml-hacking-day-in-chennai</link><guid isPermaLink="false">https://tarides.com/blog/2023-11-09-ocaml-hacking-day-in-chennai.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 09 Nov 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[WebAssembly Support for OCaml: Introducing Wasm_of_Ocaml]]></title><description><![CDATA[<p>OCaml is constantly evolving. Developers collaborate to bring support for new features, improve workflows, and resolve pain points. To this end, the <a href="https://github.com/ocaml-wasm">OCaml-Wasm</a> GitHub organisation was recently established. Its goal is to bring WebAssembly support to OCaml.</p>
<p>WebAssembly, more commonly known as Wasm, is a low-level virtual machine that is both language- and platform-independent. In essence, Wasm is a binary format designed as a portable compilation target for programming languages. It enables deployment on a variety of platforms like web browsers, cloud, blockchain, and mobile.</p>
<p>Wasm makes no assumptions about language features or operations. As a result, Wasm is technically compatible with any programming language since its code can be interpreted as virtual hardware.</p>
<h2>Why WebAssembly</h2>
<p>You might be wondering why you should care about Wasm support in OCaml. Well, Wasm has several advantages that make it popular with developers and organisations all over the world. Briefly, they are:</p>
<ul>
<li><strong>Security:</strong> Wasm has a fully formalised type system and semantics. Wasm engines validate (type check) code and execute it in a memory-safe, isolated environment known as a sandbox. Wasm code performs predictably, with no crashes or unexpected actions, and within restrictions that limit access to the user's local resources.</li>
<li><strong>Speed:</strong> Wasm can take advantage of language implementation techniques like just-in-time (JIT) and ahead-of-time (AOT) compilation, alongside other capabilities common to contemporary hardware. It allows Wasm code to perform at near-native code levels of performance.</li>
<li><strong>Openness:</strong> Wasm is an open standard, meaning that it and all its documentation are openly and freely accessible to developers. Anyone can influence the evolution of Wasm by participating in the <a href="https://www.w3.org/community/webassembly/">W3C Community Group</a>.</li>
<li><strong>Language Neutrality:</strong> As previously mentioned Wasm works by abstracting hardware and doesn't make any assumptions about language features. It makes Wasm language-neutral, meaning it does not privilege any language, programming, or object model above another.</li>
<li><strong>Platform Independence:</strong> Wasm can be built and deployed on different platforms regardless of the OS, hardware, or programming language as long as the Wasm virtual machine is supported.</li>
<li><strong>Browser Support:</strong> Wasm is supported by all major browsers including Chrome, Mozilla Firefox, and Safari.</li>
</ul>
<h3>Who is Using Wasm?</h3>
<p>Currently, several companies and organisations use Wasm. For example, the cross-platform game engine  <a href="https://unity.com">Unity</a> is <a href="https://blog.unity.com/technology/webassembly-is-here">using Wasm</a> to reduce code size, manage memory, and improve load times.  <a href="https://docs.fastly.com">Fastly</a> also uses Wasm. Fastly is a company that offers numerous Network Services for their <a href="https://docs.fastly.com/products/compute-at-edge">Compute@Edge</a> platform. <a href="https://www.figma.com">Figma</a>, an online collaborative design platform, also uses Wasm to <a href="https://www.figma.com/blog/webassembly-cut-figmas-load-time-by-3x/">cut their loading times</a>. These are just a few examples of how Wasm is being used to great effect, illustrating the potential and desirability of Wasm.</p>
<h3>Future Features</h3>
<p>The current <a href="https://webassembly.github.io/spec/core/">Wasm core specification</a>, whilst very useful for performance-critical tasks deployed on the cloud, is still quite simplistic. Users need to go through JavaScript to manipulate the DOM and also need to explicitly keep track of pointers to JavaScript values. Consequently, it is currently not feasible to write large web applications in Wasm.</p>
<p>That is all about to change as there are multiple proposals to bring new features to Wasm. The most relevant to OCaml is the <a href="https://github.com/WebAssembly/gc/blob/main/proposals/gc/MVP.md">Garbage Collection proposal</a> which will provide heap-allocated data structures that are garbage collected and can directly contain references to foreign values. It is being implemented together with the <a href="https://github.com/WebAssembly/function-references">typed function references</a> proposal. They are expected to ship in November on both Chrome and Firefox. Another proposal includes support for <a href="https://github.com/WebAssembly/tail-call">tail calls</a>. These forward-looking features will make Wasm applicable for an even wider range of uses.</p>
<h2>WebAssembly and OCaml</h2>
<p>With all these benefits and future potential, it's not hard to see why the community is eager to see support for Wasm in OCaml. Using Wasm as a compilation target will allow for faster web performance (in comparison to JavaScript) as well as potentially unlocking new platforms on which to run OCaml.  The <a href="https://github.com/ocaml-wasm">OCaml-Wasm</a> organisation is bringing previous efforts together to collaborate on implementing WebAssembly for OCaml.</p>
<p>There are two main prongs of the effort at the moment. One is <code>wasocaml</code>, an <a href="https://ocamlpro.com/blog/2022_12_14_wasm_and_ocaml/">experimental compiler backend</a> that targets Wasm using the Flambda intermediate representation of the OCaml compiler.  Engineers at <a href="https://ocamlpro.com">OCamlPro</a> are behind this approach and you can <a href="https://github.com/ocaml-wasm/wasocaml">check out the repo here</a>.</p>
<p>The other approach uses a toolchain to compile OCaml to Wasm based on the tried-and-tested <a href="https://github.com/ocsigen/js_of_ocaml"><code>js_of_ocaml</code></a> method. Called <a href="https://github.com/ocaml-wasm/wasm_of_ocaml"><code>wasm_of_ocaml</code></a>, this toolchain takes OCaml bytecode as input and emits Wasm.</p>
<p>It is relevant to mention two other methods created to run OCaml programs using Wasm runtimes. These methods are appropriate for use cases where the speed of generated code is less of a concern, and differ from <code>wasocaml</code> and <code>wasm_of_ocaml</code> by being mainly intended for server-side applications. Both <a href="https://github.com/sebmarkbage/ocamlrun-wasm"><code>ocamlrun-wasm</code></a> and <a href="https://github.com/remixlabs/wasicaml"><code>wasicaml</code></a> are ports of the OCaml bytecode interpreter to Wasm. Wasicaml also has a compiler mode that parses bytecode executable and translates it to Wasm in a similar way to <code>wasm_of_ocaml</code>, but simpler.</p>
<p>Since <code>wasm_of_ocaml</code> was and continues to be developed mainly by Tarides engineers, this article will focus on this tool. To get more information about <code>wasocaml</code>, visit <a href="https://ocamlpro.com/blog/">OCamlPro's blog</a>.</p>
<h2>Wasm_of_OCaml</h2>
<p>As previously mentioned, <code>wasm_of_ocaml</code> is designed to use OCaml bytecode as its input to emit Wasm code. It uses the same approach as the popular  <code>js_of_ocaml</code>, which in turn compiles OCaml bytecode to JavaScript. <code>wasm_of_ocaml</code> also aims to make it possible to compile programs made for <code>js_of_ocaml</code> in <code>wasm_of_ocaml</code> with minimal changes. Both <code>js_of_ocaml</code> and <code>wasm_of_ocaml</code> are the brainchildren of Jérôme Vouillon, currently a principal software engineer at Tarides.</p>
<p>Recently, there have been significant strides towards implementing runtime bindings in <code>wasm_of_ocaml</code>. The toolchain can now compile <code>ocamlc</code> into Wasm and run the <a href="https://github.com/janestreet/bonsai">Bonsai tests and examples</a>. The first benchmarks are encouraging, with compiled programs typically running an average of 30% faster than their <code>js_of_ocaml</code> equivalents.</p>
<p>With a large part of the <a href="https://github.com/ocaml-wasm/wasm_of_ocaml/issues/5">OCaml runtime already implemented</a>, there are several additional PRs in the works to get Wasm supported in <a href="https://github.com/ocaml/dune/pull/8278"><code>dune</code></a>, <a href="https://github.com/LexiFi/gen_js_api/pull/173"><code>gen_js_api</code></a>, and <a href="https://github.com/dbuenzli/brr/pull/51"><code>Brr</code></a>. On the whole, <code>wasm_of_ocaml</code> is getting impressively close to completion thanks to the sustained efforts of Jérôme.</p>
<p>The process is not entirely without challenges, and some adaptations have had to be made for OCaml and Wasm to be compatible. For example, although <code>wasm_of_ocaml</code> builds on the <code>Js_of_ocaml</code> compiler to target Wasm, it still needed some extra adjustments regarding closures. JavaScript supports closures whereas Wasm doesn't, so <code>wasm_of_ocaml</code> adds a closure conversion phase to eliminate closures and instead target Wasm's closed functions.</p>
<p>There is also the need to consider support for effects, a feature new to OCaml since the <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">OCaml 5 release</a>. Algebraic effects, which permit non-local control flow in a program and are useful for implementing concurrency, are supported in <code>js_of_ocaml</code> through a static analysis guided <a href="https://github.com/ocsigen/js_of_ocaml/pull/1384">selective Continuation-Passing Style (CPS)</a> transformation. <code>Wasm_of_ocaml</code> supports effect handlers in two ways, one of which is via a CPS transformation like in <code>js_of_ocaml</code>. A CPS transformation introduces overhead however, and the feature is opt-in only. The second way is through the <a href="https://v8.dev/blog/jspi">Javascript Promise Integration proposal</a>, which introduces less overhead than the CPS transformation at the cost of running another extension. Interestingly, a <a href="https://2023.splashcon.org/details/splash-2023-oopsla/48/Continuing-WebAssembly-with-Effect-Handlers">paper proposing support for effect handlers in Wasm</a> and a <a href="https://github.com/WebAssembly/stack-switching">stack switching proposal</a> present other paths to addressing effect handlers in Wasm.</p>
<h2>Next Steps &amp; Feedback</h2>
<p>It is incredibly helpful for the team to get feedback from people who try <code>wasm_of_ocaml</code>. You can contribute to the effort on <a href="https://github.com/ocaml-wasm/wasm_of_ocaml">the repo</a>, which also has instructions for how to install and use the toolchain.</p>
<p>Don't hesitate to <a href="/contact/">contact us</a> if you have any questions or comments. The <a href="https://discuss.ocaml.org">Discuss Forum</a> is another great place to ask questions or share your thoughts. We look forward to seeing you around the OCaml community!</p>
]]></description><link>https://tarides.com/blog/2023-11-01-webassembly-support-for-ocaml-introducing-wasm-of-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2023-11-01-webassembly-support-for-ocaml-introducing-wasm-of-ocaml.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 01 Nov 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tutorial: Building a Browser Extension With Irmin]]></title><description><![CDATA[<p><a href="https://irmin.org">Irmin</a> is a collection of OCaml libraries that makes it easy to build applications with Git-like data stores. We <a href="https://github.com/mirage/irmin/releases/tag/3.9.0">recently released</a> <code>irmin-client</code> and <code>irmin-server</code> as official Irmin packages. These packages open up a new way to use Irmin by implementing a custom protocol that lets you write a client application that can interact with a remote data store as if it is local using the <a href="https://mirage.github.io/irmin/irmin/Irmin/module-type-S/index.html">Irmin Store API</a>.</p>
<p>In addition to creating a <a href="https://github.com/mirage/irmin/tree/main/examples/server">simple example</a>, we also thought it would be fun to build a browser extension that demonstrates a real-life application of these packages and the portability of <a href="/blog/2022-08-02-irmin-in-the-browser/">Irmin in the browser</a>. We created <a href="https://github.com/tarides/irmin-bookmarks"><code>irmin-bookmarks</code></a>, a browser extension for saving bookmarks in a Git repository. This post gives an overview of the project!</p>
<h2>Creating a Browser Extension</h2>
<p>At the core of a browser extension is its <code>manifest.json</code>. This is the primary metadata for the extension that tells the browser about the extension: its name, its icons, what permissions it needs, what extension features it uses, etc.</p>
<p>Here is the <a href="https://github.com/tarides/irmin-bookmarks/blob/main/extension/manifest.json"><code>manifest.json</code></a> for <code>irmin-bookmarks</code>:</p>
<pre><code class="language-javascript">{
  "manifest_version": 2,
  "name": "Irmin Bookmarks",
  "version": "1.0",
  "description": "Save bookmarks to a local git repository. Powered by Irmin.",
  "icons": {
    "48": "icons/icon.png",
    "96": "icons/icon@2x.png"
  },
  "options_ui": {
    "page": "options/index.html"
  },
  "permissions": [
    "storage",
    "tabs"
  ],
  "browser_action": {
    "default_icon": "icons/icon@2x.png",
    "default_title": "Add bookmark!",
    "default_popup": "popup/index.html"
  }
}
</code></pre>
<p>We will focus on the <code>browser_action</code> key in this article, but you can read more about the keys in this file <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/manifest.json">on MDN</a>. Note: this key has been renamed to <code>action</code> in version 3 of the manifest specification; we are using version 2 for the widest browser compatibility.</p>
<p>The <code>browser_action</code> key defines the look and behaviour of the button that represents our extension. We want the UI to display when our button is clicked, so we set <code>default_popup</code> to an HTML page that will display our UI for adding a bookmark.</p>
<p>The UI for adding a bookmark looks like this:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/uiforbookmark-170w~3hICj7zw1VD9m3F94_fEuA.webp 170w, /blog/images/uiforbookmark-340w~KXmrzNUskLYRZpj6591D0A.webp 340w, /blog/images/uiforbookmark-680w~gYKgQESAudD1PQW3mVZx5A.webp 680w, /blog/images/uiforbookmark-1360w~hD4G3uNK57QhSlbXdZ0mug.webp 1360w" src="/blog/images/uiforbookmark-1360w~hD4G3uNK57QhSlbXdZ0mug.webp" alt="UI for the menu of saving a bookmark in the browser, shows a pop-up card with the option to click 'save' the current webpage to bookmarks. It also lets users name and add notes to the bookmark."></p>
<p>The HTML page for the UI for adding a bookmark, <code>popup/index.html</code>, has a simple body definition:</p>
<pre><code class="language-html">&lt;body&gt;
  &lt;div id="ui"&gt;&lt;/div&gt;
  &lt;script src="popup.js"&gt;&lt;/script&gt;
&lt;/body&gt;
</code></pre>
<p>The UI is created and managed through <code>popup.js</code>, which is compiled from the following OCaml code to JavaScript using <a href="https://github.com/ocsigen/js_of_ocaml"><code>js_of_ocaml</code></a>:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> extension/popup/popup.ml </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Shared</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">main</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Syntax</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">tab</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Browser</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">tabs</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Tabs</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">active</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">name</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Tab</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">title</span><span class="ocaml-source"> </span><span class="ocaml-source">tab</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">url</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Tab</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">url</span><span class="ocaml-source"> </span><span class="ocaml-source">tab</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">created_at</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Date</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">now</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Model</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> ~</span><span class="ocaml-source">created_at</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-source"> ~</span><span class="ocaml-source">url</span><span class="ocaml-source"> ~</span><span class="ocaml-source">notes</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">client</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Client</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">connect</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Ui</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">bind</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Document</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">on_content_loaded</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">async</span><span class="ocaml-source"> </span><span class="ocaml-source">main</span><span class="ocaml-source">
</span></code></pre>
<p>We only need a subset of the browser and extension APIs, so we wrap what we need using <a href="https://github.com/dbuenzli/brr"><code>Brr</code></a> in <a href="https://github.com/tarides/irmin-bookmarks/blob/main/extension/shared/ext.ml">one shared file</a>. Here is a snippet from this file that shows how to bind to the <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/tabs"><code>tabs</code></a> browser extension API, as used above:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> snippet from extension/shared/ext.ml </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Tab</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">title</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">title</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_string</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">url</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">url</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_string</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Browser</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">global</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">browser</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">tabs</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Tabs</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">tabs</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>The core UI code for the popup is in <a href="https://github.com/tarides/irmin-bookmarks/blob/main/extension/popup/ui.ml"><code>extension/popup/ui.ml</code></a>. Since our UI is not that complicated, the code implements rendering as a simple recursive function based on the state of the UI that calculates the appropriate DOM elements and replaces them as-needed in the <code>&lt;div id="ui"&gt;&lt;/div&gt;</code> element of our HTML page:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> render snippet from extension/popup/ui.ml </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">render</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">+</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">elems</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Disconnected</span><span class="ocaml-source"> </span><span class="ocaml-source">e</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">return</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">msg</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">error</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">e</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Connected</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">async</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">+</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">saved_model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Client</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">load</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">            </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">saved_model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">m</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">m</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-source">render</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Loaded</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">error</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">return</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">msg</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">info</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Loading...</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loaded</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">error</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ui</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-source">header</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">form</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">                 </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">async</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">                 </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Client</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">save</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">                 </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">                 </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ok</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Window</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">close</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">return</span><span class="ocaml-source">
</span><span class="ocaml-source">                 </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-source"> </span><span class="ocaml-source">error</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">                     </span><span class="ocaml-source">render</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Loaded</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">error</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">error</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">error</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">ui</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">err</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">msg</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">error</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">err</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">ui</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">return</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ui</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Document</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">lookup_by_id</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ui</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">elems</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Brr</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">El</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_jv</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">of_list</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Jv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">call</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Brr</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">El</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_jv</span><span class="ocaml-source"> </span><span class="ocaml-source">ui</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">replaceChildren</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>You can see references in the extension code to <code>Model</code> and <code>Client</code>. We now turn to the core part of the extension: writing our integration with <code>irmin-client</code> and <code>irmin-server</code>!</p>
<h2>Creating Our Client and Server</h2>
<p>Like the rest of Irmin, <code>irmin-client</code> and <code>irmin-server</code> are libraries meant to be used in applications:</p>
<ul>
<li><code>irmin-server</code> lets you write a server application that exposes an Irmin store's API using a custom protocol via an HTTP or WebSocket connection.</li>
<li><code>irmin-client</code> lets you build a client application that can connect to a remote Irmin store served by <code>irmin-server</code>.</li>
</ul>
<p>The client and server are wrappers around Irmin stores, so the first step is to decide how to set up our store. When creating an Irmin store, you need to make some choices:</p>
<ul>
<li>Which Irmin backend do I want to use?</li>
<li>What is the content type for my store?</li>
</ul>
<p>For <code>irmin-bookmarks</code>, we chose to use <a href="https://mirage.github.io/irmin/irmin-git/index.html"><code>irmin-git</code></a> as our backend since we wanted our bookmarks stored in a Git repository for easy backing up and sharing to a remote Git host. Some other backends that Irmin provides are:</p>
<ul>
<li><code>irmin-fs</code> for simple filesystem-based storage</li>
<li><code>irmin-pack</code> for <a href="/blog/2023-05-05-optimising-archive-node-storage-for-tezos/">optimised storage of large amounts of data</a> with features like <a href="/blog/2022-11-10-towards-minimal-disk-usage-for-tezos-bakers/">garbage collection</a></li>
<li><code>irmin-mem</code> for in-memory-only storage</li>
</ul>
<p>To create our server module, we only need the following <a href="https://github.com/tarides/irmin-bookmarks/blob/90f2cbd9d04ddb3327ccf638b227521637d5799b/server/main.ml#L2-L4">three lines of code</a>:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Store</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_git_unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">FS</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">KV</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Model</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Codec</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_server</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Conn</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Codec</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Bin</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Server</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_server_unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make_ext</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Codec</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Store</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>The first line creates our Irmin store: a key-value (<code>KV</code>) store that persists to disk (<code>FS</code>) in a Git-compatible repository (<code>Irmin_git_unix</code>). We will discuss our custom content type, <code>Model</code>, later. The second line defines the wire encoding for communication between our client and server. We choose a binary encoding (<code>Codec.Bin</code>), but JSON is also available. The final line uses our codec and store to create a server that can bind to a local port for our client to connect. For the complete server binary, see <a href="https://github.com/tarides/irmin-bookmarks/blob/main/server/main.ml"><code>server/main.ml</code></a>.</p>
<p>A <a href="https://github.com/tarides/irmin-bookmarks/blob/90f2cbd9d04ddb3327ccf638b227521637d5799b/extension/shared/client.ml#L1-L9">few more lines of code</a> are required to setup our client, but not many!</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Store</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Git_impl</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Mem</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sync</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Mem</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Sync</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Git_impl</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Maker</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">KV</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Git_impl</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Sync</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">include</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Maker</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Model</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Codec</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_server</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Conn</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Codec</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Bin</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Client</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_client_jsoo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make_codec</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Codec</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Store</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Our client compiles to JavaScript via <code>js_of_ocaml</code> as a part of our browser extension, so our store setup looks a little different from the server. Instead of using a filesystem-backed Git repository, we use the in-memory Git implementation (<code>Irmin_git.Mem</code>). It is the same setup as our server: a key-value store backed by an in-memory Git repository. In the last line, we create our client for the browser using our code and store. Note that we use <code>Irmin_client_jsoo</code> since we are compiling for the browser (<code>jsoo</code> is shorthand for <code>js_of_ocaml</code>).</p>
<p>That's all there is to setting up the core server and client! Now let's take a look at our <code>Model</code>.</p>
<h3>Bookmark Model</h3>
<p>Irmin stores support custom <a href="https://mirage.github.io/irmin/irmin/Irmin/Contents/index.html">serialisable and mergeable content types</a>. These types define not only how to encode and decode the type for persistence but also how to perform a <a href="https://mirage.github.io/irmin/irmin/Irmin/Merge/index.html">3-way merge</a> when conflicts arise. For our model, we use a simple merge algorithm of picking the "latest" updated one, based on a clock timestamp.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">merge_m</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">old</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">ignore</span><span class="ocaml-source"> </span><span class="ocaml-source">old</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Simple merge: pick </span><span class="ocaml-comment-block">"</span><span class="ocaml-comment-block">latest</span><span class="ocaml-comment-block">"</span><span class="ocaml-comment-block"> updated model </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Merge</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">ok</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">updated_at</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">updated_at</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">merge</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Merge</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">option</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">merge_m</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Irmin has built-in content types for strings and JSON. Since our bookmarks contain a few fields of information, and we would like a human-readable format in our repository, we use JSON for our serialisation format. The <a href="https://mirage.github.io/irmin/irmin/Irmin/Contents/Json/index.html">JSON content type</a> exposed by Irmin is low-level, so we define a module to wrap this lower level type by mapping from our custom type to Irmin's JSON type:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Type</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Json</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">of_json</span><span class="ocaml-source"> </span><span class="ocaml-source">to_json</span><span class="ocaml-source">
</span></code></pre>
<p>You can look at <a href="https://github.com/tarides/irmin-bookmarks/blob/main/model/model.ml"><code>model.ml</code></a> to see our complete model. Here is an example of what a bookmark looks like in our repository:</p>
<pre><code class="language-javascript">{"updated_at":1696621685526,"created_at":1696621683493,"name":"Tarides","notes":"Building Functional Systems","url":"/"}
</code></pre>
<h3>Using the Irmin API</h3>
<p>Our client only <a href="https://github.com/tarides/irmin-bookmarks/blob/main/extension/shared/client.ml">uses a small part</a> of the Irmin store API to list, load, save, and delete our bookmarks.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">list</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> [Store.Tree.fold] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Tree</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fold</span><span class="ocaml-source"> ~</span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-polymorphic-variant">`Undefined</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">contents</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">_path</span><span class="ocaml-source"> </span><span class="ocaml-source">m</span><span class="ocaml-source"> </span><span class="ocaml-source">acc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">m</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">acc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">return</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-list">[]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">load</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Model</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">key_path</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> [Store.find] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">find</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">save</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Model</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">key_path</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> [Store.Tree.add] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Tree</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add</span><span class="ocaml-source"> </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">update</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> ~</span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Info_jsoo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> ~</span><span class="ocaml-source">author</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Update </span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">url</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">delete</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Model</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">key_path</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> [Store.Tree.remove] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Tree</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">remove</span><span class="ocaml-source"> </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">update</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> ~</span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Info_jsoo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> ~</span><span class="ocaml-source">author</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Delete </span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">model</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">url</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Each bookmark is stored in the Git repository using <a href="https://github.com/tarides/irmin-bookmarks/blob/bf96308a8ddeacf0ae66475383338b26c9cbb192/model/model.ml#L15-L21">a unique path</a>. Loading, saving, and deleting is as easy as passing this path to the corresponding store functions.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/listofbookmarks-170w~PyKswLXjlfAfuzz4DN2qdA.webp 170w, /blog/images/listofbookmarks-340w~Not-gIVaY1C13LZu1Yk5Aw.webp 340w, /blog/images/listofbookmarks-680w~NXCuyndG5_eyJU1zd5zPJg.webp 680w, /blog/images/listofbookmarks-1360w~c8htCcwNQpRsYUHgtuaIow.webp 1360w" src="/blog/images/listofbookmarks-1360w~c8htCcwNQpRsYUHgtuaIow.webp" alt="An example list of saved bookmarks, here the bookmarks are: Irmin, OCaml.org, and Tarides.com"></p>
<p>For listing our bookmarks, we can accumulate all of our models in the store's tree. If our repository contained a large number of bookmarks, we would need to implement some kind of pagination API, but simple accumulation is enough for our demo application.</p>
<h3>Concurrent Atomic Updates</h3>
<p>An interesting function to look at more closely is <code>update</code>. This function is used when saving and deleting and performs atomic updates to our store, even if multiple tabs are concurrently writing.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">update</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> ~</span><span class="ocaml-source">info</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">catch</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">repo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Get latest tree for main branch </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">main</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">of_branch</span><span class="ocaml-source"> </span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Branch</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">main</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">head</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Head</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">main</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Apply [f] to the tree on main to get our new tree </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Commit</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-source">head</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Commit this tree </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Commit</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-source">repo</span><span class="ocaml-source"> ~</span><span class="ocaml-source">info</span><span class="ocaml-source"> ~</span><span class="ocaml-source">parents</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Commit</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-source">head</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-source">tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Merge commit to main </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">*</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">main</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">of_branch</span><span class="ocaml-source"> </span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Branch</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">main</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">merge_with_commit</span><span class="ocaml-source"> </span><span class="ocaml-source">main</span><span class="ocaml-source"> </span><span class="ocaml-source">commit</span><span class="ocaml-source"> ~</span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Info_jsoo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> ~</span><span class="ocaml-source">author</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Merge to main</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>The key to how this works is merging!</p>
<p>Simply writing a commit directly to the main branch, like when using <a href="https://mirage.github.io/irmin/irmin/Irmin/module-type-S/index.html#val-set_tree">Irmin's <code>set_tree</code></a>, can fail when done concurrently because updating the main branch reference is done using a <a href="https://en.wikipedia.org/wiki/Compare-and-swap">compare-and-swap</a> operation. To avoid this issue, we first create a new commit with our changes and then attempt to merge it to the main branch using <code>merge_with_commit</code>. When performed concurrently, the process of merging our updated commit into the latest commit on main and updating the reference is retried if it fails. The process will terminate either with a successful merge and update or a merge conflict that cannot be handled automatically.</p>
<p>When merging, there are two conflict scenarios:</p>
<ol>
<li>The same bookmark is added or updated on main and added or updated in the update commit. This will be resolved by our merge function since it picks the "newest" model.</li>
<li>The same bookmark is deleted either on main or the update commit and added or updated in the other. This will result in a merge conflict since the custom merge function of our model is not called when one side is deleted.</li>
</ol>
<p>The current code in the extension simply propagates the error in the second case which means that a user needs to try again. This case could be handled specially to build an improved user experience, but was sufficient for our demo application. The important aspect is that our extension handles concurrent updates correctly and can automatically resolve conflicts in many cases.</p>
<h2>Wrapping up</h2>
<p>And that's it! Take a look at the <a href="https://github.com/tarides/irmin-bookmarks">project's repository</a> to see that it only takes about 500 lines of OCaml code to have a fully working browser extension that saves, loads, lists, and deletes bookmarks in a local git repository.</p>
<p>Check out the project's <a href="https://github.com/tarides/irmin-bookmarks/blob/main/README.md">README</a> for how to build and use the extension. If you give any Irmin packages a try and run into issues, feel free to open issues or PRs on <a href="https://github.com/mirage/irmin/">the Irmin repository</a>. Happy hacking!</p>
]]></description><link>https://tarides.com/blog/2023-10-25-tutorial-building-a-browser-extension-with-irmin</link><guid isPermaLink="false">https://tarides.com/blog/2023-10-25-tutorial-building-a-browser-extension-with-irmin.html</guid><dc:creator><![CDATA[ Irmin Team ]]></dc:creator><pubDate>Wed, 25 Oct 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Off to the Races: Using ThreadSanitizer in OCaml]]></title><description><![CDATA[<p>OCaml Multicore opened up a new world of performance for developers, something that <a href="/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain/">Nomadic Labs has tested with great results.</a> Rather than relying on one core to do everything, the program can take advantage of multiple cores simultaneously for a significant performance boost.</p>
<p>With new programming possibilities come new classes of bugs, which require updated detection methods. One of these types of bugs is called a data race. A data race is a race condition that occurs when two accesses are made to the same memory location, at least one is a write, and no order is enforced between them.</p>
<p>Data races can be dangerous as they are easy to miss and capable of yielding unexpected results. Consequently, integrating a tool to detect data races has been a high priority for the teams working on OCaml 5.0 with Multicore support. Whilst data races in OCaml are less problematic than in many other languages (for example, data races in OCaml do not cause crashes and do not constitute undefined behaviour), developers still want to be made aware of possible data races so that they can remove them from their programs. More about this below.</p>
<h2>What is TSan?</h2>
<p><a href="https://clang.llvm.org/docs/ThreadSanitizer.html">ThreadSanitizer</a>, or TSan, is an open-source tool that reliably detects data races at runtime. It consists of instrumenting programs with calls to a dedicated runtime that performs the detection.</p>
<p>Support for TSan will officially be part of the OCaml 5.2 release, and there is already a backport for OCaml 5.1.</p>
<p>This blog post will demonstrate the benefits of using TSan, offer insight into how TSan works, and outline the challenges of integrating it with OCaml. For a more practically oriented guide on how to use TSan in your own projects, the <a href="https://ocaml.org/docs/multicore-transition">tutorial on using TSan with OCaml Multicore</a> is a great place to start.</p>
<p>We will begin by examining what a data race looks like, both before and after using TSan.</p>
<h3>A Practical Example</h3>
<p>Let us consider how a data race might occur. Say an OCaml programmer writes code to populate a table of clients from several sources. They decide to make it Multicore to improve performance by using two <code>Domain</code>s for two data sources:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">clients</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">16</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">free_id</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">clients1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Some data source </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">clients2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Some data source </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">record_clients</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Seq</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">c</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add</span><span class="ocaml-source"> </span><span class="ocaml-source">clients</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fetch_and_add</span><span class="ocaml-source"> </span><span class="ocaml-source">free_id</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">c</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">d</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">record_clients</span><span class="ocaml-source"> </span><span class="ocaml-source">clients1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">record_clients</span><span class="ocaml-source"> </span><span class="ocaml-source">clients2</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">join</span><span class="ocaml-source"> </span><span class="ocaml-source">d</span><span class="ocaml-source">
</span></code></pre>
<p>As we can tell, each incoming client is bound to a unique ID. The programmer correctly used the <code>Atomic</code> module for ID generation, ensuring the IDs are truly unique. However, they have failed to use a domain-safe module designed for concurrency, instead opting for <code>Hashtbl</code>. Unfortunately, this module is unsafe for concurrent use: using <code>Hashtbl.t</code> in parallel can cause data races and lead to surprising results.</p>
<p>For example, when two domains add elements in parallel it may cause some elements to be silently dropped. To make matters worse, the resulting bugs would be non-deterministic and as such be hard to detect and track down. Furthermore, if the programmer's project depends on libraries that use <code>Hashtbl</code>, it would make them unsafe to use in parallel without it necessarily being clear from their documentation.</p>
<p>If, however, the programmer were to build their program on a special <code>opam</code> switch with a TSan-enabled compiler like this:</p>
<pre><code>$ opam switch create 5.1.0+tsan
$ opam install dune
$ dune exec ./clients.exe
</code></pre>
<p>(Side note: the <code>5.1.0+tsan</code> switch is the most convenient way to use TSan with OCaml at the time of writing. Once OCaml 5.2 is released, the blessed command will be <code>opam switch create &lt;switch name&gt; ocaml-option-tsan</code>.)</p>
<p>All memory accesses would be instrumented with calls to the TSan runtime, and TSan would detect the data race condition and output a data race report:</p>
<pre><code>==================
WARNING: ThreadSanitizer: data race (pid=790576)
  Write of size 8 at 0x7f42b37f57e0 by main thread (mutexes: write M86):
    #0 caml_modify runtime/memory.c:166 (clients.exe+0x58b87d)
    #1 camlStdlib__Hashtbl.resize_749 stdlib/hashtbl.ml:152 (clients.exe+0x536766)
    #2 camlStdlib__Seq.iter_329 stdlib/seq.ml:76 (clients.exe+0x4c8a87)
    #3 camlDune__exe__Clients.entry /workspace_root/clients.ml:9 (clients.exe+0x4650ef)
    #4 caml_program &lt;null&gt; (clients.exe+0x45fefe)
    #5 caml_start_program &lt;null&gt; (clients.exe+0x5a0ae7)

  Previous read of size 8 at 0x7f42b37f57e0 by thread T1 (mutexes: write M90):
    #0 camlStdlib__Hashtbl.key_index_1308 stdlib/hashtbl.ml:507 (clients.exe+0x53a625)
    #1 camlStdlib__Hashtbl.add_1312 stdlib/hashtbl.ml:511 (clients.exe+0x53a6f8)
    #2 camlStdlib__Seq.iter_329 stdlib/seq.ml:76 (clients.exe+0x4c8a87)
    #3 camlStdlib__Domain.body_703 stdlib/domain.ml:202 (clients.exe+0x50bf60)
    #4 caml_start_program &lt;null&gt; (clients.exe+0x5a0ae7)
    #5 caml_callback_exn runtime/callback.c:197 (clients.exe+0x56917b)
    #6 caml_callback runtime/callback.c:293 (clients.exe+0x569cb0)
    #7 domain_thread_func runtime/domain.c:1100 (clients.exe+0x56d37f)
    [...]

SUMMARY: ThreadSanitizer: data race runtime/memory.c:166 in caml_modify
==================
[...]
ThreadSanitizer: reported 2 warnings
</code></pre>
<p>Above is a truncated view of what the TSan report, warning of a data race, looks like in this case. TSan has detected two memory accesses, a write and a read, made to one memory location. As they are also unordered, this constitutes a data race and TSan reports it along with the backtraces of both accesses.</p>
<p>In this case it would be evident that something has gone wrong with <code>Hashtbl.add</code> – a big hint to the programmer.</p>
<h2>Under the Hood</h2>
<p>Now that we know what TSan is used for, it's time to explore how it works. Compiling a program with TSan enabled causes the executable to be instrumented with calls to the TSan runtime library. The runtime library tracks memory accesses and ordering relations between these accesses.</p>
<p>Internally, the TSan runtime assigns a vector clock to each OCaml domain or system thread. Each thread holds a vector clock – a vector clock being an array of <em>n</em> integers, where <em>n</em> is the number of threads – and increments its clock upon each event (memory access, mutex operation, etc.). Certain operations like mutex locks, atomic reads, and so on, will synchronise clocks between threads.</p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/vector-clocks-170w~08reSUtuhyWk-ECMiM3YDA.webp 170w, /blog/images/vector-clocks-340w~dqrZHUXk2rq1YBZMCiIqOg.webp 340w, /blog/images/vector-clocks-680w~MurSJddFVJI9zyKvXBopkQ.webp 680w, /blog/images/vector-clocks-1360w~rD-FNVY0HeAg3XUeFmsVnA.webp 1360w" src="/blog/images/vector-clocks-1360w~rD-FNVY0HeAg3XUeFmsVnA.webp" alt="A mutex lock synchronising the clock between two threads." width="45%">
</p>
<p>Comparing vector clocks allows TSan to establish an order between events, so-called <a href="https://jameshfisher.com/2017/02/10/happened-before/">happens-before relations.</a> TSan reports a data race every time two memory accesses are made to overlapping memory regions, <strong>if:</strong></p>
<ul>
<li>At least one of them is a write, and</li>
<li>There is no established happens-before relation between them.</li>
</ul>
<h3>Shadow State</h3>
<p>Let us look at this process in more detail. Each word of application memory is associated with one or more 'shadow words'. Each shadow word contains information about a recent memory access to that word. This information points to the vector clock's state at the moment the access was performed.</p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/shadow-state-170w~KhX1cCzazgbI-fhvIkRYrA.webp 170w, /blog/images/shadow-state-340w~6rVh96K0y4S6u0Xe4MZnTA.webp 340w, /blog/images/shadow-state-680w~SYX1dO7Dwr14Vx1wkSpYyw.webp 680w, /blog/images/shadow-state-1360w~5nxl49UbtI1LfUGLt1bUYw.webp 1360w" src="/blog/images/shadow-state-1360w~5nxl49UbtI1LfUGLt1bUYw.webp" alt="A box labelled application with an arrow to a box labeled shadow state." width="45%">
</p>
<p>This information (called the 'shadow state') is updated at every instrumented memory access: TSan compares the accessor's clock with each existing shadow word, and checks the following:</p>
<ul>
<li>Do the accesses overlap?</li>
<li>Is one of them a write?</li>
<li>Are the thread IDs different?</li>
<li>Are they unordered by happens-before?</li>
</ul>
<p>If these conditions are met, TSan detects and reports a data race.</p>
<p>In addition to memory access, operations like <code>Domain.spawn</code> and <code>Domain.join</code> (as well as mutex operations) are relevant for operation ordering. As such, TSan also instruments these operations.</p>
<h2>Integrating TSan with OCaml</h2>
<p>The core of TSan support is instrumentation of memory acceses with calls to the TSan runtime. The OCaml compiler performs this instrumentation in a dedicated pass.</p>
<h3>Exceptions</h3>
<p>For TSan to show a backtrace of past events, function entries and exits must also be instrumented. This is done as part of the instrumentation pass.</p>
<p>However, in OCaml, a function can also be exited by an <a href="https://github.com/fabbing/obts_exn">exception</a>, bypassing part of the instrumentation. When that happens, for TSan’s view of the backtrace to remain up-to-date, the OCaml runtime informs TSan about every exited function.</p>
<h3>Effect Handlers</h3>
<p><a href="https://v2.ocaml.org/manual/effects.html">Effect handlers</a> are a generalisation of exception handlers. Performing an effect results in a jump to the associated effect handler, and then a delimited continuation makes it possible to resume the computation. In the same way as with exceptions, the OCaml runtime must signal to TSan which functions are exited when an effect is performed and re-entered when a continuation is resumed.</p>
<h3>Memory Model</h3>
<p>Each language specifies how memory behaves in parallel programs using what is known as a memory model. Incidentally, what counts as a data race in a given language also depends on its memory model.</p>
<p>TSan can detect data races in programs that follow the C memory model. OCaml 5's memory model is different from the C model, however, and it offers more guarantees: data races in C and C++ cause <em>undefined behaviour</em> (i.e., anything can happen), which is not the case in OCaml. OCaml's semantics are “fully defined” (see the <a href="https://v2.ocaml.org/manual/memorymodel.html">manual page</a> about the memory model). In particular, a program with data races in OCaml will not crash, unlike in C++. In addition, there can be no <a href="https://www.hboehm.info/c++mm/thin_air.html">out-of-thin-air values</a>: the only values that can be observed are values that are previously written to that location. The OCaml memory model guarantees that even for programs with data races, memory safety is preserved.</p>
<p>Data races in OCaml can still result in unexpected surprises for the OCaml programmer. A multi-threaded execution may produce behaviours that cannot be explained by the mere interleaving of actions from different threads. The only way such behaviours can be explained is through a reordering of actions in the same thread. Such reasoning is quite unintuitive for a programmer who will be more used to thinking about program behaviour as being an interleaving of actions from different threads.</p>
<p>However, if the program is data-race free, then the observed behaviour can be explained by a simple interleaving of operations from different threads (a property known as <em>sequential consistency</em>). Eliminating data races reduces non-determinism in the program and hence it is beneficial to remove data races whenever possible. Note that we do not completely eliminate non-determinism from a parallel program.</p>
<p>In essence, because of the differences between the C and OCaml memory models, in order for TSan to detect data races in OCaml the instrumentation of memory accesses must <em>conceptually</em> map OCaml programs to C programs. During development, the team took care to ensure that this mapping preserved the detection of data races (in the OCaml sense) and did not introduce false positives.</p>
<p>You can find more details about the inner workings of TSan and its OCaml support in this <a href="https://github.com/fabbing/ocaml_tsan_icfp/blob/master/presentation/presentation.pdf">OCaml Workshop 2023 talk</a>.</p>
<h3>Performance and Limitations</h3>
<p>In terms of the cost of running TSan, currently, it affects memory and performance in the following ways:</p>
<ul>
<li>Performance cost: about a 2-7x slowdown (compared to 5-15x for C/C++)</li>
<li>Memory consumption: increased by about 4-7x (compared to 5-10x for C++)</li>
</ul>
<p>As with all tools, TSan has some limitations. These are due to how TSan is built and are unlikely to change. With TSan, data races are only detected on visited code paths. In addition, TSan only remembers a finite amount of memory accesses for space-saving reasons, which can in principle cause TSan to miss some races. TSan also does not currently support Windows.</p>
<p>TSan support for OCaml is currently only implemented for x86-64 Linux and macOS, but will hopefully be extended to include more architectures such as arm64.</p>
<h2>Use Cases</h2>
<p>Knowing the limitations, let us explore TSan's use cases. So far, TSan has  helped by unearthing data races in several OCaml libraries:</p>
<ul>
<li><a href="https://github.com/ocaml-multicore/saturn">Saturn (formerly known as Lockfree):</a> TSan <a href="https://github.com/ocaml-multicore/saturn/issues/39">found a benign data race</a>, as well as a data race occuring from the use of <a href="https://github.com/ocaml-multicore/saturn/pull/40">semaphores</a>.</li>
<li><a href="https://github.com/ocaml-multicore/domainslib">Domainslib:</a> TSan found benign data races in Chan, not just <a href="https://github.com/ocaml-multicore/domainslib/issues/72">once</a> but <a href="https://github.com/ocaml-multicore/domainslib/pull/103">twice</a>.</li>
<li><a href="https://github.com/ocaml/ocaml">The OCaml runtime system</a> itself: TSan warned about a <a href="https://github.com/ocaml/ocaml/issues/11040">number of race conditions</a> in the OCaml runtime.</li>
</ul>
<p>In addition, TSan has been a great help in transitioning the effects-based I/O library <a href="https://github.com/ocaml-multicore/eio">Eio</a> and the distributed database <a href="https://github.com/mirage/irmin">Irmin</a> to Multicore. It allowed teams to detect potential data races and fix them as required.</p>
<h2>Feedback!</h2>
<p>We want to hear from you – are you using TSan for your OCaml projects? Please get in touch and let us know about your experience, whether you have encountered any problems, and if you have any suggestions for how it could be improved.</p>
<p>You can share your thoughts on the <a href="https://discuss.ocaml.org">OCaml Discuss Forum</a> or contact Tarides directly <a href="/contact/">on our website</a>. Don't forget to check out the <a href="https://ocaml.org/docs/multicore-transition">TSan tutorial</a> as well. Happy hacking!</p>
]]></description><link>https://tarides.com/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2023-10-18-off-to-the-races-using-threadsanitizer-in-ocaml.html</guid><dc:creator><![CDATA[ Fabrice Buoro, Olivier Nicole, Isabella Leandersson ]]></dc:creator><pubDate>Wed, 18 Oct 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Prioritising Mental Health at Tarides]]></title><description><![CDATA[<p>In a world that never stops, it's easy to get lost in the hustle and bustle of daily life, often overlooking the importance of maintaining our mental well-being, one of the most vital aspects of our existence. Especially on <a href="https://mentalhealth-uk.org/blog/how-flexible-working-could-tackle-burnout-in-the-workplace/">World Mental Health Day</a>, observed every October 10th, it's crucial to reflect on the significance of mental health both in our personal lives and within the workplace. At Tarides, we understand that true success goes beyond profits and productivity; it's about the well-being of our employees and contractors.</p>
<h2>The Significance of Mental Health at Tarides</h2>
<p>In today's society, mental health is often overlooked, minimised, or stigmatised. Despite its undeniable impact on our daily lives, mental health continues to be a topic shrouded in silence. The consequences of neglecting mental health can be severe, affecting not only individuals but entire organisations. <a href="https://mentalhealth-uk.org/blog/how-flexible-working-could-tackle-burnout-in-the-workplace/">Stress, anxiety, and burnout</a> can lead to decreased productivity, increased absenteeism, and a negative work environment.</p>
<p>At Tarides, we strive to be responsible employers, and we recognise the importance of the mental well-being of our staff. By openly addressing mental health, we aim to create a safe workplace that values holistic health, fosters employee well-being, and promotes a healthy work-life balance.</p>
<h3>Cultivating Mental Wellness in the Workplace</h3>
<p>We take significant steps to prioritise mental health within our organisation, not just on Mental Health Day but all year long.</p>
<h4>Flexible Hours &amp; Extended Paid Leave</h4>
<p>At Tarides, we embrace the spirit of flexibility and support in our work culture. Our staff have a choice to work remotely or in the office, and they are able to adjust their schedule to allow for daily life, such as appointments and childcare arrangements. We also offer a competitive holiday allowance as standard and an additional two weeks paid holiday for all employees in August. This is not only extra paid time off, but our entire organisation also takes a well-deserved breather for these two weeks. No emails, no deadlines, no worries about missing out or coming back to a mountain of work. It's a time to truly disconnect and recharge, stress-free.</p>
<h4>Comprehensive Health Insurance &amp; Sports Classes</h4>
<p>We provide a health insurance policy, with all costs covered by Tarides, for employees and their families. This includes 24/7 advice from experts and access to a range of sports classes and wellness experiences. Health-related anxiety and stress will be common to everyone at some stage of life, and we aim to provide additional support when our employees need it.</p>
<h4>Regular Check-In Meetings</h4>
<p>All of our staff have regular 1:1 meetings with their Team Lead, as well as regular check-in meetings with their HR representative. Often, there may not be a specific agenda for these meetings, but they provide a recurring opportunity for people to freely share and address concerns related to mental health or other challenges they may be facing. Our intention is to create a safe and supportive space, as well as to reduce barriers for such conversations.</p>
<h3>Expert Training &amp; Events</h3>
<p>We have partnered with <a href="https://www.moka.care/">moka.care</a>, "a global solution for mental health prevention in the workplace," to provide valuable resources and support for our employees and contractors.</p>
<h4>The Art of Disconnecting</h4>
<p>On October 19th, Tarides will host "The Art of Disconnecting," a one-hour relaxation exercise session. In a world where constant connectivity has become the norm, this session is a much-needed opportunity to step away from the stressors of daily life. Attendees will learn techniques to unplug, recharge, and find balance in a fast-paced world.</p>
<p>This event emphasises the importance of taking time for oneself, disconnecting from digital distractions, and prioritising mental well-being. It's a chance for Tarides' employees to discover the value of self-care and relaxation in maintaining their mental health.</p>
<h4>Establishing Psychological Safety in Your Team</h4>
<p>On October 20th, Tarides will host a one-hour talk titled "Towards a Culture of Trust: Establishing Psychological Safety in Your Team." This session is designed to empower attendees by highlighting the following key points:</p>
<ul>
<li>Realising that psychological safety can be practised and activated by everyone in the company</li>
<li>Becoming aware of the positive impact the development of psychological safety has on cooperation and innovation</li>
<li>Exploring the right to make mistakes as a stepping stone to growth and improvement</li>
</ul>
<p>By fostering a culture of trust within teams, this event aims to enhance cooperation, innovation, and personal growth among employees. At Tarides, we understand that when people feel safe and supported, they are more likely to thrive both personally and professionally.</p>
<h2>Conclusion</h2>
<p>Mental Health Day serves as an important reminder that mental health is an integral part of our lives and workplaces. We are committed to prioritising the mental well-being of our staff and to creating a healthy and productive work environment.</p>
<p>Through initiatives that promote a positive work-life balance, like providing regular check-in meetings and offering expert training and events internally, we actively create a culture that values mental health, encourages self-care, and fosters a sense of trust and support among our staff. A mentally healthy workforce is a more engaged, innovative, and resilient one.</p>
<p>As we celebrate Mental Health Day, let us all take a moment to reflect on the importance of mental health in our lives and within our workplaces. Tarides is leading the way in recognising that investing in mental health is not just the right thing to do; it's a smart business decision that benefits everyone involved.</p>
<p>Together, we can create a world where mental health is no longer overlooked or minimised.</p>
<p>Together, we can ensure it's acknowledged and supported.</p>
<p>Together, we can encourage all companies to prioritise the holistic well-being of their staff.</p>
]]></description><link>https://tarides.com/blog/2023-10-10-prioritising-mental-health-at-tarides</link><guid isPermaLink="false">https://tarides.com/blog/2023-10-10-prioritising-mental-health-at-tarides.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 10 Oct 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Porting OBuilder to FreeBSD]]></title><description><![CDATA[<p>OBuilder is a tool for performing arbitrary, reproduceable builds of OCaml-related software within a sandboxed environment. It is used by the CI team at Tarides to provide OCaml-based Continuous Integration (CI) for projects like <code>opam-repo-ci</code>, <code>ocaml-ci</code>, and <code>multicoretest-ci</code>. Originally written for Linux, OBuilder had Windows and macOS support added later. Previous blog posts have covered porting to <a href="/blog/2023-08-02-obuilder-on-macos/">macOS</a> and <a href="/blog/2023-07-12-ocaml-ci-renovated/">OCaml CI in general</a>.</p>
<p>Here we cover the work to add the remaining Tier-1 supported architecture for OCaml that's missing from OBuilder and FreeBSD. With this work, the CI systems maintained by the team at Tarides will be able to support FreeBSD.</p>
<h2>The Challenge</h2>
<p>Being initially Linux-centric, OBuilder is architected around three major requirements:</p>
<ul>
<li>Initial build environments are Docker images.</li>
<li>Sandboxing is performed using the Open Container Initiative (OCI) tool <code>runc</code>.</li>
<li>A filesystem with snapshot capabilities is needed and acts as a cache of
identical build steps.</li>
</ul>
<p>Neither of the first two items are available under FreeBSD (a Docker client is available, but there is no native Docker server); therefore, alternative solutions must be found. As for the filesystem requirement, FreeBSD has been supporting <a href="https://openzfs.org">Sun's ZFS filesystem</a> out of the box for many releases now. ZFS support already existed for Linux and macOS, and it is being used in the macOS port <a href="/blog/2023-08-02-obuilder-on-macos/">(more details here)</a>.</p>
<p>Fortunately, the existing architecture for OBuilder encapsulates these needs as <code>Fetcher</code>, <code>Sandbox</code>, and <code>Store</code> modules, respectively, so the only work required would be to write FreeBSD-specific <code>Fetcher</code> and <code>Sandbox</code> modules.</p>
<h3>The Fetcher</h3>
<p>Initially we tried to fetch Docker images without using the <code>docker</code> command. We have an existing script, <code>download-frozen-image</code>, of this attempt in the <code>moby</code> GitHub project (the open source parts of Docker). Although if we use that script to fetch the Docker image's various layers and apply them in order, the result won't be beneficial from a FreeBSD perspective because the available Docker images are filled with Linux binaries. These can run under FreeBSD with the help of the compatibility module, but the OCaml toolchain would believe it was running under Linux, so it would build Linux binaries. This is not the solution we are looking for. Until Docker is available under FreeBSD, there won't be a repository of FreeBSD images suitable for OBuilder. Such images will, at least in the beginning, be built locally in the Tarides CI network.</p>
<p>Given this, it makes sense to expect <code>.tar.gz</code> archives to be available, then we can simply download and extract them to implement the <code>Fetcher</code> module. Moreover, FreeBSD provides its own <code>fetch</code> command, which is able to download files over <code>http</code> and <code>https</code>. It can also use <code>file://</code> URIs, which turned out to be very helpful during development. There is currently no attempt to support aliases or canonical names, so all the <code>(from ...)</code> stanzas in OBuilder command files will need to be adjusted for use with FreeBSD. This limitation can be overcome by prepopulating the OBuilder cache with the most-used images under their expected names on the OBuilder worker systems.</p>
<p>The final solution for the Fetcher on FreeBSD uses ZFS to store the base images as ZFS datasets on each machine. These get mounted into the jail with the approriate installs of OCaml, opam, and a Git clone of the <code>opam-repository</code>. We ended up with a layout of:</p>
<div role="region"><table>
<tbody><tr>
<th>ZFS Volumes for FreeBSD</th>
</tr>
<tr>
<td>obuilder/base-image/freebsd-ocaml-4.14</td>
</tr>
<tr>
<td>obuilder/base-image/freebsd-ocaml-5.1</td>
</tr>
</tbody></table></div><p>The base image name is used in the <code>(from ...)</code> stanza of the OBuilder spec files to select an OCaml version. Interestingly, we used a similar layout in the macOS OBuilder port. More details are available in the <a href="https://github.com/ocurrent/freebsd-infra">freebsd-infra ansible scripts</a>.</p>
<h3>The Sandbox</h3>
<p>FreeBSD has come with its own sandboxing mechanism, named <code>jail</code>, since the late 1990s. In addition to only having access to a subset of the file system, jails can also be denied network access, which fits the OBuilder usage pattern where network access is only allowed to fetch build dependencies.</p>
<p>In order to start a jail, the <code>jail</code> command is invoked with either a plain text file providing its configuration or with the configuration parameters (in the "name=value" form) on its command line.</p>
<p>In order to keep things simple in OBuilder, and since the jail configuration will only need a few parameters, they are all passed on the command line. This might be a problem if the length of the run command, as specified in the OBuilder command file, reaches the FreeBSD command-line size limit. Since this limit is a few hundred kilobytes, it does not seem to be a serious concern.</p>
<p>The <code>jail</code> invocation will provide:</p>
<ul>
<li>A unique jail name</li>
<li>The absolute path of the jail filesystem</li>
<li>The command (or shell script) to run in the jail</li>
<li>The user on behalf of which the command will be run. This requires the user to exist within the jail filesystem (<code>/etc/passwd</code> and <code>/etc/group</code> entries).</li>
</ul>
<p>More options may be used to allow for network access or specify commands to run on the host or within the jail at various states of the jail lifecycle.</p>
<p>Also, for processes running under the jail to behave correctly, a stripped-down <code>devfs</code> pseudo-filesystem needs to be mounted on the <code>/dev</code> directory within the jail. While this can be done automatically by <code>jail(8)</code> using the proper <code>mount.devfs</code> option, care must be taken to correctly unmount this directory after the command run within the jail has exited. In order to be sure there will be no leftover <code>devfs</code> mounts, which would prevent removal of the jail filesystem at cleanup time, OBuilder unconditionally runs an <code>umount</code> command after the <code>jail</code> command exits.</p>
<p>Lastly, since most (if not all) OBuilder commands will expect a proper opam environment configuration, it is necessary to run the commands within a login shell. Such a shell can only be run as <code>root</code>. Therefore the command that will run within the jail is:</p>
<pre><code>  /usr/bin/su -l obuilder_user_name -c "cd obuilder_directory &amp;&amp; obuilder_command"
</code></pre>
<p>The <code>jail</code>-based sandbox environment provided by FreeBSD OBuilder closely mirrors the original Linux <code>runc</code>-based implementation because it targets an operating system-level virtualisation. In both cases, we expect to see similar performance and stability from the FreeBSD port. With the <code>Fetcher</code> and the <code>Sandbox</code> modules written, a complete OBuilder run can be attempted.</p>
<h3>Integrating with OCluster</h3>
<p>OCluster is a larger system that processes build requests on a cluster of servers, each running an OCluster worker that uses OBuilder as a library. In order to make the FreeBSD systems compatible with OCluster's needs, a few more adjustments are necessary. A Docker client for running health checks, a ZFS pool, and a FreeBSD base image all need to be addressed.</p>
<p>Fortunately the Docker client is available as a FreeBSD package: <code>pkg install -y docker</code> would do the trick; however, we do something even simpler and create a shell script which does nothing. OCluster worker uses this script as a healthcheck. In future, we plan to change this to target a more appropriate FreeBSD health check.</p>
<p>As we hinted at earler, FreeBSD needs to be setup with a ZFS pool on a separate disk or a separate partition. The basic idea is that OBuilder will store base images plus build-state snapshots on this pool, and it is better to keep it separate from the main system. Additionally, the usage patterns of this pool involve a huge amount of reads, writes, snapshot creations, and deletions, so it makes operational sense to isolate it and allocate it on different storage than the operating system storage.</p>
<p>The FreeBSD base images are created by building opam and OCaml from source, then initialising an opam repository from a Git clone of the <code>opam-repository</code> GitHub repo. More details are available in the <a href="https://github.com/ocurrent/freebsd-infra">freebsd-infra ansible scripts</a>. With some FreeBSD knowledge, this should allow anyone to setup an OCluster worker on FreeBSD. In future, these base images could be built on a single machine and copied between machines using <code>zfs send</code> on the source machine and <code>zfs recv</code> on the build machine.</p>
<p>Currently we have a <a href="https://infra.ocaml.org/by-use/freebsd-x86_64">single server running FreeBSD 13.2</a> providing OCaml 4.14 and 5.1 builds on x86_64. This modest Dual Xeon machine (16 cores) is easily handling the load from <code>opam-repo-ci</code>, <code>ocaml-ci</code>, and <code>opam-health-check</code>, with a peak observed throughput of 40 jobs per hour. This initial deployment has confirmed the performance and stability expectations we had.</p>
<h2>Future Work</h2>
<p>Some future work that we would like to do:</p>
<ul>
<li>Optimising I/O using an in-memory OverlayFS. A similar setup on Linux has given us some impressive performance improvements.</li>
<li>Supporting FreeBSD on ARM (if there is sufficient interest in this architecture)</li>
<li>Investigate OCluster performance on FreeBSD using DTrace</li>
</ul>
<h2>Conclusion</h2>
<p>The modular design of OBuilder has allowed for it to be easily adapted to run under FreeBSD. A few FreeBSD systems are currently being set up as OBuilder workers within the OCluster orchestrator used by Tarides for automated OCaml package testing. Support has been added to <a href="https://opam.ci.ocaml.org">opam.ci.ocaml.org</a> for checking opam packages, and <a href="https://ocaml.ci.dev">ocaml.ci.dev</a> has FreeBSD builds available for OCaml projects hosted on GitHub and GitLab. In addition, the FreeBSD-specific instance of <a href="https://freebsd.check.ci.dev">opam health check</a> is providing base-level metrics of the repository health on the now-supported platform. Numerous packages need fixing, and we encourage the community to have a look and lend the maintainers a hand.</p>
<p>Extending OBuilder's capabilities to include FreeBSD as a Tier 1 platform is a crucial step to ensure the robustness and accessibility of OCaml-related software development. By implementing FreeBSD-specific modules for the <code>Fetcher</code> and <code>Sandbox</code> components, developers will be empowered to perform reproducible builds within a sandboxed environment on FreeBSD. The inclusion of FreeBSD in the OBuilder ecosystem will contribute to the overall growth and adoption of OCaml, facilitating the development of reliable and efficient software on this platform.</p>
<p>Please <a href="/contact/">get in touch with us</a> if you are interested in FreeBSD support, and tell us what architecture/version of FreeBSD you'd like to be supported. Or better yet, get involved with the <a href="https://github.com/ocurrent/overview">OCurrent</a> project! We are also active on <a href="https://discuss.ocaml.org">Discuss</a>.</p>
]]></description><link>https://tarides.com/blog/2023-10-04-porting-obuilder-to-freebsd</link><guid isPermaLink="false">https://tarides.com/blog/2023-10-04-porting-obuilder-to-freebsd.html</guid><dc:creator><![CDATA[ Miod Vallat, Tim McGilchrist ]]></dc:creator><pubDate>Wed, 04 Oct 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tutorial: How to Port Lwt Applications to Eio]]></title><description><![CDATA[<p>Thomas Leonard and Jonathan Ludlam hosted a <a href="https://icfp23.sigplan.org/details/icfp-2023-tutorials/4/Porting-Lwt-applications-to-OCaml-5-and-Eio">tutorial on porting Lwt applications to OCaml 5 and Eio</a> at arguably the world's largest functional programming conference: <a href="https://icfp23.sigplan.org">ICFP</a>. The tutorial is a great introduction to Eio, with a clear step-by-step approach that is accessible to developers of different experience levels.</p>
<p>This article provides some context to the tutorial and Eio in general. If you would rather skip straight to the code, check out the <a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/tree/main">tutorial on GitHub.</a></p>
<h2>Eio: a Brief Introduction</h2>
<p>OCaml 5 brought support for programming with effects. Using effects has several advantages over using call-backs or monadic style:</p>
<ol>
<li>It is faster – no heap allocations are required to simulate a stack.</li>
<li>Concurrent and plain non-concurrent code can be written in the same style.</li>
<li>Exception backtraces work.</li>
<li>Other language features such as <code>try/with</code>, <code>match</code>,<code> while</code> etc can be used with concurrent code.</li>
</ol>
<p>The OCaml 5 update also brought Multicore support, which has big implications for better performance across a multitude of <a href="/blog/2022-03-01-segfault-systems-joins-tarides/">use cases.</a></p>
<p>In light of these changes, there was a lot of interest in moving existing OCaml code to a new I/O library that could make the best use of both effects and multiple cores. With this impetus, a new effects-based direct-style I/O stack was developed for OCaml – enter Eio.</p>
<h3>Why Eio?</h3>
<p>Beyond Multicore and effects support, Eio comes with several useful features such as making use of lock-free data structures (see <a href="https://github.com/ocaml-multicore/saturn">Saturn</a>, modular programming support, interactive monitoring support enabled by the custom runtimes in OCaml 5.1, and interoperability with other concurrency libraries such as Lwt, Async, and Domainslib.</p>
<h3>Eio and Lwt</h3>
<p>When comparing Lwt and Eio there are a few things to consider. First of all, Eio's use of direct-style is shorter and easier for beginners to use. Secondly, and perhaps more significant to the average user, Eio is faster. This holds true even when just one core is being used. For a more detailed breakdown of times, check out the <a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/blob/main/doc/intro.md">Eio Introduction and Lwt Comparison</a> part of the tutorial.</p>
<p>Eio also manages error handling and backtraces differently; for example, Eio reports exceptions immediately in cases where two tasks are running concurrently, and backtraces are also more specific. Furthermore, Eio prevents the type of resource leaks that Lwt often allows to happen by requiring all resources to be tied to a switch, ensuring that they are released when the switch finishes.</p>
<p>Finally, Eio makes it clearer to understand how the program relates to the outside world since it does so through a function argument usually named <code>env</code>. By looking at what happens to <code>env</code>, the user gets a bound on the program's behaviour. Lwt does not do this, typically just starting by saying it will run <code>main</code> with no further context.</p>
<h2><a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/tree/main">Tutorial</a> Overview:</h2>
<p>The tutorial covers the following topics:</p>
<ul>
<li><strong><a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/blob/main/doc/prereqs.md">Prerequisites:</a></strong> Walks you through what to install before starting the tutorial, including optional Docker and ThreadSanitizer files, as well as tips and common pitfalls.</li>
<li><strong><a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/blob/main/doc/intro.md">Eio Introduction and Lwt Comparison:</a></strong> Outlines the main differences between Eio and Lwt, including performance, error handling, and bounds on behaviour.</li>
<li><strong><a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/blob/main/doc/porting.md">Porting From Lwt to Eio:</a></strong> The core of the tutorial, this section walks you through how to take an example application in Lwt and port it to Eio. It gives you an overview of the code, tells you how to convert the code, and how to take advantage of Eio features.</li>
<li><strong><a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/blob/main/doc/multicore.md">Using Multiple Cores:</a></strong> Instructions on how to use multiple cores to improve performance. The section includes topics like thread-safe logging, using multiple domains with <code>cohttp</code>, testing, and suggestions on useful tools.</li>
</ul>
<h2>Feedback</h2>
<p>It really helps the teams to receive feedback and constructive comments on how to improve documentation for users. We encourage you to share your thoughts on the <a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial/tree/main">repo</a>, or on the <a href="https://discuss.ocaml.org">OCaml Discuss</a> forum.</p>
<p>You can also <a href="/contact/">contact us</a> on our website with any questions or concerns.</p>
]]></description><link>https://tarides.com/blog/2023-09-27-tutorial-how-to-port-lwt-applications-to-eio</link><guid isPermaLink="false">https://tarides.com/blog/2023-09-27-tutorial-how-to-port-lwt-applications-to-eio.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 27 Sep 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[A Year of SpaceOS: Showing the World the Benefits of OCaml]]></title><description><![CDATA[<p>I have had the pleasure of attending some great conferences over the past year, where I have presented some of the latest and most exciting OCaml use cases developed at Tarides. With our new <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">SpaceOS</a> solution, Tarides can provide significant quality-of-life upgrades for industries that rely on satellites to power their critical infrastructure. In this post, I’ll share the best parts of my experience at the different conferences along with my key takeaways.</p>
<h2>India Space Congress 2023</h2>
<p>The <a href="https://www.indiaspacecongress.com/">India Space Congress 2023 conference</a> was held in the Grand Hyatt, New Delhi, between July 10-12, 2023. This is only the second edition of the conference, and yet there were more than 500 participants and representation from over 30 countries. The event was graced by the presence of governors from three states in India, ambassadors, the heads of space agencies, the chief economist at NASA, as well as senior officers from the Indian Space Research Organisation (ISRO) and Department of Telecommunication (DoT).</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Shakthi2-170w~uAgHAtKPD9RKe1wFh24aUw.webp 170w, /blog/images/Shakthi2-340w~uCS1wp2kNbJg11KoFzpsuQ.webp 340w, /blog/images/Shakthi2-680w~9oDh_ln4zwNnqAfpzM7atA.webp 680w, /blog/images/Shakthi2-1360w~wwMnycH_jw7u4OwHG8sA4A.webp 1360w" src="/blog/images/Shakthi2-1360w~wwMnycH_jw7u4OwHG8sA4A.webp" alt="Shakthi is standing next to a vertical sign from the India Space Congress 2023 conferece. The sign features the logos of partners and sponsors."></p>
<p>The conference was organised by the <a href="https://www.sia-india.com/">SatCom Industry Association</a>, a nonprofit organisation who cater to the interests of the space industry in India. They act as a platform for collaboration between academia, industry, and governments. Their members include satellite operators, suppliers, manufacturers, law firms, and academic institutions – they now have over 150 registered start-ups!</p>
<p>India has recently <a href="https://www.state.gov/the-republic-of-india-signs-the-artemis-accords/">signed the Artemis Accords</a> in order to participate in the <a href="https://www.nasa.gov/artemisprogram">Artemis</a> program. The Indian Space Research Organisation owns and manages all the satellites in India; however, the recently announced <a href="https://www.isro.gov.in/media_isro/pdf/IndianSpacePolicy2023.pdf">India Space Policy 2023</a> is now allowing private players to enter into the space market.</p>
<p>In previous years, an 18% Goods and Services Tax (GST) was levied on private players entering into the space market. As part of a series of reforms starting in 2020, the government of India now
<a href="https://www.businesstoday.in/latest/in-focus/story/gst-exemption-to-private-satellite-launch-service-firms-a-huge-financial-incentive-to-boost-growth-industry-389459-2023-07-12">exempts</a> private launch service companies from the tax. The current public-private partnership (PPP) model permits the use of the Small Satellite Launch Vehicle (SSLV), and knowledge transfers to these start-ups for use in Low Earth Orbit (LEO) satellites.</p>
<p>I had the opportunity to interact with a number of distinguished speakers, dignitaries, start-up founders, and leaders in the space sector during the event. We look forward to continuing our engagement to nurture and lead space initiatives. Tarides is proud to be one of the DeepTech Space startups in India!</p>
<h2>IoT Solutions World Congress 2023, Barcelona</h2>
<p>Tarides has also been participating in IoT conferences to foster collaboration and explore potential partnerships. <a href="https://space-os.eu/">Space-OS</a> was launched at the <a href="https://www.iotsworldcongress.com">IoT Solutions World Congress</a> held between January 31-February 2, 2023 at Gran Via, Fira de Barcelona. Tarides showcased SpaceOS using an exhibit booth that we shared with <a href="https://systematic.com/en-gb/">Systematic</a> and Région Île-de-France in the ‘La French Tech’ area of the conference.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Shakthi3-170w~LNl_QbpqWSYYffOltXkI7g.webp 170w, /blog/images/Shakthi3-340w~QDyarHsrka5U5RJPzCZEVw.webp 340w, /blog/images/Shakthi3-680w~h0NCzgd-sFDWrESl5s3-iQ.webp 680w, /blog/images/Shakthi3-1360w~wJ8iVAhJZF_PhDaZjBAa0A.webp 1360w" src="/blog/images/Shakthi3-1360w~wJ8iVAhJZF_PhDaZjBAa0A.webp" alt="A monitor showing a slide which says SpaceOS - the foundation for the next generation of satellites. Tarides will develop the next generation of satellite OS – for Europe's space industry, based on the best research from Inria and The University of Cambridge. Bullet points: smaller, more secure, faster, more flexible. Arrow pointing to: Satellites are much more secure, more capable, more flexible and more sustainable vs. any competitor. The monitor itself is on a wall with the logos of Tarides, Systematic, and Région Île de France"></p>
<p>SpaceOS provides a secure and efficient solution for multiuser and multimission satellites. It is built on unikernels and uses the <a href="https://mirage.io/">MirageOS</a> library operating system. Its compact and flexible design allows users to build and deploy custom applications with ease. The platform can also be used for IoT and edge computing applications, including in simulators to speed development and testing of satellites. Moreover, we see a 20x smaller memory footprint and faster processing time as compared to current solutions in the market. If you would like to know more about the specs of SpaceOS, you can <a href="mailto:sales@tarides.com">contact us</a> and we will be happy to share more information with you.</p>
<p>We had a number of visitors at our booth with whom we discussed the many possibilities where our OS-model could be used including in automobile, healthcare, energy, manufacturing, factory, and industrial use cases. According to a 2022 survey by <a href="https://www.eclipse.org/org/foundation/">the Eclipse Foundation</a>, most <a href="https://5413615.fs1.hubspotusercontent-na1.net/hubfs/5413615/2022%20IoT%20&amp;%20Edge%20Developer%20Survey%20Report.pdf">hardware devices currently use a memory-unsafe programming language</a>. You are
encouraged to read our earlier article on how <a href="/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application/">your programming language can impact on  cybersecurity of your application</a>. Although our focus for the coming years will primarily be on the space and satellite industry, we are also exploring IoT solutions for different verticals.</p>
<h2>IOTshow.in 2022, Bengaluru</h2>
<p>The <a href="https://www.iotshow.in/">iotshow.in</a> conference was held at the Karnataka Trade Promotion Organization (KTPO) in Whitefield, Bengaluru, between November 23-25, 2022. The conference had a B2B Expo which was a triad of India Electronics Week, IOTshow.in, and smartBHARAT. The three-day event had over 10,000 registrations, 120 speakers, and 300 different IoT brands.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Shakthi1-170w~cm0i0clC5ChIzLItwJIuNQ.webp 170w, /blog/images/Shakthi1-340w~dvhvGCqCk82fzAJLgB5jBw.webp 340w, /blog/images/Shakthi1-680w~UKthfFYED6MWMRVLcqP1rA.webp 680w, /blog/images/Shakthi1-1360w~7dhc-spz4fqleE0NXk5iHA.webp 1360w" src="/blog/images/Shakthi1-1360w~7dhc-spz4fqleE0NXk5iHA.webp" alt="Shakthi is standing in front of a sign which says &quot;thank you to our speakers &amp; advisors in 2022&quot;, the sign has headshots of different speakers at the conference."></p>
<p>I gave a talk on "Foundations to Security by Design: A Novel Approach to Building Type-Safe, Modular, and Efficient IoT Solutions" where I introduced the current challenges in the IoT industry with regard to security. I followed this up with the concept of unikernels and how MirageOS addresses these security issues. I also gave OCaml industrial use cases such as <a href="https://hyper.systems">Hyper</a>, <a href="https://www.nitrokey.com">Nitrokey</a>, and <a href="https://www.thalesgroup.com/en">Thales</a>. A demo running a Solo5 unikernel that hosts the MirageOS <code>www-htdocs</code> consuming very little memory was demonstrated on a Raspberry Pi! A <a href="https://watch.ocaml.org/w/1RhQXW5NNqRtdxXgi12pD9">demo video</a> is available.</p>
<h2>See You Soon</h2>
<p>Tarides was once again a silver sponsor at the 28th ACM SIGPLAN <a href="https://icfp23.sigplan.org">International Conference on Functional Programming</a> held in Seattle between 4-9 September 2023. I had some fruitful discussions at that conference as well, and enjoyed meeting more like-minded people.</p>
<p>I would like to thank Tarides for sponsoring my travel to the conferences, and I look forward to more interactions with industry representatives! Conferences offer a great opportunity to learn from others, including exploring the needs and challenges of different sectors and how Tarides can provide solutions. I hope to see you around soon!</p>
]]></description><link>https://tarides.com/blog/2023-09-20-a-year-of-spaceos-showing-the-world-the-benefits-of-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2023-09-20-a-year-of-spaceos-showing-the-world-the-benefits-of-ocaml.html</guid><dc:creator><![CDATA[ Shakthi Kannan ]]></dc:creator><pubDate>Wed, 20 Sep 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Our Experience at Tarides: Projects From Our Internships in 2023]]></title><description><![CDATA[<h2>Internships at Tarides</h2>
<p>We regularly have the pleasure of hosting internships where we work with engineers from all over the world on a diverse range of projects. By collaborating with people who are relatively new to the OCaml ecosystem, we get to benefit from their perspective. Seeing things with fresh eyes helps with identifying holes in documentation, gaps in workflows, as well as other ways to improve user experience.</p>
<p>In turn, we offer interns the opportunity to work on a project in OCaml in close collaboration with a mentor. This affords participants a great deal of independence, while still having the support and expertise of an experienced engineer at their disposal. During the course of their internship, participants will learn more about OCaml and strengthen their skills in functional programming. They will also have the chance to complete a project with real-world implications, contributing meaningfully to an open-source ecosystem.</p>
<p>Does this sound like something you would like to do? Appplications for our next round of internships open early next year, and you will be able to apply <a href="/careers/">on our website</a> around that time.</p>
<p>Let's check out some reports from this summer's internships, and see what the teams got up to!</p>
<h2>Dipesh: Par_incr - A Library for Incremental Computation With Support for Parallelism</h2>
<h3>Background</h3>
<p>I am a final year CS student from <a href="https://www.nitt.edu/">NIT Trichy</a>. I had tried to learn Haskell in my second year but didn't really succeed. I enjoy learning about languages and their features, however, so I had learnt some OCaml by the end of my third year but not tried out any fancy features.</p>
<p>I found out about the internship from X (Twitter) in one of <a href="https://x.com/kc_srk?s=20">KC's tweets</a>, but I knew about Tarides and the good work they do since I had worked with KC in the past. I messaged him to check the rules and ask if recent graduates could apply. He confirmed that they could and encouraged me to apply.</p>
<p>The interview itself was very pleasant; it was as if it was just me talking and discussing things with interviewers (all interviews ever should be like this!). I thought I wouldn't get it but thankfully I did.</p>
<h3>Goal of the Project</h3>
<p>The goal of my project was to build an incremental library with support for parallelism constructs using OCaml 5.0. Incremental computation is a software feature which attempts to optimise efficiency by only recomputing outputs that depend on changed data. The library we built, <a href="https://github.com/ocaml-multicore/par_incr">Par_incr</a>, takes advantage of the new parallelism features in OCaml 5.0 to create an even more efficent incremental computation library.</p>
<h3>Journey</h3>
<p>I was somewhat familiar with OCaml so I brushed up on some concepts using the <a href="https://dev.realworldocaml.org">Real World OCaml</a> textbook. <a href="https://ocaml.org/docs">OCaml.org</a> also has a lot of resources for learning OCaml aimed at programmers of any level(beginner to advanced). For any non-trivial doubts, I would just ask my amazing mentor (Vesa) or someone else at Tarides (you can always find someone who's an expert in whatever question you have relating to OCaml) for help.</p>
<p>Initially, we wanted to finalise the module signature for the library. Vesa suggested a Monadic interface for the library, and it felt like the right choice.</p>
<p>After that was done, I started on the implementation and got something working. We wanted to check how it fared against existing libraries, so I wrote benchmarks comparing the library to <a href="https://github.com/ocurrent/current_incr">current_incr</a> and <a href="https://github.com/janestreet/incremental/">incremental</a>.</p>
<p>I remember one particular bug on which I wasted almost 2 full days. I had something like this in the code:</p>
<pre><code><span class="ocaml-source">      </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">is_same</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Reader_list</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">readers</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Rsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">RNode</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">mark_dirty</span><span class="ocaml-source">
</span></code></pre>
<p>which should've actually been like this:</p>
<pre><code><span class="ocaml-source">      </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">is_same</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Reader_list</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">readers</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Rsp</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">RNode</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">mark_dirty</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>This caused a huge performance hit because it would cause a lot of
unnecessary work. You can <a href="https://ocaml-multicore.github.io/par_incr/par_incr/Par_incr/index.html">learn more about the library from
here</a>.</p>
<p>Debugging this was quite fun and frustrating. It didn't even occur to me that this part could be the problem, so I was banging my head against the wall thinking I did something wrong somewhere else. I was trying out different things, but thankfully making changes to the code was enjoyable because the typechecker was always there holding my hand.</p>
<p>Overall it was an amazing journey. Getting to work in such an amazing environment here was a blessing for me, and I'm very grateful to have gotten this opportunity. I learnt a lot from Vesa throughout the
internship and from many amazing folks at Tarides.</p>
<h3>Challenges</h3>
<p>The biggest challenge was to make the library performant. Since OCaml is a language with a garbage collector, you have to take special care when allocating things, since allocation isn't cheap. Another difficulty was trying to find things relating to compiler internals, so how certain things get compiled when certain optimisations
kick in, etc. This is something that can be improved, but I get that it's quite
difficult to keep track of documentation of large open-source compiler
codebases that keep changing.</p>
<h3>Takeaways and Best Parts</h3>
<p>The best part was learning about optimisations, profiling, benchmarking, and
improving performance, looking into assembly trying to figure out whether some things got inlined, as well as my discussions with Vesa.</p>
<p>The discussions with Vesa made me want to explore Emacs more, and his advice will definitely help me throughout my career. I'm also much more confident in OCaml and will probably use it whenever possible. I got to learn about all sorts of cool things being done by the Multicore team and other Tarides folks.</p>
<h2>Shreyas: Olinkcheck</h2>
<h3>Background</h3>
<p>I'm a final year CS student from <a href="https://www.nitt.edu/">NIT Trichy</a>. I had never been exposed to functional programming before, but I had heard cool things about Haskell and OCaml and how Rust features were inspired by these languages. I also followed <a href="https://x.com/kc_srk?s=20">KC on Twitter</a> from before, when I had been researching internships and professors whose work I found interesting.</p>
<p>When KC tweeted about openings for interns at Tarides, I opened the application doc to read about all the cool projects listed, but I didn't know any functional programming. I still applied anyways, thinking that the worst that could happen is I get rejected, no big deal.</p>
<p>Fast forward to a really fun interview. (No Data Structures and Algorithms? Yay! Easily my favorite interview experience so far.) It was more of a discussion than a question-and-answer.</p>
<h3>Goal of the Project</h3>
<p>The goal of my project was to create <a href="https://github.com/tarides/olinkcheck">a tool that could be used to check for broken HTTP links</a>, as well as present the broken link information to the user. The tool would then be integrated into <a href="https://ocaml.org">OCaml.org</a> through GitHub, to check for broken links on the website. Since OCaml.org is such a large website with lots of content, it is difficult to manually keep up with all the links. However, broken links negatively impact the user experience, and may also make pages on the website less visible to people who would otherwise be able to find the information they need.</p>
<h3>Journey</h3>
<h4>Learning OCaml</h4>
<p>I used these resources to learn OCaml:</p>
<ul>
<li>From the book '<a href="https://dev.realworldocaml.org">Real World OCaml</a>'</li>
<li>From <a href="https://ocaml.org/docs">ocaml.org/learn</a></li>
<li>By reading others' code</li>
<li>Writing something and changing it until the compiler stops complaining</li>
<li><a href="https://github.com/ocaml-community/utop">UTop</a></li>
<li><a href="https://stackoverflow.com">Stackoverflow</a></li>
<li>Setting up a developer environment (I was convinced by friends at college that 'real programmers' use Vim / Emacs on Arch Linux)</li>
</ul>
<h4>Categories of Programmers and Categories in Programming</h4>
<p>I spent some time going through library code to figure out how to actually use it. I could hack something together to work for Markdown files, and I slowly learned how to write more idiomatic OCaml (thanks to my mentor Cuihtlauac). As an imperative programmer, I was used to giving names to intermediate things, which wasn't really necessary with OCaml.</p>
<p>I learnt a bit about <a href="https://github.com/ocsigen/lwt"><code>Lwt</code></a> and came across the term <code>Monad</code>, which is, of course, as is widely known - a monoid in the category of endofunctors. (Thankfully there were much better explanations and documentation online).</p>
<p>Everything was going fine - I was slowly iterating on the code, making it incrementally better and adding more tests, until the first major rewrite. I was using an outdated version of a library!
That wasn't too painful, I knew what parsing code looked like already - but the structure of the document was now different.
Another library (<a href="https://github.com/aantron/hyper"><code>hyper</code></a>) had unfixed issues for over a year, so I swapped that out too.</p>
<p>I went back to my old habit of writing imperative OCaml (!) using <code>ref</code>s. They have their place, but can be avoided when it's possible. But this was important - it helped me really imbibe the idea that functions are first class, what functional code looks like, and how I can start thinking like a functional programmer. The humble looking <code>List.fold_left</code> was the key to my enlightenment.</p>
<p>Or so I thought. I hadn't met functors yet. It is, after all, just a mapping between two categories. (No, please.)
Again, Cuiht really broke it down to a point where I could start understanding what a functor in OCaml is, which eventually led me to discover the power of the OCaml module system.</p>
<h4>Seeing it Work</h4>
<p>After some "hacky" fixes and regular expression magic (resulting from a lot of discussions with Sabine, because I thought I hit a fundamental roadblock here and thought it might be very hard to do the project (!)), I could get it to run as a GitHub CI action, which lead to an automated <a href="https://github.com/ocaml/ocaml.org/pull/1354">pull request</a>. I could also integrate it into Voodoo, the package documentation generator, and it is now being tested in the staging pipeline.</p>
<h4>I've Had it All Wrong From the Beginning</h4>
<p>By this time I had read a lot of other people's code and learnt enough from Cuiht to realise, yet again, that my code was bad. The functional programmer doesn't rely on the name of the function (what does the function <code>v</code> do? Or <code>pp</code>?). The meaning is taken from the context and the <em>signature</em>. So I had functions that looked like</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">do_this_thing</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">c</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">d</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>with no clue as to what those arguments mean. Someone reading the code would be forced to look into the source code to understand what that means. Now my target was to have a decent looking interface when someone said <code>#show Olinkcheck;;</code> on <code>utop</code>. That's how I used other libraries, so I wanted others to be able to use mine like that too.</p>
<h3>Biggest Challenge</h3>
<p>My project was a <em>practical</em> problem, as opposed to a theoretical one like a data structure. So the challenges were also <em>practical</em>. Not everyone follows the same formatting while writing text-based files (let's first agree on tabs vs spaces?), and not all parsers are perfect. In the ideal world I could manipulate a syntax tree data structure which turns back into a string with the original formatting, webservers wouldn't care how many links I request from them, and there would be well defined regular expressions to find URLs amongst other text, but alas, no. None of these things are true. Text based data is convenient because of the loose requirements. Webservers can't realistically be fine with a user asking it for 7000+ links in a short time.</p>
<h3>The Best Part</h3>
<p>The best part for me was easily the opportunity to learn from people who are much more experienced than I am and to see something written by me be actually used in the real world.</p>
<h2>Adithya: Domain-Safe Data Structures for Multicore OCaml</h2>
<h3>Background</h3>
<p>I am a final year CS student at <a href="https://www.nitk.ac.in">NITK Surathkal</a>. Before this internship, I had only done a little bit of functional programming in <a href="https://www.scala-lang.org">Scala</a>, so programming in OCaml was something very new to me. However, I was pretty excited to work on this because OCaml had only recently got Multicore support, and it was a niche area to explore.</p>
<p>I got to know about the internship from one of KC's tweets, and how I got to know about KC and the work he does is a pretty random incident where I needed his help to contact another professor to discuss some of my previous research internship work in a related area.</p>
<p>The interview experience was amongst the best ones I've had, very open ended discussions and friendly interviewers.</p>
<h3>Goal of the Project</h3>
<p>I was a part of the Multicore applications team and was mentored by Carine. The goal of my project was to add lock-based data structures to the <a href="https://github.com/ocaml-multicore/saturn">Saturn</a> library that maintains parallelism-safe data structures for Multicore OCaml.</p>
<p>The first step was to create a bounded queue, which is based on a Michael Scott queue. This type of queue has two locks, one for the head and one for the tail node. I also investigated fine-grained versus coarse-grained lists, double-linked lists, and finally a <a href="/blog/2023-08-07-kcas-building-a-lock-free-stm-for-ocaml-1-2/">lock-free priority queue</a> which was implemented on top of a lock-free skiplist.</p>
<p>Towards the later part of the internship, I also worked on lock-free data structures.</p>
<h3>Journey</h3>
<p>Initially, I started off slow since I was just getting familiar with the OCaml environment and language features. My main 2 resources to learn Ocaml was <a href="https://dev.realworldocaml.org/">Real World OCaml</a> and OCaml.org. Other than this, I spent a significant amount of time going through the book called <a href="https://www.sciencedirect.com/book/9780124159501/the-art-of-multiprocessor-programming">The Art of Multiprocessor Programming</a>, since that was the main reference point for my project. I also had to dive into some research papers cited in the book to get a better understanding of the implementation and some nitty-gritty details.</p>
<p>Over the course of the internship, I gained a lot of insights about minor details while programming for multicore systems, as well as OCaml language features that can have a significant impact on performance. Something that never struck me before was how much worse using structural equality (=) instead of physical equality (==) could be depending on the scenario.</p>
<p>Since I was interning on-site at the Paris office, it was very easy for me to clarify any doubts or difficulties I faced whenever required, as most people at Tarides have a very high level of expertise in OCaml and are really helpful. I often had to rewrite many functions or make major changes, but thanks to OCaml features such as static checking and type inference, it was pretty easy and relatively quick to make those modifications.</p>
<h3>Challenges</h3>
<p>The biggest challenge was debugging and reasoning about performance of one implementation over the other. Since I was writing parallel programs, debugging was difficult because of the many edge case scenarios that are hard to detect and can lead to deadlocks or errors in output. I remember spending an entire day sometimes finding the bug, but in the end it was really satisfying to fix it. Comparing different implementations and trying to find if any possible optimisations can be done was quite interesting and challenging.</p>
<h3>The Best Part</h3>
<p>Compared to my previous internships, Tarides was a unique experience since it is a pretty small company with a great culture working on some niche areas. There aren't many other places doing this kind of work. So if someone is interested in computer systems and programming languages, I would definitely recommend them to intern here. Getting the opportunity to work from the Paris office and visit Europe was definitely an unexpected yet pleasant surprise.</p>
<h2>Want to Strengthen Your OCaml Skills?</h2>
<p>If you're looking to learn more about functional programming in a supportive environment, you sound like an excellent candidate for our next round of internships! The next round is coming up early next year and we would be delighted if you would apply! Keep an eye on <a href="/careers/">our website</a> for more information or <a href="/contact/">contact us</a> here.</p>
]]></description><link>https://tarides.com/blog/2023-09-15-our-experience-at-tarides-projects-from-our-internships-in-2023</link><guid isPermaLink="false">https://tarides.com/blog/2023-09-15-our-experience-at-tarides-projects-from-our-internships-in-2023.html</guid><dc:creator><![CDATA[ Dipesh Kafle, Shreyas Krishnakumar, Adithya Chandraserry ]]></dc:creator><pubDate>Fri, 15 Sep 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[The State of the Art in Functional Programming: Tarides at ICFP 2023]]></title><description><![CDATA[<h2>ICFP 2023</h2>
<p>The 28th <a href="https://www.sigplan.org">ACM Sigplan</a> <a href="https://icfp23.sigplan.org">International Conference on Functional Programming</a> is taking place in Seattle as I’m typing. This is the largest international research conference on functional programming, and this year’s event features fascinating keynotes (including one from OCaml’s very own Anil Madhavapeddy!), deep dives on various topics like compilation and verification, tutorials, networking opportunities, and <a href="https://icfp23.sigplan.org/program/program-icfp-2023/Session-Timeline">workshops</a> on several functional programming languages.</p>
<p>Out of this veritable cornucopia of things to do and see, we’re of course most excited about the <a href="https://icfp23.sigplan.org/home/ocaml-2023#program">OCaml Workshop</a>. The OCaml Users and Developers Workshop brings together a diverse group of experts and enthusiasts, from academia and businesses using OCaml in practice, to present and discuss recent developments in the OCaml ecosystem. This year, that includes presentations on everything from MetaOCaml, to an effects-based I/O in OCaml 5, and a complete OCaml compiler for WebAssembly. You can keep up with the conference on <a href="https://www.youtube.com/@acmsigplan">ACM Sigplan’s YouTube channel</a> where talks are being live streamed.</p>
<p>At Tarides, our mission is to bring sustainable and secure software infrastructure to the world, and a powerful way to achieve this is  by supporting forums that promote these goals. ICFP fosters the sharing of ideas, research, and implementation of sound functional programming principles, which is why Tarides is proud to be a silver sponsor of this year’s ICFP conference.</p>
<p>Several colleagues from Tarides are participating in the OCaml Workshop presenting their hard work and research on extending the language, type system, and tooling. In this post, I will give you an overview of each presentation from the Tarides team. Check out <a href="https://icfp23.sigplan.org/program/program-icfp-2023/?&amp;past=Show%20upcoming%20events%20only&amp;track=OCaml&amp;date=Sat%209%20Sep%202023">the OCaml Workshop program</a> if you would like to explore it on your own.</p>
<h2>Tarides at ICFP</h2>
<h3><a href="https://icfp23.sigplan.org/details/icfp-2023-icfp-keynotes/50/Programming-for-the-planet">ICFP Keynote - Programming for the Planet </a></h3>
<p>Anil Madhavapeddy, our partner at the University of Cambridge, held a morning keynote speech on the role of computer systems in analysing complex data from around the globe to aid conservation efforts. Anil argues that using functional programming can lead to systems that are more resilient, predictable, and reproducible. In his presentation, he outlines the benefits of using functional programming in planetary science, and how the cross-disciplinary research his team is doing is having a tangible impact on conservation projects.</p>
<p>For more information on how Anil is using functional programming to help the planet, you can visit the <a href="https://4c.cst.cam.ac.uk/">Cambridge Centre for Carbon Credits’s website</a>. To understand how OCaml and SpaceOS will become the new global standard for satellites, you can read our <a href="/blog/2023-07-31-ocaml-in-space-welcome-spaceos/">blog post on SpaceOS</a>.</p>
<h3><a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/5/Eio-1-0-Effects-based-IO-for-OCaml-5">Eio 1.0 - Effects Based I/O for OCaml 5</a></h3>
<p>This talk introduces the concurrency library <a href="https://github.com/ocaml-multicore/eio">Eio</a> and the main features of the 1.0 release. After the release of OCaml 5, which brought support for effects and Multicore, there was demand for a new I/O library in OCaml that would unify the community around a single I/O API as well as introduce new modern features to OCaml’s I/O support.</p>
<p>The presentation outlines how Eio is structured, including how it uses effects so that operations don’t block the whole domain, and also highlights significant new features including modularity, integrations, and tracing. If you’re curious to know more about OCaml’s new concurrency library, check out <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/5/Eio-1-0-Effects-based-IO-for-OCaml-5">the presentation on Eio 1.0</a> on Saturday the 9th of September.</p>
<h3><a href="https://icfp23.sigplan.org/details/icfp-2023-tutorials/4/Porting-Lwt-applications-to-OCaml-5-and-Eio">Tutorial - Porting Lwt Applications to OCaml 5 and Eio</a></h3>
<p>Thomas Leonard and Jon Ludlam present a tutorial on porting Lwt applications to OCaml 5 and Eio. The tutorial shows users how to incrementally convert an existing Lwt application to Eio using the<code>Lwt_eio</code> compatibility package. Doing so will usually result in simpler code, better diagnostics, and better performance.</p>
<p>If you can’t attend the tutorial at ICFP, you can check out the <a href="https://github.com/ocaml-multicore/icfp-2023-eio-tutorial">instructions on GitHub</a> and follow the steps. Please let us know how well the tutorial works for you, and if you have any questions don’t hesitate to ask!</p>
<h3><a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/12/Runtime-Detection-of-Data-Races-in-OCaml-with-ThreadSanitizer">Runtime Detection of Data Races in OCaml with ThreadSanitizer</a></h3>
<p>This presentation from Olivier Nicole and Fabrice Buoro focuses on <a href="https://github.com/ocaml-multicore/ocaml-tsan">ThreadSanitizer</a> (TSan) and its ability to detect data races at runtime. With the new possibilities that parallel programming in OCaml brings, it also results in new kinds of bugs. Amongst these bugs, data races present a real danger as they are difficult to detect and can lead to very unexpected results.</p>
<p>That’s where TSan comes in! TSan is an open source library and program instrumentation pass to reliably detect data races at runtime. The presentation covers example usages of TSan, a look into how it works, interesting insights like challenges and limitations of the project, as well as related work including static and runtime detection. There will also be a demo of how to use it in your own code. If you want to know more, have a look at the <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/12/Runtime-Detection-of-Data-Races-in-OCaml-with-ThreadSanitizer">talk on TSan at ICFP</a>.</p>
<h3><a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/6/Building-a-lock-free-STM-for-OCaml">Building a Lock-Free STM for OCaml</a></h3>
<p>This talk describes the process by which the <code>kcas</code> library, first developed to provide a primitive atomic lock-free multi-word compare-and-set operation, was recently turned into a proper lock-free software transactional memory implementation.  By using transactional memory as an abstraction, Kcas offers developers both a relatively familiar programming model and composability.</p>
<p>The presentation details how Kcas composes transactions, its use cases and any trade offs, as well as the <a href="/blog/2023-08-10-kcas-building-a-lock-free-stm-for-ocaml-2-2/">process behind how its design has evolved</a> to its current state. Discover the full details by listening to the <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/6/Building-a-lock-free-STM-for-OCaml">talk on Kcas</a>, taking place on Saturday the 9th at the OCaml Workshop.</p>
<h3><a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/15/State-of-the-OCaml-Platform-2023">State of the OCaml Platform in 2023</a></h3>
<p>The final presentation of the workshop provides an update on the <a href="https://ocaml.org/docs/platform">OCaml Platform</a>, including progress over the past few years and a roadmap for future work. The OCaml Platform has grown from one tool, opam, to a complete toolchain of reliable tools for OCaml developers.</p>
<p>The talk covers the main milestones of the past three years, including the release of <a href="https://github.com/ocaml/odoc"><code>odoc</code></a> and the widespread adoption of <a href="https://github.com/ocaml/dune">Dune</a>, before looking at the <a href="https://github.com/tarides/ocaml-platform-roadmap">goals for the future</a> which include seamless editor integration and filling in gaps in the OCaml development workflows. Be sure to check out the <a href="https://icfp23.sigplan.org/details/ocaml-2023-papers/15/State-of-the-OCaml-Platform-2023">presentation on the OCaml Platform</a> for more context and information.</p>
<h2>We’d Love to Hear from You!</h2>
<p>If you’re at ICFP please come and say hi, we’d love to chat about everything OCaml with you!  The OCaml Workshop is located in the <a href="https://icfp23.sigplan.org/room/icfp-2023-venue-grand-crescent">Grand Crescent</a>, and the tutorial on Eio is at <a href="https://icfp23.sigplan.org/room/icfp-2023-venue-st-helens">St Helens</a>. The talks are available on <a href="https://www.youtube.com/@acmsigplan">ACM Sigplan’s youtube channel</a> for remote viewing.</p>
<p>You can always <a href="https://bsky.app/profile/tarides.com">message us on Bluesky</a>, or chat with the larger OCaml community on <a href="https://discuss.ocaml.org/">Discuss</a>. Look out for more content on <a href="/">Tarides.com</a> coming your way soon and sign up to our <a href="/contact/">newsletter</a> for up to date content - until next time!</p>
]]></description><link>https://tarides.com/blog/2023-09-08-the-state-of-the-art-in-functional-programming-tarides-at-icfp-2023</link><guid isPermaLink="false">https://tarides.com/blog/2023-09-08-the-state-of-the-art-in-functional-programming-tarides-at-icfp-2023.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 08 Sep 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Your Programming Language and its Impact on the Cybersecurity of Your Application]]></title><description><![CDATA[<p><strong>Did you know that the programming language you use can have a huge impact on the cybersecurity of your applications?</strong></p>
<p>In a 2022 meeting of the Cybersecurity Advisory Committee, the Cybersecurity and Infrastructure Security Agency’s Senior Technical Advisor Bob Lord commented that: <a href="https://www.nextgov.com/cybersecurity/2022/12/federal-government-moving-memory-safety-cybersecurity/381275/">“About two-thirds of the vulnerabilities that we see year after year, decade after decade”</a> are related to memory management issues.</p>
<h2>Memory Unsafe Languages</h2>
<p>One can argue that cyber vulnerabilities are simply a fact of life in the modern online world, which is why every application needs robust cyber security protections (applications, libraries, middleware, operating systems, tools, etc.). While this argument is not technically incorrect, there are still significant differences in the intrinsic security levels of different programming languages.</p>
<p>Computing devices today have access to huge amounts of memory in order to store, process, and retrieve information. Programming languages are used to describe the operations that a device needs to perform. The computer then interprets these operations to access and manipulate memory (of course, programming languages do many other things as well).</p>
<p>Among the various language paradigms, there are some widely used ones such as C and C++ that allow the developer to directly manipulate hardware memory. However, when a programmer writes code using these languages, it could result in attackers gaining access to hardware, stealing data, denying access to the user, and performing other malicious activities. Hence, these programming languages are termed as “memory-unsafe” languages.</p>
<h2>Impact of Memory Exploits</h2>
<p>Around 60-70% of cyber attacks (attacks on applications, the operating system, etc.) are due to the use of these memory-unsafe programming languages.</p>
<p>This remains true for any computing platform. Memory issues represented around <a href="https://github.com/google/sanitizers/blob/master/hwaddress-sanitizer/MTE-iSecCon-2018.pdf">65% of critical security risks in the Chrome browser and Android operating system</a>. Similarly, memory unsafety issues also represented around <a href="https://alexgaynor.net/2020/may/27/science-on-memory-unsafety-and-security/">65% of total reported issues for the Linux kernel in 2019</a>. The Chromium web browser project has also reported that <a href="https://www.chromium.org/Home/chromium-security/memory-safety/">70% of high-severity security bugs</a> were related to memory safety. In iOS 12, <a href="https://support.apple.com/en-us/HT209192">66.3% of vulnerabilities</a> were related to handling memory.</p>
<h2>The Solution: Memory Safety</h2>
<p>All this begs the question: is there a solution that can eliminate risks that exist due to a programming language’s design, or is the only solution to use several layers of cybersecurity protection (code hardening, firewalls, etc.)?</p>
<p>Many cybersecurity and technology experts recommend using a “memory-safe” programming language, where a number of validation checks are performed during the translation from the human-readable programming language to the format that the machine reads. Many such programming languages exist, giving the developers several choices, for example: Go, Java, Ruby, Swift, and OCaml are all memory safe.</p>
<p>Does this mean that memory-safe languages are protected from all cyber attacks? No, but 60-70% of attacks are <strong>by design</strong> not permitted by the language. That is why most memory safe languages also offer crypto libraries, formal verification, and more in order to ensure the best possible cyber protection in addition to the strong protection the language itself provides. Of course, you also need to follow industry best practices for physical security, access controls, firewalls, data protection techniques, and other defence mechansims for people-centric security.</p>
<p>If you already work using memory-safe programming languages, you are on the right track. If you don’t, we would be glad to tell you why companies like Jane Street, Tezos, Microsoft, Tarides, and Meta use OCaml to provide not only the best possible cybersecurity but also exceptional coding flexibility.</p>
<p>Don’t hesitate to contact us via sales@tarides.com for more information or with any questions you may have.</p>
<p><strong>References</strong></p>
<ol>
<li>
<p>Report: Future of Memory Safety. <a href="https://advocacy.consumerreports.org/research/report-future-of-memory-safety/">https://advocacy.consumerreports.org/research/report-future-of-memory-safety/</a></p>
</li>
<li>
<p>NSA releases guidance on how to protect against software memory safety issues. <a href="https://www.nsa.gov/Press-Room/News-Highlights/Article/Article/3215760/nsa-releases-guidance-on-how-to-protect-against-software-memory-safety-issues/">https://www.nsa.gov/Press-Room/News-Highlights/Article/Article/3215760/nsa-releases-guidance-on-how-to-protect-against-software-memory-safety-issues/</a></p>
</li>
<li>
<p>The Federal Government is moving on memory safety for Cybersecurity. <a href="https://www.nextgov.com/cybersecurity/2022/12/federal-government-moving-memory-safety-cybersecurity/381275/">https://www.nextgov.com/cybersecurity/2022/12/federal-government-moving-memory-safety-cybersecurity/381275/</a></p>
</li>
<li>
<p>Memory Safety Convening Report 1.1. <a href="https://advocacy.consumerreports.org/wp-content/uploads/2023/01/Memory-Safety-Convening-Report-1-1.pdf">https://advocacy.consumerreports.org/wp-content/uploads/2023/01/Memory-Safety-Convening-Report-1-1.pdf</a></p>
</li>
<li>
<p>Chromium project memory safety. <a href="https://www.chromium.org/Home/chromium-security/memory-safety/">https://www.chromium.org/Home/chromium-security/memory-safety/</a></p>
</li>
</ol>
]]></description><link>https://tarides.com/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application</link><guid isPermaLink="false">https://tarides.com/blog/2023-08-17-your-programming-language-and-its-impact-on-the-cybersecurity-of-your-application.html</guid><dc:creator><![CDATA[ Shakthi Kannan ]]></dc:creator><pubDate>Thu, 17 Aug 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Kcas: Building a Lock-Free STM for OCaml (2/2)]]></title><description><![CDATA[<p>This is the follow-up post continuing the discussion of the development of Kcas.
<a href="/blog/2023-08-07-kcas-building-a-lock-free-stm-for-ocaml-1-2/">Part 1</a> discussed the development done on the library to improve
performance and add a transaction mechanism that makes it easy to compose
atomic operations without really adding more expressive power.</p>
<p>In this part we'll discuss adding a fundamentally new feature to Kcas that makes it into a proper STM implementation.</p>

<h3>Get Busy Waiting</h3>
<p>If shared memory locations and transactions over them essentially replace
traditional mutexes, then one might ask what replaces condition variables. It is
very common in concurrent programming for threads to not just want to avoid
stepping on each other's toes, or the I of
<a href="https://en.wikipedia.org/wiki/ACID">ACID</a>, but to actually prefer to follow in each other's
footsteps. Or, to put it more technically, wait for events triggered
or data provided by other threads.</p>
<p>Following the approach introduced in the paper
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/2005-ppopp-composable.pdf?from=https://research.microsoft.com/~simonpj/papers/stm/stm.pdf">Composable Memory Transactions</a>,
I implemented a retry mechanism that allows a transaction to essentially wait on
arbitrary conditions over the state of shared memory locations. A transaction
may simply raise an exception,
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Retry/index.html#exception-Later"><code>Retry.Later</code></a>,
to signal to the commit mechanism that a transaction should only be retried
after another thread has made changes to the shared memory locations examined by
the transaction.</p>
<p>A trivial example would be to convert a non-blocking take on a queue to a
blocking operation:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">take_blocking</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Queue</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">take_opt</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Retry</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">later</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">elem</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">elem</span><span class="ocaml-source">
</span></code></pre>
<p>Of course, the
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas_data/Kcas_data/Queue/index.html"><code>Queue</code></a>
provided by <strong>kcas_data</strong> already has a blocking take which essentially results in the above
implementation.</p>
<p>Perhaps the main technical challenge in implementing a retry mechanism in
multicore OCaml is that it should perform blocking in a scheduler friendly
manner such that other fibers, as in
<a href="https://github.com/ocaml-multicore/eio">Eio</a>, or tasks, as in
<a href="https://github.com/ocaml-multicore/domainslib">Domainslib</a>, are not prevented
from running on the domain while one of them is blocked. The difficulty with
that is that each scheduler potentially has its own way for suspending a fiber
or waiting for a task.</p>
<p>To solve this problem such that we can provide an updated and convenient blocking experience, we introduced a library that provides a
<a href="https://github.com/ocaml-multicore/domain-local-await/">domain-local-await</a>
mechanism, whose interface is inspired by Arthur Wendling's
<a href="https://github.com/ocaml-multicore/saturn/pull/68">proposal</a> for the Saturn
library. The idea is simple. Schedulers like Eio and Domainslib install their
own implementation of the blocking mechanism, stored in a domain local variable,
and then libraries like Kcas can obtain the mechanism to block in a scheduler
friendly manner. This allows blocking abstractions to not only work on one
specific scheduler, but also allows blocking abstractions to work
<a href="https://discuss.ocaml.org/t/interaction-between-eio-and-domainslib-unhandled-exceptions/11971/10">across different schedulers</a>.</p>
<p>Another challenge is the desire to support both conjunctive and disjunctive
combinations of transactions. As explained in the paper
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/2005-ppopp-composable.pdf?from=https://research.microsoft.com/~simonpj/papers/stm/stm.pdf">Composable Memory Transactions</a>,
this in turn requires support for nested transactions. Consider the following attempt at a
conditional blocking take from a queue:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">non_nestable_take_if</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Queue</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">take_blocking</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Retry</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">later</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<p>If one were to try to use the above to take an element from the
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-first"><code>first</code></a>
of two queues</p>
<pre><code class="language-ml">Xt.first [
  non_nestable_take_if predicate queue_a;
  non_nestable_take_if predicate queue_b;
]
</code></pre>
<p>one would run into the following problem: while only a value that passes the
predicate would be returned, an element might be taken from both queues.</p>
<p>To avoid this problem, we need a way to roll back changes recorded by a
transaction attempt. The way Kcas supports this is via an explicit scoping
mechanism. Here is a working (nestable) version of conditional blocking take:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">take_if</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">snap</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">snapshot</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Queue</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">take_blocking</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Retry</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">later</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">rollback</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">snap</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<p>First a
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-snapshot"><code>snapshot</code></a>
of the transaction log is taken and then, in case the predicate is not
satisfied, a
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-rollback"><code>rollback</code></a>
to the snapshot is performed before signaling a retry. The obvious disadvantage of
this kind of explicit approach is that it requires more care from the
programmer. The advantage is that it allows the programmer to explicitly scope
nested transactions and perform rollbacks only when necessary and in a more
fine-tuned manner, which can allow for better performance.</p>
<p>With properly nestable transactions one can express both conjunctive and
disjunctive compositions of conditional transactions.</p>
<p>As an aside, having talked about the splay tree a few times in my previous post, I should
mention that the implementation of the rollback operation using the splay tree
also worked out surprisingly nicely. In the general case, a rollback may have an
effect on all accesses to shared memory locations recorded in a transaction log.
This means that, in order to support rollback, worst case linear time cost in
the number of locations accessed seems to be the minimum — no matter how
transactions might be implemented. A single operation on a splay tree may
already take linear time, but it is also possible to take advantage of the tree
structure and sharing of the immutable spine of splay trees and stop early as
soon as the snapshot and the log being rolled back are the same.</p>
<h3>Will They Come</h3>
<p>Blocking or retrying a transaction indefinitely is often not acceptable. The
transaction mechanism with blocking is actually already powerful enough to
support timeouts, because a transaction will be retried after any location
accessed by the transaction has been modified. So, to have timeouts, one could
create a location, make it so that it is changed when the timeout expires, and
read that location in the transaction to determine whether the timeout has
expired.</p>
<p>Creating, checking, and also cancelling timeouts manually can be a lot of work.
For this reason Kcas was also extended with direct support for timeouts. To
perform a transaction with a timeout one can simply explicitly specify a
<code>timeoutf</code> in seconds:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">try_take_in</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">seconds</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">commit</span><span class="ocaml-source"> ~</span><span class="ocaml-source">timeoutf</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">seconds</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">tx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Queue</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">take_blocking</span><span class="ocaml-source"> </span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>Internally Kcas uses the
<a href="https://github.com/ocaml-multicore/domain-local-timeout">domain-local-timeout</a>
library for timeouts. The OCaml standard library doesn't directly provide a
timeout mechanism, but it is a typical service provided by concurrent
schedulers. Just like with the previously mentioned domain local <em>await</em>, the
idea with domain local <em>timeout</em> is to allow libraries like Kcas to tap into the
native mechanism of whatever scheduler is currently in use and to do so
conveniently without pervasive parameterisation. More generally this should
allow libraries like Kcas to be scheduler agnostic and help to
<a href="http://rgrinberg.com/posts/abandoning-async/">avoid duplication of effort</a>.</p>
<h3>Hollow Man</h3>
<p>Let's recall the features of Kcas transactions briefly.</p>
<p>First of all, passing the transaction <code>~xt</code> through the computation allows
<em>sequential composition</em> of transactions:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">bind</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">b</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<p>This also gives <em>conjunctive composition</em> as a trivial consequence:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">pair</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">(</span><span class="ocaml-source">a</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Nesting, via
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-snapshot"><code>snapshot</code></a>
and
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-rollback"><code>rollback</code></a>,
allows <em>conditional composition</em>:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">if_else</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">snap</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">snapshot</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">begin</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">rollback</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">snap</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">b</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>Nesting combined with blocking, via the
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Retry/index.html#exception-Later"><code>Retry.Later</code></a>
exception, allows <em>disjunctive composition</em></p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">or_else</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">snap</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">snapshot</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">exception</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Retry</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Later</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">rollback</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">snap</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">b</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source">
</span></code></pre>
<p>of blocking transactions, which is also supported via the
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-first"><code>first</code></a>
combinator.</p>
<h3>What is Missing?</h3>
<blockquote>
<p>The limits of my language mean the limits of my world. — Ludwig
Wittgenstein</p>
</blockquote>
<p>The main limitation of transactions is that they are invisible to each other. A
transaction does not directly modify any shared memory locations and, once it
does, the modifications appear as atomic to other transactions and outside
observers.</p>
<p>The mutual invisibility means that
<a href="https://en.wikipedia.org/wiki/Rendezvous_(Plan_9)">rendezvous</a> between two
(or more) threads cannot be expressed as a pair of composable transactions. For
example, it is not possible to implement synchronous message passing as can be
found e.g. in
<a href="https://people.cs.uchicago.edu/~jhr/papers/cml.html">Concurrent ML</a>,
<a href="https://go.dev/">Go</a>, and various other languages and libraries, including zero
capacity Eio
<a href="https://ocaml-multicore.github.io/eio/eio/Eio/Stream/index.html#val-create"><code>Stream</code></a>s,
as simple transactions with a signature such as follows:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Channel</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-source">'a </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml">module</span><span class="ocaml-source"> Xt : </span><span class="ocaml-keyword-other-ocaml">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">give</span><span class="ocaml-source"> : xt:'x Xt.t -&gt; 'a t -&gt; 'a -&gt; unit
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">take</span><span class="ocaml-source"> : xt:'x Xt.t -&gt; 'a t -&gt; 'a
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>Languages such as Concurrent ML and Go allow disjunctive composition of such
synchronous message passing operations and some other libraries even allow
conjunctive, e.g. <a href="https://twistedsquare.com/CHP.pdf">CHP</a>, or even sequential
composition, e.g.
<a href="https://www.cs.cornell.edu/people/fluet/research/tx-events/ICFP06/icfp06.pdf">TE</a>
and <a href="https://aturon.github.io/academic/reagents.pdf">Reagents</a>, of such message
passing operations.</p>
<p>Although the above <code>Channel</code> signature is unimplementable, it does not mean that
one could not implement a non-compositional <code>Channel</code></p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Channel</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-source">'a </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">give</span><span class="ocaml-source"> : 'a t -&gt; 'a -&gt; unit
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">take</span><span class="ocaml-source"> : 'a t -&gt; 'a
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>or implement a compositional message passing model that allows such operations
to be composed. Indeed, both the <a href="https://twistedsquare.com/CHP.pdf">CHP</a> and
<a href="https://www.cs.cornell.edu/people/fluet/research/tx-events/ICFP06/icfp06.pdf">TE</a>
libraries were implemented on top of Software Transactional Memory with the same
fundamental invisibility of transactions. In other words, it is possible to
build a new composition mechanism, distinct from transactions, by using
transactions. To allow such synchronisation between threads requires committing
multiple transactions.</p>
<h3>Torn Reads</h3>
<p>The k-CAS-n-CMP algorithm underlying Kcas ensures that it is not possible to
read uncommitted changes to shared memory locations and that an operation can
only commit successfully after all of the accesses taken together have been
atomic, i.e. strictly serialisable or both
<a href="https://en.wikipedia.org/wiki/Linearizability">linearisable</a> and
<a href="https://en.wikipedia.org/wiki/Serializability">serialisable</a> in database
terminology. These are very strong guarantees and make it much easier to
implement correct concurrent algorithms.</p>
<p>Unfortunately, the k-CAS-n-CMP algorithm does not prevent one specific
concurrency anomaly. When a transaction reads multiple locations, it is possible
for the transaction to observe an inconsistent state when other transactions commit
changes between reads of different locations. This is traditionally called <em>read
skew</em> in database terminology. Having observed such an inconsistent state, a
Kcas transaction cannot succeed and must be retried.</p>
<p>Even though a transaction must retry after having observed read skew, unless
taken into account, read skew can still cause serious problems. Consider, for
example, the following transaction:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">unsafe_subscript</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>The assumption is that the <code>array</code> and <code>index</code> locations are always updated
atomically such that the subscript operation should be safe. Unfortunately due
to read skew the array and index might not match and the subscript operation
could result in an "index out of bounds" exception.</p>
<p>Even more subtle problems are possible. For example, a balanced binary search
tree implementation using
<a href="https://en.wikipedia.org/wiki/Tree_rotation">rotations</a> can, due to read skew,
be seen to have a cycle. Consider the below diagram. Assume that a lookup for
node <code>2</code> has just read the link from node <code>3</code> to node <code>1</code>. At that point another
transaction commits a rotation that makes node <code>3</code> a child of node <code>1</code>. As the
lookup reads the link from node <code>1</code> it leads back to node <code>3</code> creating a cycle.</p>
<p align="center">
  <img src="/blog/images/2023-06-01.building-a-lock-free-stm-for-ocaml/img-rotation-cycle-light~dsWfGgHWNGzrt38LsXeRgQ.svg" alt="Tree rotations">
</p>
<p>There are several ways to deal with these problems. It is, of course, possible
to use ad hoc techniques, like checking invariants manually, within
transactions. The Kcas library itself addresses these problems in a couple of
ways.</p>
<p>First of all, Kcas performs periodic validation of the entire transaction log
when an access, such as <code>get</code> or <code>set</code>, of a shared memory location is made
through the transaction log. It would take quadratic time to validate the entire
log on every access. To avoid changing the time complexity of transactions, the
number of accesses between validations is doubled after each validation.</p>
<p>Periodic validation is an effective way to make loops that access shared memory
locations, such as the lookup of a key from a binary search tree, resistant
against read skew. Such loops will eventually be aborted on some access and will
then be retried. Periodic validation is not effective against problems that
might occur due to non-transactional operations made after reading inconsistent
state. For those cases an explicit
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas/Kcas/Xt/index.html#val-validate"><code>validate</code></a>
operation is provided that can be used to validate that the accesses of
particular locations have been atomic:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">subscript</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Validate accesses after making them: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">validate</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">validate</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>It is entirely fair to ask whether it is acceptable for an STM mechanism to
allow read skew. A candidate correctness criterion for transactional memory
called "opacity", introduced in the paper
<a href="https://dl.acm.org/doi/10.1145/1345206.1345233">On the correctness of transactional memory</a>,
does not allow it. The trade-off is that the known software techniques to
provide opacity tend to introduce a global sequential bottleneck, such as a
global transaction version number accessed by every transaction, that can and
<a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">will limit scalability</a>
especially when transactions are relatively short, which is usually the case.</p>
<p>At the time of writing this there are several STM implementations that do not
provide opacity. The current Haskell STM implementation, for example,
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/2005-ppopp-composable.pdf">introduced in 2005</a>,
allows similar read skew. In Haskell, however, STM is implemented at the runtime
level and transactions are guaranteed to be pure by the type system. This allows
the Haskell STM runtime to validate transactions when switching threads.
Nevertheless there have been experiments to replace the Haskell STM using
algorithms that provide opacity as described in the paper
<a href="https://dl.acm.org/doi/abs/10.1145/3241625.2976020">Revisiting software transactional memory in Haskell</a>,
for example. The Scala ZIO STM
<a href="https://github.com/zio/zio/issues/6324">also allows read skew</a>. In his talk
<a href="https://www.youtube.com/watch?v=k20nWb9fHj0">Transactional Memory in Practice</a>,
Brett Hall describes their experience in using a STM in C++ that also allows
read skew.</p>
<p>It is not entirely clear how problematic it is to have to account for the possibility
of read skew. Although I expect to see read skew issues in the
future, the relative success of the Haskell STM would seem to suggest that it is
not necessarily a show stopper. While advanced data structure implementations
tend to have intricate invariants and include loops, compositions of
transactions using such data structures, like the LRU cache implementation, tend
to be loopless and relatively free of such invariants and work well.</p>
<h3>Tomorrow May Come</h3>
<p>At the time of writing this, the <code>kcas</code> and <code>kcas_data</code> packages are still
marked experimental, but are very close to being labeled 1.0.0. The core Kcas
library itself is more or less feature complete. The Kcas data library, by its
nature, could acquire new data structure implementations over time, but there is
one important feature missing from Kcas data — a bounded queue.</p>
<p>It is, of course, possible to simply compose a transaction that checks the
length of a queue. Unfortunately that would not perform optimally, because
computing the exact length of a queue unavoidably requires synchronisation
between readers and writers. A bounded queue implementation doesn't usually need
to know the exact length — it only needs to have a conservative
approximation of whether there is room in the queue and then the computation of
the exact length can be avoided much of the time. Ideally the default queue
implementation would allow an optional capacity to the specified. The challenge
is to implement the queue without making it any slower in the unbounded case.</p>
<p>Less importantly the Kcas data library currently does not provide an ordered map
nor a priority queue. Those serve use cases that are not covered by the current
selection of data structures. For an ordered map something like a
<a href="https://en.wikipedia.org/wiki/WAVL_tree">WAVL tree</a> could be a good starting
point for a reasonably scalable implementation. A priority queue, on the other
hand, is more difficult to scale, because the top element of a priority queue
might need to be examined or even change on every mutation, which makes it a
sequential bottleneck. On the other hand, updating elements far from the top
shouldn't require much synchronisation. Some sort of two level scheme like a
priority queue of per domain priority queues might provide best of both worlds.</p>
<h3>But Why?</h3>
<p>If you look at a typical textbook on concurrent programming it will likely tell
you that the essence of concurrent programming boils down to two (or three)
things:</p>
<ul>
<li>independent sequential threads of control, and</li>
<li>mechanisms for threads to communicate and synchronise.</li>
</ul>
<p>The first bullet on that list has received a lot of focus in the form of
libraries like <a href="https://github.com/ocaml-multicore/eio">Eio</a> and
<a href="https://github.com/ocaml-multicore/domainslib">Domainslib</a> that utilise OCaml's
support for algebraic effects. Indeed, the second bullet is kind of meaningless
unless you have threads. However, that does not make it less important.</p>
<p>Programming with threads is all about how threads communicate and synchronise
with each other.</p>
<p>A survey of concurrent programming techniques could easily fill an entire book,
but if you look at most typical programming languages, they provide you with a
plethora of communication and synchronisation primitives such as</p>
<ul>
<li>atomic operations,</li>
<li>spin locks,</li>
<li>barriers and count down latches,</li>
<li>semaphores,</li>
<li>mutexes and condition variables,</li>
<li>message queues,</li>
<li>other concurrent collections,</li>
<li>and more.</li>
</ul>
<p>The main difficulty with these traditional primitives is their relative lack of
composability. Every concurrency problem becomes a puzzle whose solution is some
ad hoc combination of these primitives. For example, given a concurrent thread
safe stack and a queue it may be impossible to atomically move an element from
the stack to the queue without wrapping both behind some synchronisation
mechanism, which also likely reduces scalability.</p>
<p>There are also some languages based on asynchronous message passing with the
ability to receive multiple messages selectively using both conjunctive and
disjunctive patterns. A few languages are based on rendezvous or synchronous
message passing and offer the ability to disjunctively and sometimes also
conjunctively select between potential communications. I see these as
fundamentally different from the traditional primitives as the number of
building blocks is much smaller and the whole is more like unified language for
solving concurrency problems rather than just a grab bag of non-composable
primitives. My observation, however, has been that these kind of message passing
models are not familiar to most programmers and can be challenging to program
with.</p>
<p>As an aside, why should one care about composability? Why would anyone care
about being able to e.g. disjunctively either pop an element from a stack or
take an element from a queue, but not both, atomically? Well, it is not about
stacks and queues, those are just examples. It is about modularity and
scalability. Being able to, in general, understand independently developed
concurrent abstractions on their own and to also combine them to form effective
and efficient solutions to new problems.</p>
<p>Another approach to concurrent programming is transactions over mutable data
structures whether in the form of databases or Software Transactional Memory
(STM). Transactional databases, in particular, have definitely proven to be a
major enabler. STM hasn't yet had a similar impact. There are probably many
reasons for that. One probable reason is that many languages already offered a
selection of familiar traditional primitives and millions of lines of code using
those before getting STM. Another reason might be that attempts to provide STM
in a form where one could just wrap any code inside an atomic block and have it
work perfectly proved to be unsuccessful. This resulted in many publications and
blog posts, e.g.
<a href="https://joeduffyblog.com/2010/01/03/a-brief-retrospective-on-transactional-memory/">A (brief) retrospective on transactional memory</a>,
discussing the problems resulting from such doomed attempts and likely
contributed to making STM seem less desirable.</p>
<p>However, STM is not without some success. More modest, and more successful,
approaches either strictly limit what can be performed atomically or require the
programmer to understand the limits and program accordingly. While not a
panacea, STM provides both composability and a relatively simple and familiar
programming model based on mutable shared memory locations.</p>
<h3>Crossroads</h3>
<p>Having just recently acquired the ability to have multiple domains running in
parallel, OCaml is in a unique position. Instead of having a long history of
concurrent multicore programming we can start afresh.</p>
<p>What sort of model of concurrent programming should OCaml offer?</p>
<p>One possible road for OCaml to take would be to offer STM as the go-to approach
for solving most concurrent programming problems.</p>
<h3>Until Next Time</h3>
<p>I've had a lot of fun working on Kcas. I'd like to thank my colleagues for
putting up with my obsession to work on it. I also hope that people will find
Kcas and find it useful or learn something from it!</p>
]]></description><link>https://tarides.com/blog/2023-08-10-kcas-building-a-lock-free-stm-for-ocaml-2-2</link><guid isPermaLink="false">https://tarides.com/blog/2023-08-10-kcas-building-a-lock-free-stm-for-ocaml-2-2.html</guid><dc:creator><![CDATA[ Vesa Karvonen ]]></dc:creator><pubDate>Thu, 10 Aug 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Kcas: Building a Lock-Free STM for OCaml (1/2)]]></title><description><![CDATA[<p>In the past few months I've had the pleasure of working on the
<a href="https://github.com/ocaml-multicore/kcas/">Kcas</a> library. In this and a
follow-up post, I will discuss the history and more recent development process
of optimising Kcas and turning it into a proper Software Transactional Memory
(STM) implementation for OCaml.</p>
<p>While this is not meant to serve as an introduction to programming with Kcas,
along the way we will be looking at a few code snippets. To ensure that they are
type correct — the best kind of
correct<sup><a href="https://www.youtube.com/watch?v=hou0lU8WMgo">*</a></sup> — I'll
use the <a href="https://github.com/realworldocaml/mdx#readme">MDX</a> tool to test them.
So, before we continue, let's require the libraries that we will be using:</p>
<pre><code><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">require</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">kcas</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Kcas</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">require</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">kcas_data</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Kcas_data</span><span class="ocaml-source">
</span></code></pre>
<p>All right, let us begin!</p>
<h3>Origins</h3>
<p>Contrary to popular belief, the name "Kcas" might not be an abbreviation of KC
and
Sadiq. Sadiq once joked "I like that we named the library after KC too."
— two early contributors to the library. The Kcas library was originally
developed for the purpose of implementing
<a href="https://aturon.github.io/academic/reagents.pdf">Reagents</a> for OCaml and is an
implementation of multi-word compare-and-set, often abbreviated as MCAS, CASN,
or — wait for it — k-CAS.</p>
<p>But what is this multi-word compare-and-set?</p>
<p>Well, it is a tool for designing lock-free algorithms that allows atomic
operations to be performed over multiple shared memory locations. Hardware
traditionally only supports the ability to perform atomic operations on
individual words, i.e. a single-word
<a href="https://v2.ocaml.org/api/Atomic.html#VALcompare_and_set">compare-and-set</a>
(CAS). Kcas basically extends that ability, through the use of intricate
algorithms, so that it works over any number of words.</p>
<p>Suppose, for example, that we are implementing operations on doubly-linked
circular lists. Instead of using a mutable field, <code>ref</code>, or <code>Atomic.t</code>, we'd use
a shared memory location, or
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Loc/index.html#type-t"><code>Loc.t</code></a>,
for the pointers in our node type:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">succ</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">pred</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">datum</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>To remove a node safely we want to atomically update the <code>succ</code> and <code>pred</code>
pointers of the predecessor and successor nodes and to also update the <code>succ</code>
and <code>pred</code> pointers of a node to point to the node itself, so that removal
becomes an <a href="https://en.wikipedia.org/wiki/Idempotence">idempotent</a> operation.
Using a multi-word compare-and-set one could implement the <code>remove</code> operation as
follows:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">remove</span><span class="ocaml-source"> </span><span class="ocaml-variable-parameter-optional">?</span><span class="ocaml-source">(</span><span class="ocaml-variable-parameter-optional">backoff</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Backoff</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">default</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Read pointer to the predecessor node and... </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">pred</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ..check whether the node has already been removed. </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!=</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">succ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ok</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Op</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">atomically</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Update pointers in this node: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Op</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make_cas</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Op</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make_cas</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Update pointers to this node: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Op</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make_cas</span><span class="ocaml-source"> </span><span class="ocaml-source">pred</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-source">succ</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Op</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make_cas</span><span class="ocaml-source"> </span><span class="ocaml-source">succ</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-source">pred</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">ok</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Someone modified the list around us, so backoff and retry. </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">remove</span><span class="ocaml-source"> ~</span><span class="ocaml-source">backoff</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Backoff</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">once</span><span class="ocaml-source"> </span><span class="ocaml-source">backoff</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source">
</span></code></pre>
<p>The list given to
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Op/index.html#val-atomically"><code>Op.atomically</code></a>
contains the individual compare-and-set operations to perform. A single
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Op/index.html#val-make_cas"><code>Op.make_cas loc expected desired</code></a>
operation specifies to compare the current value of a location with the expected
value and, in case they are the same, set the value of the location to the
desired value.</p>
<p>Programming like this is similar to programming with single-word compare-and-set
except that the operation is extended to being able to work on multiple words.
It does get the job done, but I feel it is fair to say that this is a low level
tool only suitable for experts implementing lock-free algorithms.</p>
<h3>Getting Curious</h3>
<p>I became interested in working on the Kcas library after Bartosz Modelski asked
me to review a couple of PRs to Kcas. As it happens, I had implemented the same
k-CAS algorithm, based on the paper
<a href="https://www.cl.cam.ac.uk/research/srg/netos/papers/2002-casn.pdf">A Practical Multi-Word Compare-and-Swap Operation</a>,
a few years earlier in C++ as a hobby project. I had also considered
implementing Reagents and had implemented a prototype library based on the
<a href="https://dl.acm.org/doi/10.1007/11864219_14">Transactional Locking II</a> (TL2)
algorithm for software transactional memory (STM) in C++ as another hobby
project. While reviewing the library, I could see some potential for
improvements.</p>
<h3>Fine Grained Competition</h3>
<p>One of the issues in the Kcas Github repo mentioned a new paper on
<a href="https://arxiv.org/pdf/2008.02527.pdf">Efficient Multi-word Compare and Swap</a>.
It was easy to adapt the new algorithm, which can even be seen as a
simplification of the previous algorithm, to OCaml. Compared to the previous
algorithm, which took <code>3k+1</code> single word CAS operations per <code>k</code>-CAS, the new
algorithm only took <code>k+1</code> single word CAS operations and was much faster. This
basically made k-CAS potentially competitive with fine grained locking
approaches, that also tend to require roughly the equivalent of one CAS per
word, used in many STM implementations.</p>
<h3>Two Birds with One Stone</h3>
<p>Both the original algorithm and the new algorithm require the locations being
updated to be in some total order. Any ordering that is used consistently in all
potentially overlapping operations would do, but the shared memory locations
created by Kcas also include a unique integer id, which can be used for ordering
locations. Initially Kcas required the user to sort the list of CAS operations.
Later an internal sorting step, that was performed by default by essentially
calling <code>List.sort</code> and taking
<a href="https://en.wikipedia.org/wiki/Time_complexity#Linearithmic_time">linearithmic</a>
<code>O(n*log(n))</code> time, was added to Kcas to make the interface less error prone.
This works, but it is possible to do better. Back when I implemented a TL2
prototype in C++ as a hobby project, I had used a
<a href="https://en.wikipedia.org/wiki/Splay_tree">splay tree</a> to record accesses of
shared memory locations. Along with the new algorithm, I also changed Kcas to
use a splay tree to store the operations internally. The splay tree was
constructed from the list of operations given by the user and then the splay
tree, instead of a list, would be traversed during the main algorithm.</p>
<p>You could ask what makes a splay tree interesting for this particular use case.
Well, there are a number of reasons. First of all, the new algorithm requires
allocating internal descriptors for each operation anyway, because those
descriptors are essentially consumed by the algorithm. So, even when the sorting
step would be skipped, an ordered data structure of descriptors would still need
to be allocated. However, what makes a splay tree particularly interesting for
this purpose is that, unlike most self-balancing trees, it can perform a
sequence of <code>n</code> accesses in linear time <code>O(n)</code>. This happens, for example, when
the accesses are in either ascending or descending order. In those cases, as
shown in the diagram below, the result is either a left or right leaning tree,
respectively, much like a list.</p>
<p align="center">
  <img src="/blog/images/2023-06-01.building-a-lock-free-stm-for-ocaml/img-access-sequence-light~Y3_LgZsa-9ATn_EZEPDabQ.svg" alt="resulting spray tree">
</p>
<p>This means that a performance conscious user could simply make sure to provide
the locations in either order and the internal tree would be constructed in
linear time and could then be traversed, also in linear time, in ascending
order. For the general case a splay tree also guarantees the same linearithmic
<code>O(n*log(n))</code> time as sorting.</p>
<p>With some fast path optimisations for preordered sequences the splay tree
construction was almost free and the flag to skip the by default sorting step
could be removed without making performance worse.</p>
<h3>Keeping a Journal</h3>
<p>Having the splay tree also opened the possibility of implementing a higher level
transactional interface.</p>
<p>But what is a transaction?</p>
<p>Well, a transaction in Kcas is essentially a function that records a log of
accesses, i.e. reads and writes, to shared memory locations. When
accessing a location for the first time, whether for reading or for writing, the
value of that location is read and stored in the log. Then, instead of reading
the location again or writing to it, the entry for the location is looked up
from the log and any change is recorded in the entry. So, a transaction does not
directly mutate shared memory locations. A transaction merely reads their
initial values and records what the effects of the accesses would be.</p>
<p>Recall the example of how to remove a node from a doubly-linked circular list.
Using the transactional interface of Kcas, we could write a transaction to
remove a node as follows:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">remove</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Read pointers to the predecessor and successor nodes: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">pred</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">succ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Update pointers in this node: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Update pointers to this node: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">pred</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">succ</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">succ</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">pred</span><span class="ocaml-source"> </span><span class="ocaml-source">pred</span><span class="ocaml-source">
</span></code></pre>
<p>The labeled argument, <code>~xt</code>, refers to the transaction log. Transactional
operations like
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Xt/index.html#val-get"><code>get</code></a>
and
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Xt/index.html#val-set"><code>set</code></a>
are then recorded in that log. To actually remove a node, we need to commit the
transaction</p>
<pre><code class="language-ml">Xt.commit { tx = remove node }
</code></pre>
<p>which repeatedly calls the transaction function, <code>tx</code>, to record a transaction
log and attempts to atomically perform it until it succeeds.</p>
<p>Notice that <code>remove</code> is no longer recursive. It doesn't have to account for
failure or perform a backoff. It is also not necessary to know or keep track of
what the previous values of locations were. All of that is taken care of for us
by the transaction log and the
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Xt/index.html#val-commit"><code>commit</code></a>
function. But, I digress.</p>
<p>Having the splay tree made the implementation of the transactional interface
straightforward. Transactional operations would just use the splay tree to
lookup and record accesses of shared memory locations. The
<a href="https://ocaml-multicore.github.io/kcas/0.6.0/kcas/Kcas/Xt/index.html#val-commit"><code>commit</code></a>
function just calls the transaction with an empty splay tree and then passes the
resulting tree to the internal k-CAS algorithm.</p>
<p>But why use a splay tree? One could suggest e.g. using a hash table for the
transaction log. Accesses of individual locations would then be constant time.
However, a hash table doesn't sort the entries, so we would need something more
for that purpose. Another alternative would be to just use an unordered list or
table and perhaps use something like a
<a href="https://en.wikipedia.org/wiki/Bloom_filter">bloom filter</a> to check whether a
location has already been accessed as most accesses are likely to either target
new locations or a recently used location. However, with k-CAS, it would still
be necessary to sort the accesses later and, without some way to perform
efficient lookups, worst case performance would be quadratic <code>O(n²)</code>.</p>
<p>For the purpose of implementing a transaction log, rather than just for the
purpose of sorting a list of operations, a splay tree also offers further
advantages. A splay tree works a bit like a cache, making accesses to recently
accessed elements faster. In particular, the pattern where a location is first
read and then written</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">decr_if_positive</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decr</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<p>is optimised by the splay tree. The first access brings the location to the root
of the tree. The second access is then guaranteed constant time.</p>
<p>Using a splay tree as the transaction log also allows the user to optimise
transactions similarly to avoiding the cost of the linearithmic sorting step. A
transaction over an array of locations, for example, can be performed in linear
time simply by making sure that the locations are accessed in order.</p>
<p>Of course, none of this means that a splay tree is necessarily the best or the
most efficient data structure to implement a transaction log. Far from it. But
in OCaml, with fast memory allocations, it is probably difficult to do much
better without additional runtime or compiler support.</p>
<h3>Take a Number</h3>
<p>One nice thing about transactions is that the user no longer has to write loops
to perform them. With a primitive (multi-word) CAS one needs to have some
strategy to deal with failures. If an operation fails, due to another CPU core
having won the race to modify some location, it is generally not a good idea to
just immediately retry. The problem with that is that there might be multiple
CPU cores trying to access the same locations in parallel. Everyone always
retrying at the same time potentially leads to quadratic <code>O(n²)</code> bus traffic to
synchronise shared memory as every round of retries generates <code>O(n)</code> amount of
bus traffic.</p>
<p>Suppose multiple CPU cores are all simultaneously running the following naïve
lock-free algorithm to increment an atomic location:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">naive_incr</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare_and_set</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source"> </span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">naive_incr</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source">
</span></code></pre>
<p>All CPU cores read the value of the location and then attempt a compare-and-set.
Only one of them can succeed on each round of attempts. But one might still
reasonably ask: what makes this so expensive? Well, the problem comes from
<a href="https://en.wikipedia.org/wiki/MSI_protocol">the way shared memory works</a>.
Basically, when a CPU core reads a location, the location will be stored in the
cache of that core and will be marked as "shared" in the caches of all CPUs that
have also read that location. On the other hand, when a CPU core writes to a
location, the location will be marked as "modified" in the cache of that core
and as "invalid" in the caches of all the other cores. Although a
compare-and-set doesn't always logically write to memory, to ensure atomicity,
the CPU acts as if it does. So, on each round through the algorithm, each core
will, in turn, attempt to write to the location, which invalidates the location
in the caches of all the other cores, and require them to read the location
again. These invalidations and subsequent reads of the location tend to be very
resource intensive.</p>
<p>In some lock-free algorithms it is possible to use auxiliary data structures to
<a href="https://people.csail.mit.edu/shanir/publications/Lock_Free.pdf">deal with contention scalably</a>,
but when the specifics of the use case are unknown, something more general is
needed. Assume that, instead of all the cores retrying at the same time, the
cores would somehow form a queue and attempt their operations one at a time.
Each successful increment would still mean that the next core to attempt
increment would have to expensively read the location, but since only one core
makes the attempt, the amount of bus traffic would be linear <code>O(n)</code>.</p>
<p>A clever way to form a kind of queue is to use
<a href="https://en.wikipedia.org/wiki/Exponential_backoff">randomised exponential backoff</a>.
A random delay or backoff is applied before retrying:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">incr_with_backoff</span><span class="ocaml-source"> </span><span class="ocaml-variable-parameter-optional">?</span><span class="ocaml-source">(</span><span class="ocaml-variable-parameter-optional">backoff</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Backoff</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">default</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">not</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare_and_set</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source"> </span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">incr_with_backoff</span><span class="ocaml-source"> ~</span><span class="ocaml-source">backoff</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Backoff</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">once</span><span class="ocaml-source"> </span><span class="ocaml-source">backoff</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">atomic</span><span class="ocaml-source">
</span></code></pre>
<p>If multiple parties are involved, this makes them retry in some random order. At
first everyone retries relatively quickly and that can cause further failures.
On each retry the maximum backoff is doubled, increasing the probability that
retries are not performed at the same time. It might seem somewhat
counterintuitive that waiting could improve performance, but this can greatly
reduce the amount of synchronisation and improve performance.</p>
<p>The Kcas library already employed a backoff mechanism. Many operations used a
backoff mechanism internally and allocated an object to hold the backoff
configuration and state as the first thing. To reduce overheads and make the
library more tunable, I redesigned the backoff mechanism to encode the
configuration and state in a single integer so that no allocations are required.
I also changed the operations to take the backoff as an optional argument so
that users could potentially tune the backoff for specific cases, such as when a
particular transaction should take priority and employ shorter backoffs, or the
opposite.</p>
<h3>Free From Obstructions</h3>
<p>The new k-CAS algorithm was efficient, but it was limited to CAS operations that
always wrote to shared memory locations. Interestingly, a CAS operation can also
express a compare (CMP) operation — just use the same value as the
expected and desired value, <code>Op.make_cas loc expected expected</code>.</p>
<p>One might wonder; what is the use of read-only operations? It is actually common
for the majority of accesses to data structures to be read-only and even
read-write operations of data structures often involve read-only accesses of
particular locations. As explained in the paper
<a href="https://people.csail.mit.edu/shanir/publications/Nonblocking%20k-compare.pdf">Nonblocking k-compare-single-swap</a>,
to safely modify a singly-linked list typically requires not only atomically
updating a pointer, but also ensuring that other pointers remain unmodified.</p>
<p>The problem with using a read-write CAS to express a read-only CMP is that, due
to the synchronisation requirements, writes to shared memory are much more
expensive than reads. Writes to a single location cannot proceed in parallel.
Multiple cores trying to "read" a location in memory using read-write CASes
would basically cause similar expensive bus traffic, or cache line ping-pong, as
with the previously described naïve increment operation — without even
attempting to logically write to memory.</p>
<p>To address this problem I extended the new
<a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom">lock-free</a>
k-CAS algorithm to
<a href="https://github.com/ocaml-multicore/kcas/blob/main/doc/gkmz-with-read-only-cmp-ops.md">a brand new obstruction-free k-CAS-n-CMP algorithm</a>
that allows one to perform a combination of read-write CAS and read-only CMP
operations. The extension to k-CAS-n-CMP is a rather trivial addition to the
k-CAS algorithm. The gist of the k-CAS-n-CMP algorithm is to perform an
additional step to validate all the read-only CMP accesses before committing the
changes. This sort of validation step is a fairly common approach in
non-blocking algorithms.</p>
<p>The
<a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm#Obstruction-freedom">obstruction-free</a>
k-CAS-n-CMP algorithm also retains the lock-free k-CAS algorithm as a subset. In
cases where only CAS operations are performed, the k-CAS-n-CMP algorithm does
the exact same thing as the k-CAS algorithm. This allows a transaction mechanism
based on the k-CAS-n-CMP algorithm to easily switch to using only CAS operations
to guarantee lock-free behavior. The difference between an obstruction-free and
a lock-free algorithm is that a lock-free algorithm guarantees that at least one
thread will be able to make progress. With the obstruction-free validation step
it is possible for two or more threads to enter a livelock situation, where they
repeatedly and indefinitely fail during the validation step. By switching to
lock-free mode, after detecting a validation failure, it is possible to avoid
such livelocks.</p>
<h3>Giving Monads a Pass</h3>
<p>The original transactional API to k-CAS actually used monadic combinators.
Gabriel Scherer suggested the alternative API based on passing a mutable
transaction log explicitly that we've already used in the examples. This has the
main advantage that such an API can be easily used with all the existing control
flow structures of OCaml, such as <code>if then else</code> and <code>for to do</code> as well as
higher-order functions like <code>List.iter</code>, that would need to be encoded with
combinators in the monadic API.</p>
<p>On the other hand, a monadic API provides a very strict abstraction barrier
against misuse as it can keep users from accessing the transaction log directly.
The transaction log itself is not thread safe and should not be accessed or
reused after it has been consumed by the main k-CAS-n-CMP algorithm. Fortunately
there is a way to make such misuse much more difficult as described in the paper
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/1994/06/lazy-functional-state-threads.pdf">Lazy Functional State Threads</a>
by employing higher-rank polymorphism. By adding a type variable to the type of
the transaction log</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'x</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span></code></pre>
<p>and requiring a transaction to be universally quantified</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">tx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">tx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'x</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source"> </span><span class="ocaml-source">xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-storage-type">'x</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>with respect to the transaction log, the type system prevents a transaction log
from being reused:</p>
<pre><code><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">futile</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">log</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">ref</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">tx</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">log</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">log</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">xt</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">raise</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Retry</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Later</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">tx</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Line</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">characters</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">17</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-integer">19</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">This</span><span class="ocaml-source"> </span><span class="ocaml-source">field</span><span class="ocaml-source"> </span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-source">has</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'b</span><span class="ocaml-source"> </span><span class="ocaml-source">which</span><span class="ocaml-source"> </span><span class="ocaml-source">is</span><span class="ocaml-source"> </span><span class="ocaml-source">less</span><span class="ocaml-source"> </span><span class="ocaml-source">general</span><span class="ocaml-source"> </span><span class="ocaml-source">than</span><span class="ocaml-source">
</span><span class="ocaml-source">         </span><span class="ocaml-storage-type">'x</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source"> </span><span class="ocaml-source">xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-storage-type">'x</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'c</span><span class="ocaml-source">
</span></code></pre>
<p>It is still possible to e.g. create a closure that refers to a transaction log
after it has been consumed, but that requires effort from the programmer and
should be unlikely to happen by accident.</p>
<p>The explicit transaction log passing API proved to work well and the original
monadic transaction API was then later removed from the Kcas library to avoid
duplicating effort.</p>
<h3>Division of Labour</h3>
<p>When was the last time you implemented a non-trivial data structure or algorithm
from scratch? For most professionals the answer might be along the lines of
"when I took my data structures course at the university" or "when I interviewed
for the software engineering position at Big Co".</p>
<p>Kcas aims to be usable both</p>
<ul>
<li>for experts implementing correct and performant lock-free data structures, and</li>
<li>for everyone gluing together programs using such data structures.</li>
</ul>
<p>Implementing lock-free data structures, even with the help of k-CAS-n-CMP, is
not something everyone should be doing every time they are writing concurrent
programs. Instead programmers should be able to just reuse carefully constructed
data structures.</p>
<p>As an example, consider the implementation of a least-recently-used (LRU) cache
or a bounded associative map. A simple sequential approach to implement a LRU
cache is to use a hash table and a doubly-linked list and keep track of the
amount of space in the cache:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-storage-type">'k</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'v</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">cache</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">space</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">table</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-storage-type">'k</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'k</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'v</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'k</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>On a cache lookup the doubly-linked list node corresponding to the accessed key
is moved to the left end of the list:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">get_opt</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">table</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">find_opt</span><span class="ocaml-source"> </span><span class="ocaml-source">table</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Option</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">datum</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">     </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">move_l</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">datum</span><span class="ocaml-source">
</span></code></pre>
<p>On a cache update, in case of overflow, the association corresponding to the
node on the right end of the list is dropped:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">set</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">table</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">space</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-source">datum</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">node</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">find_opt</span><span class="ocaml-source"> </span><span class="ocaml-source">table</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">update</span><span class="ocaml-source"> </span><span class="ocaml-source">space</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">n</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">max</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">n</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">take_opt_r</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-source">
</span><span class="ocaml-source">           </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Option</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">remove</span><span class="ocaml-source"> </span><span class="ocaml-source">table</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add_l</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">move_l</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">replace</span><span class="ocaml-source"> </span><span class="ocaml-source">table</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">datum</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Sequential algorithms such as the above are so common that one does not even
think about them. Unfortunately, in a concurrent setting the above doesn't work
even if the individual operations on lists and hash tables were atomic.</p>
<p>As it happens, the individual operations used above are actually atomic, because
they come from the
<a href="https://ocaml-multicore.github.io/kcas/doc/kcas_data/Kcas_data/index.html"><code>kcas_data</code></a>
package. The <code>kcas_data</code> package provides lock-free and parallelism safe
implementations of various data structures.</p>
<p>But how would one make the operations on a cache atomic as a whole? As explained
by Maurice Herlihy in one of his talks on
<a href="https://youtu.be/ZkUrl8BZHjk?t=1503">Transactional Memory</a> adding locks to
protect the atomicity of the operation is far from trivial.</p>
<p>Fortunately, rather than having to e.g. wrap the cache implementation behind a
<a href="https://en.wikipedia.org/wiki/Lock_(computer_science)">mutex</a> and make
another individually atomic yet uncomposable data structure, or having to learn
a completely different programming model and rewrite the cache implementation,
we can use the transactional programming model provided by the Kcas library and
the transactional data structures provided by the <code>kcas_data</code> package to
trivially convert the previous implementation to a lock-free composable
transactional data structure.</p>
<p>To make it so, we simply use transactional versions, <code>*.Xt.*</code>, of operations on
the data structures and explicitly pass a transaction log, <code>~xt</code>, to the
operations. For the <code>get_opt</code> operation we end up with</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">get_opt</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">table</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Hashtbl</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">find_opt</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">table</span><span class="ocaml-source"> </span><span class="ocaml-source">key</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Option</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">node</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">datum</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">     </span><span class="ocaml-constant-language-capital-identifier">Dllist</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Xt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">move_l</span><span class="ocaml-source"> ~</span><span class="ocaml-source">xt</span><span class="ocaml-source"> </span><span class="ocaml-source">node</span><span class="ocaml-source"> </span><span class="ocaml-source">order</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">datum</span><span class="ocaml-source">
</span></code></pre>
<p>and the <code>set</code> operation is just as easy to convert to a transactional version.
One way to think about transactions is that they give us back the ability to
compose programs such as the above. But, I digress, again.</p>
<p>It was not immediately clear whether Kcas would be efficient enough. A simple
node based queue, for example, seemed to be significantly slower than an
implementation of
<a href="https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf">the Michael-Scott queue</a>
using atomics. How so? The reason is fundamentally very simple. Every shared
memory location takes more words of memory, every update allocates more, and the
transaction log also allocates memory. All the extra words of memory need to be
written to by the CPU and this invariably takes some time and slows things down.</p>
<p>For the implementation of high-performance data structures it is important to
offer ways, such as the ability to take advantage of the specifics of the
transaction log, to help ensure good performance. A common lock-free algorithm
design technique is to publish the desire to perform an operation so that other
parties accessing the same data structure can help to complete the operation.
With some care and ability to check whether a location has already been accessed
within a transaction it is possible to implement such algorithms also with Kcas.</p>
<p>Using such low level lock-free techniques, it was possible to implement a queue
using three stacks:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">front</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">middle</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">back</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>The front stack is reversed so that, most of the time, to take an element from
the queue simply requires popping the top element from the stack. Similarly to
add an element to the queue just requires pushing the element to the top of the
back stack. The difficult case is when the front becomes empty and it is
necessary to move elements from the back to the front.</p>
<p>The third stack acts as a temporary location for publishing the intent to
reverse it to the front of the queue. The operation to move the back stack to
the middle can be done outside of the transaction, as long as the back and the
middle have not yet been accessed within the transaction.</p>
<p>The three-stack queue turned out to perform well — better, for example,
than some non-compositional lock-free queue implementations. While Kcas adds
overhead, it also makes it easier to use more sophisticated data structures and
algorithms. Use of the middle stack, for example, requires atomically updating
multiple locations. With plain single-word atomics that is non-trivial.</p>
<p>Similar techniques also allowed the <code>Hashtbl</code> implementation to perform various
operations on the whole hash table in ways that avoid otherwise likely
starvation issues with large transactions.</p>
<h3>Intermission</h3>
<p>This concludes the first part of this two part post. In the next part we will
continue our discussion on the development of Kcas, starting with the addition
of a fundamentally new feature to Kcas which turns it into a proper STM
implementation.</p>
]]></description><link>https://tarides.com/blog/2023-08-07-kcas-building-a-lock-free-stm-for-ocaml-1-2</link><guid isPermaLink="false">https://tarides.com/blog/2023-08-07-kcas-building-a-lock-free-stm-for-ocaml-1-2.html</guid><dc:creator><![CDATA[ Vesa Karvonen ]]></dc:creator><pubDate>Mon, 07 Aug 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OBuilder on macOS]]></title><description><![CDATA[<p>The CI team at Tarides provides critical infrastucture to support the OCaml community. At the heart of that infrastructure is providing a cluster of machines for running jobs. This blog post details how we improved our support for macOS and moved closer to our goal of supporting all Tier1 OCaml platforms.</p>
<p>In 2022, Patrick Ferris of Tarides, successfully implemented a macOS worker for <a href="https://github.com/ocurrent/obuilder">OBuilder</a>. The workers were added to <a href="https://opam.ci.ocaml.org"><code>opam-repo-ci</code></a> and <a href="https://ocaml.ci.dev">OCaml CI</a>, and this work was presented at the <a href="https://icfp22.sigplan.org/details/ocaml-2022-papers/8/Homogeneous-builds-with-OBuilder-and-OCaml">OCaml workshop in 2022</a> (<a href="https://watch.ocaml.org/w/64N6AFMfrfz7wpNJ5rsJsQ">video</a>).</p>
<p>Since then, I took over the day-to-day responsibility. This work builds upon those foundations to achieve a greater throughput of jobs on the existing Apple hardware. Originally, we launched macOS support using rsync for snapshots and user accounts for sandboxing and process isolation. At the time, we identified that this architecture was likely to be relatively slow<sup><a href="#fn-1" id="ref-1-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup> given the overhead of using rsync over native file system snapshots.</p>
<p>This post describes how we switched the snapshots over to use ZFS, which has improved the I/O throughput, leading to more jobs built per hour. It also removed our use of MacFUSE, both simplifying the setup and further improving the I/O throughput.</p>
<h2>OBuilder</h2>
<p>The OBuilder library is the core of Tarides' CI Workers <sup><a href="#fn-2" id="ref-1-fn-2" role="doc-noteref" class="fn-label">[2]</a></sup>. OCaml CI, <code>opam-repo-ci</code>, OCurrent Deployer, OCaml Docs CI, and the Base Image Builder all generate jobs which need to be executed by OBuilder across a range of platforms. A central scheduler accepts job submissions and passes them off to individual workers running on physical servers. These jobs are described in a build script similar to a Dockerfile.</p>
<p>OBuilder takes the build scripts and performs its steps in a sandboxed environment. After each step, OBuilder uses the snapshot feature of the filesystem (ZFS or Btrfs) to store the state of the build. There is also an rsync backend that copies the build state. On Linux, it uses <code>runc</code> to sandbox the build steps, but any system that can run a command safely in a chroot could be used. Repeating a build will reuse the cached results.</p>
<p>It is worth briefly expanding upon this description to understand the typical steps OBuilder takes. Upon receiving a job, OBuilder loads the base image as the starting point for the build process. A base image contains an opam switch with an OCaml compiler installed and a Git clone of <code>opam-repository</code>. These base images are built periodically into Docker images using the <a href="https://images.ci.ocaml.org">Base Image Builder</a> and published to <a href="https://hub.docker.com/r/ocaml/opam">Docker Hub</a>. Steps within the job specification could install operating system packages and opam libraries before finally building the test package and executing any tests. A filesystem snapshot of the working folder is taken between each build step. These snapshots allow each step to be cached, if the same job is executed again or identical steps are shared between jobs. Additionally, the opam package download folder is shared between all jobs.</p>
<p>On Linux-based systems, the file system snapshots are performed by Btrfs and process isolation is performed via <code>runc</code>. A ZFS implementation of file system snapshots and a pseudo implementation using rsync are also available. Given sufficient system resources, tens or hundreds of jobs can be executed concurrently.</p>
<h2>The macOS Challenges</h2>
<p>macOS is a challenging system for OBuilder because there is no native container support. We must manually recreate the sandboxing needed for the build steps using user isolation. Furthermore, macOS operating system packages are installed via Homebrew, and the Homebrew installation folder is not relocatable. It is either <code>/usr/local</code> on Intel x86_64 or <code>/opt/homebrew</code> on Apple silicon (ARM64). The Homebrew documentation includes the warning <strong>Pick another prefix at your peril!</strong>,  and the internet is littered with bug reports of those who have ignored this warning. For building OCaml, the per-user <code>~/.opam</code> folder is relocatable by setting the environment variable <code>OPAMROOT=/path</code>; however, once set it cannot be changed, as the full path is embedded in objects built.</p>
<p>We need a sandbox that includes the user's home directory and the Homebrew folder.</p>
<h2>Initial Solution</h2>
<p>The initial macOS solution used dummy users for the base images, user isolation for the sandbox, a FUSE file system driver to redirect the Homebrew installation, and rsync to create file system snapshots.</p>
<p>For each step, OBuilder used rsync to copy the required snapshot from the store to the user’s home directory. The FUSE file system driver redirected filesystem access to <code>/usr/local</code> to the user’s home directory. This allowed the state of the Homebrew installation to be captured along with the opam switch held within the home directory. Once the build step was complete, rsync copied the current state back to the OBuilder store. The base images exist in dummy users' home directories, which are copied to the active user when needed.</p>
<p>The implementation was reliable but was hampered by I/O bottlenecks, and the lack of opam caching quickly hit GitHub's download rate limit.</p>
<h2>A New Implementation</h2>
<p>OBuilder already supported ZFS, which could be used on macOS through the <a href="https://openzfsonosx.org">OpenZFS on OS X</a> project. The ZFS and other store implementations hold a single working directory as the root for the <code>runc</code> container. On macOS, we need the sandbox to contain both the user’s home directory and the Homebrew installation, but these locations need to be <em>in place</em> within the file system. This was achieved by adding two ZFS subvolumes mounted on these paths.</p>
<div role="region"><table>
<tbody><tr>
<th>ZFS Volume</th>
<th>Mount point</th>
<th>Usage</th>
</tr>
<tr>
<td>obuilder/result/<sha></sha></td>
<td>/Volumes/obuilder/result/<sha></sha></td>
<td>Job log</td>
</tr>
<tr>
<td>obuilder/result/<sha>/home</sha></td>
<td>/Users/mac1000</td>
<td>User’s home directory</td>
</tr>
<tr>
<td>obuilder/result/<sha>/brew</sha></td>
<td>/opt/homebrew or /usr/local</td>
<td>Homebrew installation</td>
</tr>
</tbody></table></div><p>The ZFS implementation was extended to work recursively on the result folder, thereby including the subvolumes in the snapshot and clone operations. The sandbox is passed the ZFS root path and can mount the subvolumes to the appropriate mount points within the file system. The build step is then executed as a local user.</p>
<p>The ZFS store and OBuilder job specification included support to cache arbitrary folders. The sandbox was updated to use this feature to cache both the opam and the Homebrew download folders.</p>
<p>To create the initial base image, empty folders are mounted on the user home directory and Homebrew folder, then a shell script installs opam, OCaml, and a Git clone of the opam repository. When a base image is initially needed, the ZFS volume can be cloned as the basis of the first step. This replaces the Docker base images with OCaml and opam installed in them used by the Linux OBuilder implementation.</p>
<div role="region"><table>
<tbody><tr>
<th>ZFS Volumes for macOS Homebrew Base Image for OCaml 4.14</th>
</tr>
<tr>
<td>obuilder/base-image/macos-homebrew-ocaml-4.14</td>
</tr>
<tr>
<td>obuilder/base-image/macos-homebrew-ocaml-4.14/brew</td>
</tr>
<tr>
<td>obuilder/base-image/macos-homebrew-ocaml-4.14/home</td>
</tr>
</tbody></table></div><h2>Performance Improvements</h2>
<p>The rsync store was written for portability, not efficiency, and copying the files between each step quickly becomes the bottleneck. ZFS significantly improves efficiency through native snapshots and mounting the data at the appropriate point within the file system. However, this is not without cost, as unmounting a file system causes the disk-write cache to be flushed.</p>
<p>The ZFS store keeps all of the cache steps mounted. With a large cache disk (&gt;100GB), the store could reach several thousand result steps. As the number of mounted volumes increases, macOS’s disk arbitration service takes exponentially longer to mount and unmount the file systems. Initially, the number of cache steps was artificially limited to keep the mount/unmount times within acceptable limits. Later, the ZFS store was updated to unmount unused volumes between each step.</p>
<p>The rsync store did not support caching of the opam downloads folder. This quickly led us to hit the download rate limits imposed by GitHub. Homebrew is also hosted on GitHub; therefore, these steps were also impacted. The list of folders shared between jobs is part of the job specification and was already passed to the sandbox, but it was not implemented. The job specification was updated to include the Homebrew downloads folder, and the shared cache folders were mounted within the sandbox.</p>
<p>Throughput has been improved by approximately fourfold. The rsync backend gave a typical performance of four jobs per hour. With ZFS, we see jobs rates of typically 16 jobs per hour. The best recorded rate with ZFS is over 100 jobs per hour!</p>
<h2>Multi-User Considerations</h2>
<p>The rsync and ZFS implementations are limited to running one job simultaneously, limiting the throughput of jobs on macOS. It would be ideal if the implementation could be extended to support concurrent jobs; however, with user isolation, it is unclear how this could be achieved, as the full path of the OCaml installation is included in numerous binary files within the <code>~/.opam</code> directory. Thus, opam installed in <code>/Users/foo/.opam</code> could not be mounted as <code>/Users/bar/.opam</code>. The other issue with supporting multiuser is that Homebrew is not designed to be used by multiple Unix users. A given Homebrew installation is only meant to be used by a single non-root user.</p>
<h2>Summary</h2>
<p>With this work adding macOS support to OBuilder using ZFS, the cluster provides workers for macOS on both x86_64 and ARM64. This capability is available to all CI systems managed by Tarides. Initial support has been added to <code>opam-repo-ci</code> to provide builds for the opam repository, allowing us to check packages build on macOS. We have also added support to OCaml-CI to provide builds for GitHub and GitLab hosted projects, and there is work in progress to provide macOS builds for testing OCaml's Multicore support. MacOS builds are an important piece of our goal to provide builds on all Tier 1 OCaml platforms. We hope you find it useful too.</p>
<p>All the code is open source and available on <a href="https://github.com/ocurrent">github.com/ocurrent</a>.</p>
<section role="doc-endnotes"><ol>
<li id="fn-1">
<p>As compared to other workers where native snapshots are available, such as BTRFS on Linux.</p>
<span><a href="#ref-1-fn-1" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-2">
<p>In software development, a "Continuous Integration (CI) worker" is a computing resource responsible for automating the process of building, testing, and deploying code changes in Continuous Integration systems.</p>
<span><a href="#ref-1-fn-2" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
]]></description><link>https://tarides.com/blog/2023-08-02-obuilder-on-macos</link><guid isPermaLink="false">https://tarides.com/blog/2023-08-02-obuilder-on-macos.html</guid><dc:creator><![CDATA[ Mark Elvers ]]></dc:creator><pubDate>Wed, 02 Aug 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml in Space - Welcome SpaceOS!]]></title><description><![CDATA[<p>Our mission is to build sustainable and secure software infrastructure that will not only work for decades but also positively impact the world. This includes our work on essential open-source libraries and tooling in the OCaml space, but also extends to include cutting-edge innovation through MirageOS technologies. We are investigating mission-critical IoT use cases: one of which is facilitating the deployment of secure high-performance applications in space to help data scientists write models that run on satellite-generated data. In this post, we present our solution that does just that: SpaceOS.</p>
<p>The satellite industry is transforming! As a result, an exciting commercial space industry is emerging – one that industry professionals are increasingly referring to as ‘NewSpace’.</p>
<h2>The NewSpace Opportunity</h2>
<p>For those unfamiliar with NewSpace, here is a brief overview. Historically, satellites have been owned and operated by large and powerful companies that could afford the costs inherent in their design, launch, and operation. In addition to their high cost of production, this generation of satellites rarely changes their software/hardware configuration to avoid operational risk, and consequently operates in the same way a decade after its launch.</p>
<p>The high cost and lack of software flexibility have made it difficult for smaller companies to enter the market, disincentivising the development of technologies that require the capabilities of satellites. A timely and broad example with many use cases is earth observation, including monitoring volcanic activity, forest fires, agriculture, and oil spill detection.</p>
<p>Fast forward to today. New technologies – resulting in smaller satellites and significant reductions in launch costs – as well as new business models such as shared satellites and satellites as a service, now make it possible for many smaller companies to benefit from satellite capabilities. <a href="https://ourworldindata.org/grapher/yearly-number-of-objects-launched-into-outer-space">More satellites have been launched into space in the last two years than the fifty years before</a>. Welcome to NewSpace, where multi-user and multi-mission satellites are becoming the norm!</p>
<h2>NewSpace Needs New Software</h2>
<p>NewSpace requires new software capabilities. The traditional and outdated practice of launching satellites and leaving them untouched for 15-20 years is no longer effective.</p>
<p>NewSpace requires the ability to run software from multiple users on the same satellites whilst maintaining software isolation (between applications and data of different users), as well as complete separation from the flight system software. Software must also be easy to update to allow for software innovation (for instance, to use a new machine learning inference algorithm) or to enable the new concept of usage-based models (where users pay for time spent or resources used). Existing platforms are not able to satisfy these new software requirements.</p>
<p>Many satellite operators either develop their own custom software stack (including their own operating system) or use complex Cloud-native software, such as Docker and Kubernetes, to manage multi-user and multi-mission needs. Cloud-native technologies are suboptimal in this context and, in particular, are inefficient for resource-constrained onboard satellite computing systems. There is a need for an alternative solution that is secure, efficient and easy to use.</p>
<h3>Welcome to SpaceOS!</h3>
<p>SpaceOS is an operating system that is secure by design, providing complete isolation between user software paired with effortless software updates.</p>
<p>Multipurpose: Currently, there is no standard OS for satellites. Launching your software on a satellite platform requires you to write your own software based on different satellite and satellite service provider specifications. SpaceOS ensures compatibility across multiple satellites and service providers, ensuring you only need to write your software once.</p>
<p>Flexible: With SpaceOS, software updates are easy. Users can choose from powerful containerisation options, or opt to run on bare metal.</p>
<p>Compact: SpaceOS is small. A recent demonstration showcased that for an earth observation application, SpaceOS was 20 times smaller when compared to the classic Kubernetes approach, also requiring less memory and processing power.</p>
<p>Secure: SpaceOS is built on stable and safe programming logic (read on for details about the memory safety of OCaml) and <a href="https://mirage.io">MirageOS</a> <a href="https://queue.acm.org/detail.cfm?id=2566628">unikernel technology</a>. The <a href="https://mirageos.org/blog/bitcoin-pinata-results">MirageOS Bitcoin Pinata</a> is an example of a very successful, efficient, and transparent bug bounty program. Over 3 years the pinata was exposed to 150,000 hack attacks without success. Since MirageOS-style unikernels also power the SpaceOS solution, this test is a good indication of its cybersecurity strength.</p>
<h2>How is This Huge Leap in OS Technology Possible?</h2>
<p>Adapting to rapid development in any field often necessitates a paradigm shift. The order-of-magnitude improvements that SpaceOS provides over existing alternatives are only made possible due to fundamental changes in the underlying technology.</p>
<p>How can a software platform provide the powerful OS environment required for NewSpace?
To explain, one must understand what unikernels are and how the design of a programming language directly impacts its cybersecurity vulnerability.</p>
<h3>Unikernels: A Shift in OS Philosophy</h3>
<p>Let us talk about how operating systems generally work. Most operating systems have been built with the aim of running on lots of different kinds of hardware, and supporting lots of different kinds of applications (many of which don’t exist yet when the OS is released and installed). This means that the operating system (such as Windows, Linux, macOS etc.) is optimised for broad compatibility, and is designed and built to provide a compelling platform for any application the user might need. This could include printer drivers, Bluetooth protocols, graphics card support, file system management, a range of network protocols, or user-space components such as <code>systemd</code>, ssh, logging systems… the list goes on.</p>
<p>In theory, the standard OS can theoretically service any number of applications. In practice, support for a wide range of applications that only “might” be used commonly leads to a large, resource-intensive OS vulnerable to cyber attack. Typically, any one application only requires a subset of the complete OS, and all of that extra functionality results in wasted resources and increased risk.</p>
<p>SpaceOS uses a different approach based on unikernel technology, and instead of being a general-purpose OS for any application, it is specialised for one unique application. In the build phase, SpaceOS analyses the application to determine the requirements for runtime. For example, if the application doesn’t require Bluetooth or a sound driver, these functionalities will not be included in the OS. The OS creates a highly specialised, efficient, and compact executable with a significantly smaller attack surface, specifically designed for its single use case.</p>
<p>This kind of unikernel technology is not yet widely used commercially, but recent examples of mission-critical applications include the <a href="https://galois.com/project/cyberchaff/">CyberChaff</a> joint project between the US Department of Defense (DOD) and <a href="https://galois.com/">Galois</a>, and the <a href="https://www.nitrokey.com/products/nethsm">NetHSM security module</a> from Tarides partners, <a href="https://www.nitrokey.com/">Nitrokey</a>.</p>
<h3>OCaml: Memory Safety by Design</h3>
<p>SpaceOS has a second “secret” to add to the mix: it uses a memory-safe programming language called <a href="/blog/2022-11-22-six-surprising-reasons-the-ocaml-programming-language-is-good-for-business/">OCaml</a>. The Cybersecurity and Infrastructure Security Agency (CISA) published <a href="https://www.cisa.gov/sites/default/files/2023-04/principles_approaches_for_security-by-design-default_508_0.pdf">a report emphasising the importance of Secure-By-Design principles as mitigation against cyber intrusions</a>. Some widely used languages (such as C or C++) are not memory safe and, therefore, vulnerable by design. With memory-related attacks being the most common cyber attack, forming <a href="https://www.itpro.com/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">70% of all zero-day attacks</a>, the <a href="https://www.nsa.gov/Press-Room/News-Highlights/Article/Article/3215760/nsa-releases-guidance-on-how-to-protect-against-software-memory-safety-issues/">NSA (USA National Security Agency) also recommends using memory-safe languages</a>.</p>
<p>This is why we have chosen OCaml for SpaceOS. OCaml is purposefully designed and developed with safety and performance in mind, and therefore we can confidently say that SpaceOS is “secure by design”.
<a href="/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you/">Read more about how OCaml can protect you against zero-day attacks</a>.</p>
<h2>Conclusion</h2>
<p>SpaceOS and the underlying “secure by design” unikernel technology is a powerful and innovative new technology for in-space IoT and edge computing (with many other potential applications for mission-critical IoT use cases). By combining the performance and safety of OCaml with the specialisation and flexibility of unikernels, we aim to revolutionise the capabilities of NewSpace.</p>
<p>No other alternative offers similar capabilities today, which explains the very strong interest and many partnership discussions we are having with companies and organisations including such as <a href="https://www.thalesgroup.com/en/global/activities/space">Thales TAS</a>, <a href="https://www.esa.int/">ESA</a>,  <a href="https://cnes.fr/en">CNES</a>, <a href="https://infiniteorbits.io/">Infinite Orbits</a>, <a href="https://www.space.org.sg/">Singapore Space Agency</a>, <a href="https://www.ohb-hellas.gr/">OHB</a>, <a href="https://www.eutelsat.com/en/home.html">Eutelsat</a>, <a href="https://www.dorbit.space/">D-Orbit</a>, and more.</p>
<p>Stay tuned to hear how SpaceOS will become the new global standard for NewSpace satellites and <a href="/contact/">get in touch</a> if you have any questions.</p>
]]></description><link>https://tarides.com/blog/2023-07-31-ocaml-in-space-welcome-spaceos</link><guid isPermaLink="false">https://tarides.com/blog/2023-07-31-ocaml-in-space-welcome-spaceos.html</guid><dc:creator><![CDATA[ Miklos Tomka ]]></dc:creator><pubDate>Mon, 31 Jul 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Reflections on the MirageOS Retreat in Morocco]]></title><description><![CDATA[<h2>Introduction</h2>
<p>Since we are a hybrid remote and distributed company, everyone at Tarides knows first-hand how important in-person retreats are for collaborating on software development. They give us a chance to focus more deeply on our work, collaborate closely, and learn from one another. We are particularly enthusiastic about the MirageOS retreats, which are organised by <a href="https://github.com/hannesm">@hannesm</a> from <a href="https://robur.coop/">Robur</a> and happen once to twice a year. These retreats bring together OCaml programmers and MirageOS enthusiasts from all over the world to share ideas and work on projects.</p>
<p>For those unacquainted with it, <a href="https://mirage.io/">MirageOS</a> is a library operating system that lets users create 'unikernels' – light-weight, single-purpose machine images designed for secure, efficient, high-performance applications. MirageOS unikernels are written in OCaml, which is a functional, semantically-rich, and type-safe programming language.</p>
<p>This blog post offers a glimpse into our journey to the recent MirageOS retreat, which took us to Morocco. We will share our most memorable experiences from the retreat - the personal stories, the community bonding, the projects we worked on, and the things we learned. So, buckle up and join us as we reminisce on a journey of technical exploration and personal growth under the Moroccan sky.</p>
<h2>The Journey to Morocco</h2>
<p>Our experience of the MirageOS retreat was as much about the journey as it was about the destination. Some of us started our trip with a train ride to La Feria in Seville. It's a big spring festival in Spain that we were excited to experience. From there, we headed to Cadiz, another city in Spain known for its history and food.</p>
<p>Our final destination was Marrakech, Morocco. We stayed at a traditional Moroccan house called a riad, named the Queen of the Medina. The owner of the riad was very welcoming, and the house was comfortable and filled with local art. During our stay, we shared rooms and meals, growing closer to the rest of the community.</p>
<p>Right outside our riad was the famous Jemaa el-Fnaa square. It's a busy marketplace and a UNESCO World Heritage site, filled with music, food stalls, and plenty of action – especially in the evenings.</p>
<p>The journey to Morocco and the experiences we had along the way helped set the stage for a productive and enjoyable retreat.</p>
<h2>The MirageOS Retreat Experiences</h2>
<p>At the heart of the retreat was the daily 'circle'. Each day, we gathered together to share our experiences and discuss what we had been working on. These discussions provided insights into the different projects, and it was inspiring to hear about the progress that each person was making, often with the help of other participants at the retreat.</p>
<p>One highlight of the retreat was the night-time presentations. These covered various subjects, and not all were directly related to OCaml or MirageOS. The diversity of topics always sparked interesting conversations and created opportunities for us to learn from each other.</p>
<p>Throughout the retreat, a topic that came up often was how to increase the adoption of MirageOS. This spurred a lot of creative thinking, as we brainstormed new ways to promote the wider use of MirageOS.</p>
<p>And of course, we also had the opportunity to work on personal projects. Two of us, for instance, added the Git server commands to the <code>ocaml-git</code> library. Another project played music from a bare-metal Raspberry Pi 4!</p>
<p>But the retreat wasn't all work. We also found time for fun and relaxation. One memorable activity was the contact improvisation dance  accompanied by live music, in which several retreat participants took part. After the week-long retreat ended, some of us stayed in Morocco to visit the Atlas Mountains, climb Jbel Toubkal, and go surfing in Imsouane.</p>
<h2>Projects and Collaborations</h2>
<p>During the retreat, we split into small groups of one or two engineers to work on different projects. The projects let us explore different aspects of Mirage that we found interesting and test the boundaries of what Mirage can do. Some of our projects included:</p>
<h3>MIDI over Bare-Metal Raspberry Pi</h3>
<p><em>Contributors: <a href="https://github.com/pitag-ha">@pitag-ha</a>, <a href="https://github.com/Engil">@Engil</a></em></p>
<p>We set out to explore the capabilities of a Raspberry Pi (RPi) in handling MIDI signals. We had a host of adapters at our disposal, including a GPIO board with MIDI DIN plugs and an adapter that could transform MIDI DIN to USB. Our colleague <a href="https://github.com/engil">@Engil</a>, who was one of the two main people working on this project, brought a synthesiser, enabling us to establish a direct connection to the GPIO board of the RPi. Additionally, the DIN-USB cable allowed us to connect our computers to the RPi for debugging purposes, using a program called <a href="https://github.com/surfacepatterns/midisnoop">Midisnoop</a>.</p>
<p>Our primary objective was to send MIDI output from the RPi, which proved to be a straightforward task. We made use of <a href="https://github.com/Dinosaure">@Dinosaure</a>'s bare-metal RPi toolchain, <a href="https://github.com/%5B@dinosaure%5D(https://github.com/Dinosaure)/gilbraltar">Gibraltar</a>, which already had UART write support for logging. Since MIDI operates on a serial protocol, we decided to send it over UART as well. We adjusted the baudrate, converted some music into MIDI bytes, and sent it to the UART. Thanks to the functionalities already present in Gibraltar, we managed to play the intro of "Mr Sandman"!</p>
<p>We also attempted to receive MIDI signals. After some troubleshooting and experiments, we concluded that the GPIO board's MIDI Out DIN connector was faulty. We confirmed our theory by installing Linux on the RPi and running a Python program for MIDI output provided by the GPIO board's provider, but to no avail. It's amusing to note that it took us a whole day to install Linux and run a single program, compared to the ease and speed of booting a bare-metal MirageOS unikernel 🤓.</p>
<p>We had envisioned implementing MIDI In support to create a "bare-metal OCaml drum machine." The idea was to convert incoming MIDI signals into drum samples, similar to how synthesisers operate. The intention was to load drum samples into the unikernel's memory and generate the corresponding drum sounds upon receiving a MIDI event.</p>
<p>In a bid to broaden our experimentation, we also wanted to explore how we could send audio from the unikernel to the host system. The solution involved writing music to the unikernel's <code>stdout</code> and piping the unikernel into an ALSA function, which then played the received music. Although this wasn't typical usage of a unikernel, it proved to be a really fun experiment.</p>
<p>This project served as a testament to the flexibility and creative applications of unikernels, and we're excited about the further possibilities that this experiment will inspire.</p>
<h3>Adding Git Server Commands to the <code>ocaml-git</code></h3>
<p><em>Contributors: <a href="https://github.com/panglesd">@panglesd</a>, <a href="https://github.com/Julow">@Julow</a></em></p>
<p>The primary goal of our project was to create a unikernel that could act as a Git server. We needed several components in order to accomplish this, including an OCaml implementation of the SSH protocol and an OCaml implementation of the Git protocol. For the SSH protocol, everything we needed was implemented in <code>awa-ssh</code>, but for <code>ocaml-git</code> things were a bit more complicated.</p>
<p><code>ocaml-git</code> implements the Git format and part of the protocol. It is also used as a backend for Irmin and for fetching data in a unikernel. However, the server side part of the protocol that we needed was missing. It had not been needed in <code>ocaml-git</code> use cases before.</p>
<p>Our main challenge was that programming in <code>ocaml-git</code> can be really hard! There were a lot of monads at the same time, as well as Higher Kinded Types. Abstractions were necessary to support all the use cases we wanted, which were to use it in a Unix program or in a unikernel, as an Irmin backend, and as a library by a unikernel. We also needed to decipher some slightly vague documentation for the Git protocol, so there was some trial and error and reverse engineering of Git going on.</p>
<p>We were lucky enough to get some great help from several people. <a href="https://github.com/Dinosaure">@Dinosaure</a> walked us through the code of <code>ocaml-git</code> and answered many of our questions about Git's protocol, whereas <a href="https://github.com/reynir">@reynir</a> helped us write a unikernel and answered our questions about SSH.</p>
<p>We implemented the project in a series of steps, starting by writing a 'cat' SSH unikernel as a basis for our server. We then implemented a server-side fetch protocol called <code>upload-pack</code>. We needed to do a lot of iterations and experimentation before we got it right, as the protocol was full of hidden details. We were finally able to create a <code>git-clone</code> that was answered by our server. It was just in time as well, as it was the last night of the retreat!</p>
<p>The next steps would be to implement the missing features like shallow clones for <code>upload-pack</code>, implement the server-side of <code>push</code> which is called <code>receive-pack</code>, and integrate all of this into a unikernel! If you'd like to help or just check out our project, you can <a href="https://github.com/mirage/ocaml-git/pull/618">look at the PR on GitHub.</a></p>
<h3>Exploring Solo5 and Multicore</h3>
<p><em>Contributors: <a href="https://github.com/haesbaert">@haesbaert</a>, <a href="https://github.com/fabbing">@fabbing</a></em></p>
<p>We were experimenting with <a href="https://github.com/Solo5/solo5">Solo5</a> and Multicore. <a href="https://github.com/haesbaert">@haesbaert</a> was trying to figure out how Unikraft booted and what parts of SMP it already had, as well as understanding what people expect from something like Unikraft. This involved checking the reservations and so on. Together with <a href="https://github.com/kit-ty-kate">@kit-ty-kate</a>, we tried to fix halting on Google Compute for Mirage, which involved diving into OpenBSD to see how they managed halting. After a lot of investigation, it seemed that in order to fix halting, we would need a proper ACPI implementation and some minor table parsing to achieve proper shutdown. <a href="https://github.com/haesbaert">@haesbaert</a> also collaborated with Hannesm to fix an Eio bug in FreeBSD.</p>
<p><a href="https://github.com/fabbing">@fabbing</a> was also working with <a href="https://github.com/Dinosaure">@Dinosaure</a> to learn about Solo5 and how to get multiple CPUs running. While they were working on Solo5 together, <a href="https://github.com/Dinosaure">@Dinosaure</a> went through and <a href="https://github.com/Solo5/solo5/pull/558">updated the Solo5 documentation.</a></p>
<h3>OCaml Splashscreen</h3>
<p><em>Contributors: <a href="https://github.com/MisterDA">@MisterDA</a></em></p>
<p>We started by wanting to explore <a href="https://www.rfc-editor.org/info/rfc9230">DNS over HTTPS</a> with MirageOS, and we managed to deploy a DNS server locally! However, with all of the exciting things going on all around us, we got distracted and started to work on smaller projects. We learned about generating OCaml bindings to C libraries with <a href="https://github.com/yallop/ocaml-ctypes"><code>ctypes</code></a> and started to expand the coverage of <a href="https://github.com/savonet/ocaml-posix/pull/12"><code>ocaml-posix</code></a>. We set an informal goal to write a binding for FUSE with <code>ctypes</code>, which is still a work in progress. We also explored the steps involved with building MirageOS and OCaml on macOS, and found and fixed a couple of bugs.</p>
<p>We then started on a really fun project! It was all about splashscreens, or the windows that are shown when a program starts. We based it on the first chronophotography of a camel walking. <a href="https://github.com/MisterDA">@MisterDA</a> extracted the camel using GIMP and turned it into a computer animation, displayed with the OCaml binding to <a href="https://github.com/fccm/OCamlSDL2">SDL2</a>. We then added a futuristic soundtrack, composed by <a href="https://github.com/engil">@Engil</a>, before being ready to present the <a href="https://github.com/MisterDA/ocamlwalk">OCamlWalk</a> project.</p>
<p>It also acts as a wrapper around <code>ocamlrun</code> that you can use to launch OCaml bytecode executables. The camel walks, and OCaml runs!</p>
<h2>Conclusion</h2>
<p>The retreat was a great opportunity to meet other developers who are enthusiastic about MirageOS. We had some great discussions and brainstorming sessions, sharing ideas and insights with each other. Morocco provided an amazing setting for the retreat, with beautiful nature and historic cultural landmarks.</p>
<p>We're happy to be part of a vibrant community with a lot of passionate people, and we're already looking forward to the next opportunity to get together!</p>
]]></description><link>https://tarides.com/blog/2023-07-27-reflections-on-the-mirageos-retreat-in-morocco</link><guid isPermaLink="false">https://tarides.com/blog/2023-07-27-reflections-on-the-mirageos-retreat-in-morocco.html</guid><dc:creator><![CDATA[ Antonin Décimo, Isabella Leandersson, Fabrice Buoro, Christiano Haesbaert, Jules Aguillon, Paul-Elliot, Sonja Heinze ]]></dc:creator><pubDate>Thu, 27 Jul 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Sandmark: Boosting Multicore Projects with Performance Benchmarking]]></title><description><![CDATA[<h3>Introduction</h3>
<p>In the realm of software development, continuous improvement is paramount. When it comes to Multicore projects, the need for thorough benchmarking becomes even more critical. This is where <a href="https://github.com/ocaml-bench/sandmark">Sandmark</a> comes into play. Sandmark, developed for the OCaml programming language, has proven to be an invaluable tool for optimising performance and aiding in upstreaming efforts. In this blog post, we will explore the benefits of using Sandmark and its role in the development of Multicore projects.</p>
<h3>Enhancing Upstreaming Efforts</h3>
<p>Sandmark has been extensively used in Multicore projects to assist with upstreaming. Its impact can be witnessed in the OCaml community, where it helped demonstrate that sequential programs running on OCaml 5 performed nearly as efficiently as those running on OCaml 4. For instance, the results achieved in the <a href="https://github.com/ocaml/ocaml/pull/10831">Multicore PR (pull request)</a> merge were accomplished using Sandmark. Additionally, the findings presented in <a href="https://kcsrk.info/papers/retro-parallel_icfp_20.pdf">the ICFP'20 paper</a> were all obtained through the utilisation of Sandmark. This tool has played a crucial role in showcasing the progress made tracking performance regressions in the OCaml compiler.</p>
<h3>Ongoing Compiler Development</h3>
<p>Even after the Multicore merge, Sandmark remains actively employed with a <a href="https://sandmark.tarides.com/">dashboard</a> for the compiler development. Its significance is evident in the multitude of pull requests related to Sandmark in the OCaml repository. For example, consider the <a href="https://github.com/ocaml/ocaml/issues/11589">issue #11589</a>, where an idle processing domain slows down major garbage collection (GC) cycles. The comparison of parallel benchmarks between the fix in the PR and the current development version of the compiler in the illustration shows the speedup comparisons. This highlights the continued reliance on Sandmark as a vital tool in the compiler development process.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/sandmark1-170w~UmEi1DpounVFkSB-dPYSVg.webp 170w, /blog/images/sandmark1-340w~Jx1JpcMXaKUnjFrltw7XAg.webp 340w, /blog/images/sandmark1-680w~S5Lyk7gfghC_K6r8UTcCwQ.webp 680w, /blog/images/sandmark1-1360w~xSl5giQ3MHT1a88tYTknjA.webp 1360w" src="/blog/images/sandmark1-1360w~xSl5giQ3MHT1a88tYTknjA.webp" alt="Speed Comparisons"></p>
<h3>Nightly Benchmarking</h3>
<p>One of the key aspects of Sandmark is its nightly benchmarking feature. Sandmark ensures that benchmarks are run regularly on diverse x86 servers, namely Turing (Intel Xeon Gold with 56 cores) and Navajo (AMD EPYC 7551 with 128 cores). This practice serves as a proactive measure to continuously identify and address performance regressions promptly. The nightly runs cover both sequential and parallel benchmarks, providing comprehensive insights into the program's behaviour under different scenarios for different inputs.</p>
<h3>Sandmark Nightly Config</h3>
<p>To simplify the process of requesting development branches for nightly benchmarking, Sandmark offers a convenient service called <a href="https://github.com/ocaml-bench/sandmark-nightly-config">"Sandmark Nightly Config"</a>. This service streamlines simplifies the configuration setup for benchmarking, thereby reducing the steps required to initiate the benchmark runs. Compiler developers only need to provide their development branch URL for the configuration, and the nightly service will execute both the sequential and parallel benchmarks. By automating this process, developers can focus on their core tasks while still benefiting from gaining insights from the regular benchmark runsing insights.</p>
<h3>Permalinks for Easy Sharing and Discussion</h3>
<p>A remarkable feature of Sandmark is the provision of permalinks. These permalinks enable users to easily share benchmark results and engage in meaningful discussions. You can specify more than two development branches across dates, and even different hosts for comparison. This capability is a game-changer for collaborative development, as it facilitates efficient communication and fosters a deeper understanding of the pull request changes using the benchmarking outcomes. The permalinks in Sandmark allow for specific results to be referenced and examined in detail.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/sandmark2-170w~1xxhpsv0fBwZ08hkY8861w.webp 170w, /blog/images/sandmark2-340w~GMlJ-3lAp0ZIO_w32aI4lQ.webp 340w, /blog/images/sandmark2-680w~iPJ9rG3LLX572thcsTH7qg.webp 680w, /blog/images/sandmark2-1360w~ILLo9jpDiDFh5u_NX-WerQ.webp 1360w" src="/blog/images/sandmark2-1360w~ILLo9jpDiDFh5u_NX-WerQ.webp" alt="Benchmarks"></p>
<h3>Importance of Perfstat Output</h3>
<p>Sandmark offers perfstat output, which plays a vital role in accurately evaluating program performance. Modern machines exhibit varying raw running times due to their complex nature. However, “instructions retired” provide a more stable and reliable metric, especially when assessing the impact of compiler optimisations. This feature ensures that performance analysis is based on consistent and meaningful measurements.</p>
<h3>Looking Towards the Future</h3>
<p>Sandmark continues to evolve, with ongoing developments in the Multicore release. The efforts put into enhancing Sandmark reflect the commitment to improving Multicore programming in OCaml. As the OCaml community pushes the boundaries of Multicore development, Sandmark will undoubtedly play a crucial role in optimising performance, tracking regressions, and ensuring the stability of the language.</p>
<h3>Conclusion</h3>
<p>Sandmark has emerged as an indispensable tool for the OCaml community, particularly in the realm of Multicore projects. Its ability to benchmark performance, catch regressions, simplify configuration, and facilitate discussions through permalinks has greatly contributed to the OCaml compiler development process. The commitment to ongoing improvements and enhancements will help measure, monitor and track the compiler development as the OCaml language evolves. We encourage you to try the above services, and share any feedback or file new feature requests or GitHub issues for the Sandmark project.</p>
<h3>References</h3>
<ul>
<li>Sandmark. https://github.com/ocaml-bench/sandmark</li>
<li>Multicore OCaml PR merge. https://github.com/ocaml/ocaml/pull/10831</li>
<li>Retrofitting Parallelism onto OCaml. https://kcsrk.info/papers/retro-parallel_icfp_20.pdf</li>
<li>Sandmark Dashboard. https://sandmark.tarides.com/</li>
<li>Idle Domain Slows Down GC Cycles. https://github.com/ocaml/ocaml/issues/11589</li>
<li>Sandmark-nightly-config. https://github.com/ocaml-bench/sandmark-nightly-config</li>
</ul>
]]></description><link>https://tarides.com/blog/2023-07-19-sandmark-boosting-multicore-projects-with-performance-benchmarking</link><guid isPermaLink="false">https://tarides.com/blog/2023-07-19-sandmark-boosting-multicore-projects-with-performance-benchmarking.html</guid><dc:creator><![CDATA[ Shakthi Kannan ]]></dc:creator><pubDate>Wed, 19 Jul 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml-CI Renovated]]></title><description><![CDATA[<p>OCaml-CI started with the goal of making a better continuous build system for OCaml projects. When we began in 2019, the goals were clear: it should provide a zero-configuration experience for OCaml projects using opam and Dune, and it should use an incremental architecture to avoid expensive recomputation of builds. We're delighted to announce that we achieved these goals, and OCaml-CI is currently tracking over five hundred repositories and processing over a hundred thousand jobs daily. This is inspiring news to those already using OCaml-CI or developers looking for a CI solution for their OCaml project.</p>
<p>Throughout 2022, the Tarides CI team worked on rennovating OCaml-CI, focusing on improving the usability of the website, adding build history for branches, supporting new platforms, and launching experimental build support. We will cover all of those things in this blog post and hope you find them useful.</p>
<p>There is also a <a href="https://discuss.ocaml.org/t/best-practices-for-continuous-integration-ci-in-2023/12380">Discuss thread on CI Best Practices</a>.</p>
<h2>What is OCaml-CI?</h2>
<p>Continuous Integration, or CI, performs a series of automated steps (or jobs), e.g., building and testing code. With it, developers can confidently and regularly integrate code into the central repository, relying on the automated CI system to detect and even fix problems early. This reduces production issues and leads to more robust and secure software. OCaml-CI is a Continuous Integration tool tailored for OCaml projects.</p>
<p><strong>VALUE PROPOSITION OF OCAML-CI</strong></p>
<ul>
<li>OCaml specific / no configuration</li>
<li>Check on various platforms</li>
<li>Linting like version bounds (upper and lower) and project metadata</li>
<li>Incremental caching of builds</li>
</ul>
<p>OCaml-CI adds value by targeting just OCaml projects that are written with the standard OCaml opam tooling for package management and Dune for building code. Because OCaml-CI targets a specific language, it does not require any configuration. Instead, it derives the necessary information from metadata in the <code>opam</code> and <code>dune</code> project files.</p>
<p>You never have to teach OCaml-CI about how to build OCaml!</p>
<p>Additionally, it can do clever things like linting your <code>opam</code> and <code>dune</code> files to check for common mistakes. It also checks the upper and lower version bounds of packages to see where they break. The biggest feature over most popular CI systems is that OCaml-CI can derive which hardware and operating system platforms a package supports and do builds across all of those platforms! That means if you want to check on Linux ARM64 or MacOS x86_64 or Linux s390x you can. All that comes with the added benefit of <a href="https://icfp20.sigplan.org/details/ocaml-2020-papers/6/OCaml-CI-A-Zero-Configuration-CI">a caching strategy based on the incremental architecture</a>, which won't repeat builds if not necessary.</p>
<p>Lets look at the new features that were added!</p>
<h3>Redesigned UI Using Dream</h3>
<p>When OCaml-CI came into existence in 2019, the focus was on providing a CI system based on an incremental architecture, so the user interface (UI) for OCaml-CI was kept simple. In 2022, the team worked with a designer to develop a consistent and contemporary theme to make the site look and feel like a modern website. We decided on a tech stack of <a href="https://github.com/aantron/dream">Dream</a> for the web framework, <a href="https://tailwindcss.com/">Tailwind CSS</a> for styling, <a href="https://github.com/ocsigen/tyxml">Tyxml</a> for HTML generation, and <a href="https://github.com/tmattio/omigrate">Omigrate</a> for database migrations. In the <a href="#design-and-implementation">Design and Implementation</a> section below, we cover the technical reasons for each choice.</p>
<h3>Build History</h3>
<p>Over the years, people consistently requested a provided build history. So we added a new history page to show the build history of a branch. This feature allows users to conveniently access and view historical builds in the context of similar builds of the branch. It shows every commit built by OCaml-CI, a summary of each build (including build status, the time at which the build started, and the running time of the build), and links to the commit's build page.</p>
<insert image="">
<h3>Live Updates</h3>
<p>Pages now automatically update with new information as the build progresses! OCaml-CI adds build steps as they are created, so build statuses and runtimes all update as the build occurs.</p>
<p>In 2023, it’s unusual to have to refresh a page to update the information, so this is just us catching up!</p>
<h3>Enriching Build Information</h3>
<p>Timestamps and durations relevant to a build and each of its steps are now available. This feedback enables development teams to monitor the resources and time taken for their builds. It gives them what they need to identify bottlenecks and opportunities for faster build times.</p>
<h3>Summary of a Repository's Health</h3>
<p>When looking at an organisation's page, you will now see a summary of the default branch for each of your repos. Inspired by the UI of BuildKite, we hope to provide teams with a view of their builds that indicates the overall health of their repository. The chart of the last 15 builds makes anomalous builds easy to identify and investigate.</p>
<h3>Mobile Version</h3>
<p>OCaml-CI can now be conveniently used from a variety of devices. We have rewritten our pages to be responsive to mobile devices, choosing to pare information to the essentials for small screens. All the functionality of the main site is still available on the mobile version, so you can view logs or navigate to the GitHub PR for a build.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/ci_mobile-170w~8AxkY-FuzQu52DNc45a_wQ.webp 170w, /blog/images/ci_mobile-340w~tJmvyNw7QF4UBRFxVoXegA.webp 340w, /blog/images/ci_mobile-680w~9g4qCdP9_lSDiqPt4o6t1A.webp 680w, /blog/images/ci_mobile-1360w~ZPq_xUC-h1vZ53obCsMncQ.webp 1360w" src="/blog/images/ci_mobile-1360w~ZPq_xUC-h1vZ53obCsMncQ.webp" alt="Mobile Screenshot"></p>
<h3>Experimental Builds</h3>
<p>Experimental builds was a conceptual feature that was added to OCaml-CI in order to support build types that might not be stable or to introduce ones without breaking CI for all projects. Think of them as a kind of feature flag. Experimental builds are clearly labelled with <code>(experimental)</code> in the UI and will not report as failures if every other build passes. They let us boldly introduce new features like supporting new platforms or running new linting checks like package lower bounds.</p>
<h3>macOS Experimental Builds</h3>
<p>Using Experimental builds, we added support for macOS (both x86_64 and ARM64) to OCaml-CI. These builds will run on the latest two versions of OCaml on both architectures. Currently these are marked as experimental as we work towards making macOS builds more efficient. The path to supporting macOS has been a long one, starting back in late 2021, and has gone through two different implementations before reaching a stable state in early 2023. We have plans to publish a post going deeper into the technical details soon.</p>
<h2>Design and Implementation</h2>
<p>The story of why technical decisions are made on a project are often as interesting as the project itself. Here we will go through the thoughts about the technologies used and why.</p>
<p>As previously mentioned, we kicked off this work with a designer to help develop a consistent and contemporary theme for the site. They created a set of designs in Figma and also made example HTML pages using <a href="https://tailwindcss.com/">Tailwind CSS</a> as a basis for the style sheets. Tailwind works by scanning HTML files, JavaScript components, and any other templates for class names to generate the corresponding styles and then write them to a static CSS file. There is an opam package <a href="https://github.com/tmattio/opam-tailwindcss">tailwindcss</a> that wraps this all up for us.</p>
<p>We decided to use <a href="https://github.com/aantron/dream"><code>Dream</code></a> to replace OCaml-CI’s previous web layer based on <a href="https://github.com/mirage/ocaml-cohttp"><code>Cohttp</code></a> library for the following reasons:</p>
<ul>
<li><code>Cohttp</code> is a low-level library, so we had to hand-roll solutions to standard patterns that are generally provided by web frameworks. For example, we had to solve the CSRF problem consistently throughout our usage of forms and also construct our own solution to show flash messages. We understood <code>Cohttp</code> but were interested in taking the opportunity to investigate other web frameworks in the OCaml landscape.</li>
<li>Inspired by frameworks like Sinatra (of Ruby fame) and Flask (from Python), our colleagues Rudi Grinberg and Thibaut Mattio (and others) had constructed a web framework called <a href="https://github.com/rgrinberg/opium">Opium</a>. They suggested that we check out Dream.</li>
<li>We were impressed by its elegance and polish. Dream has brilliant <a href="https://aantron.github.io/dream/">documentation</a>, a ton of examples, and convenient functions and support for several standard patterns in web development. It also uses common OCaml types, so adding it to the project would be relatively straightforward. We immediately saw an opportunity to accomplish several things at once:
<ul>
<li>Support a promising project by adopting it and contributing to it</li>
<li>Create a non-trivial example of using Dream for the community</li>
<li>Reduce complexity, modernise the UI, and make it easier to add new features</li>
<li>Have a lot of fun!</li>
</ul>
</li>
</ul>
<p>As we began to work with Dream, we made the following choices:</p>
<ul>
<li>We chose to work with <code>TyXML</code> over <code>Eml</code> so that we would have the guardrails of typed templates to help write correct HTML. This proved to be challenging in the beginning, but examples from the Opium project really helped our team figure out how to wield <code>TyXML</code> correctly.</li>
<li>Our team did not have any CSS expertise and, frankly, was a little bit at sea with how to implement some of our designer's suggested designs. Tailwind CSS really helped us out here. In particular, it made it possible for us to achieve responsiveness for different screens and light and dark modes.</li>
<li>For our database layer, we chose to work with <a href="https://github.com/tmattio/omigrate">Omigrate</a>, so we could introduce migrations and develop our information model with confidence.</li>
</ul>
<p>We are working on improving the signup process and on introducing <a href="https://github.com/ocsigen/js_of_ocaml"><code>js_of_ocaml</code></a> to replace the plain JavaScript that we previously introduced.</p>
<h2>We’d Love Your Feedback</h2>
<p>If you have an OCaml project hosted on GitHub or GitLab and would like to test drive OCaml-CI, please follow our <a href="https://ocaml.ci.dev/getting-started">getting-started</a> guide. There are many popular projects already using OCaml-CI to improve their development, and we want to see your project too.</p>
<p>Please open an issue on https://github.com/ocurrent/ocaml-ci if you run into any problems or to suggest improvements and point out missing features. The Tarides team wants to support more OCaml platforms like Windows and FreeBSD so we can cover the full OCaml Tier 1 supported platforms and to continue improving the UI experience.</p>
<p>If you are curious about web development in OCaml, we recommend checking out <a href="https://github.com/aantron/dream">Dream</a>. Please use our code for reference and ask us questions or make suggestions for improvements. We are using the following technologies:</p>
<ul>
<li><a href="https://github.com/ocurrent/current_incr">current_incr</a> - Self-adjusting computations</li>
<li><a href="https://github.com/ocurrent/ocurrent">OCurrent</a> - a CI/CD pipeline OCaml eDSL</li>
<li><a href="https://github.com/ocsigen/lwt">Lwt</a> - OCaml promises and concurrent I/O</li>
<li><a href="https://github.com/ocsigen/tyxml">Tyxml</a> - Typed HTML and SVG</li>
<li><a href="https://github.com/mirage/capnp-rpc">Capnp_rpc</a> - OCaml Cap'n Proto RPC library</li>
</ul>
<p>If we can learn and improve from your experience, we all win! Thank you!</p>
<h2>Acknowledgements</h2>
<p>The Tarides engineers that delivered this work are Étienne Marais, Ben Andrew, and Navin Keswani. We got much support and feedback from several of our Tarides colleagues and others in the OCaml community, and we are very grateful for all we learned from them. Special mention to Thibaut Mattio and Tim McGilchrist.</p>
</insert>]]></description><link>https://tarides.com/blog/2023-07-12-ocaml-ci-renovated</link><guid isPermaLink="false">https://tarides.com/blog/2023-07-12-ocaml-ci-renovated.html</guid><dc:creator><![CDATA[ Navin Keswani, Tim McGilchrist ]]></dc:creator><pubDate>Wed, 12 Jul 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Making OCaml 5 Succeed for Developers and Organisations]]></title><description><![CDATA[<p>OCaml recently won the <a href="/blog/2023-06-20-ocaml-receives-the-acm-programming-languages-software-award/">ACM SIGPLAN PL Software Award</a>. The award recognises a software system that has had a significant impact on programming language implementation, research, and tools. It is especially notable that <a href="https://twitter.com/kc_srk/status/1670849062684467202/photo/3">4 out of the 14</a> named OCaml compiler developers are affiliated with Tarides: Anil, David, Jérôme, and me. In this post, I discuss the wider effort afoot at Tarides in order to make OCaml 5, the latest release of the OCaml programming language, succeed for developers. I should note that I shall specifically focus on the new OCaml 5 features and omit important developments such as Tarides' work on the OCaml platform, which is discussed <a href="https://discuss.ocaml.org/t/a-roadmap-for-the-ocaml-platform-seeking-your-feedback/12238">elsewhere</a>.</p>
<p>I started hacking on OCaml when I joined Anil, Stephen, and Leo (who are also named in this award) at OCaml Labs in the University of Cambridge in 2014 to work on the Multicore OCaml project. The aim of the Multicore OCaml project was to add native support for concurrency and parallelism to the OCaml programming language. Multicore OCaml compiler was maintained as a fork of the OCaml compiler for many years before it merged with the mainline OCaml compiler in <a href="https://github.com/ocaml/ocaml/pull/10831">January 2022</a>. After almost a year of work stabilising the features, OCaml 5.0 was finally released in <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974">December 2022</a>, nearly 8 years after the first commit.</p>
<p>Has the Multicore OCaml project succeeded with the release of OCaml 5.0? The short answer is <em><strong>No</strong></em>. There is a long road to making OCaml 5 succeed for the developers. The goal of making OCaml 5 succeed for the developers is a two-step process:</p>
<ol>
<li>Help developers transition existing programs to OCaml 5</li>
<li>Help developers take advantage of new concurrency and parallelism features in OCaml 5</li>
</ol>
<h2>Transitioning developers to OCaml 5</h2>
<p>Even with the arrival of OCaml 5, most OCaml programs will remain sequential forever. It is important that developers can successfully transition their OCaml projects over to OCaml 5, even if they don't plan to use the new features. We have carefully designed OCaml 5 such that the breaking changes are minimized. In particular, we eschewed a potentially more scalable GC design since it broke the C FFI compatibility (see section 7 "Discussion" in the <a href="https://kcsrk.info/papers/retro-parallel_icfp_20.pdf">ICFP 2020 paper</a> on the new GC design). The only breaking changes in OCaml 5 were the removal of the support for naked pointers and the unrelated removal of deprecated functions from the standard library. We released a <a href="https://discuss.ocaml.org/t/ann-a-dynamic-checker-for-detecting-naked-pointers/5805">dynamic detector for naked pointers</a> to help developers find and remove naked pointers from their codebase.</p>
<h3>Restoring Unimplemented Features</h3>
<p>OCaml 5.0 was an experimental release with many features unimplemented. In particular, OCaml 5.0 only supported the <a href="https://github.com/ocaml/ocaml/pull/10831">x86</a> and <a href="https://github.com/ocaml/ocaml/pull/10972">ARM</a> backends. With impressive efforts from the community, the OCaml maintainers have <strong>restored support for All Tier 1 platforms</strong> including <a href="https://github.com/ocaml/ocaml/pull/11418">RISC-V</a>, <a href="https://github.com/ocaml/ocaml/pull/11712">s390x</a> and <a href="https://github.com/ocaml/ocaml/pull/12276">Power</a>. Tarides helped implement or review the support for all of these backends. Tarides engineers also restored other important features such as <a href="https://github.com/ocaml/ocaml/pull/11827">GC mark loop pre-fetching</a> and <a href="https://github.com/ocaml/ocaml/pull/11144">frame-pointer support for x86 backend</a>. These features will be in OCaml 5.1.</p>
<p>We have also been working on restoring other big-ticket items such as compaction and <code>statmemprof</code> which were not implemented for OCaml 5.0. In OCaml, compaction is the only time when the runtime releases memory allocated for the heap back to the operating system. Many long-running programs have an initialisation phase where they use a lot of memory followed by a steady state phase where they operate for a long time with less memory. It is a common practice to call <code>Gc.compact()</code> after the initialisation phase so that the steady-state memory usage of the program remains low. Without compaction, the steady state will also use as much memory as the peak memory usage. This problem was <a href="https://discuss.ocaml.org/t/ocaml-5-gc-releasing-memory-back-to-the-os/11293">reported</a> by the <a href="https://fbinfer.com/">Infer</a> team at Meta (who were otherwise able to switch to OCaml 5 easily, thanks to our focus on backwards compatibility).</p>
<p>Tarides engineers have <a href="https://github.com/ocaml/ocaml/pull/12193">opened a PR</a> for restoring compaction. The compaction feature is slated to be restored in OCaml 5.2. We have also been working on restoring <code>statmemprof</code>, the statistical memory profiler for OCaml. We are hoping to have a PR ready for this in the coming weeks.</p>
<h3>Fixing Performance Regressions</h3>
<p>OCaml 5 is a major rewrite of the runtime system and comes with a completely new allocator and a garbage collector (GC). As a result, some large OCaml projects such as <a href="https://github.com/ocaml/ocaml/issues/11662">Frama-C</a>, <a href="https://github.com/ocaml/ocaml/issues/11913">Pyre</a>, <a href="https://github.com/EasyCrypt/easycrypt/issues/390">EasyCrypt</a>, and <a href="https://discuss.ocaml.org/t/ocaml-5-gc-releasing-memory-back-to-the-os/11293/16">Infer</a> have reported performance regressions. We have been steadily fixing these issues and have not encountered any serious challenges here. Many of the fixes have been incorporated into 5.1, and we expect more performance fixes to land in 5.2. The very fact that large open-source projects can build and test their code on OCaml 5 is itself a testament to our careful backwards-compatible implementation of OCaml 5.</p>
<h4>Allocator Performance</h4>
<p>A potential source of performance regressions is the allocator. OCaml 5 uses a new parallelism-aware allocator written from scratch and different from the well-performing best-fit allocator available in OCaml since 4.10. Major industrial users of OCaml have <a href="https://blog.janestreet.com/memory-allocator-showdown/">reported</a> that best-fit performs better than the earlier first-fit and next-fit allocators. In our benchmarking efforts, we observed that OCaml 5 allocator performs as well as the best-fit allocator, as both allocators utilise size-segmented pages for the allocator. But <a href="https://github.com/ocaml-bench/sandmark/tree/main/benchmarks">our benchmarks</a> are admittedly much smaller than industrial OCaml workloads.</p>
<p>In order to derisk the transition to OCaml 5, we have <a href="https://github.com/sadiqj/ocaml/tree/backport_alloc">backported</a> the OCaml 5 allocator to OCaml 4 compiler. The backported allocator helps industrial users run their workloads in OCaml 4 with only the allocator changed, which helps identify any regressions. We are working with one of our customers to test the backported allocator on their internal workloads. We hope to identify regressions that only show up at scale and fix them for everyone using OCaml 5.</p>
<h3>Continuously Benchmarking Compiler Quality</h3>
<p>One of the goals of OCaml 5 is that, for sequential programs, the performance of those programs running on OCaml 5 is no worse than running on OCaml 4. Not only can the developers compile and run their existing sequential code in OCaml 5, but the expectation is that the performance is also similar.
To this end, we have been doing nightly benchmarking of compiled code using <a href="https://sandmark.tarides.com/">Sandmark</a>, a benchmarking service consisting of real-world, open-source OCaml programs. Sandmark monitors a multitude of performance parameters related to running time, memory usage, and GC latency.</p>
<p>The benchmarks and the related repository of OCaml packages are constructed in such a way that they can build with both OCaml 4 and OCaml 5. This lets the compiler developers quickly identify any regressions that may be introduced in OCaml 5 with respect to the <em>same code</em> compiled under OCaml 4. Tarides is working to turn this into a GitHub bot that will make it easier for compiler developers to trigger benchmarking runs on development branches.</p>
<h3>Better Observability</h3>
<p>Another strong reason to move to OCaml 5 from OCaml 4, even if you plan to remain sequential, is the better observability tools that come with OCaml 5. Starting from OCaml 5, the compiler supports a new feature named <a href="https://v2.ocaml.org/manual/runtime-tracing.html"><em>runtime events</em></a>, which brings deep introspection capabilities for OCaml programs running in production. Runtime events add a series of probes to the OCaml program that emits data at specific events. This lets the <em>consumers</em> of these events produce interesting insights into the running programs. For example, <a href="https://github.com/tarides/runtime_events_tools/tree/main#olly">Olly</a> is a consumer that reports GC statistics including latency distribution. Olly can also produce traces of OCaml program runs visualising the GC behaviours.</p>
<p>An important aspect of runtime events is that the cost of the probes in the fast path (when the probes are not emitting data) is so low that it is available for every OCaml 5 program. In particular, you do not need to recompile your programs with special options to enable event collection. Hence, every OCaml 5 program can be introspected at runtime for interesting events using Olly.</p>
<p>By default, the only probes available are to do with GC events. OCaml 5.1 also brings in support for <a href="https://github.com/ocaml/ocaml/pull/11474"><em>Custom events</em></a>, where the user can describe new probes. It unlocks exciting possibilities for application-specific introspection. For example, <a href="https://github.com/tarides/meio">Meio</a> is a command-line tool that lets the user monitor the status of their application built using <a href="https://github.com/ocaml-multicore/eio">Eio</a>, a new concurrency library built using OCaml 5 features, at a per fiber (lightweight task) granularity.</p>
<h2>Taking Advantage of OCaml 5 Features</h2>
<p>We anticipate two kinds of developers to take advantage of OCaml 5:</p>
<ol>
<li>Those who want to use the new features in their existing code.</li>
<li>Those who want to write new code using the new features.</li>
</ol>
<p>There is an increase of positive noise around OCaml recently, which may attract new developers and organisations to OCaml. However, given the millions of lines of existing OCaml code, our aim is to tackle (1) first. We hope that the experience of helping (1) succeed will inform what we should focus on for (2).</p>
<h3>Primitive Features</h3>
<p>It is important at this point to note that OCaml 5 brings in distinct features for native concurrency and parallelism support in OCaml. For concurrency, OCaml 5 adds <em>effect handlers</em>, and for parallelism, it adds <em>domains</em> to the language. These features are spartan by design, and our aim is to build expressive libraries on top of these features, which will live outside the compiler distribution. The OCaml manual pages on <a href="https://v2.ocaml.org/manual/effects.html">effect handlers</a> and <a href="https://v2.ocaml.org/manual/parallelism.html">parallelism</a> give a good overview of these primitive features. I also discuss the approach we've taken in retrofitting concurrency to OCaml in the <a href="https://icfp22.sigplan.org/details/icfp-2022-papers/48/Retrofitting-Concurrency-Lessons-from-the-Engine-Room">ICFP 2022 Keynote</a>.</p>
<h3>Concurrency Libraries</h3>
<h4>Eio -- I/O Library</h4>
<p>For asynchronous, non-blocking I/O, OCaml 4 has two industrial-strength libraries such as Lwt and Async. These libraries simulate concurrency using a monad. They are both very successful, and OCaml code that does asynchronous I/O uses one of these libraries. These libraries do have some downsides in that, due to the use of a monad, they don't produce useful backtraces, and OCaml's built-in exceptions cannot be used. The separation of synchronous and asynchronous code (function colours) and the lack of easy-to-use, higher-kinded polymorphism in OCaml means that one ends up with two versions of useful functions: one for monadic code and another for non-monadic code. This leads to code duplication such as the need to have a separate <a href="https://ocsigen.org/lwt/latest/api/Lwt_list">Lwt's list module</a>. These libraries can continue to be used in OCaml 5, but given that these libraries are not parallelism-safe, one cannot write parallel code that takes advantage of them out of the box.</p>
<p><a href="https://github.com/ocaml-multicore/eio">Eio</a> is a new direct-style I/O library built using effect handlers. It avoids function colouring by using native stacks provided by effect handlers, unlike Lwt and Async which simulate it using a monad. Thanks to this, Eio produces faster code, supports built-in exceptions, produces good backtraces, and avoids code duplication. Eio also is built to be parallelism-safe. Eio provides a generic cross-platform API that can utilise optimised backends on different platforms such as <a href="https://en.wikipedia.org/wiki/Io_uring">io_uring on Linux</a>.</p>
<p>One particular aspect that I would like to highlight is that Eio provides bridges for Async and Lwt so that existing code can be <em>incrementally</em> translated to Eio. This aspect is crucial for adoption, as we believe that it is impractical to translate a large Lwt or Async codebase over to Eio in one go. Tarides is currently working towards the goal of <a href="https://github.com/ocaml-multicore/eio/issues/388">Eio 1.0</a>, which we expect to be released by Q3 2023. If you are interested in using Eio, Tarides engineers are running <a href="https://icfp23.sigplan.org/details/icfp-2023-tutorials/4/Porting-Lwt-applications-to-OCaml-5-and-Eio">a hands-on tutorial</a> on porting Lwt applications over to Eio at ICFP 2023.</p>
<h4>Saturn -- Parallel Data Structures</h4>
<p>An essential component in the parallel programming toolkit is a library of parallel data structures. A sequential stack or queue data structure is fairly uncontroversial, and it is common to have only a single stack or queue implementation in the language. Indeed, we have a single <a href="https://v2.ocaml.org/api/Stack.html">stack</a> and a single <a href="https://v2.ocaml.org/api/Queue.html">queue</a> data structure in the OCaml standard library. The addition of parallelism brings an explosion of possibilities and challenges:</p>
<ul>
<li><strong>Correctness</strong> -- the addition of concurrency makes it much harder to reason about the correctness of the data structures.</li>
<li><strong>Specialisation</strong> -- the performance of the data structure varies widely based on the number of parallel threads accessing the data structure. Hence, it is common to have specialised data structures that are optimised for a limited number of threads and capacity, such as single or multiple producers and consumers to bounded or unbounded queues.</li>
<li><strong>Progress</strong> -- Should a pop operation on an empty queue block the caller or should it return immediately with a <code>None</code>? Both options are useful in different circumstances, but supporting one or the other will mean very different tradeoffs and hence, different implementations. Moreover, the non-blocking options are <a href="https://en.wikipedia.org/wiki/Non-blocking_algorithm">further classified</a> in literature based on the progress in the presence of concurrent operations.</li>
<li><strong>Composability</strong> -- In a typical parallel data structure each of the individual operations such as a push or a pop is atomic. What if our application demands that multiple operations be performed atomically? Putting a lock around the entire thing does not often work since it affects performance non-trivially and introduces correctness issues such as deadlocks. There are other mechanisms for well-behaved composition such as software transactional memory.</li>
</ul>
<p>In other languages, this explosion in the state space often leads to a multitude of concurrency libraries, with overlapping features and different trade-offs, often not clearly labelled. Developers frequently face a challenge choosing the right library with the right trade-off. The correctness of the implementations is also often unclear.</p>
<p>At Tarides, we have been working towards <a href="https://github.com/ocaml-multicore/lockfree/pull/67">Saturn</a>, a library that brings together all of our efforts at building parallelism-safe libraries. Saturn will consist of lock-free and lock-based, blocking and non-blocking, composable and non-composable parallel data structures under one roof. Each of the different data structures will have a default version that is good enough to be used for parallelism and will have well-documented variants with clearly labelled tradeoffs.</p>
<p>Our composable atomic data structures are built over the <a href="https://github.com/ocaml-multicore/kcas">kcas</a> library which provides a software transactional memory (STM) on top of lock-free multi-word compare-and-swap (MCAS) primitive. While the kcas library implements MCAS in software efficiently, with the arrival of <a href="https://github.com/ocaml/ocaml/pull/12276">Power backend in OCaml 5</a>, we plan to explore the promise to utilise <a href="https://research.ibm.com/publications/transactional-memory-support-in-the-ibm-power8-processor">hardware transactions</a> for MCAS.</p>
<p>To ensure correctness, Saturn data structures are model-checked using <a href="https://github.com/ocaml-multicore/dscheck">dscheck</a>, an experimental model checker for OCaml that cleverly exploits effect handlers to mock and control parallel scheduling. We also plan to <a href="https://autumn.ocamllabs.io/">continuously benchmark</a> the data structure to monitor any performance regressions. We expect Saturn to be released in Q3 2023.</p>
<h4>Domain-Local Await</h4>
<p>With OCaml 5, there are several notions of concurrency:</p>
<ul>
<li>Domains -- OS threads potentially running in parallel on different cores</li>
<li>Systhreads -- OS threads on a given domain that timeshare a domain</li>
<li>Fibers -- Lightweight, language-level threads implemented by the concurrency library. Each concurrency library may have its own scheduler.</li>
</ul>
<p>This makes the task of writing blocking data structures, such as blocking channels, challenging because the blocking mechanism is specific to each notion of concurrency. Ideally, we would like to write blocking data structures that are parametric over the blocking mechanism so that we can describe blocking channels once and for all of the different notions of concurrency.</p>
<p>To this end, Tarides has been developing <a href="https://github.com/ocaml-multicore/domain-local-await">domain-local await</a> (DLA), a scheduler-independent mechanism for blocking. The goal is that concurrency libraries provide the implementation of the DLA interface, and with this, they can use blocking data structures from Saturn. For example, with the <a href="https://github.com/ocaml-multicore/eio/pull/494">implementation of a DLA interface in Eio</a>, it is able to utilise <a href="https://github.com/ocaml-multicore/kcas/pull/32">blocking transactions in kcas</a>. By separating out the blocking mechanism from the blocking data structures, different concurrency libraries such as <code>eio</code> and <a href="https://github.com/ocaml-multicore/domainslib"><code>domainslib</code></a> may communicate easily. At Tarides, we are exploring other scheduler-independent mechanisms for <a href="https://github.com/ocaml-multicore/domain-local-timeout"><code>timeout</code></a> and <a href="https://github.com/polytypic/io"><code>io</code></a>.</p>
<h3>Multicore Testing Tools</h3>
<p>The task of moving a large OCaml codebase to take advantage of new OCaml 5 features may seem daunting. It is likely that none of the existing code was written with concurrency and parallelism in mind. Tarides has been working to empower software engineers with multicore testing tools in order to ease the process of using the new OCaml 5 features.</p>
<h4>Thread Sanitizer</h4>
<p>When parallelism is introduced in a code base, there is the risk of introducing <em>data races</em>. A data race is said to occur when there are two accesses to a memory location, with at least one of them being a write, and there is no synchronisation between the accesses. For example, the following program:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">ref</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">d</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">r</span><span class="ocaml-source">
</span></code></pre>
<p>has a data race, since the main domain and the newly spawned domain <code>d</code> race to access the reference <code>r</code>, and there is no synchronisation between the accesses.</p>
<p>As a pragmatic language, OCaml encourages the use of mutable state with primitive operations such as reference cells, mutable record fields, arrays, and standard library data structures such as hash tables, stacks, and queues with in-place modification. Thus, it is likely that the addition of parallelism to an OCaml code base will introduce data races.</p>
<p>In C++, the behaviour of a program with data races is undefined. In OCaml, the situation is much better. OCaml programs with data races have <a href="https://v2.ocaml.org/manual/memorymodel.html">well-defined semantics</a>. In particular, a program with data races will not violate type safety and will not crash. That said, the programs with data races may produce behaviours that cannot be explained only by the interleaving of operations from different threads. Hence, it is important that data races are detected and removed from the code base.</p>
<p>To this end, Tarides has developed Thread Sanitizer (TSan) support for OCaml. TSan is an approach <a href="https://dl.acm.org/doi/10.1145/1791194.1791203">developed by Google</a> to locate data races originally for C++ code bases. It works by instrumenting executables to keep a history of previous memory accesses (at a certain performance cost) in order to detect data races, even when they have no visible effect on the execution. TSan instrumentation has been implemented in various compilers (GCC, Clang, as well as the Go and Swift compilers) and has proved very effective in detecting hundreds of concurrency bugs in large projects. Executables instrumented with TSan report data races <em>without false positives</em>. However, data races in code paths that are not visited will not be detected.</p>
<p>Tarides engineers have used TSan successfully to port large non-trivial code bases such as the work-in-prograss port of <a href="https://github.com/mirage/irmin">Irmin</a> to Multicore. The response from the developers using TSan has been overwhelmingly positive. A particularly attractive feature of TSan in OCaml is the ease of use. The developer merely needs to install a different compiler switch with TSan enabled, and without any additional work, TSan reports data races with accurate backtraces for the conflicting accesses. A <a href="https://github.com/ocaml/ocaml/pull/12114">PR</a> for adding TSan support for OCaml is currently open. TSan support for OCaml is likely to appear in OCaml 5.2.</p>
<h4>Property-Based Testing</h4>
<p>Data races are just one of the hazards of parallel programming. Even without data races, the program may produce different results across several runs due to non-determinism. How can the developers gain more confidence about the correctness of their implementations? To this end, we have been developing two property-based testing libraries namely <a href="https://github.com/ocaml-multicore/multicoretests">Lin and STM</a>.</p>
<p>In property-based testing, the programmer provides a specification about the program that should remain true and the system tests that the properties hold under a large number of different executions, typically randomly generated inputs. In the case of Lin and STM, the program is tested under different interleavings of domains. Lin tests whether the results obtained under parallel execution correspond to the same operations applied one after the other in a sequential execution. STM take a pure model description and compares the results to the actual results seen in a parallel execution.</p>
<p>Both libraries have been <a href="https://github.com/ocaml-multicore/multicoretests#issues">extremely effective</a> in identifying issues in the standard library under parallel execution. The OCaml standard library was implemented without parallel execution in mind. While much of the standard library is not parallelism-safe, we do not expect parallel access to the standard library to crash. Lin and STM have been particularly successful in identifying crashes. We believe that Lin and STM will help OCaml 5 developers gain more confidence that their code is correct under parallel execution.</p>
<h2>Call for Action</h2>
<p>If you have an existing OCaml code base, please try OCaml 5 today. If you find regressions, please file an issue on the <a href="https://github.com/ocaml/ocaml/issues/">OCaml GitHub repo</a>. If you are considering utilising the new OCaml 5 features, please give the concurrency libraries and the tools a go. We would love to hear whether the libraries and tools work for you. File issues in corresponding repos if you find anything that is amiss. If you are looking for commercial support on any of these topics, do not hesitate to <a href="/contact/">contact us</a>.</p>
<p>All of the work discussed in this post are open-source. If you wish to contribute to these efforts, please look for the "good first issue" tag in any of these repos. If you are looking to learn, please head over to the <a href="https://ocaml.org/community">community section</a> to ask us questions and share and discuss OCaml-related topics.</p>
<p>Happy hacking!</p>
]]></description><link>https://tarides.com/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations</link><guid isPermaLink="false">https://tarides.com/blog/2023-07-07-making-ocaml-5-succeed-for-developers-and-organisations.html</guid><dc:creator><![CDATA[ KC Sivaramakrishnan ]]></dc:creator><pubDate>Fri, 07 Jul 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Zero-Day Attacks: What Are They, and Can a Language Like OCaml Protect You?]]></title><description><![CDATA[<p>Zero-day attacks have been getting increased media attention lately, but what are they? And how can we protect ourselves? Google’s <a href="https://googleprojectzero.blogspot.com/2022/04/the-more-you-know-more-you-know-you.html?m=1">Project Zero</a> tracks zero-day vulnerabilities at major software vendors. In 2021, their tracker noted the detection and disclosure of <a href="https://googleprojectzero.blogspot.com/2022/04/the-more-you-know-more-you-know-you.html?m=1">58 in-the-wild zero-day exploits</a>, which was more than any other year since they started tracking in 2014. This suggests an increased awareness of zero-days among the community of developers, explaining the increased number of reports.</p>
<p>This article will give you an overview of what zero-day attacks are, as well as some of the ways to limit the risks they pose. One way to mitigate zero-day attacks is to utilise a secure-by-design language such as OCaml. In this post, we shall see how OCaml promotes secure-by-design software construction practices and how this mitigates the threat of zero-day attacks. There is a lot that could be said on this topic, and this post will only scratch the surface, but it will be a good introduction and overview to an aspect of OCaml that's not talked about enough!</p>
<h2>Zero-Day Attacks and Trends</h2>
<p>Some basics first: Zero-day attacks are so called because they describe a scenario where threat actors take advantage of an as-of-yet unknown vulnerability in the code of the target. The purpose of the hacks varies; it could be used to introduce various forms of malware into the target’s computer, including ransomware, or to gain access to private identifying information for a phishing scam.</p>
<p>Since it’s an unknown and unpatched vulnerability, the developers are said to have ‘zero days’ to respond to the threat. This also means that whatever antivirus program someone may have in place will be unequipped to handle the threat. This makes the target incredibly vulnerable, being unprotected for the time it takes to release a security patch for the issue – not to mention the time it will take for all users to install that patch.</p>
<p>Hackers and researches are incentivised to find vulnerabilities by the <a href="https://techmonitor.ai/partner-content/zero-day-vulnerability-exploit-spyware">significant pay-outs offered by private companies</a> that buy and sell zero-day exploits. These companies act as brokers and resell the zero-day exploits to interested parties. Exploits that are in high demand can sell for sums in excess of <a href="https://www.sirp.io/blog/behind-the-rise-of-the-million-dollar-zero-day-market/">one million US dollars</a>. Since the market isn't regulated, it’s hard to track what a buyer uses an exploit for once it's been sold.</p>
<p>Contrary to popular belief, every major operating system can be hacked and exploited as a result of a zero-day attack. While significantly more zero-day attacks are targeted towards Microsoft Windows rather than Apple’s macOS, this is a result of their proportionately larger market share. <a href="https://www.itpro.com/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">Essentially, the more users it has, the more attractive the platform is to attackers,</a>. Attacks on macOS and iOS still happen.</p>
<p>Furthermore, the strengthening of cybersecurity measures across the board has made <a href="https://www.itpro.com/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">zero-day attacks a more attractive option for cybercriminals</a>. Rather than trying to circumvent increasingly strong protective measures, hackers are opting for finding unguarded software vulnerabilities and new attack vectors.</p>
<p>The danger posed by these attacks can affect end users in unpredictable ways. For example, if a financial institution is targeted through software they use, hackers could steal sensitive financial information and conduct fraudulent transactions. This could in turn put the company’s customers at risk. In this way, zero-day attacks are a worry for everyone, as in our increasingly digital world we all have something to lose to a cyberattack.</p>
<h2>Secure-by-Design, a Possible Solution?</h2>
<p>With the rise of zero-day attacks and exploits, focus has shifted to the way software systems are designed. In <a href="https://www.cisa.gov/sites/default/files/2023-04/principles_approaches_for_security-by-design-default_508_0.pdf">a report</a> created by the Cybersecurity and Infrastructure Security Agency (<a href="https://www.cisa.gov">CISA</a>) they, together with several partners including the Federal Bureau of Investigation (<a href="https://www.fbi.gov">FBI</a>), Australian Cyber Security Centre (<a href="https://www.cyber.gov.au">ACSC</a>), Canadian Centre for Cyber Security (<a href="https://www.cyber.gc.ca/en">CCCS</a>), and the United Kingdom’s National Cyber Security Centre (<a href="https://www.ncsc.gov.uk">NCSC-UK</a>) emphasise the need for a fundamental change in how cybersecurity is incorporated in the products and services that technology manufacturers deliver. The report states that:</p>
<blockquote>
<p>Historically, technology manufacturers have relied on fixing vulnerabilities found after the customers have deployed the products, requiring the customers to apply those patches at their own expense. Only by incorporating secure-by-design practices will we break the vicious cycle of creating and applying fixes.</p>
</blockquote>
<p>Instead of reacting to vulnerabilities as they become known, developers should focus on making their software intrinsically more resistant to attack by incorporating secure-by-design principles from the start. This may come with a trade-off with increased development times now, but with the understanding that it will be gained back later in time saved by not having to release patches and respond to threats. The report reenforces the severity of the threat that cybersecurity vulnerabilities pose and the pressing need for lasting solutions.</p>
<h2>Zero-Day Attacks and OCaml</h2>
<p>How does OCaml factor into the fight against zero-day attacks and cybersecurity exploits? OCaml is an example of a language that supports secure-by-design practices. Some of its core features already protect you against the most common attacks, and there are several projects using OCaml’s strengths to address cybersecurity threats both known and unknown.</p>
<h3>Memory Safety and Zero-Day Attacks</h3>
<p>Memory safety issues are maybe the most well-known vulnerabilities that zero-day attackers target. In languages where memory is manually managed, like C, C++, or Assembly, cybercriminals can try to ‘trick’ the program to write to memory incorrectly. These types of attacks typically come in the form of buffer overflows, race conditions, page faults, null pointers, stack exhaustion, etc. Memory related attacks make up <a href="https://www.itpro.co.uk/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">the vast majority</a> of zero-day attacks, about 70%, which makes them a serious consideration for any business or organisation.</p>
<p>Memory-safe languages, on the other hand, protect the user against these kinds of attacks simply because they're not possible. Examples of memory-safe languages include OCaml, Java, Rust, and Swift. In OCaml, the compiler provides strong guarantees to ensure that a pointer is only allowed to read and write into the portions of memory intended by the developer (spatial safety). In other languages, like C or C++, this is not the case, so pointers may be exploited to access data outside of the intended structure's memory. The OCaml compiler statically guarantees, at compile time, that a pointer to a record cannot be used to access memory outside of that record – making the language memory-safe.</p>
<p>OCaml also provides temporal safety. In C, the heap memory is manually managed by the developer who decides to allocate free memory. This can lead to use-after-free bugs, which may in turn lead to security exploits. OCaml is a garbage-collected language that automatically manages the lifetimes of the heap objects. This makes it impossible to have use-after-free bugs in OCaml, thus preventing a large class of exploits by design.</p>
<p>To read more about memory-safe vs unsafe languages you can check out this <a href="https://about.gitlab.com/blog/2023/03/14/memory-safe-vs-unsafe/">article on Gitlab</a>.</p>
<h3>Security Through Teamwork in Open Source</h3>
<p>Something that’s mentioned less frequently as a tool for reducing the risks of cyberattacks is open-source development of a language or project. The <a href="https://www.ncsc.gov.uk/collection/developers-collection/principles/protect-your-code-repository">British National Cyber Security Centre</a> has several recommendations for secure development principles, including tips for managing code repositories. It emphasises the importance of thorough reviews for all code before merge. When open-source projects are well managed, the number of code reviews and scrutiny from different individuals contributes to their safety.</p>
<p><a href="https://www.intel.co.uk/content/www/uk/en/business/enterprise-computers/resources/what-is-a-zero-day-exploit.html">Intel</a> emphasises that “vigilant attention to code inspection, patching, and maintenance can help to reduce an organization’s vulnerability to zero-day attacks.” Again, in a large open-source community with appropriate methods for merge approvals and access, the sheer number of peer reviewers and testing helps secure a language or project further against zero-day attacks. More eyes and minds working to find and patch vulnerabilities helps in the effort to stay one step ahead of attackers. OCaml has a large open source community collaborating in this way, as do many projects written in OCaml. Other languages operate similarly, such as <a href="https://www.rust-lang.org/">Rust</a> and <a href="https://www.haskell.org/">Haskell</a>.</p>
<h3>Smaller Attack Surfaces: The Security Features of MirageOS and Unikernels</h3>
<p><a href="https://mirage.io/">MirageOS</a> builds on the security features of OCaml to create lightweight and secure applications. Research on MirageOS began in <a href="https://queue.acm.org/detail.cfm?id=2566628">2008</a> in response to the rise of virtual machines (VMs) being used to make cloud computing more efficient. Whilst virtualisation brought many benefits, reliance on VMs added <a href="https://queue.acm.org/detail.cfm?id=2566628">“yet another layer to an already highly layered software stack.”</a> This not only made using and hacking on the software more cumbersome, but it also more vulnerable to attacks due to its large size.</p>
<p>MirageOS addresses this by restructuring VMs into modular components called <em>unikernels</em>. These are small, flexible, and secure specialised OS kernels that act as individual software components. Each unikernel is standalone and responsible for one function or task. An application is made up of several unikernels working together as a distributed system.
Cybersecurity experts <a href="https://www.ibm.com/uk-en/topics/attack-surface">generally agree</a> that the bigger the ‘attack surface’ is, the more vulnerable the application is to attack. Because of their small size, unikernels have a significantly smaller attack surface than equivalent virtualised solutions, which makes them more secure.</p>
<p>The unikernels of MirageOS also benefit from the security features of OCaml, as <a href="https://queue.acm.org/detail.cfm?id=2566628">Anil Madhavapeddy and David J. Scott describe in their paper</a>:</p>
<blockquote>
<p>...managed memory eliminates many resource leaks, type inference results in more succinct source code, static type checking verifies that code matches some abstraction criteria at compilation time rather than execution time, and module systems allow the manipulation of this code at the scales demanded by a full OS and application stack.</p>
</blockquote>
<p>Combined, the use of OCaml and the unikernel design makes MirageOS an attractive solution with a variety of applications. For example, IoT (Internet of Things) devices face many security challenges, and MirageOS can provide a secure, efficient way to communicate between multiple devices and keep user data safe.</p>
<h3>Putting MirageOS to the Test</h3>
<p>Don’t just take our word for it, however, but consider the collective efforts of thousands of hackers. In 2015, the MirageOS team decided to put unikernels to the test. They created a ‘piñata’-style security bounty in the form of a unikernel that held a private key to a Bitcoin wallet with 10 BTC. Anyone who could successfully break into the piñata and get the key would walk away with the 10 BTC, no questions asked. Any method of attack was permitted:</p>
<blockquote>
<p><a href="https://mirage.io/blog/bitcoin-pinata-results">Anything allowing you to get a valid certificate (signed by the cryptographic material which shouldn't leave the piñata) or reading the memory location where the private key to the bitcoin wallet is stored, an exploitable flaw in any software layer (OCaml runtime, virtual network device, TCP/IP stack, TLS library, X.509 validation, or elsewhere), or anything else.</a></p>
</blockquote>
<p>The code for MirageOS is all open source, so the code for how unikernels are built is freely accessible. This means that failure on the attacker’s part was not due to imperfect knowledge or secrecy, but a direct result of the strength of the unikernel solution. This gives us a much more realistic impression of how well a unikernel can resist attack.</p>
<p>To encrypt the unikernel’s connection to the internet, the team used <a href="https://github.com/mirleft/ocaml-tls">OCaml-TLS</a>, a transport-layer security protocol used for securing web services that use the internet and web browsers. Written entirely in OCaml, it benefits from the type- and memory-safety that comes with the functional programming language. This is in contrast to a TLS written in C, which is vulnerable to attack on these fronts.</p>
<p>At the time of launch, 10 BTC were worth around 2000 EUR, and by the <a href="https://hannes.robur.coop/Posts/Pinata">time the project ended in 2018</a>, 10 BTC were worth around 200 000 EUR. During the time the ‘piñata’ was live, over 150 000 attempts were made to connect to its bounty. The ‘piñata’ was retired in 2018 with no successful attempts at cracking it open. At the time, the test illustrated the viability of type- and memory-safe unikernels as a secure solution that could withstand continued targeted attack.</p>
<p>This still holds true today, with cybersecurity at the core of MirageOS and unikernels. The experiment itself illustrates an innovative and collaborative way of testing a product that leverages the strength of the open-source development community. The team devised a way of incentivising hundreds of people to scrutinise their public code and try to break into the unikernel. This gave them a sense of their solution's strength and ideas on how they could fortify it further. They have since built on the insights gained from the BTC unikernel ‘piñata’ experiment to strengthen its resistance to zero-day attacks.</p>
<h2>Conclusion</h2>
<p>By carefully choosing your programming language and software, you can protect yourself, your projects, and your users against zero-day attacks and security threats. Picking a language with strong safety features is crucial to the long-term success and safety of your projects. Due to the high proportion of memory-safety exploits among zero-day attacks, using a memory-safe language gives you an advantage. Attackers are constantly honing their skills and looking for new vulnerabilities to exploit, so choosing software that is resistant to their attempts is an important part of ensuring your projects are secure.</p>
<p>There’s much more to say about OCaml and the potential it has to protect you against cyberattacks, including technical aspects like formal verification which we haven’t touched on here. If you’re looking for the technical details, don’t worry! Just look out for future posts!</p>
<p>If you’re looking for an efficient, high-security solution to protect your sensitive data and think OCaml or MirageOS might be right for you, don’t hesitate to <a href="/contact/">contact us</a> for more information or to get you started.
You can also find us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides/">LinkedIn</a>.</p>
<h3>Sources</h3>
<p><em>Zero-Day Attacks</em></p>
<ul>
<li><a href="https://www.itpro.co.uk/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">ITPro: What's Behind the Explosion in Zero-Day Exploits?</a></li>
<li><a href="https://www.intel.co.uk/content/www/uk/en/business/enterprise-computers/resources/what-is-a-zero-day-exploit.html">Intel: What is a Zero-Day Exploit?</a></li>
<li><a href="https://www.cynet.com/zero-day-attacks/zero-day-exploit-recent-examples-and-four-detection-strategies/">Cynet: Zero-Day Exploits: Examples, Prevention, and Detection</a></li>
<li><a href="https://www.ncsc.gov.uk/collection/developers-collection/principles/protect-your-code-repository">National Cyber Security Center: Protect Your Code Repository</a></li>
<li><a href="https://techmonitor.ai/partner-content/zero-day-vulnerability-exploit-spyware">TechMonitor: The Zero Day Vulnerability Trade Remains Lucrative but Risky</a></li>
<li><a href="https://googleprojectzero.blogspot.com/2022/04/the-more-you-know-more-you-know-you.html?m=1">Project Zero: The More You Know, The More You Know You Don’t Know</a></li>
<li><a href="https://www.sirp.io/blog/behind-the-rise-of-the-million-dollar-zero-day-market/">SIRP: https://www.sirp.io/blog/behind-the-rise-of-the-million-dollar-zero-day-market/ </a></li>
</ul>
<p><em>Memory Safety</em></p>
<ul>
<li><a href="https://about.gitlab.com/blog/2023/03/14/memory-safe-vs-unsafe/">GitLab: How to Secure Memory-Safe vs Manually Managed Languages</a></li>
<li><a href="https://www.itpro.co.uk/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">ITPro: What's Behind the Explosion in Zero-Day Exploits?</a></li>
</ul>
<p><em>MirageOS</em></p>
<ul>
<li><a href="https://queue.acm.org/detail.cfm?id=2566628">ACM Queue: Unikernels: Rise of the Virtual Library Operating System</a></li>
<li><a href="https://mirage.io/blog/bitcoin-pinata-results">MirageOS Bitcoin Piñata Results</a></li>
<li><a href="https://hannes.nqsb.io/Posts/Pinata">Full Stack Engineer: The Bitcoin Piñata - No Candy for You</a></li>
<li><a href="https://robur.coop/Our%20Work/Projects">Robur: Robur Reproducible Builds</a></li>
</ul>
<p><em>OCaml-tls</em></p>
<ul>
<li><a href="https://mirage.io/blog/introducing-ocaml-tls">MirageOS: Introducing Transport Layer Security (TLS) in Pure OCaml</a></li>
</ul>
]]></description><link>https://tarides.com/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you</link><guid isPermaLink="false">https://tarides.com/blog/2023-07-05-zero-day-attacks-what-are-they-and-can-a-language-like-ocaml-protect-you.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 05 Jul 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Receives the ACM SIGPLAN Programming Languages Software Award]]></title><description><![CDATA[<p><a href="https://ocaml.org/">OCaml</a> has received one of the most <a href="https://www.sigplan.org/Awards/Software/">prestigious
awards</a> in the field of programming languages, and we are very
thrilled that four of the award winners are from Tarides. This
represents a huge success for the language, the named maintainers, and
everyone who has worked on improving OCaml. We want to thank everyone
for their hard work and celebrate the award alongside the OCaml
community. Here’s to many more years of hacking together!</p>
<h2>A Significant Impact on Building Better Software</h2>
<p>The ACM special interest group on programming languages,
<a href="https://www.sigplan.org/">SIGPLAN</a>, annually recognises significant
developments in a software system and awards it with the Programming
Languages Software Award. To be selected for this prestigious award, a
software system must have made a significant impact on <a href="https://www.sigplan.org/Awards/Software/">programming
language research, implementation, and tools</a>.</p>
<p>Previous recipients include <a href="https://webassembly.org/">WebAssembly</a>, the first widely
adopted language for web browsers since JavaScript; and
<a href="https://www.scala-lang.org/">Scala</a>, one of the few programming languages from academia
that has had a significant impact on the world as well as on
programming languages research.</p>
<p>This year, fourteen developers in the open-source OCaml ecosystem have
been recognised for their contributions to the design and
implementation of the language. <a href="https://ocaml.org/">OCaml</a> is a functional
programming language that combines type and memory safety with
powerful features like garbage collection and a type inferring
compiler. Born out of extensive research into ML, OCaml was first
released in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, and
Didier Rémy. Since then, the open-source community surrounding OCaml
has grown (in parts, thanks to Tarides!) with new tools, libraries,
and applications.</p>
<p><strong>OCaml is unique because it occupies a sweet spot in the space of
programming language designs. It provides a combination of efficiency,
expressiveness and practicality that is matched by no other
language.</strong> This is largely because OCaml is an elegant combination of
language features developed over the last <a href="https://dev.realworldocaml.org/prologue.html">60 years</a>, with strong
roots in academia and the industry. The language also continues to
evolve and innovate, with the release of <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">OCaml 5</a> last
December. That release heralds a new era for OCaml by providing the
infrastructure for programming efficiently and safely using multiple
cores. OCaml 5 also added <a href="https://v2.ocaml.org/manual/effects.html">effect handlers</a> to the language,
which makes OCaml the first mainstream language with support with
effects.  Meanwhile, OCaml is now used for <a href="https://blog.janestreet.com/why-ocaml/">trading billions of
dollars in global equity daily</a> or for <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">helping millions of daily
users of Docker to access the network</a>.</p>
<p>The engineers receiving this award have played a crucial role in
the long-term development of the OCaml language. Their hard work has
made OCaml a language that prioritises performance and expressivity
while strongly focusing on security and safety. The fourteen
developers named by ACM SIGPLAN are: <a href="https://github.com/dra27">David
Allsopp</a>, <a href="https://github.com/Octachron/">Florian
Angeletti</a>, <a href="https://github.com/stedolan">Stephen
Dolan</a>, <a href="https://en.wikipedia.org/wiki/Damien_Doligez">Damien Doligez</a>, <a href="https://github.com/alainfrisch">Alain
Frisch</a>, <a href="https://github.com/garrigue">Jacques
Garrigue</a>, <a href="https://anil.recoil.org/">Anil Madhavapeddy</a>,
<a href="https://github.com/maranget">Luc Maranget</a>, <a href="https://github.com/nojb">Nicolás Ojeda
Bär</a>, <a href="https://gallium.inria.fr/~scherer/">Gabriel Scherer</a>, <a href="https://kcsrk.info/">KC
Sivaramakrishnan</a>, <a href="https://github.com/vouillon">Jérôme Vouillon</a>,
<a href="https://github.com/lpw25">Leo White</a> and <a href="https://xavierleroy.org/">Xavier Leroy</a>.</p>
<p>It is well worth noting that Xavier Leroy already holds many
prestigious awards for his work - he is a former recipient of the
ACM SIGPLAN <a href="https://www.sigplan.org/Awards/Achievement/">Programming Languages Achievement</a> award in 2022,
holds the chair of software science at <a href="https://www.college-de-france.fr/">Collège de France</a>
and is member of <a href="https://www.academie-sciences.fr/">Académie des sciences</a>. Xavier made
pivotal contributions across various fields, including the design of
type and module systems, bytecode verification, and verified
compilation, to highlight a few. He is also the visionary architect of
the <a href="https://compcert.org/">CompCert C compiler</a>, the first formally verified,
high-assurance compiler for almost all of the C language. This
enormous achievements generated entirely new areas of activity and
research: CompCert won the 2022 <a href="https://www.sigplan.org/Awards/Software/">ACM SIGPLAN Programming Languages
Software award</a> and the
<a href="https://awards.acm.org/software-system">2021 ACM Software Systems
award</a>. But Xavier's research
contributions are not just integral to his illustrious career. They
are also pivotal to OCaml's current success and widespread appeal. His
active and ongoing influence is deeply embedded within OCaml, shaping
it into the rigorous yet pragmatic language that it is today.</p>
<h2>The Role of Tarides</h2>
<p>Tarides is honoured to contribute to the development of OCaml and to
be part of the vibrant ecosystem surrounding the language. <em>Four of the
developers receiving the award are affiliated with Tarides: David, KC,
Jérôme and Anil!</em></p>
<p>The list of recipients comprises award-winning and internationally
acclaimed academics (Inria, University of Cambridge, University of
Nagoya, IIT Madras) as well as impactful and innovative industry
professionals (Lexifi, Jane Street, Tarides). This list makes a
compelling case for the model that guides the entire OCaml ecosystem
and that we’ve adopted at Tarides. We combine the powers of academia,
industry, and community hackers by collaborating for the benefit of
OCaml as a whole.</p>
<p>Moreover, Tarides is a descendant of <a href="/blog/2022-01-27-ocaml-labs-joins-tarides/">OCaml Labs</a> at the
University of Cambridge, a decade-long effort aiming to bring OCaml to
the masses:KC, Stephen, Leo, and David all started off at the
University of Cambridge, under the direction of <a href="https://anil.recoil.org/projects/ocamllabs/">Anil</a>. Since
then, OCaml Labs and now Tarides have dedicated much time and energy
towards maintaining several parts of the OCaml ecosystem, including
the <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">compiler</a>, <a href="https://ocaml.org/docs/platform">platform tools</a>, the <a href="https://ocaml.ci.dev/">CI
infrastructure</a>, and <a href="https://ocaml.org/">OCaml.org</a>.</p>
<p>Finally, we want to acknowledge that this award
recognises the hard work of people beyond just the list of
winners. There are countless people who have contributed to OCaml, who
taken together would be too numerous to formally
recognise. Nevertheless, their hard work is palpable and their impact
far-reaching, and we want to thank everyone who has played a role in
bringing OCaml to where it is today. This achievement is one we all
share with the entire community.</p>
<h2>OCaml 5</h2>
<p>As described in <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC's keynote</a>, OCaml 5.0 introduced <a href="/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-life/">much
anticipated new features</a> to OCaml, supporting shared memory
parallelism and effect handlers. The team focused on making that
release as backwards compatible as possible; thus, existing OCaml
users could upgrade without experiencing breakage. OCaml 5 allows
users to combine <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">safety and security features</a> with
significant <a href="/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain/">performance improvements</a>, including parallel
programming and <a href="https://github.com/ocaml-multicore/eio">improved methodologies for writing concurrent
code</a>.</p>
<p>If you want to learn how to use the parallelism features in OCaml 5,
have a look at these <a href="https://github.com/kayceesrk/ocaml5-tutorial">tutorials</a> on GitHub. For more
details on exactly what changes OCaml 5 brought to OCaml, the
<a href="https://v2.ocaml.org/releases/5.0/notes/Changes">changelog</a> contains all the information you need.</p>
]]></description><link>https://tarides.com/blog/2023-06-20-ocaml-receives-the-acm-programming-languages-software-award</link><guid isPermaLink="false">https://tarides.com/blog/2023-06-20-ocaml-receives-the-acm-programming-languages-software-award.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Tue, 20 Jun 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Optimising Archive Node Storage for Tezos]]></title><description><![CDATA[<p>For the past few years, Tarides has been responsible for the storage component of Tezos, from L1 and L2 shells up to the Tezos protocol. In 2022, our main focus was on improving storage performance and UX for running nodes and bakers. Our efforts resulted in significant improvements, including reducing the storage requirements for rolling nodes by 10x and <a href="/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed/">decreasing the memory usage of the storage layer by 80%</a>. We also maintain core storage APIs necessary to scale the TPS of the network. But we didn't stop there! This year, we've already worked on improving the performance of archive nodes and delivered the project last month. In this blog post, we'll take a closer look at our work and what it means for the future of Tezos.</p>
<p>Last year we released a <a href="/blog/2022-11-10-towards-minimal-disk-usage-for-tezos-bakers/">garbage collector (GC) for the <code>irmin-pack</code> backend</a> to reclaim disk space. The <code>irmin-pack</code> backend is notably used by Octez to support the <a href="https://tezos.com">Tezos blockchain</a> storage on disk. In this context, the Irmin GC is used to perform <a href="https://research-development.nomadic-labs.com/pruning-the-context-and-other-seasonal-activities.html">context pruning</a> by <a href="https://tezos.gitlab.io/user/history_modes.html">rolling nodes</a>. This operation reclaims disk space by deleting old blocks that are no longer required to participate in the Tezos consensus algorithm. In Git terms, the rolling nodes only maintain a <a href="https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/">shallow history</a> of the blockchain. On the other end, archive nodes retain the entirety of the blockchain to ensure integrity. The Irmin GC has the additional benefit of compacting the data on disk, hence improving the performances of disk operations due to better data locality.</p>
<p>Following the context pruning release in Octez v16, we focused on improving the performances of the Irmin GC, both in its disk usage and memory. We also brought the optimisation benefit of data compaction to archive nodes. As the Tezos blockchain keeps growing, the archive nodes storage requirements are also becoming problematic, so our solution opens the possibility to use multiple hard drives to scale beyond a single disk. In the future, this could enable archive node operators to use a small SSD with excellent performance to store the most recent blocks, while storing older blocks on a larger, less expensive disk.</p>
<h2>Archive Node Storage Optimisations</h2>
<p>In the following, we'll explain how and why the new archive node storage was divided into multiple volumes:</p>
<ul>
<li>The <code>upper</code> volume contains the most recent history and is identical to the store used by a rolling node. This volume stores everything necessary to participate in endorsements and the Tezos consensus algorithm.</li>
<li>The <code>lower</code> volumes are separate folders that store older blocks to preserve the blockchain's integrity and respond to RPC requests.</li>
</ul>
<p>When Octez interacts with the store, Irmin transparently directs reads and writes to the correct volume, which has the same API as before. New public APIs added in Irmin 3.7 allow Octez to configure the lower volumes directory and to create new lower volumes dynamically.</p>
<p>An important consideration is that each volume is <em>self-contained</em>:</p>
<ul>
<li>Reading recent history and adding new blocks will only interact with the <code>upper</code> volume, with no need to consult the lower volumes at all. This implies that archive nodes will perform exactly the same operation and access the same data layout as if one was running as a rolling node.</li>
<li>Reading information about an old block, and recursively all of its children's objects, will only perform reads from one specific <code>lower</code> volume pertaining to that block. As an example, it means that an RPC request to read a block will interact only with a single volume.</li>
</ul>
<p>In other words, data locality is enforced and random reads are bounded to a much smaller region (proportional to the size of the volumes where the operation takes place).</p>
<p>To achieve these self-contained volumes, a bit of data duplication is required. This is where the Irmin GC compaction algorithm intervenes! Each volume contains a small summary of the old data required for self-containment. Thanks to this summary, complex reads can avoid visiting other lower volumes since the required data can be found locally. This has two benefits:</p>
<ul>
<li>This data summary is compacted. Previously, archive nodes performance suffered from random reads into an enormous data file, which caused them to be slower than rolling nodes. With the new <code>upper</code> volume, archive nodes will be able to perform those expensive reads from the compact summary -- in the exact same way as rolling nodes already do.</li>
<li>As long as configured paths are preserved, each <code>upper</code> and <code>lower</code> volume can be stored on different disks. The initial Octez integration will not expose these capabilities directly, but as the blockchain history continues to grow, this feature will provide flexibility for the capabilities that Octez provides for archive node operators.</li>
</ul>
<p>Furthermore, the GC compaction algorithm is also applied to the lower volumes data. The Tezos blockchain can contain temporary forks that will be abandoned later. While those blocks are important in the <code>upper</code> volume until consensus has been reached, they can safely be removed when data is transferred to a lower volume. This saves disk space and cleans up the blockchain from unused blocks.</p>
<p>The following graphs show the performances of the new archive nodes compared to the existing fast rolling nodes. We replay the Tezos blockchain trace, performing all operations as fast as possible (in the same way it happened, only without network interference). By recording the disk space used by the two nodes over time, we can observe how fast they can process and store new blocks on disk:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/irminGC_1-170w~rd3-llUqIu1cZpmDGfF-Zw.webp 170w, /blog/images/irminGC_1-340w~kkOW9-caQyvKJhmbCIHuNA.webp 340w, /blog/images/irminGC_1-680w~s_X3UB1WQirSqtkt0WciAg.webp 680w, /blog/images/irminGC_1-1360w~AnMKds-QwPfG3S05cT0gWw.webp 1360w" src="/blog/images/irminGC_1-1360w~AnMKds-QwPfG3S05cT0gWw.webp" alt="Disk space usage of the Irmin GC over time"></p>
<p>The purple sawtooth wave of the rolling node disk usage corresponds to when the Irmin garbage collector visits the <code>upper</code> volume to release old blocks and free disk space. In production, this happens at the end of every Tezos cycles.</p>
<p>In the case of archive nodes, this old data will not be deleted. Instead, it will be transferred and compacted to the lower volume (in yellow). This results in an increasing total disk space (in green) as the blockchain grows. The blue line shows that the data transfer can take a few minutes (depending on the hard drive that stores the <code>lower</code> volume)! But as this data transfer is done in a background process, the new archive node is able to keep up with the fast rolling node performance like clockwork. The graph confirms this assertion, as the <code>upper</code> disk space used by the archive node (in blue) is same as the disk space used by the rolling node (in purple). This means that both nodes were able to process new blocks at the same speed.</p>
<p>This is not a big surprise, since archive nodes are running the same code as a rolling node in their <code>upper</code> volume. In fact, one could imagine deleting the lower archived volumes, and the archive node would keep working as a rolling node! (Well, almost! We have a security in place to detect that an archive node was incorrectly configured and has lost parts of its history, which could happen if the <code>lower</code> volumes are moved to a different disk.)</p>
<p>Looking at the nodes' memory usage, we can see that the tradeoff for using the efficient <code>upper</code> volume optimisation in archive nodes is that they must now run the Irmin garbage collector. The GC requires some extra RAM while it runs, causing spikes in memory usage:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/irminGC_2-170w~VC_uoqOtiFYulHGfrKi-Tw.webp 170w, /blog/images/irminGC_2-340w~C2ExJ8jczjoyePZBW1lPAQ.webp 340w, /blog/images/irminGC_2-680w~xwwYRCSGmAnHY5DGpicbyQ.webp 680w, /blog/images/irminGC_2-1360w~QlzdYnf8csXmtpdY1A2KpQ.webp 1360w" src="/blog/images/irminGC_2-1360w~QlzdYnf8csXmtpdY1A2KpQ.webp" alt="Memory usage of the Irmin GC over time"></p>
<p>Otherwise the analysis of the memory graph is the same. The GC context pruning terminates a bit earlier for rolling nodes, as archive nodes have to go through the additional step of compacting old data and transferring it to the <code>lower</code> volume (while rolling nodes just delete it). Once again, we get a confirmation that this extra work does not impact the archive node's performance. Rather, it keeps processing new blocks as fast as a rolling node! Both nodes reach the end of the cycles at the same time, triggering the next GC memory spike.</p>
<p>While the integration of these features into Octez is still in-progress, early results show a noticeable decrease in bootstrapping time (by ~30%!) for an archive node when it is using <code>lower</code> volumes versus when it is not. These results need further verification, but they demonstrate some of the potential expected performance improvements for archive nodes.</p>
<h2>Other Optimisations in Irmin 3.5, 3.6, and 3.7</h2>
<p>Behind the scenes, further optimisations of Octez context pruning have been introduced. They provide benefits to both the existing rolling nodes and the new archive node volumes.</p>
<p>First in Irmin 3.5, we fixed the garbage collector to bound the amount of disk space required to perform a context pruning:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/irminGC_3-170w~O49GYRJYRMLuIt16aBW9wQ.webp 170w, /blog/images/irminGC_3-340w~0hOW-peaiKQuJX-iv0mGtg.webp 340w, /blog/images/irminGC_3-680w~3_97WWwpz28HK-YNTCb9NA.webp 680w, /blog/images/irminGC_3-1360w~2BDQQDimDbqNgkQn3ZKlKQ.webp 1360w" src="/blog/images/irminGC_3-1360w~2BDQQDimDbqNgkQn3ZKlKQ.webp" alt="Comparison of disk space usage between Irmin 3.4.2 and Irmin.3.5.0"></p>
<p>This has already been made available in Octez v16 with the release of Irmin 3.5 in December 2022.</p>
<p>In Irmin 3.6, the compaction algorithm was optimised to traverse the blockchain in disk order. Reducing random accesses yields a faster context pruning duration and lower the memory requirements by half during GC. The memory graph below shows the improvements, both in the height and width of the spikes induced by the GC, with Irmin 3.5 in green and Irmin 3.6 in orange:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/irminGC_4-170w~LkdINdfjUBIyHGTFTiSv-g.webp 170w, /blog/images/irminGC_4-340w~dEEuMDySgk3eulqJ0JxmAg.webp 340w, /blog/images/irminGC_4-680w~Nzdu_PV8LRGOm3CqG1l1-g.webp 680w, /blog/images/irminGC_4-1360w~zxU70H1invR3iGhcWo7TCQ.webp 1360w" src="/blog/images/irminGC_4-1360w~zxU70H1invR3iGhcWo7TCQ.webp" alt="Comparison of memory usage between Irmin.3.5 and Irmin.3.6"></p>
<p>Finally in Irmin 3.7, the summary produced by the compaction algorithm and present in each <code>upper</code> and <code>lower</code> volumes was optimised for reading speed. As this summary is accessed often, we can see from the trace replay that operations start to perform faster once the compacted summary is available, right after the first GC:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/irminGC_5-170w~uyUx5p8PycKNM8X7R53few.webp 170w, /blog/images/irminGC_5-340w~AFn7nWjq5987E9sDcLWrYA.webp 340w, /blog/images/irminGC_5-680w~mThHkIVKVlA2Fk68onQS-g.webp 680w, /blog/images/irminGC_5-1360w~lH26hEiup_15fehmdy_bxg.webp 1360w" src="/blog/images/irminGC_5-1360w~lH26hEiup_15fehmdy_bxg.webp" alt="Comparison of disk space usage between Irmin.3.6 and Irmin.3.7"></p>
<p>A read intensive benchmark shows the performance boost on different read patterns and volume sizes:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/irminGC_6-170w~YOBwDSwBq4owMSA8RNcG5g.webp 170w, /blog/images/irminGC_6-340w~e3SRPbzjJ9FHgzjOWgTdzg.webp 340w, /blog/images/irminGC_6-680w~rejbS09bWW-BYuG_ZRHKbA.webp 680w, /blog/images/irminGC_6-1360w~Bc2WDAmhPBBkINgbVJZ0sQ.webp 1360w" src="/blog/images/irminGC_6-1360w~Bc2WDAmhPBBkINgbVJZ0sQ.webp" alt="Comparing commit load time medians for mmap vs. no mmap"></p>
<h2>Conclusion</h2>
<p>The Irmin 3.5 and 3.6 releases brought much needed disk and memory optimisations to Octez context pruning for rolling nodes. The Irmin 3.7 release brings more improvements to rolling nodes and introduces the same optimised garbage collection design to archive nodes, allowing them to have the same small and efficient store for recent history and enabling future improvements for storing history on multiple disks. Integration of Irmin 3.7's features into Octez is in-progress and will ship in a future version of Octez. In the meantime, we welcome your comments and feedback on the optimisations and design choices. Join the conversation on the <a href="https://discuss.ocaml.org/">OCaml Discuss forum</a>, in the <a href="https://github.com/mirage/irmin/issues">GitHub Issues</a>, and through comments on the <a href="https://forum.tezosagora.org/">Tezos Agora post</a>.</p>
]]></description><link>https://tarides.com/blog/2023-05-05-optimising-archive-node-storage-for-tezos</link><guid isPermaLink="false">https://tarides.com/blog/2023-05-05-optimising-archive-node-storage-for-tezos.html</guid><dc:creator><![CDATA[ Irmin Team ]]></dc:creator><pubDate>Fri, 05 May 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml at MinidebConf TN 2023]]></title><description><![CDATA[<p><a href="https://tn23.mini.debconf.org/">MinidebConf TN 23</a> was organised by Debian Developers and Villupuram Linux Users Group (VGLUG) as a precursor to DebConf 23 in September at Kochi, India. I had an opportunity to attend and speak at MiniDebConf TN.</p>
<p>I presented two sessions, one built on our experiences of introducing <a href="https://github.com/ocaml/code-of-conduct">a Code of Conduct</a> to an <a href="https://discuss.ocaml.org/t/adopting-the-ocaml-code-of-conduct/10870">open source community</a> <a href="https://hackmd.io/JIWCOrBfQ7CfzPqeDw4t2Q#/">(slides here</a>), and one called <a href="https://hackmd.io/wgB3EzlAQA6aTnQGyyp5Rw#/"><em>An Invitation to OCaml</em></a>, aimed at people with no prior OCaml experience. I was pleased to see a lot of folks getting interested in learning OCaml.</p>
<p>Over the course of two days, I attended interesting sessions by speakers from across India and other parts of the world.</p>
<h3>First Day</h3>
<p>The conference was inaugurated by Dr. Ravikumar, a Member of Parliament (MP) from the Villupuram district. A tech-savvy politician who has presence in the fediverse, the MP emphasised the importance of adopting FOSS technologies by the government in his speech. He released the private beta of <a href="https://prav.app/">Prav</a>, a privacy-focussed communication app.</p>
<p>The first session was "Introduction to Debian" by Sruthi Chandran, the first and only woman Debian Developer from India. It was interesting to see how the Debian community is comprised of a diverse set of people all across the world and is completely driven by volunteers. I learnt about the <em>do-o-cratic model</em>, where people doing the work make decisions.</p>
<p>A professor at RV College of Engineering and a FOSS enthusiast, Dr. <a href="http://deepikak.in/">Deepika</a>'s session on the KDE ecosystem was a great primer on motivating people to move to FOSS technologies. I found her suggestion to use the term <em>Swatantra Software</em> to indicate Free Software (Free as in Freedom) to be a great one. Then, Martha and Kelvin, mappers by profession, took us through the journey of OpenStreetMaps from a blank state to its growth of being at par with other maps. They did a quick session on how to contribute to it.</p>
<p>Later I presented a session on "Introducing a Code of Conduct" to an open-source community. This talk was built upon our experience of drafting and enforcing a Code of Conduct for the OCaml community, which led to completion in late 2022. This effort started earlier in the same year, with the idea of first forming a group of respected members in the community to act as the enforcement team. The effort was <a href="https://discuss.ocaml.org/t/ocaml-software-foundation-january-2023-update/11217#community-3">supported by the OCaml Software foundation</a>. Once the team had enough strength, we worked on drafting a <a href="https://github.com/ocaml/code-of-conduct">Code of Conduct document</a>, largely inspired from existing texts, and iterating it over till it was accepted by the community.</p>
<p>The day ended with a speakers-only round table session of <em>FOSSivist</em>. It was a discussion with VGLUG volunteers on how to utilize Free and Open Source Software technologies to uplift the lives of underprivileged students in the district. For context, Villupuram falls in the bottom five in the literacy rate, both average and female literacy rate. VGLUG was formed by a group of volunteers actively working on identifying talented first-generation Villupuram learners and training them for a career in tech.</p>
<h3>Second Day</h3>
<p>The second day saw another lineup of interesting sessions. First was an introduction to contributing to Linux kernel by <a href="https://nihaal.me/">Nihal</a>. This was followed by <a href="https://gwolf.org/">Gunnar Wolf</a>'s session on Debian authentication. It was evident Debian takes privacy seriously, and he urged the listeners to do so too. Bhuvana presented a session on why Diversity and Inclusion is important in tech.</p>
<p>This was followed by my presentation on <a href="https://hackmd.io/wgB3EzlAQA6aTnQGyyp5Rw#"><em>An Invitation to OCaml</em></a>. I talked about all the nice things OCaml offers, including the new and <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">exciting features in OCaml 5</a> with Multicore support. The talk is aimed at folks who've been programming in other languages, but new to Functional Programming. We go over why FP, slowly moving on to talk about OCaml features like immutability, type inference, garbage collection, etc. We also briefly touch upon the new features in OCaml 5, namely native support for parallelism and concurrency. It was great to chat about functional programming with folks afterwards.</p>
<p><a href="https://github.com/ranjithsiji">Renjith</a>, an active Wikipedian, presented their story of moving a Malayalam daily newspaper called <em>Janayugom</em> to an entirely FOSS tech stack. This saved the company a lot of money and stopped <em>Janayugom</em> from shutting down. Renjith emphasised the importance of free speech in a democracy and how small maganizes and newspapers play a role in it. Then Subin presented Varnam, an Indic input tool.</p>
<hr>
<p>I was impressed by the efforts taken by <a href="https://vglug.org/">VGLUG volunteers</a> and the Debian India team to organise everything in a smooth manner. From the time we landed in Villupuram, we did not worry about anything. Transport, food, and lodging were all taken care of by VGLUG volunteers. I did not think such a vibrant community of FOSS users would operate in a rural town. The community is doing great work to uplift the lives of people in their Villupuram.</p>
<p>Best of all, it was great to meet old friends and make new ones. I hope to spread the joy of OCaml in more places.</p>
]]></description><link>https://tarides.com/blog/2023-04-28-ocaml-at-minidebconf-tn-2023</link><guid isPermaLink="false">https://tarides.com/blog/2023-04-28-ocaml-at-minidebconf-tn-2023.html</guid><dc:creator><![CDATA[ Sudha Parimala ]]></dc:creator><pubDate>Fri, 28 Apr 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Compiler Hacking in Cambridge is Back!]]></title><description><![CDATA[<p>What’s the best way to spend a Friday evening? We think most people would agree that hacking on OCaml is pretty much at the top of that list (although full disclosure, our sample size for this data could be larger).</p>
<p>On Friday the 24th of February, Tarides’s UK office hosted an evening of compiler hacking, presentations, and talks about all things OCaml. We’re continuing a tradition that began in 2013, making this our 19th event, when we (then known as OCaml Labs) were based at the <a href="https://ocamllabs.io/compiler-hacking/">Computer Lab</a> in Cambridge. Just like back then, anyone with an interest in the OCaml compiler is welcome. At our recent event we had a mixture of students, industry professionals, and experts in attendance. If you'd like to create your own compiler hacking sessions, check out the <a href="https://github.com/tarides/compiler-hacking/wiki">wiki here</a>.</p>
<p>Something that’s changed since 2013 is that OCaml now represents a large chunk of the undergraduate Computer Science tripos at the University of Cambridge; not only as the implementation language for courses such as Compiler Construction &amp; Semantics of Programming Language, but literally as the first language students are taught! This means that we had quite a few undergraduates turn up – it was great to see such an interest in OCaml across different backgrounds.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/talk_compiler-170w~7VxbqONUsfumxSXwMChZuQ.webp 170w, /blog/images/talk_compiler-340w~dWEdrk_whj9spm4n-3gj8w.webp 340w, /blog/images/talk_compiler-680w~SdREOR3V998lOlmkm-X5GQ.webp 680w, /blog/images/talk_compiler-1360w~KLdCp_8Wk2Mh97EBHDOxxQ.webp 1360w" src="/blog/images/talk_compiler-1360w~KLdCp_8Wk2Mh97EBHDOxxQ.webp" alt="David's Talk"></p>
<h2>A Welcome and Introduction to the Compiler</h2>
<p>The afternoon began with our very own David Allsopp giving the first of the day’s two talks. He briefly laid the foundation for what Tarides is and what we do, but focussed on introducing OCaml and outlining some examples of things to hack on. Since we had the pleasure of hosting many undergraduate students who were new to the OCaml community, as well as some grizzled veterans (sorry, Jon!), it was important to have a selection of projects for all abilities.</p>
<p>Suggestions included bug fixes (which are always welcomed), documentation edits and improvements (which are always needed), and <a href="https://github.com/ocaml/ocaml/issues/">issues</a> labelled with the tag “good first issue” or “newcomer job.” Compilers that are self-bootstrapped (like OCaml) always require a <a href="https://dl.acm.org/doi/pdf/10.1145/358198.358210">complex build system</a>, so David concluded with a demonstration of the sequence of build system targets, explaining each step along the way.</p>
<p>Once the introduction was over, the room settled into a hive of activity, with some people furiously typing and others scratching their heads and looking thoughtful. Many of the undergrads focused on getting familiar with the OCaml compiler, whereas more experienced developers began undertaking their own hacking projects. We had invited well-known OCaml compiler hackers (including some of our own) as an awesome resource for all levels of experience. A combination of in-person hacking and an informal setting provided the perfect environment for sharing imaginative new ideas - something we’ve all been missing since the pandemic.</p>
<p>With everyone divided up into smaller working groups, we worked our way around trying to help everyone make some progress. Groups were working on projects at all levels: some were trying to get the compiler to run <code>hello world</code>, whilst others (Patrick!) were forward-porting advanced modal type features between major versions of the compiler. A third year undergraduate was working on debugging the OCaml compiler for her dissertation, and she was attempting to use <a href="https://en.wikipedia.org/wiki/Hash_consing">hash consing</a> to make multiple identical values use the same bit of memory rather than multiple memory slots as a space-saving solution for the compiler.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/ryan-170w~zwpbed_eGXmbrnTGHs2O2w.webp 170w, /blog/images/ryan-340w~eG823AdnI2Rst1nK6MavTQ.webp 340w, /blog/images/ryan-680w~loqEHCe10uXZ0QYxHEy4hQ.webp 680w, /blog/images/ryan-1360w~OgOW-O5Hj_fjrvwMlEbF7w.webp 1360w" src="/blog/images/ryan-1360w~OgOW-O5Hj_fjrvwMlEbF7w.webp" alt="Ryan hacking on modal types"></p>
<h2>Local Allocations and Pizza</h2>
<p>After a couple of hours of hacking, Stephen Dolan gave us a tour of his and Leo White’s ground-breaking work on stack allocation. This work was <a href="https://stedolan.net/talks/ocaml22/#1">presented at ICFP 2022</a> and is a compiler feature that aims to improve performance by reducing heap allocations in OCaml programs. Local allocations let programs use space on the stack (instead of allocating on the heap), which is automatically reclaimed without requiring the assistance of the (resource-heavy) garbage collector (GC).  OCaml uses a stop-the-world parallel garbage collector for collecting recently allocated objects in the minor heap. This means that all the OCaml threads will need to stop when the minor heap is collected. Generating fewer heap allocations means less garbage, and less garbage means improved performance and reduced pause times. This is particularly important for parallel workloads.  Local allocations are already being run in production internally at Jane Street, and there are plans to bring the associated benefits to the masses by upstreaming the work to mainline OCaml.</p>
<p>After Stephen’s talk, and a quick but much needed pizza break, everyone went back to hacking. An all-too-common problem that cropped up several times happened when trying to run the freshly-built OCaml compiler from the build tree without first installing it. The error messages in this circumstance are not particularly intuitive, complaining of a "bad interpreter: no such file or directory." The message refers to a bootstrap issue; the program is trying to find the interpreter, but the interpreter hasn’t been installed yet. Some people solved the issue and moved straight onto the next task (a very common thing to do), but one group decided to tackle this head-on by improving the error message to provide more detail. This will help other new OCaml compiler developers and will almost certainly make life easier in our future hack events! This kind of “simple” fix is incredibly important for reducing the barrier to entry for new developers and emphasises the benefits of mixed-experience hack events with newcomers providing feedback and highlighting useful areas of improvement. We hope this work turns into a PR soon!</p>
<p>Another project focussed on ensuring OCaml programs can take advantage of new security features in Linux. There is a relatively new <a href="https://www.phoronix.com/news/Linux-6.3-Tmpfs-IDMAPPED">feature of the kernel</a> that allows the user to create a secure temporary file that is isolated from other users. One participant was experimenting with different versions of OCaml and Linux to see how this feature might be used in OCaml. Implementing this in the <a href="https://v2.ocaml.org/api/Unix.html">Unix module</a> is tempting, but as it provides the "lowest common denominator" interface, it has to be compatible with all platforms, and therefore does not cater to a niche function. A better option would be to write a separate library with a separate binding to address the compatibility issues, but that would require a lot of work for one feature. This illustrates the important kinds of questions that form the debate around supporting new, platform-specific features.</p>
<p>The prize for “oldest bug addressed” for the evening went to one of our most junior attendees, a first-year computer scientist who took on a problem first reported in 2005. The almost-20-year-old-issue involves structural comparisons of cyclical data structures and is easily reproduced by pasting “let rec x = 1 :: x in x = x” into a toplevel. A <a href="https://github.com/ocaml/ocaml/pull/12039">pull request fixing the problem</a> was made during the evening and has generated a lot of interesting discussion!</p>
<h2>Until Next Time</h2>
<p>We’re thrilled that we could restart these events, and it was lovely to see so many familiar faces alongside all the newcomers. The next hack day is scheduled for <a href="https://forms.gle/c6A2TSbUBZeVJSG46">March 31st</a>, and we’re excited to see more people working on the compiler.</p>
<p>We’d love to see you at a future event, but even if you can’t come in person, there are loads of ways you can contribute. You can suggest projects and "good first issues," add and improve on documentation, and even set up your own local event! You can check out the <a href="https://github.com/tarides/compiler-hacking/wiki">wiki here</a>.</p>
<p>We look forward to hanging out with more people around Cambridge who are curious or passionate about OCaml. If you’re interested in joining future events in Cambridge, please <a href="/contact/">email us</a>, we look forward to hearing from you! See you next time!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/hacking-170w~9RFn63qwKmbVDH8bTQolRw.webp 170w, /blog/images/hacking-340w~FguQSQpEyoeC_QcJIg2TYg.webp 340w, /blog/images/hacking-680w~O4rbodeuQfMDUhCAYMEGjQ.webp 680w, /blog/images/hacking-1360w~i7NwJ9XWfeR9fI4JzrciFw.webp 1360w" src="/blog/images/hacking-1360w~i7NwJ9XWfeR9fI4JzrciFw.webp" alt="Hacking">
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Patrick-170w~m6xenUekfscCjj_Zd-6BqA.webp 170w, /blog/images/Patrick-340w~pTypdFsb4ur2ldJESNinqg.webp 340w, /blog/images/Patrick-680w~i-Z-yVmUMp4NohMlbklRng.webp 680w, /blog/images/Patrick-1360w~MngCFDZQfv23KCI29VKaeQ.webp 1360w" src="/blog/images/Patrick-1360w~MngCFDZQfv23KCI29VKaeQ.webp" alt="Patrick"></p>
]]></description><link>https://tarides.com/blog/2023-03-22-compiler-hacking-in-cambridge-is-back</link><guid isPermaLink="false">https://tarides.com/blog/2023-03-22-compiler-hacking-in-cambridge-is-back.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 22 Mar 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[More Than a Day: How Does Tarides Promote Women in Tech?]]></title><description><![CDATA[<p>The aim of <a href="https://www.internationalwomensday.com/About">International Women’s Day</a> is to raise awareness of gender inequalities and call for the empowerment of women worldwide. The goal is to forge a gender equal world and advance women’s equality in all forms.</p>
<p>Within the <a href="https://www.internationalwomensday.com/Mission/Tech">field of technology</a>, the focus is on elevating and advancing gender parity in technology and celebrating the women shaping innovation.</p>
<p>While there have been advancements towards these goals, continuous attention and effort is required to create lasting change.</p>
<h2>How is Tarides Promoting Gender Equality?</h2>
<h3>Externally</h3>
<p>One of our goals is to <a href="/company/">foster diversity and inclusion in tech</a> and help provide more opportunities for underrepresented groups, including women. As a company, we partner with and support several organisations that promote diversity in the field of computer science. Some of them include:</p>
<ul>
<li><a href="https://www.50intech.com/about-us">50 in Tech</a> is working to achieve a gender balance of 50% women in tech by 2050. Their Gender Score Board helps companies across Europe measure their level of gender inclusion. They selected Tarides as an inclusive company and <a href="https://app.50intech.com/company/tarides">featured us on their website</a>. For more information on our work with 50 in Tech, we have a <a href="/blog/2022-04-19-tarides-partners-with-50intech/">blog post from 2022</a>.</li>
<li>The <a href="https://adatechschool.fr">Ada Tech School</a>, named after the first computer programmer Ada Lovelace, is a programming school designed for women but open to all. They are driven by three values: feminism, empathy, and singularity. We have a <a href="/blog/2021-02-15-partnering-for-more-diversity-in-tech/">blog post on our partnership with them from 2021</a>.</li>
<li><a href="https://girlscancode.fr">GirlsCanCode</a> is an initiative launched by the organisation Prologin that hosts summer camps specially aimed at teaching young women about computer programming, free of charge. We have a <a href="/blog/2022-09-06-tarides-sponsors-girls-can-code/">blog post about why we sponsor GirlsCanCode from 2022</a>.</li>
<li><a href="https://www.recurse.com/about">The Recurse Center</a> is an initiative that offers educational retreats for anyone who wants to get better at programming. They also provide needs-based grants to traditionally underrepresented groups to make programming more accessible for all. We're happy to have hired many talented engineers from the Recurse Center over the years!</li>
<li><a href="https://www.outreachy.org/">Outreachy</a> is an internship program that provides paid remote internships in open source and open science. Outreachy’s goal is to increase diversity in open source and expressly invites anyone who faces underrepresentation or systemic bias in the technology industry of their country to apply. We sponsor and mentor interns in each biannual intake, and you can watch project presentations here (<a href="https://watch.ocaml.org/w/eSSmoyEcPTEXPGAqDtKENX">December 2021</a>, <a href="https://watch.ocaml.org/w/vXJtTj3cULRa1bZB5HrecX">May 2022</a>, <a href="https://watch.ocaml.org/w/pQSAfZ9kDSsSnr8Bxzocn3">December 2022</a>); read a <a href="https://discuss.ocaml.org/t/for-diversity-and-the-ocaml-community-outreachy-summer-2022/92340">community post</a> about how to get involved as a mentor; and read a <a href="/blog/2022-08-02-irmin-in-the-browser/">blog post</a> from one of our Summer 2022 interns about her project.</li>
<li>The <a href="https://oxbridgewomenincs8.wixsite.com/2020">Oxbridge Women in Computer Science Conference</a> is an annual one-day event hosted by the Universities of Oxford and Cambridge (UK). The purpose of the conference is to spotlight the successes of women within computer science and strengthen the network of women in computer science within a supportive environment. The conference is free and open to all genders. We have a <a href="/blog/2020-12-14-tarides-sponsors-the-oxbridge-women-in-computer-science-conference-2020/">blog post from 2020 on our sponsorship of the conference</a>.</li>
</ul>
<h3>Internally</h3>
<p>As a company we aim to provide a flexible and supportive working environment that encourages women to enter and remain in the workforce. Our aim is to make working at Tarides as inclusive as possible.</p>
<p>Examples of our policies include:</p>
<ul>
<li>Childcare support as an employee perk</li>
<li>Flexible hours and working</li>
<li>Equal pay scales based on experience and skills</li>
<li>Apprenticeships and internships to kickstart careers, or to enable later-stage career changes</li>
<li>Career progression development in technical and managerial roles</li>
</ul>
<h2>Still Some Way to Go</h2>
<p>There is still much room for improvement, and we continue to be committed to removing barriers for women: at Tarides, in open source, and in computer science.</p>
<p>Currently at Tarides, 24% of our workforce is female, with 15% in technical roles. Over the last 12 months, Tarides has grown from 55 to 83 people, with 27% of those hired being women. Despite this increase, we are still below our ambitious goal of reaching 30% of women in tech roles. This highlights the disappointing reality that for each position we want to hire for, there are still proportionally fewer women applying and reaching the later stages of recruitment in our field.</p>
<p>In tech, we can address the issues from a number of angles, all of which will improve the overall picture. Collectively, we still need to encourage girls into STEM areas at an early age, in order to gradually increase the numbers of women in tech overall, but also to increase the size of the potential hiring pool of female applicants. Having female role models is essential, and we must continue to increase the representation of women in technology and STEM fields to encourage girls, and women, to see themselves in these kinds of roles. We must also continue to support lifelong learning by funding and creating training opportunities and resources for later-stage career changes.</p>
<p>At Tarides, we have specifically noticed a skills and training gap between entry-level internships (e.g., Outreachy) and the next level of progression into junior software engineer. We are getting better at helping women make their first steps into the tech world, but where do they go next? This year, we are focussing on how we can specifically improve this by preparing more resources to help learn functional programming, OCaml, and open-source methods, and by understanding the different levels of training and education needed in order to progress beyond these initial stages.</p>
<p>Finally, gender equality is not just a topic we should consider on March 8th every year. We must ensure that equality is in everyone’s consciousness and that it forms the basis of our conversations and decisions.</p>
<blockquote>
<p>“We will always have STEM with us. Some things will drop out of the public eye and will go away, but there will always be science, engineering and technology. And there will always, always be mathematics. Everything is physics and math.” - <a href="https://www.nasa.gov/audience/foreducators/a-lifetime-of-stem.html">Katherine Johnson, NASA mathematician</a></p>
</blockquote>
]]></description><link>https://tarides.com/blog/2023-03-08-more-than-a-day-how-does-tarides-promote-women-in-tech</link><guid isPermaLink="false">https://tarides.com/blog/2023-03-08-more-than-a-day-how-does-tarides-promote-women-in-tech.html</guid><dc:creator><![CDATA[ Gemma Gordon ]]></dc:creator><pubDate>Wed, 08 Mar 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[The Journey to OCaml Multicore: Bringing Big Ideas to Life]]></title><description><![CDATA[<p>Continuing our blog series on <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">Multicore OCaml</a>, this blog provides an overview of the road to OCaml Multicore. If you want to know how you can use OCaml 5 in your own projects, please <a href="/contact/">contact us</a> for more information. We also recommend watching KC Sivaramakrishnan's ICFP 22' talk <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">Retrofitting Concurrency - Lessons from the Engine Room</a></p>
<hr>
<p>The journey to <a href="https://github.com/ocaml-multicore/ocaml-multicore/wiki">Multicore OCaml</a> is a journey from cutting-edge theory to real-life code. It’s the story of an idea that grew from a small side-project into a multinational effort that brought a long-awaited update to OCaml. Along the road, the Multicore OCaml team faced many different challenges, leading them to re-evaluate their priorities and approach tasks differently.</p>
<p>As part of the Multicore Project since December 2014, KC Sivaramakrishnan is in a good position to describe the process from the initial days of experimentation right up until launch. He has unique insight into the decisions, challenges, and successes that the team experienced as they worked to turn innovative ideas into tangible results.</p>
<h2>The Journey Begins</h2>
<p>In 2013, the world had survived the 21st of December 2012, Flappy Bird was popular, and everyone was doing the Harlem shake. At the University of Cambridge, Professor Anil Madhavapeddy launched the Multicore OCaml project as part of the <a href="https://ocamllabs.io/">OCaml Labs</a> initiative alongside Leo White, Jeremy Yallop, and Phillipe Wang. They were eventually joined by Stephen Dolan, the then PhD student working on combining <a href="https://www.bcs.org/events/awards-and-competitions/distinguished-dissertations/previous-winners/2017-competition/">ML-style parameteric polymorphism with subtyping</a>.</p>
<p>In 2014 KC, who had just finished his PhD in the US, joined the team. His PhD had focused on making a multicore version of MLton Standard ML compiler, which made him an asset to the growing team that would see the Multicore OCaml Project through to completion. Together they collaborated on a project that would see many partial victories and setbacks, before ultimately releasing OCaml 5.0 to the public in December 2022.</p>
<h2>Timeline</h2>
<p>In the years since the project started, there have been several developments and incremental successes. Below is an overview of the milestones along the road to Multicore OCaml:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Multicore_timeline_new-01-170w~crtLLJpBmopE7mhYSobI2w.webp 170w, /blog/images/Multicore_timeline_new-01-340w~r5r2CncLcfLBXXQaIQZm5w.webp 340w, /blog/images/Multicore_timeline_new-01-680w~m8LdbFNndnGB9eToZndAAw.webp 680w, /blog/images/Multicore_timeline_new-01-1360w~IgIJzyv-jNZ6X2x6CXwSjQ.webp 1360w" src="/blog/images/Multicore_timeline_new-01-1360w~IgIJzyv-jNZ6X2x6CXwSjQ.webp" alt="Isabella Leandersson's Graphic"></p>
<p><strong>2013</strong></p>
<ul>
<li>Multicore OCaml project was started by Prof Anil Madhavapeddy in the <a href="https://ocamllabs.io/">OCaml Labs</a> initiative at the University of Cambridge Computer Lab with Leo White, Jeremy Yallop, and Phillipe Wang. The team was later joined by the then PhD student Stephen Dolan, who was working on combining <a href="https://www.bcs.org/events/awards-and-competitions/distinguished-dissertations/previous-winners/2017-competition/">ML-style parameteric polymorphism with subtyping</a>.</li>
</ul>
<p><strong>2014</strong></p>
<ul>
<li>March: Stephen Dolan, Leo White, and Anil Madhavapeddy started hacking on Multicore OCaml</li>
<li>March: Earliest <a href="https://github.com/ocaml/ocaml/commit/a56e4530b5b173e8de28eead196d6878bc021c55">commit</a> that can be directly attributed to Multicore OCaml that is in the OCaml commit history. The commit removes most of the out-of-heap pointers the interpreter uses by replacing them with stack offsets.</li>
<li>September: A status update on Multicore OCaml was presented in the OCaml workshop 2014, you can read the <a href="https://web.archive.org/web/20160414164304/https://ocaml.org/meetings/ocaml/2014/ocaml2014_1.pdf">associated paper</a> by Stephen Dolan, Leo White, and Anil Madhavapeddy.</li>
</ul>
<p><strong>2015</strong></p>
<ul>
<li>First 6 months: An initial implementation of effect handlers was completed. The inspiration behind this idea came from the <a href="https://www.eff-lang.org">Eff language</a>.</li>
<li>September: Effect handlers in OCaml were presented at the OCaml workshop 2015. You can read more about it in this <a href="https://kcsrk.info/ocaml/multicore/2015/05/20/effects-multicore">blog post</a>.</li>
</ul>
<p><strong>2016</strong></p>
<ul>
<li>May: <a href="https://www.dagstuhl.de/en/seminars/seminar-calendar/seminar-details/16112">Dagstuhl Seminar 16112</a>: “From Theory to Practice of Algebraic Effects and Handlers.” Effect handlers in OCaml was presented and refined based on expert interactions.</li>
<li>ML workshop: Daniel Hilleström, Sam Lindley, KC Sivaramakrishnan <a href="https://kcsrk.info/publications">"Compiling Links Effect Handlers to the OCaml Backend"</a>. Daniel Hillerstrom developed a Multicore OCaml backend for Links language compiling Links effect handlers to OCaml effect handlers.</li>
<li>ML workshop: Oleg Kiselyov and KC Sivaramakrishnan <a href="https://kcsrk.info/papers/eff_ocaml_ml16.pdf">"Eff Directly in OCaml"</a>. Showed how to get the expressive power of Eff language directly using features from the OCaml language + OCaml effect handlers.</li>
<li>OCaml workshop: KC Sivaramakrishnan and Théo Laurent <a href="https://kcsrk.info/papers/reagents_ocaml16.pdf">"Lock-Free Programming for the Masses"</a>. Presented the implementation of Reagents in OCaml, a composable lock-free programming library.</li>
</ul>
<p><strong>2017</strong></p>
<ul>
<li>Papers published at the ML &amp; OCaml Workshop: Stephen Dolan, Spiros Eliopoulos, Daniel Hillerström, Anil Madhavapeddy, KC Sivaramakrishnan, and Leo White <a href="https://icfp17.sigplan.org/details/mlfamilyworkshop-2017-papers/2/Effectively-tackling-the-awkward-squad">"Effectively Tackling the Awkward Squad"</a>. The work outlined in this paper showed how effect handlers can simplify concurrent systems programming. These ideas were then incorporated in the development of <a href="https://github.com/ocaml-multicore/eio">Eio</a>.</li>
<li>Stephen Dolan and KC Sivaramakrishnan - <a href="https://icfp17.sigplan.org/details/ocaml-2017-talks/19/A-memory-model-for-multicore-OCaml">"A Memory Model for Multicore OCaml"</a>. The paper proposed a relaxed memory model for OCaml, broadly following the design of axiomatic memory models for languages such as C++ and Java, but with a number of differences to provide stronger guarantees and easier reasoning to the programmer, at the expense of not admitting every possible optimisation. This work eventually lead to the <a href="https://v2.ocaml.org/releases/5.0/htmlman/memorymodel.html">relaxed memory model used in OCaml 5</a>.</li>
</ul>
<p><strong>2018</strong></p>
<ul>
<li>Stephen Dolan, KC Sivaramakrishnan, Spiros Eliopoulos, Daniel Hillerström, Anil Madhavapeddy, and Leo White presented a forward looking paper on "<a href="https://kcsrk.info/papers/system_effects_feb_18.pdf">Concurrent Systems Programming with Effect Handlers"</a> at the Trends in Functional Programming conference. This is the full version of the 2017 ML Workshop paper.</li>
<li>Stephen Dolan, KC Sivaramakrishnan, and Anil Madhavapeddy, published a paper on the relaxed memory model for OCaml at PLDI, <a href="https://kcsrk.info/papers/pldi18-memory.pdf">"Bounding Data Races in Space and Time"</a>. This is the full version of the memory model work presented at the 2017 OCaml Workshop.</li>
<li>The team worked on simplifying and speeding up the implementation of effect handlers.</li>
</ul>
<p><strong>2019</strong></p>
<ul>
<li>Sadiq Jaffer and Tom Kelly implemented a new garbage collector for the minor heap (parallel stop-the-world minor collector), which ensures that programs using C FFI in OCaml remain backwards compatible.</li>
<li>The <a href="https://github.com/ocaml-bench/sandmark/">Sandmark</a> benchmark suite for rigorously benchmarking OCaml programs was developed and deployed. These days the performance of OCaml compiler is tracked continuously using the <a href="https://sandmark.tarides.com">Sandmark nightly continuous benchmarking service</a>.</li>
</ul>
<p><strong>2020</strong></p>
<ul>
<li>The team decided to switch to the parallel stop-the-world minor collector (ParMinor) as default and drop the support for the concurrent minor collector (ConcMinor). ParMinor GC avoided a breaking change in the C FFI introduced by the ConcMinor GC. One concern is that the stop-the-world aspect in ParMinor would be a scalability bottleneck at large core counts. Our performance evaluation on the Sandmark suite showed that the impact of ParMinor is minimal even at large core counts (120+).</li>
<li>KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy presented <a href="https://core.ac.uk/download/pdf/328720849.pdf">"Retrofitting Parallelism onto OCaml"</a> at ICFP 2020. The paper describes the design choices for multicore support in OCaml, the design of the ConcMinor and ParMinor GCs, detailed performance evaluation, and justifies our choice to switch to ParMinor as default. It won the distinguished paper award at ICFP.</li>
<li>From 2020 through 2021, the team focused on achieving feature parity with sequential OCaml (systhreads, GC performance, DWARF support, <a href="https://check.ocamllabs.io/">opam health check</a>, etc.)</li>
</ul>
<p><strong>2021</strong></p>
<ul>
<li>KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jaffer, Tom Kelly, and Anil Madhavapeddy published <a href="https://kcsrk.info/papers/drafts/retro-concurrency.pdf">"Retrofitting Effect Handlers onto OCaml"</a> at PLDI 2021. The paper describes the design choices for the concurrency substrate in OCaml 5 and how effect handlers are a good fit for our needs.</li>
<li>Later half of 2021, OCaml core developers began reviewing code for Multicore OCaml, including the new concurrency and parallelism features, see <a href="https://github.com/ocaml-multicore/docs/blob/main/ocaml_5_upstreaming_proposal.md">this document</a> for more information.</li>
</ul>
<p><strong>2022</strong></p>
<ul>
<li>Early 2022, the <a href="https://github.com/ocaml/ocaml/pull/10831">Multicore PR</a> was merged!</li>
<li>Significant efforts were made by core OCaml developers to implement new features, review them, and ready the compiler for release. Without their hard work and dedication, the would be no OCaml Multicore nor OCaml 5.0.</li>
<li>Memory model successfully implemented</li>
<li>RISC-V backend and ARM64 backend achieved</li>
<li>December 16th, 2022: OCaml 5.0 is released!</li>
</ul>
<h2>Why Multicore OCaml?</h2>
<p>The number of cores on the machines that we use have been <a href="https://www.techspot.com/article/2363-multi-core-cpu/">steadily increasing for years</a>. Almost every computer now has several cores available to the user, and for a programming language to use them effectively it must support shared-memory parallel programming. If it does not, the user is forced to execute everything sequentially using only one core, or use multi-process programming, which is hard to use and in many cases less efficient than shared-memory parallel programming.</p>
<p>There are two main features coming with OCaml 5: <strong>Parallelism</strong> and <strong>Concurrency.</strong> Parallelism is about performance; it’s the idea that if you have an <em>n</em> amount of cores, you can make your program go faster by <em>n</em> amounts of time. The effects of parallelism will be most keenly felt in how fast your programs run, giving you as a user a significant performance boost.</p>
<p>On a bigger scale, parallel programming is significant for projects that need to complete resource intensive tasks quickly, like <a href="/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain/">theorem provers</a> for example. With multicore support for OCaml, developers can take advantage of features like type and memory-safety with unprecedented levels of performance.</p>
<p>Concurrency, on the other hand, is a programming abstraction. It is a way to tell your program that you want to execute several functions, each of which may potentially block for a short time while waiting for some external event. The programming language may choose either to execute such functions sequentially, one after the other, on a single core, interleaving their execution when a function gets blocked, or choose to execute them in parallel on several cores at once. Concurrency is useful, for example, when writing a web server that must handle several concurrent requests. The program may handle several such requests at the same time, but not necessarily need to use multiple cores to handle them. With OCaml 5, writing concurrent code is made a lot easier.</p>
<p>Previously, concurrent OCaml code would have to be written in a specific tool, Async or Lwt, that the developer would have to learn separately. However, these tools don’t currently allow for asynchronous and synchronous code to interact with each other. In a <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">blog post from 2015,</a>, Bob Nystrom goes describes this process in what he calls the ‘Functional Colouring Problem’. OCaml 5 brings in support for concurrency through <a href="https://kcsrk.info/webman/manual/effects.html">effect handlers</a> and the new Input/Output library <a href="https://github.com/ocaml-multicore/eio">Eio</a>, which lets developers compose asynchronous and synchronous code together. It’s also easy to learn and use since it behaves like normal OCaml code, <a href="/blog/2022-10-19-porting-charrua-unix-and-rawlink-to-eio/">simplifying the developer workflow</a>.</p>
<p>For those who still prefer to use Lwt or Async, OCaml 5 doesn’t preclude them from doing so. Should they one day want to switch from using either tool to using Eio, changing their code to be compatible with Eio is simple and user-friendly. Whilst they will still need to rewrite their applications to use the primitives provided by Eio, to do so is straight-forward and can be made incremental thanks to the Lwt- and Async-Eio bridges.</p>
<p>The team of people who worked on OCaml 5 knew from the start that bringing multicore support to OCaml would improve the lives of its users. It would make programs that run in OCaml <a href="https://medium.com/geekculture/what-makes-a-cpu-fast-344517cf91f9">faster and more efficient</a>,  as well as help developers be more productive.</p>
<h2>The Academic and the Engineer</h2>
<p>When academics are on the cutting edge of science, they're essentially creating a new area of research as they go. This leads to a natural lag time between innovation and the creation of academic papers revieweing the process. For example, KC explains that “The first <a href="https://kcsrk.info/ocaml/multicore/2015/05/20/effects-multicore/">talk on effect handlers was in 2015</a>, but the first <a href="https://anil.recoil.org/papers/2021-pldi-retroeff.pdf">proper paper on effect handlers</a> was just published in 2021.”</p>
<p>Refererring to the time between experimentation and finished product, KC goes on to say: “Personally, it has been challenging to take part in building these systems, because 95% of the work is not very visible but you have to create that 95% in order to talk about the 5%.”</p>
<p>Since the road to get here has been so long, it feels all the more exhilarating that release day has finally arrived.</p>
<blockquote>
<p>“It’s incredible that we are at the stage where we’re able to take cutting-edge research and put it into practice. In the last few years, we’ve expanded from academic research to producing robust code that can be upstreamed.”</p>
</blockquote>
<p>This is great news for the whole community, as it demonstrates OCaml’s potential to turn research into real products. The team behind the multicore effort’s goals were to modernise OCaml and make it faster and more efficient for everyone. The realisation of that goal took years of experimentation, optimisation, and groundbreaking research.</p>
<h2>First Major Challenge: It’s All About Garbage</h2>
<p>The first major challenge facing the Multicore team was OCaml’s garbage collector. In OCaml, there are two programs working together on the heap, the language and the garbage collector. If the language supports parallelism but the garbage collector does not, the language would run fast just to be slowed down by the garbage collector.</p>
<p>To avoid this problem, the team made the garbage collector support parallelism to give users a uniformly smooth experience. “Garbage collectors balance memory usage at the cost of time, so you can either have it use a small amount of memory but take a long time, or be fast but use a lot of memory,” KC comments.</p>
<p>With different variables to optimise for, the team had to make some crucial decisions. OCaml already had a user base with certain expectations. They had to ensure that their changes did not remove features that users had come to expect. For example, OCaml is a <a href="https://ocaml.org/about">robust and predictable language</a>, and they needed to replace the garbage collector without sacrificing on that predictability. They also didn’t want to settle for worse results in terms of performance.</p>
<p>Working on the new garbage collector, the team built an initial version that performed very well. However, they soon discovered that in order for the garbage collector to work, it would break the existing Application Programming Interface (API) interacting with C code.</p>
<blockquote>
<p>“That was our dilemma: we had a nice, fast, garbage collector, but it would break people’s code.”</p>
</blockquote>
<p>A broken API would have been bad news for anyone, but it would especially affect any existing projects that relied heavily on C code (like Coq), as well as many industrial users who would have had to change millions of lines of code. The team worried that this would create a fork in the community, between those who would find it worth the upgrade and those who would not.</p>
<p>This was a big lesson for the team: user friendliness is incredibly important when introducing new technologies, and a big part of user friendliness is backwards compatibility. With this in mind, they set out to redesign the garbage collector. Although they were initially resigned to sacrifice some performance for the sake of compatibility, they ended up with a final product that not only did not break any code, but also didn’t see significant performance losses! They <a href="https://icfp20.sigplan.org/details/icfp-2020-papers/21/Retrofitting-Parallelism-onto-OCaml">presented their findings</a>  at ICFP 2020 and won the distinguished paper award.</p>
<h2>Second Major Challenge: Memory Model, What Memory Model?</h2>
<p>The second challenge came as a result of the very way computers are constructed. Unsurprisingly, the hardware that actually executes your code predates the multicore era. Consequently, the hardware and compilers running the code are designed to make optimisations based on the assumption that you’re running a single-threaded (so not multicore) program.</p>
<p>As you might imagine, several of these optimisations conflict with more modern, multicore aspects of code. In order for multicore code to run successfully in the face of these optimisations, useful abstractions are needed to determine what is safe and how parallel code is expected to run. These abstractions are called <a href="http://canonical.org/~kragen/memory-models/">memory models</a>, and they are necessary for hardware made for single-threaded programs to run multi-threaded code.</p>
<p>Memory models are very complex and have to balance simplicity with performance. The more straight-forward the model, the greater the risk that it can’t account for all possibilities, and therefore cause bugs. Conversely, if the memory model is complex enough to maximise performance, it will be hard for people to understand and use.</p>
<p>For the Multicore OCaml Project, the team decided to take inspiration from the memory models of <a href="https://cplusplus.com">C++</a> and <a href="https://www.java.com/en/">Java</a>, which choose to prioritise performance. However, they still wanted to make a memory model that was straightforward and intuitive. “OCaml is used to prove other languages, and if the memory model is too complicated, it becomes hard to verify other parallel programs,” KC explains.</p>
<p>By sacrificing a small amount of performance (around 3%), the team managed to create an <a href="https://kcsrk.info/webman/manual/memorymodel.html">OCaml memory model</a> that was both high-performing and easy-to-use. The paper detailing the process is called <a href="https://kcsrk.info/papers/pldi18-memory.pdf"><em>Bounding Data Races in Space and Time</em></a>.</p>
<p>In two of the big technological challenges that faced the team, a clear focus on user experience emerged. As a language with deep roots in academia, at times rumoured to be ‘difficult,’ focusing on improving user experience is an important part of making OCaml a language for everyone.</p>
<h2>The People Behind the Project</h2>
<p>Behind every project is a group of hardworking people. Stephen Dolan, Leo White, and Anil Madhavapeddy started the Multicore project back in 2014. Until 2018, KC, Stephen, and Leo were doing most of the hacking. After 2018, the team saw enormous growth with Sadiq Jaffer and Tom Kelly working together on the garbage collector. Today, there are around ten people hacking on Multicore OCaml at any given time, all working hard to ensure that OCaml 5 is a success.</p>
<p>The open-source community has also provided continuous, valuable feedback as work on Multicore OCaml has progressed. Every person who participates by sharing their opinions and experience helps the project more forward. Many core OCaml developers worked tirelessly to get OCaml 5.0 release ready. In particular we should highlight the support of Xavier Leroy, who spent a considerable amount of time and effort implementing changes to important pieces in the runtime to make them multicore compatible (such as closure representation, bytecode interpreter, etc.), as well as Gabriel Scherer for his enthusiastic support of Multicore features and the willingness to do an enormous amount of crucial work like reviewing a large number of Multicore PRs and additional features. The academic community has also actively utilised Multicore OCaml to push the boundaries of what is possible with effect handlers, and provided useful feedback and bug reports.</p>
<p>On the commercial side, Tezos has significantly helped the team test OCaml 5 by using multicore features for their tools <a href="/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain/">PLONK prover</a>. They’ve made good progress using OCaml 5 and have been extremely helpful by reporting on bugs and their experience.</p>
<h2>Sandmark</h2>
<p>Over the course of OCaml Multicore’s implementation, new tools have been developed to facilitate its creation. These tools are useful in and of themselves, and can be used in other projects. One tool born out of the OCaml Multicore push is the benchmarking suite <a href="https://github.com/ocaml-bench/sandmark">Sandmark</a>.</p>
<p>When Sadiq and Tom were working on the garbage collector, they had to understand how the change to a parallel garbage collector would affect non-parallel, sequential programs. To this end, they created Sandmark to benchmark different iterations.</p>
<h2>Where Do We Go from Here?</h2>
<p>OCaml 5 is just the beginning, and from its release springs countless more opportunities. Several teams across the community are innovating on new features for OCaml. These features are at various levels of maturity and development, with small groups of developers testing some of them, whilst others are more or less in the ideation phase. Some of these features in development are listed below, but this is by no means an exhaustive list:</p>
<ul>
<li>Effects system: At the moment, there is no support from the OCaml type system to ensure that effect handlers are handled properly. An effect system is an extension of the type system that keeps track of which effects can be performed by an expression or a function, ensuring that effects are only performed in a context where a corresponding effect handler is set up to deal with them. In a language as well-established and large as OCaml, implementing new features comes with significant considerations. Backwards compatibility is a must, and the new system must work with the polymorphism, modularity, and generativity features already in place. For an early exploration of typed effect handlers in OCaml, check out <a href="https://www.janestreet.com/tech-talks/effective-programming/">Leo White's talk</a> from 2018.</li>
<li>JavaScript: OCaml has a very nice compiler to <a href="https://www.javascript.com">JavaScript</a>, but it couldn't compile effect handlers to JavaScript. Indeed, JavaScript does not provide a corresponding feature. A standard way to translate effect handlers is to transform the code into the so-called continuation-passing style (CPS). Functions require an extra argument: a one-argument continuation function. Instead of returning a result value, they call the continuation with this value. By making continuations explicit, one can then explicitly manipulate the control flow of the program, which makes it possible to support effect handlers. Js_of_ocaml has been recently modified to support effect handlers using this approach. This <a href="https://github.com/ocsigen/js_of_ocaml/pull/1340">preliminary implementation</a> has been released in <a href="https://discuss.ocaml.org/t/ann-js-of-ocaml-5-0/11008">Js_of_ocaml 5.0</a>. It has provided their team with a good understanding of how effect handlers work and what technical difficulties exist when supporting them in a compiler targeting JavaScript. However, CPS transformation comes with an important negative impact on performance. The team then implemented a <a href="https://github.com/ocsigen/js_of_ocaml/pull/1384">partial CPS transform</a> that removed some of the overheads with the CPS transformation. There are still some overheads due to CPS conversion that can be eliminated with smarter analysis and transformation. The team is considering trying alternative compilation techniques to support effect handlers. For example, there are implementation strategies that should have low overhead as long as no effect is performed at the cost of making effect handling slower. However, this might make the generated code much larger. The CPS-based implementation provides them with a point of comparison for undertaking this work.</li>
<li><a href="https://github.com/ocaml-flambda/ocaml-jst/tree/main/jane/doc">Local Allocations</a>: implemented by Stephen Dolan and Leo White: This feature adds support for stack-allocated blocks. It enforces memory safety by requiring that heap-allocated blocks never point to stack-allocated blocks, and stack-allocated blocks never point to shorter-lived stack-allocated blocks. This is a big addition to OCaml’s type system and is still under development.</li>
<li>Unboxed Types: Currently in OCaml, all fields of a structure store values in a single-machine word. This word is further restricted by having to either point to a garbage-collected memory or be tagged to denote that the garbage collector should skip it.  Unboxed types relax this restriction, allowing a field to hold values smaller or larger than a word. This can be used to save memory and to improve performances by avoiding some pointer dereferencing. To find out more, read the <a href="https://github.com/ocaml/RFCs/pull/34">proposal on unboxed types</a> on GitHub.</li>
</ul>
<h2>The Legacy</h2>
<p>The Multicore OCaml project has been full of challenges, successes, and surprises. Along the way, the team has developed and grown, learning important lessons and adapted their approach to best suit the needs of all OCaml users.</p>
<p>Making the leap from research to product is a complex process that takes time to execute properly. In computer science, it can take decades to get right. It’s a massive achievement to get a revolutionary update like OCaml Multicore from concept to finished product in less than 8 years.</p>
<p>It’s also an update suitable for everyone. Users who don’t have a need for multicore features can carry on using OCaml like they always have, benefitting from other OCaml 5 features without having to change a line of code. On the other hand, the significant number of people who have long awaited the update can now benefit from having OCaml and all its strengths on multiple cores.</p>
<p>The story of OCaml Multicore is one of hard work and a dedication to learning. It speaks to anyone with a passion project that seems too innovative or experimental to succeed. With a strong team and a flexible, problem-solving approach, theory can quickly become reality.</p>
<h2>Acknowledgements</h2>
<p>A big thank you to KC Sivaramakrishnan, without whom this article would not be possible. Further thanks goes to Jerôme Vouillon and Leo White for their expertise and contributions to the ‘where do we go from here’ section of the article.</p>
<h2>Sources and Further Reading</h2>
<ul>
<li>
<p>A collection of libraries, experiments, and ideas relating to OCaml 5: https://github.com/ocaml-multicore/awesome-multicore-ocaml</p>
</li>
<li>
<p>A wiki for Multicore OCaml. Note that it's not currently being maintained, so whilst it has much useful information, some migh be outdated: https://github.com/ocaml-multicore/ocaml-multicore/wiki</p>
</li>
<li>
<p>Information on Effect Handlers: https://kcsrk.info/webman/manual/effects.html</p>
</li>
<li>
<p>Information on Parallelism: https://kcsrk.info/webman/manual/parallelism.html</p>
</li>
<li>
<p>Information on Memory Models: https://kcsrk.info/webman/manual/memorymodel.html</p>
</li>
<li>
<p>Academic publications pertaining to OCaml Multicore: https://github.com/ocaml-multicore/awesome-multicore-ocaml#papers</p>
</li>
<li>
<p>OCaml’s home on the web: https://ocaml.org</p>
</li>
</ul>
]]></description><link>https://tarides.com/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-life</link><guid isPermaLink="false">https://tarides.com/blog/2023-03-02-the-journey-to-ocaml-multicore-bringing-big-ideas-to-life.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 02 Mar 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Lambda Retreat Report]]></title><description><![CDATA[<p>Today we're taking a little pause from our OCaml 5 series to talk about a programming retreat. I spent a week in the woods with fellow programmers at the <a href="https://anandology.com/lambda-retreat/">Lambda Retreat</a>. It was a wonderful way to explore the nature of computations, abrstractions, and paradigms. Although I mostly work in OCaml, it was fun and challenging to code in Scheme, another functional programming language.</p>
<p>For more OCaml 5 posts, visit our <a href="/blog/">Tarides blog</a> for posts about some exciting new features and interviews with OCaml programmers, <a href="/blog/2023-01-10-engineer-spotlight-sudha-parimala/">including one with me</a>! Next week, come back to read an article about how OCaml 5 performs during the <a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html">Benchmarking Game</a>, but for now, read on for a Lambda Retreat Retrospective.</p>
<hr>
<p><em>Structure and Interpretation of Computer Programs (SICP)</em> is many programmers' favourite programming textbook. It teaches programming constructs like recursion, modularity, abstractions, etc. For a long time, it was used as the textbook for an introduction to programming course. Here's what <a href="https://www.amazon.com/review/R403HR4VL71K8">Peter Norvig</a> and <a href="https://eli.thegreenplace.net/2008/05/28/book-review-structure-and-interpretation-of-computer-programs-by-harold-abelson-gerald-jay-sussman/">Eli Bendersky</a> have to say about SICP.</p>
<p>Having seen a lot of people highly recommend SICP, I grabbed a copy for myself a few years ago and started reading it, but, alas, I never completed the book.</p>
<p>In the latter part of last year, <a href="https://anandology.com/">Anand</a> decided to host a week-long retreat to gather a bunch of people and go through some interesting parts of SICP. Suffice it to say, I jumped at the opportunity to do nothing but read and write code for a week.</p>
<h3>Getting Ready</h3>
<p>Two weeks before the retreat, we had some warm-up sessions to get ourselves ready. During this time, we attended some remote sessions and solved a few exercises from the Chapters 1 &amp; 2 of SICP.</p>
<h3>Arriving</h3>
<p>On Day 0, we all arrived at Bangalore from various parts of India, and carpooled to the <a href="https://tvc.farm/">Tamarind Valley Collective</a> (TVC), located ~80km from the city. Reaching TVC turned out to be an unexpected but enjoyable 1.5km trek, since the roads to the campsite were unusable due to rain.</p>
<h3>At the Retreat</h3>
<p><strong>Functional Geometry</strong></p>
<p>The retreat began with Functional Geometry from the second chapter of SICP. We started with the basics, like rendering images, and slowly built the primitives needed for generating Escher's woodcut.</p>
<p>It was amazing to see the power of composability! We thoroughly enjoyed building Escher's woodcut from an unassuming image of a fish.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/fish-170w~GKS6RDq2vcqptUFvVZDHxA.webp 170w, /blog/images/fish-340w~rP13DyhQjXv6q9YmkvtVHw.webp 340w, /blog/images/fish-680w~e-Cy5x5utGQ2TvYUDy9bqQ.webp 680w, /blog/images/fish-1360w~XzEYL-7N0PgjYG05-TX1HQ.webp 1360w" src="/blog/images/fish-1360w~XzEYL-7N0PgjYG05-TX1HQ.webp" alt="Initial image of a fish"></p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/woodcut-170w~rtDFBPV2jfiRkOLO7qDaTw.webp 170w, /blog/images/woodcut-340w~8fnnX2kanquYLsNfvokdLg.webp 340w, /blog/images/woodcut-680w~8tnGsWBEIjTXqKaN2iNd-g.webp 680w, /blog/images/woodcut-1360w~UpBjnTmwjm0UmpCT0AUS2A.webp 1360w" src="/blog/images/woodcut-1360w~UpBjnTmwjm0UmpCT0AUS2A.webp" alt="Escher's Woodcut" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/group2-170w~B97xah-ZIfMQ3Mng94G01A.webp 170w, /blog/images/group2-340w~vlnsDczFEcTsjrgQh24Wzg.webp 340w, /blog/images/group2-680w~EM7b--CduoTPnJjIyDtclg.webp 680w, /blog/images/group2-1360w~qFEXFIS1XYz8PNzCA9kSUA.webp 1360w" src="/blog/images/group2-1360w~qFEXFIS1XYz8PNzCA9kSUA.webp" alt="Participants with Lambda Retreat t-shirt" width="45%">
</p>
<p>Anand rewarded everyone with an Escher's woodcut T-Shirt for successfully generating the <code>square-limit</code> \o/</p>
<p>We then abstracted out the implementation details for the Functional Geometry primitives we had built. The abstraction gives us the freedom to change the implementation at a later point without affecting the higher-level details.</p>
<p><strong>Mutability and State</strong></p>
<p>We spent some time understanding mutations, global state, and local state in Scheme, a functional language like OCaml. This led us to building some mutable data structures, like a mutable queue and a mutable hash table in Scheme, and generalising with a dispatcher to perform operations.</p>
<p><strong>Metacircular Interpreter</strong></p>
<p>Another exercise before the retreat was to write a parser for Scheme in Python. At the retreat, we started with translating it to Scheme. Going further, we built a metacircular interpreter -- a Scheme interpreter written in Scheme. How cool is that?</p>
<p>We then learnt about lazy evaluation in Scheme and went on to make our metacircular interpreter lazy by default. Another interesting part we looked forward to was targeting WebAssembly (Wasm) from Scheme. It was surprisingly simple to go from Scheme to Wasm, targeting Wasm's Lisp-like syntax.</p>
<h3>Beyond Tech</h3>
<p>Living at TVC in the middle of a forest with barely any electricity or cellular network for a week was a humbling experience. Madhav, the resident manager at TVC, and his team made sure our stay was comfortable. The food, made from locally-sourced indgredients featuring local cuising was amazing!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/campsite-170w~TJXU3RNgUILCTsX2CHFpsA.webp 170w, /blog/images/campsite-340w~pg97xWRcZKGqWxaD9NMbPQ.webp 340w, /blog/images/campsite-680w~rMe2UW6WnHYd7gyxSA7l8g.webp 680w, /blog/images/campsite-1360w~JC67557M4_9T7KHZjF2cOA.webp 1360w" src="/blog/images/campsite-1360w~JC67557M4_9T7KHZjF2cOA.webp" alt="Our campsite in the middle of the country site"></p>
<p>The evening walks and hikes at TVC were memorable. We managed to sight some kingfishers and owls while snacking on some freshly plucked tamarinds. We had so much fun hiking along a stream that runs in the middle of TVC and capturing wisdom about sustainable living from Madhav and Vikrant, who put it into practice by living on farms.</p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/owl-170w~RaSLYSbKMfeCzl2vsNlJtA.webp 170w, /blog/images/owl-340w~pvStvOT6NDPRDBtubafO0g.webp 340w, /blog/images/owl-680w~lAQBloYQ_ZYqID9XBjLCWQ.webp 680w, /blog/images/owl-1360w~AQ5eYulQx_NBeQ5yhkudtg.webp 1360w" src="/blog/images/owl-1360w~AQ5eYulQx_NBeQ5yhkudtg.webp" alt="An owl we spotted" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/group-170w~1QNbjN1ExXSE45V8PnAfTQ.webp 170w, /blog/images/group-340w~WGqbJTIoQ8_yPN2igZowEw.webp 340w, /blog/images/group-680w~nBpkA40cljrVcrWyM8jAHQ.webp 680w, /blog/images/group-1360w~p-UeZcReXOleoiVGY7hekw.webp 1360w" src="/blog/images/group-1360w~p-UeZcReXOleoiVGY7hekw.webp" alt="After a hike along the stream" width="45%">
</p>
<p>Our coworkers Pappu the hunting cat, her three kittens, and our boy Poco, the rugged looking sweet doggo, kept us company.</p>
<p align="center">
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/cat-170w~cSXXegh7l8PQqa7eNsnMNQ.webp 170w, /blog/images/cat-340w~7jSgckiwLfHb2sDBp1Iqsg.webp 340w, /blog/images/cat-680w~RwufUdx96KrQf3CWTTNUEA.webp 680w, /blog/images/cat-1360w~OXLOQXr46C-muTSXsuYEdQ.webp 1360w" src="/blog/images/cat-1360w~OXLOQXr46C-muTSXsuYEdQ.webp" alt="Pappu the cat" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/poco-dog-170w~ARGb0XM6Y-aeXTUYnxSQrQ.webp 170w, /blog/images/poco-dog-340w~U5JhIU2egitvLhO7IcFkfA.webp 340w, /blog/images/poco-dog-680w~RtS7cwxus3VN8V4pHRDANA.webp 680w, /blog/images/poco-dog-1360w~M4Vy3b40JgxoA-3Rdz1c5A.webp 1360w" src="/blog/images/poco-dog-1360w~M4Vy3b40JgxoA-3Rdz1c5A.webp" alt="Hiking crew with Poco the dog" width="45%">
</p>
<p>Our days of hacking were followed by board games in evenings and nights. We had so much fun and laughter riots playing games like Skull, Chameleon, and Ticket to Ride Europe. By the end of the retreat, we were surprised by how little internet and social media we had consumed that week!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/game-170w~ToqUPqGxupPV500EiYGTSA.webp 170w, /blog/images/game-340w~4KmjN3p4yuXB_uMBq_xNGg.webp 340w, /blog/images/game-680w~qXYd9G9gM-orQ0D8T9UmZg.webp 680w, /blog/images/game-1360w~qpRbXALecbRGtPZzc0FUVQ.webp 1360w" src="/blog/images/game-1360w~qpRbXALecbRGtPZzc0FUVQ.webp" alt="A picture of the game: Ticket to Ride"></p>
<p>I'm grateful to have had the opportunity to attend the first ever Lambda Retreat and hope to carry the functional programming spirit forward. It was super nice meeting all the enthusiastic and kind people at the retreat, and I hope to see everyone again at future events.</p>
<p>Thanks to Anand for organising it and to everyone who attended for making it an enjoyable experience. Thanks also to Madhav and his team at TVC for ensuring our stay was comfortable.</p>
<hr>
<p>Check out our series on Multicore OCaml, a project I've worked on for the last several years, starting with the <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">announcement of the OCaml 5 release</a>. If you'd like to know more about OCaml 5, you can start with <a href="https://youtu.be/zJ4G0TKwzVc">KC's keynote address</a> from the ICFP 2022 conference, <a href="https://ocaml.org/docs">OCaml tutorials</a>, and the informative book <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em></a>.</p>
]]></description><link>https://tarides.com/blog/2023-01-12-lambda-retreat-report</link><guid isPermaLink="false">https://tarides.com/blog/2023-01-12-lambda-retreat-report.html</guid><dc:creator><![CDATA[ Sudha Parimala ]]></dc:creator><pubDate>Thu, 12 Jan 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Engineer Spotlight: Sudha Parimala]]></title><description><![CDATA[<p>For our third and final Engineer Spotlight, we interviewed Sudha Parimala, a Tarides engineer who works primarily on the Multicore Applications team. She talks about what lead her to become an OCaml programmer and why she's excited about OCaml 5, as our blog series on <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">Multicore OCaml</a> continues.</p>
<hr>
<p><strong>Christine: Why did you decide to become an OCaml programmer rather than Python or C++?</strong></p>
<p>Sudha: My programming journey started with Python in high school. At that point I didn't really know much about programming, and I picked it only as an alternative to studying biology. Python's human language-like syntax made it easy to grasp the concepts as a novice, while also getting a feel of programming constructs. The education board decided to switch to C++, and I ended up learning OOP as a result. I continued learning C, Java, and such during my undergrad.</p>
<p>Then I discovered Haskell and was hooked. I found it wild that I could write a 200+ lines Java program with just 10 lines in Haskell. I got an opportunity to participate in a summer school organised by ACM India on Programming Language Design. This deepened my interest in Functional Programming (FP). After graduating, I got an opportunity to join KC's Multicore OCaml team at IIT Madras. I started learning OCaml then, and there's no looking back.</p>
<p><strong>C: What do you like best in OCaml?</strong></p>
<p>S: I like OCaml's features combined with its practicality. When I started learning OCaml, I could relate to a lot of general FP concepts I had learnt through other languages. At the same time one can easily do imperative programming (mutations) or I/O with ease.</p>
<p><strong>C: What’s the coolest thing you've made with OCaml?</strong></p>
<p>S: When I was working on the Multicore OCaml compiler, I found it really cool that we could easily connect OCaml directly with C, with the type safety of OCaml. If I may add a futuristic take on this is, I'd find it really cool to do hobby projects of mine with the entire stack - from web scraping, to talking to databases, to creating web apps - in OCaml.</p>
<p><strong>C: Why should engineers learn OCaml?</strong></p>
<p>S: I'd recommend learning OCaml to anyone curious to learn new and succint ways of expressing programs. OCaml definitely changes the way you think about programs, and I'm sure that reflects on how you write programs, in OCaml and elsewhere.</p>
<p><strong>C: What are you most excited about in OCaml 5?</strong></p>
<p>S: I'm really excited to see the OCaml world transition to Multicore. It will be a challenging, yet rewarding journey. Challenging because OCaml programs for more than two decades have been designed for single core. Rewarding, thanks to blazing fast performance time and direct style concurrency. The Multicore and OCaml development teams have invested time in ensuring backwards compatibility, which will hopefully ease the process a bit.</p>
<hr>
<p>Thank you so much, Sudha, for taking the time to answer these questions about your experience with OCaml. Also read <a href="/blog/2023-01-05-engineer-spotlight-zach-shipko/">Zach Shipko's</a> and <a href="/blog/2022-12-29-engineer-spotlight-jules-aguillon/">Jules Aguillon's</a> interviews for their take! Thanks to their willingness to share, other developers can see why they should learn OCaml as their next language.</p>
<p>Feel like learning OCaml? Get started with the <a href="https://ocaml.org/docs">tutorials</a> and the <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em></a> book. Get a nice overview of what OCaml 5 has to offer by watching <a href="https://youtu.be/zJ4G0TKwzVc">KC Sivaramamakrishnan's keynote address</a>.</p>
]]></description><link>https://tarides.com/blog/2023-01-10-engineer-spotlight-sudha-parimala</link><guid isPermaLink="false">https://tarides.com/blog/2023-01-10-engineer-spotlight-sudha-parimala.html</guid><dc:creator><![CDATA[ Christine Rose, Sudha Parimala ]]></dc:creator><pubDate>Tue, 10 Jan 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Engineer Spotlight: Zach Shipko]]></title><description><![CDATA[<p>Tarides engineer Zach Shipko answers a few questions about why he decided to learn OCaml and why he's particularly excited about the OCaml 5 release. In celebration of OCaml 5, we've interviewed several engineers about their personal experience with the language and what features they enjoy. It's a great way to get some unique insight into the language from someone who works with it on a daily basis.</p>
<hr>
<p><strong>Christine: Why did you decide to become an OCaml programmer rather than Python or C++?</strong></p>
<p>Zach: I don't really see myself as an "OCaml programmer" because I use Python, C, Rust, Javascript, and other languages quite frequently. It's my interest in many different programming languages that led me to OCaml!</p>
<p><strong>C: What do you like best in OCaml?</strong></p>
<p>Z: One of my favorite things about OCaml is the amount of thought put into new language features. Because of this I think the whole community values the importance of API design and correctness.</p>
<p><strong>C: What’s the coolest thing you've made with OCaml?</strong></p>
<p>Z: I have been working on <a href="https://github.com/mirage/irmin/blob/main/libirmin.opam"><code>libirmin</code></a>, which provides C bindings to the Irmin API, making it possible to use Irmin directly from C and other languages. This uses <a href="https://github.com/yallop/ocaml-ctypes/tree/master/src/cstubs"><code>Cstubs_inverted</code></a> to wrap OCaml code in C functions. I don't know how "cool" that is, but everytime it works I am pleasantly surprised.</p>
<p><strong>C: Why should engineers learn OCaml?</strong></p>
<p>Z: Learning a new programming language can help you see problems from a new perspective and it gives you another tool to reach for when needed. OCaml has lots of nice features (pattern matching, functors, ...) that make solving certain problems more fun.</p>
<p><strong>C: What are you most excited about in OCaml 5?</strong></p>
<p>Z: Other than being able to use multiple cores, I am very excited about Effects (and eventually typed effects). It is an entirely new paradigm for writing applications with a lot of research behind it. To have a usable effects system in a general-purpose language like OCaml is a huge accomplishment!</p>
<hr>
<p>Zach emphasises the importance of expanding your horizons as a programmer, learning new languages to give you fresh perspectives and insights. Perhaps especially when learning a language like OCaml. Like the esteemed <a href="https://en.wikipedia.org/wiki/Alan_Perlis">Alan Perlis</a> said, "A language that doesn't affect the way you think about programming is not worth knowing."</p>
<p>Zach also studied photography and digital media in college. He took the pictures at the top of this post and said he "typically picks photos like this to have some nature to look at on programming websites."</p>
<p>Read <a href="/blog/2022-12-29-engineer-spotlight-jules-aguillon/">Jules Aguillon's</a> interview from 27 December 2022 to learn about his journey to OCaml. Next week, look for our final Engineer Spotlight interview with Sudha Parimala, as well as a post from her on the Benchmarking Game.</p>
<p>Feel like learning OCaml? Get started with <a href="https://ocaml.org/docs">the tutorials</a> and the <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em></a> book. Learn more about effects and other things OCaml 5 has to offer? Watch KC Sivaramamakrishnan <a href="https://youtu.be/zJ4G0TKwzVc">keynote</a> and check out the <a href="https://speakerdeck.com/kayceesrk/ocaml-5-dot-0">speaker deck</a> for his talk as well.</p>
]]></description><link>https://tarides.com/blog/2023-01-05-engineer-spotlight-zach-shipko</link><guid isPermaLink="false">https://tarides.com/blog/2023-01-05-engineer-spotlight-zach-shipko.html</guid><dc:creator><![CDATA[ Christine Rose, Zach Shipko ]]></dc:creator><pubDate>Thu, 05 Jan 2023 00:00:00 GMT</pubDate></item><item><title><![CDATA[Engineer Spotlight: Jules Aguillon]]></title><description><![CDATA[<p>In celebration of the OCaml 5 release, we decided to interview a few of our talented engineers about OCaml. While it isn't a well-known language outside of the functional programming community, we're striving to get the word out about the great benefits of OCaml and why it's worth your time to try it out, especially with the introduction of <a href="/blog/2022-12-19-ocaml-5-with-multicore-support-is-here/">Multicore support in OCaml 5</a>.</p>
<p>KC Sivaramakrishnan's inspiring <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">keynote address</a> is a great introduction to OCaml 5 and all it offers. Check out the <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">speaker deck</a> as well.</p>
<p>Today our engineer Jules Aguillon, who works primarily on our OCaml Platform tooling project, talks about his journey to OCaml, what he enjoys about the language, and why he thinks you should learn it! Take it away Jules!</p>
<hr>
<p><strong>Christine: How did you start programming?</strong></p>
<p>Jules: My programming journey began with learning C in school, but I soon realised I wanted more abstraction. It felt like the C code was always talking about low-level stuff instead of what I wanted to express. In C, translations are a lot of work and get harder as the algorithm becomes more complex. Things like ASTs and polymorphic datastructures are also really hard to write in C.</p>
<p>Some of the ways C works are not so innocent, it supports the kinds of dangerous memory operations that <a href="https://www.zdnet.com/article/microsoft-70-percent-of-all-security-bugs-are-memory-safety-issues/">famously cause 70% of all security bugs</a> in some big corporations.</p>
<p><strong>C: What did you do to get that abstraction that you were looking for?</strong></p>
<p>J: I decided to learn C++, which is C but with classes (grouping code and data together), abstracted memory operations (making them safer by default), and more type checking.</p>
<p>But I quickly found that C++ also wasn't a good fit, mainly due to its use of boilerplate. Boilerplate refers to pieces of code that must be repeated in various places without significant change, wrapping around every concept you try to express in C++. They are used to represent several complicated concepts, and any mistake in the boilerplate could bring things like memory unsafety back. I wanted to abstract this away, too.</p>
<p><strong>C: What did you do next?</strong></p>
<p>J: To finally write shorter and safe code, I tried Python. It was a joy to use compared to the previous unsafe and verbose C and C++. The garbage collector solved the memory-unsafety problems, and the built-in datastructures and idioms allowed me to write many complex algorithms using only a small amount of code.</p>
<p>But Python has a dark side: it entirely lacks static type checking. This means that it requires considerable effort to find a type-related mistake. The only way is to run the program with different inputs and wait until it crashes. This gets really annoying as the program grows.</p>
<p>Furthermore, this kind of mistake happens all the time (sometimes once in every line of code!) and could be entirely solved by a type checker.</p>
<blockquote>
<p>"For me, this is already the perfect language and it doesn't stop there!"</p>
</blockquote>
<p><strong>C: Is this where OCaml comes in?</strong></p>
<p>J: Yes! Then I learned OCaml! It's unconditionally memory-safe, has a garbage collector, the code is concise, many kinds of abstractions are possible, and most importantly, it has well-defined and powerful type checking.</p>
<p>For me, this is already the perfect language and it doesn't stop there! Modules, polymorphism, and higher-order functions all add deep abstraction possibilities, and there's even a more important feature. Variant types allow types to have different shapes and write tree-looking things like ASTs (abstract syntax trees) that are impossibly hard to express in all the languages above and many others that I have tried.</p>
<p>Theoreticians talk about algebra of types, and this is the "plus" operation. Now that I've used it in OCaml, I could never go back to a language that doesn't have the "plus" operation!</p>
<hr>
<p>A big thank you to Jules for taking the time to speak about his experience with OCaml. Getting a personal account of why he chose OCaml gives great insight into the strengths and features of the language from someone who uses it every day.</p>
<p>If you're interested in learning more about OCaml you can learn from <a href="https://ocaml.org/docs">tutorials</a>, the <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571">Real World OCaml book</a>, and contribute <a href="https://github.com/ocaml/ocaml">on Github</a>. We look forward to seeing you in the community!</p>
]]></description><link>https://tarides.com/blog/2022-12-29-engineer-spotlight-jules-aguillon</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-29-engineer-spotlight-jules-aguillon.html</guid><dc:creator><![CDATA[ Jules Aguillon, Christine Rose ]]></dc:creator><pubDate>Thu, 29 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Love Rust? Then OCaml's New Eio Library is for You]]></title><description><![CDATA[<p>We’ve come to expect a lot from the programming languages we use. We want the memory safety of Java, the performance of C/C++, and the concurrency of Go. On top of this, we need robust cybersecurity tools to protect us from the many risks and vulnerabilities in the world, all in an intuitive and easy-to-use package for programmers.</p>
<p>You can expect all of the above with OCaml 5. The new library <a href="https://github.com/ocaml-multicore/eio">Eio</a> introduces some great new features that let the programmer write concurrent code in a way that best suits them. Eio is fast, solves the <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">function colouring problem</a>, and can use effect handlers to let the developer customise the scheduling algorithm rather than baking it in at runtime. If you’re a fan of how Rust delivers fast and high performing concurrent code, OCaml 5’s Eio is a close match, with some additional features.</p>
<p>Eio gives OCaml a new edge on speed, ease-of-use, portability, and security. It matches Rust’s reputed performance on the same points, making the two languages more comparable when it comes to concurrent programming. Let me know what you think of my comparison on <a href="https://discuss.ocaml.org">Discuss</a> or <a href="https://bsky.app/profile/tarides.com">Bluesky</a>.</p>
<h2>How Eio Makes Concurrent Code Quick and Easy</h2>
<p>Rust is a <a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">popular programming language</a> that solves a lot of problems for programmers. One of Rust’s strengths is the way it delivers concurrent code in a quick and safe manner.</p>
<p>According to a <a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">recent blog post</a>, “Rust solves problems that C/C++ developers have been struggling with for a long time: memory errors and concurrent programming. This is seen as its main benefit.”</p>
<p>Eio makes writing concurrent code in OCaml much easier, resolving earlier pain points and providing significant benefits. OCaml is also a type- and memory-safe language with a low-latency and high-throughput concurrent garbage collector that doesn't get in the way of application code execution.</p>
<p>Below is an overview of the biggest changes Eio brings to concurrent programming in OCaml. With these improvements, OCaml provides a concurrent programming experience comparable to Rust.</p>
<h2>Key Benefits of Eio</h2>
<p><em>Performance</em>
Eio brings big performance improvements to concurrent code in OCaml, making use cases like web servers serving requests from users a lot faster. In a speed test comparing Eio’s performance to Go’s <code>net/http</code> and Rust’s <code>hyper</code>, the results show that Eio outperforms Go and closely matches Rust. Eio can reliably serve over one million requests per second on a few cores. Being such a close match in terms of performance, OCaml is a strong contender for users looking to expand beyond using Rust.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/http_load1-170w~RZckgL6RDd06Q9ouLGqLJQ.webp 170w, /blog/images/http_load1-340w~FCaiS-rUwT1xzrczGpYpKg.webp 340w, /blog/images/http_load1-680w~tTh6XJY6gRfFUzO8ESaIkg.webp 680w, /blog/images/http_load1-1360w~3JAovQ7etL5s-zNQ-g2iSQ.webp 1360w" src="/blog/images/http_load1-1360w~3JAovQ7etL5s-zNQ-g2iSQ.webp" alt="EioImage"></p>
<p><em>Ease-of-Use</em>
One of Rust’s strengths is its user friendliness, meaning it’s easy to use and code in. With the OCaml 5 update, Eio makes OCaml a much easier language for writing concurrent code.</p>
<p>Eio offers an alternative to monadic I/O, which used to be the only way to write concurrent code in OCaml. Now, with OCaml 5, the developer experience is greatly simplified and will feel familiar to anyone who knows OCaml.</p>
<p>Patrick Ferris, a developer at Tarides, emphasises that with Eio:</p>
<h3>“Normal OCaml features just work out of the box, like exception backtraces, which are crucial for writing new libraries, using tools, and debugging programs.”</h3>
<p>Being able to use backtraces is a big improvement, as they can show the developer a form of ‘history’ of what interventions have been made to a program. Using it in Eio lets developers debug and troubleshoot more quickly. Having access to standard OCaml features also means that the more difficult parts of concurrent programming, such as cancellations, cleaning up resources, reporting errors, and testing are a lot easier to perform with Eio.</p>
<p>The biggest change users will notice is the resolution of the ‘code colouring problem.’ In the past, synchronous code could not exist alongside asynchronous code without breaking, requiring the developer to use a special calling convention to invoke asynchronous code. With Eio, that is no longer a problem, as both just appear as normal OCaml functions. This significantly improves developer experience and productivity and is unique to Eio. Currently, in Rust’s I/O library the ‘code colouring problem’ still exists, and developers have to spend time resolving conflicts between code types.</p>
<p><em>Portability</em>
Both Rust and OCaml offer excellent portability, and with Eio OCaml gets several quality-of-life updates that lets developers create programs in different environments with several different features.</p>
<p>Operating systems have changed a lot in the last decade, benefiting from continuous development and modernisation. Thanks to its flexibility, Eio is able to take advantage of modern OS features (such as Linux’s <a href="https://github.com/axboe/liburing">io_uring</a> to boost its own performance.</p>
<p>In turn, different backends for various platforms (such as Linux, MacOS, Windows, Mirage, etc.) can also implement the standard environment Eio expects to run programs. This adds an element of predictability to developer workflow, minimising the amount of task-switching and time spent outside of programming.</p>
<p>On the topic of Eio’s flexibility, Thomas Leonard, the creator and lead maintainer of Eio, highlights that:</p>
<h3>“Eio can also run existing Lwt and Async code alongside new code, allowing existing projects to be upgraded piece by piece, keeping the tests passing throughout the migration. A couple of lines of code is all it takes to make an existing Lwt application run on Eio, and from there any new code can use Eio directly.”</h3>
<p>Instead of having to pick one I/O tool to learn, Eio’s library can run all three in the same program which is completely new.</p>
<p><em>Security</em>
Both Rust and OCaml offer strong safety features. According to the blog <a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">Codilime</a>, “High performance and safety are the features that made Rust so appealing.” Well, Eio adds even more security features to OCaml’s already long list.</p>
<p>Eio allows developers to implement measures with great specificity, which has great significance when it comes to security. For example, Eio lets a developer program a web server to serve files only from within specified directory trees, removing the possibility that it could be tricked into serving other files. This ensures that the I/O only shares what it’s intended to do without being bypassed, which is a common security problem with web servers.</p>
<h2>Conclusion</h2>
<p>Many businesses and programmers love Rust, and for good reason! What OCaml 5 and Eio can offer is an alternative that matches Rust on performance and user friendliness, includes new cutting-edge features, and delivers on safety in a way that is uniquely OCaml. With its new Eio library, concurrent programming in OCaml becomes more similar to Rust, and out of the two only OCaml solves the function colouring problem. If you’re looking to complement your use of Rust with a robust functional programming language – without sacrificing performance – OCaml 5 is the language for you.</p>
<p><a href="/contact/">Contact us</a> and learn more about how OCaml can transform your business. You can also find us on <a href="https://github.com/tarides">GitHub</a>, the OCaml <a href="https://discuss.ocaml.org">Discuss</a>  forum, and <a href="https://bsky.app/profile/tarides.com">Bluesky</a></p>
<h3>Acknowledgements</h3>
<p>With thanks to Thomas Leonard, creator and lead maintainer of Eio, and Patrick Ferris, a developer at Tarides, for their expertise and input that made this article possible.</p>
<h3>Sources</h3>
<ul>
<li>
<p><a href="https://codilime.com/blog/why-is-rust-programming-language-so-popular/">Codilime</a> blog</p>
</li>
<li>
<p><a href="https://serokell.io/blog/rust-guide">Serokell</a> blog</p>
</li>
</ul>
]]></description><link>https://tarides.com/blog/2022-12-27-love-rust-then-ocaml-s-new-eio-library-is-for-you</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-27-love-rust-then-ocaml-s-new-eio-library-is-for-you.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 27 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 Multicore Testing Tools]]></title><description><![CDATA[<p>The new version of OCaml 5 is here! It brings the ability to program multicore applications and to maximise our usage of all the CPU cores without a global lock getting in the way of performance. What's most exciting to me though is that we have a whole new way of writing... bugs!</p>
<p>And with so much potential for mistakes comes a new era of testing tools to help us write correct applications:</p>
<h2>Memory Model</h2>
<p>The first of those is the <a href="https://kcsrk.info/webman/manual/memorymodel.html"><em>memory model</em> of OCaml 5</a>. If you already know what those two words mean, please skip this part because I won't pretend that I do. (I'm still convinced that it's just some fancy legalese terms to confuse people.) But it may actually matter when you realise that you've been living in a fantasy your whole CPU life:</p>
<pre><code><span class="ocaml-source">left</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">42</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">right</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<p>When I read these two lines, I have years of beliefs telling me that the reference <code>left</code> will be updated before the <code>right</code> one is. But modern compilers and hardware conspire to break any sanity that may exist in my brain. You see, the order of operations doesn't actually matter <em>if</em> you can't see that the CPU is doing things in another order. It may just happen that the compiler or CPU will choose to do those two operations in reverse if they think it would be more convenient. And without this, our software would be so much slower that <em>"instructions are executed in order"</em> is an essential lie. (Well, is it even a lie if it makes no difference?)</p>
<p>To catch a liar, you need a second observer to correlate their claims. This is exactly what the other CPU cores will do. The bad news is that they are not actively looking for bad behavior from their colleagues, but they will end up reading values that aren't quite written in the order expected. This will wreak havoc into your invariants and trigger very, very weird bugs.</p>
<p>I'm not kidding when I say "very, very weird." Below is a real example of an out-of-order read/write that happened on my computer. This was a very simple program, with only two references, <code>left</code> and <code>right</code>, that got updated by two different domains (shown as two branches here):</p>
<pre><code>                          !left    = 42
                          !right   = 0
                                 |
              .------------------------------------.
              |                                    |
              |                               left  := 1
              |                               right := 2
         right := 3                                |
        !left        = 42                    !right       = 3
                      ^^^^
                      how?
</code></pre>
<p>I tried to align the sequence of operations according to the observed memory values, but no ordering actually made sense. We can't have both <code>!left = 42</code> and <code>!right = 3</code> in the end.</p>
<p>Here's another attempt to align the instructions in a coherent way:</p>
<pre><code>                          !left    = 42
                          !right   = 0
                                 |
              .------------------------------------.
              |                                    |
         right := 3                                |
        !left        = 42                          |
                                              left  := 1
                                              right := 2
                                             !right       = 3
                                                          ^^^^
                                                          how?
</code></pre>
<p>It already requires some time to unpack this short example, but imagine how bad it would get to debug such a thing in production!</p>
<p>I want to stress that this specific execution wasn't the result of a compiler optimisation that we could have discovered by reading the assembly code. The program was running just fine over many iterations before being disturbed by a sudden hardware optimisation. The probability of observing this behavior from your CPU is very low---not low enough that you can ignore it, but you won't be able to reproduce this exact bug in any reasonable time. (But we'll see how to catch our CPU cores red-handed in the following sections!)</p>
<p>But ok, wait---come again. How is any of this nonsense a good memory model? For starters, the values you can read "out-of-order" are still real values that have been assigned to the references, not imaginary ones. Yes, it could be even weirder, but you don't want to know. It's all fun and games with integers, but this property really matters for pointers (where following the wrong one would lead you down a segmentation fault). You need to be wary of this in other languages, but not in OCaml. Memory safety is preserved. It's not an instant <em>Game Over</em> to do an accidental out-of-order read.</p>
<p>Secondly, when reading and writing to shared memory, you should use the new <code>Atomic</code> module to ensure the proper memory ordering of operations. This will introduce the required memory barriers to bring back sanity---at a small performance cost---so it's opt-in and only required for shared memory! (Note: you can also use a <code>Mutex</code> lock to protect your read/write into shared memory.)</p>
<p>In technical words, OCaml 5 programs enjoy the <em>"Sequential Consistency for Data Race Freedom (DRF-SC)"</em> property. If your program has no data races, then you can reason about your code under sequential consistency where the operations from different threads are interleaved with each other, but the instructions don't seem to be executed out of order.</p>
<p><a href="https://kcsrk.info/webman/manual/memorymodel.html">Read more about the memory model in the OCaml 5 manual</a></p>
<p>By using <code>Atomic</code>, we are back in the wonderful land where operations happen in the expected order! The memory model becomes a tool for your brain, enabling it to reason about your algorithms. This one is so intuitive that I can once again pretend that it doesn't exist (without getting hit by an unexpected bug later.)</p>
<h2>ThreadSanitizer</h2>
<p>Alright, so how do we check that our programs aren't susceptible to an "out-of-order" bug caused by a missing <code>Atomic</code> or <code>Mutex</code>? ThreadSanitizer was created by Google as a lightweight instrumentation to discover these runtime data races.</p>
<p>To enable it on your OCaml program, you’ll need a special compiler that adds the required instrumentation to your software. Don't worry, it’s super easy to setup thanks to opam switches!</p>
<p><a href="https://github.com/ocaml-multicore/ocaml-tsan">Install and usage informations on the <code>ocaml-tsan</code> repository</a></p>
<p>As a running example to demonstrate the usefulness of each tool, let's look at different implementations of simple banking accounts, where users can transfer money to each other <em>if</em> they have enough money in their account:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">                     </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> no negative transfer! </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source">    </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> or transfer to self! </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">from_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source">     </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> and you must have enough money! </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">begin</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">from_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">from_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">to_account</span><span class="ocaml-source">)</span><span class="ocaml-source">   </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">to_account</span><span class="ocaml-source">)</span><span class="ocaml-source">   </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>This module could be part of a much larger program that receives transaction requests from the network and handles them. For simplicity here, we'll only be running a small simulation, but ThreadSanitizer is intended to be used on large real programs with messy I/O and side effects, not just broken toys and unit tests.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">100</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> 8 accounts with $100 each </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">money_shuffle</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> simulate an economy </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sleepf</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> wait for a network request </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Random</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Random</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> transfer $1 </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">account_balances</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> inspect the bank accounts </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Format</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-constant-character-printf">%i</span><span class="ocaml-string-quoted-double"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Format</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">@.</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sleepf</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> run the simulation and the debug view in parallel </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">money_shuffle</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">account_balances</span><span class="ocaml-source"> </span><span class="ocaml-source">|]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">join</span><span class="ocaml-source">
</span></code></pre>
<p>It should be pretty clear that our code is not thread-safe and that transferring money while printing the account balances is asking for trouble! Running it with ThreadSanitizer enabled will print warnings into the terminal as soon as a potential data-race is observed (and it's even better in real life as the output is colorful!):</p>
<pre><code>WARNING: ThreadSanitizer: data race (pid=1178477)
  Write of size 8 at 0x7fc4936fd6b0 by thread T4 (mutexes: write M87):
    #0 camlDune__exe__V0.transfer_317 &lt;null&gt; (v0.exe+0x6ae1a)
    #1 camlDune__exe__V0.money_shuffle_325 &lt;null&gt; (v0.exe+0x6af8d)
    .. ...

  Previous read of size 8 at 0x7fc4936fd6b0 by thread T1 (mutexes: write M83):
    #0 camlStdlib__Array.iter_329 &lt;null&gt; (v0.exe+0x9c675)
    #1 camlDune__exe__V0.account_balances_563 &lt;null&gt; (v0.exe+0x6b054)
    .. ...
</code></pre>
<p>The issue is reported very clearly thanks to the two conflicting stacktraces. There's a read/write data-race happening between the <code>money_shuffle</code> execution and the <code>account_balances</code> one, which could result in unreasonable memory reordering artifacts. In fact, it would be even worse if we were to <code>transfer</code> money from multiple domains in parallel (which we'll attempt to do in the next section as an interesting way of speeding up our bank transactions with Multicore).</p>
<p>It looks like we can fix the read/write data-race by adding a <code>Mutex</code> lock around the <code>transfer</code> function <em>write</em> operations:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">lock</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mutex</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Mutex</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">lock</span><span class="ocaml-source"> </span><span class="ocaml-source">lock</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ... same as before ... </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Mutex</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">unlock</span><span class="ocaml-source"> </span><span class="ocaml-source">lock</span><span class="ocaml-source">
</span></code></pre>
<p>But ThreadSanitizer is not easily fooled and will still complain loudly. We also need to use the same <code>Mutex</code> to protect the array reads in the <code>account_balances</code> function, as it would otherwise be perfectly valid to optimise away the shared memory reads into oblivion:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">account_balances_optimized</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> faster... but wrong-er! </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">str</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">concat</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">of_list</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-source">string_of_int</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Format</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double"> @.</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">str</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sleepf</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-float">0.1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source">
</span></code></pre>
<p>The data races reported by ThreadSanitizer are not only the ones where an absurd “out-of-order” happened, but preemptively, all those that could potentially trigger such a problem. If you are porting an existing multi-threaded application to OCaml 5, this compiler variant should probably be your default debug build.</p>
<p>Note that the ThreadSanitizer instrumentation does add a performance cost and doesn't increase memory safety by itself. Run it for a bit, track down your shared memory misuses, and add the required <code>Atomic</code> and <code>Mutex</code> operations.</p>
<h2>Multicore Tests: <code>Lin</code> (and <code>STM</code>)</h2>
<p>How do we unit test our Multicore libraries? It's business as usual, and the standard Alcotest will do well, for example. But there are some new properties that we should look for when writing and using libraries in a multicore setting. Let's revisit the bank accounts implementation by using <code>Atomic</code> operations this time rather than a <code>Mutex</code>:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">client</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">client</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">begin</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> [fetch_and_add x v] is an atomic operation that does [x := !x + v] </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fetch_and_add</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">from_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fetch_and_add</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">to_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>See how careful I was to use <code>Atomic</code> to read and write from shared memory? (I even used a fancy <code>fetch_and_add</code>!) Therefore, it must be correct if used by different domains, right? While this program doesn't have a data race, the definition of "correctness" is more subtle in a multicore setting.</p>
<p>It's easier to explain if I show you the problem. To test this interface with the <code>Lin</code> library, we only need to describe how to <code>init</code>ialise a new bank and the API signature of available functions:</p>
<pre><code><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lin</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank_test</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> 8 accounts with $100 each </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">init</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">100</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cleanup</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">int_bound</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">7</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> array index between 0..7 </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">api</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">get</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">transfer</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">transfer</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">nat_small</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Run</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lin_domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Bank_test</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">QCheck_base_runner</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run_tests_main</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Run</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">lin_test</span><span class="ocaml-source"> ~</span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">1000</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Bank</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>That's all! A small DSL to define the types of our functions and we are done. Run this test to admire the beautiful ASCII art... craAaAash:</p>
<pre><code>  Results incompatible with sequential execution

                                   |
                                   |
                .------------------------------------.
                |                                    |
     transfer t 4 0 8  : ()               transfer t 4 0 93  : ()
          get t 4  : -1
</code></pre>
<p><code>Lin</code> found a bug! The account number <code>4</code> is trying to simultaneously transfer $8 and $93 to account number <code>0</code>. It then naturally ends up with a negative $1 on its account, which is obviously bad for a banking system... but we never told <code>Lin</code> that this was illegal, so why is it complaining?</p>
<p>In the tradition of QuickCheck, <code>Lin</code> not only generates random arguments to test our API, but it also creates full programs to execute on two domains. It then runs these generated programs and checks if the intermediate results are "sequentially consistent," the property of a well-behaved API where we can always explain its multicore behavior as a linear execution of the calls on a single core.</p>
<p>Without this "sequential consistency" property, the internals of our functions leak when multiple cores interleave their execution. In the example above, it wouldn't be possible to reach a negative $1 account on a single core, so <code>Lin</code> reports that sequential consistency is broken. It doesn't know that negative accounts are illegal, but it knows that this state is unreachable without multicore shenanigans.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/nonseq-170w~rdKyDsX5Euj7-WEoAuQn0A.webp 170w, /blog/images/nonseq-340w~TFwXpNggalROG6gGei_viQ.webp 340w, /blog/images/nonseq-680w~89mhSqO1UsF1nc7eWxuyGA.webp 680w, /blog/images/nonseq-1360w~Eg50_aeJlwvYqQwerurVSQ.webp 1360w" src="/blog/images/nonseq-1360w~Eg50_aeJlwvYqQwerurVSQ.webp" alt="Diagram showing the interleaved execution of two concurrent transactions"></p>
<p>Even though this is not a data race, a user of our library would have an equally hard time understanding the outcomes of our functions, when they depend so much on their accidental interleaving. <em>"It sometimes doesn't work"</em> is not a bug report I wish to see!</p>
<p>An intuitive way of thinking about "sequential consistency" is that our functions should behave as if they were a single atomic operation: Either we see none of their side effects, or we see all of them. It shouldn't be possible to see an in between, as this would result in a non-sequentialisable execution.</p>
<p>Once again, the easiest solution here is to use a <code>Mutex</code> to lock all the accounts during a transfer and when reading an account balance. Run the test suite again with <code>Lin</code>, and yep, we are safe. The operations are now sequentially consistent! (But we don't need the Atomics anymore with the <code>Mutex</code>.)</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/seqmutex-170w~Syp5q63tj1u_0I2n7GlM_w.webp 170w, /blog/images/seqmutex-340w~_STpApqeRzrx6V4jngu_IQ.webp 340w, /blog/images/seqmutex-680w~rzV_YUKYSoQrkOzch9gmwA.webp 680w, /blog/images/seqmutex-1360w~BOFIraZrZpooe0MX30Gdqg.webp 1360w" src="/blog/images/seqmutex-1360w~BOFIraZrZpooe0MX30Gdqg.webp" alt="Diagram showing two possible interleavings, both sequentially consistent"></p>
<p>I really like how low effort / high reward <code>Lin</code> is. In just a few lines of declarative code, we can check that our code is correct when running on multiple cores. It's very extensive in its testing, which is just what we need when bugs are this hard to reproduce. The Multicore testing suite also provides a state-machine interface <code>STM</code>, which allows you to specify more properties that your system should respect (not only sequential consistency, but custom business logic!)</p>
<p><a href="https://github.com/ocaml-multicore/multicoretests">More examples on the <code>multicoretests</code> repository</a></p>
<p>Fun fact: The earlier "out-of-order" memory read/write on mutable references was also generated by <code>Lin</code>. While this tool is not specialised like ThreadSanitizer for discovering data-races, it can still trigger and identify the hardware memory reordering since they produce outcome that can't be explained on a single core. Here's the complete test if you want to see your computer memory <del>misbehaving</del> optimising:</p>
<pre><code><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lin</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Int_array</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">array</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">init</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[|</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-source">|]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cleanup</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">int_bound</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">api</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">get</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">val_</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">set</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">index</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">returning</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Run</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lin_domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Int_array</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">QCheck_base_runner</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run_tests_main</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Run</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">lin_test</span><span class="ocaml-source"> ~</span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">10_000</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Array</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<h2>Dscheck</h2>
<p>While adding a <code>Mutex</code> restores the sequential consistency of the <code>transfer</code> function, it's unsatisfying to slow down all transactions with a global lock. Most transfers are going to happen on different accounts, so could we be more precise in our safety measures? This is far from easy with locks, if we don't want to end up bankrupt with <a href="https://en.wikipedia.org/wiki/Dining_philosophers_problem">hungry philosophers</a>!</p>
<p>One alternative solution is called lock-free programming, and as the name implies, it gets rid of all the locks---but at the cost of more complex algorithms. By using only <code>Atomic</code> operations, there are ways of encoding our <code>transfer</code> operation without blocking the other cores (such that they don't get stuck waiting on the unrelated threads to finish their transaction).</p>
<p>Lock-free algorithms have a bad reputation of being crazy hard to implement correctly. It's very easy to convince yourself that you found the right solution, only to discover that your algorithm only works when the OS scheduler is on your side (which it generally is...until it isn't). This is another type of hard-to-reproduce bug. We can't coerce the OS scheduler to be evil when testing our software.</p>
<p>Our last testing tool is the library <code>dscheck</code>. It provides a way to <em>exhaustively</em> test all the possible schedulings of a Multicore execution in order to discover the worst-case scenario that would lead to a crash. It does so by simulating parallelism on a single core using concurrency, thanks to algebraic effects and a custom scheduler. Dscheck is very fast because it doesn't test <em>all</em> possible interleaving but only the ones that matters.</p>
<p>In order to use it, you only need to replace the <code>Atomic</code> module by a custom one, and then write your unit test. Here I simply copy-pasted the bug generated by <code>Lin</code> earlier:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dscheck</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">TracedAtomic</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Test</span><span class="ocaml-source">   </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Dscheck</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">TracedAtomic</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> same as before, but now using the traced Atomic </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">test</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">init</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">100</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Test</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">spawn</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> fake a Domain.spawn </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">93</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Bank</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Test</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">trace</span><span class="ocaml-source"> </span><span class="ocaml-source">test</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> exhaustively test all interleaving </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>Dscheck will then run our <code>test</code> function multiple times, discovering all the interesting paths that the scheduler could lead us down, and finally outputs a visualisation describing the worst-case scheduling that lead to crashes:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/dscheck-170w~DLyJYLTbQkRwdALnvx4cPw.webp 170w, /blog/images/dscheck-340w~UWs9b8Ff2QkpAKEXAGS6OQ.webp 340w, /blog/images/dscheck-680w~9ZOWn00fBCuk9rp74wizsA.webp 680w, /blog/images/dscheck-1360w~XWRNdSK7gNJ3vvFU-UAsCA.webp 1360w" src="/blog/images/dscheck-1360w~XWRNdSK7gNJ3vvFU-UAsCA.webp" alt="Diagram showing a complext DSCheck trace"></p>
<p>This is a low-level view of the bug. By inspecting the sequence of <code>Atomic</code> operations along the bad paths, we can discover the origin of the problem: the code is not careful when removing money from an account.</p>
<p>Perhaps this would work better:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">money_from</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">money_from</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">begin</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare_and_set</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">from_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">money_from</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">money_from</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Atomic</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fetch_and_add</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">to_account</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">transfer</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">from_account</span><span class="ocaml-source"> </span><span class="ocaml-source">to_account</span><span class="ocaml-source"> </span><span class="ocaml-source">money</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> retry </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>And yes, Dscheck now happily validates all possible interleaving of this unit test!</p>
<p><a href="https://github.com/ocaml-multicore/dscheck">More examples on the <code>dscheck</code> repository</a></p>
<p>So, does it mean our banking system works now? Nope! <code>Lin</code> reports new counter examples that break sequential consistency. I told you that lock-free was hard! Still, we can keep iterating, and we will eventually get it right because our tools remove the doubt and the impossibility of reproducibility that would otherwise make the task insurmountable. It's like having tiny assistants to double-check our assumptions. I love it. I've never been so excited to test my software!</p>
<pre><code>                           get t 3  : 100
                           get t 4  : 100
                                   |
                .------------------------------------.
                |                                    |
                |                             transfer t 4 3 10  : ()
           get t 4  : 90
           get t 3  : 100
                     ^^^^^
                how? account 4 sent the money, but account 3 didn't receive it!
</code></pre>
<p>Would you have caught this bug? Can you fix it? ;)</p>
<p>This was a tiny example, and it already brought some surprises. The multicore testing <code>Lin</code>, <code>STM</code>, and <code>dscheck</code> have been applied to real datastructures with great success. In fact, I wouldn't trust lock-free algorithms that were not validated by them.</p>
<h2>Conclusion</h2>
<p>It's 2022 and OCaml is finally Multicore. Even if this article is only scratching the surface of a specific itch, I hope it has convinced you that the Multicore metamorphose wasn't only about lifting a global lock somewhere in the runtime. A lot of care and attention also went into creating a great environment to tackle really hard problems. Here we've only looked at:</p>
<ul>
<li>The memory model to be able to reason about our programs</li>
<li>ThreadSanitizer to detect dangerous use of shared memory</li>
<li><code>Lin</code> and <code>STM</code> to discover logical bugs in a multicore setting</li>
<li>Dscheck to validate unit tests of lock-free algorithms by exhaustively checking all possible interleavings of their Atomic operations</li>
</ul>
<p>There's still a lot more to discover in the latest release of OCaml. In the meantime, not only can we do Multicore, we can do it with confidence that our code works!</p>
]]></description><link>https://tarides.com/blog/2022-12-22-ocaml-5-multicore-testing-tools</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-22-ocaml-5-multicore-testing-tools.html</guid><dc:creator><![CDATA[ Arthur Wendling ]]></dc:creator><pubDate>Thu, 22 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Advanced Merlin Features: Destruct and Construct]]></title><description><![CDATA[<p>Merlin is one of the most important tools for OCaml users, but a lot of its
advanced feature often remain unknown. For OCaml newcomers who might not know, Merlin is the server software that provides intelligence to code editors when working on OCaml documents. It allows one to easily navigate the code, get meaningful information (like type information), and perform code generation and refactoring tasks. Merlin installation and usage is documented on its <a href="https://ocaml.github.io/merlin/">official webpage</a>.</p>
<p>Merlin is distributed with both an Emacs and a Vim plugin. It can also be used in Vscode via the OCaml LSP Server and the corresponding plugin.</p>
<p>In this post, we will focus on two complementary features of Merlin: the venerable <code>destruct</code> and the younger <code>construct</code>. Both of these leverage OCaml's precise type information to destruct or create expressions.</p>
<h2>Destruct</h2>
<p>Destruct (sometimes called case-analysis) uses the type of an identifier to
perform multiple tasks related to pattern-matching. It can be called with the
following key bindings:</p>
<ul>
<li>Emacs: <kbd>C-d</kbd> or <kbd>M-x merlin-destruct</kbd></li>
<li>Vim: <kbd>:MerlinDestruct</kbd></li>
<li>VSCode: <kbd>Alt-d</kbd> or <kbd>💡 Destruct</kbd></li>
</ul>
<p>Destruct's behavior changes slightly depending on the context around the cursor. We are going to describe how it behaves in the next three sections.</p>
<h3>Automatic Case Analysis</h3>
<p>The primary use case for Destruct is to generate a pattern-matching for a
given value. Let's consider the following snippet:</p>
<pre><code>let f (x : int option) = x
</code></pre>
<p>Calling <code>destruct</code> on the right-most occurrence of <code>x</code> will automatically generate the following pattern-matching with the two constructors of <code>x</code>'s' option type:</p>
<pre><code>let f (x : int option) = match x with
  | None -&gt; _
  | Some _ -&gt; _
</code></pre>
<p>What happened is that Merlin looked at the type of <code>x</code> and generated a complete pattern-matching by enumerating its constructors.</p>
<p>Notice that Merlin used underscores on the right-handsides of the matching. We call these underscores <em>typed holes</em>. These holes are rejected by the compiler, but Merlin will provide type information for them. These holes should not be confused with the wildcard pattern appearing on the left handside <code>Some _</code>.</p>
<p>After calling <code>destruct</code>, the cursor should have jumped to the first hole. In
Emacs (resp. Vim), you can navigate between holes by using the commands <kbd>M-x merlin-next-hole</kbd> (resp. <kbd>:MerlinNextHole</kbd>) and <kbd>M-x merlin-previous-hole</kbd> (resp. <kbd>:MerlinPreviousHole</kbd>). In VSCode, you can use <kbd>Alt-y</kbd> to jump to the next typed hole.</p>
<h3>Complete a Matching</h3>
<p>Merlin can also add missing branches to an incomplete matching. Given
the following snippet:</p>
<pre><code>let f (x : int option) = match x with
  | None -&gt; _
</code></pre>
<p>Calling <code>destruct</code> with the cursor on <code>None</code> will make the pattern-matching
exhaustive:</p>
<pre><code>let f (x : int option) = match x with
  | None -&gt; _
  | Some _ -&gt; _
</code></pre>
<h3>Refine the Cases</h3>
<p>Finally, Merlin can be used to make a pattern-matching more precise when called on a <em>wildcard</em> pattern <code>_</code>. Given the following snippet:</p>
<pre><code>let f (x : int option opton) = match x with
  | None -&gt; _
  | Some _ -&gt; _
</code></pre>
<p>Calling <code>destruct</code> with the cursor on the <code>_</code> pattern in <code>Some _</code> will refine the matching:</p>
<pre><code>let f (x : int option option) = match x with
  | None -&gt; _
  | Some (None) | Some (Some _) -&gt; _
</code></pre>
<p>Note that Destruct also works with other types, like records. Let's consider the following snippet:</p>
<pre><code>type t = { a : string option }
let f (x : t) = x
</code></pre>
<p>Calling <code>destruct</code> on the last occurrence of <code>x</code> will yield:</p>
<pre><code>let f (x : t) = match x with
  | { a } -&gt; _
</code></pre>
<p>And we can refine it by calling <code>destruct</code> again on <code>a</code>, etc.</p>
<pre><code>let f (x : t) = match x with
  | { a = None } | { a = Some _ } -&gt; _
</code></pre>
<p>That wraps our presentation for <code>destruct</code>. Generating and completing pattern-
matching cases can be very useful when working with large sum types !</p>
<h2>Construct</h2>
<p>Construct can be considered as the dual of Destruct, as they work
complementarily. When called over a typed-hole <code>_</code>, Construct will suggest
values that can fill that hole. It can be called with the following key
bindings:</p>
<ul>
<li>Emacs: <kbd>M-x merlin-construct</kbd></li>
<li>Vim: <kbd>:MerlinConstruct</kbd></li>
<li>VSCode: <kbd>Alt-c</kbd> of <kbd>💡 Construct an expression</kbd> (the cursor must be right after the <code>_</code>)</li>
</ul>
<p>For example, given the following snippet:</p>
<pre><code>let x : int option = _
</code></pre>
<p>Calling <code>construct</code> with the cursor on the <code>_</code> typed hole will suggest the following constructions:</p>
<pre><code>Some _
None
</code></pre>
<p>Choosing the first one will replace the hole and place the cursor on the next hole:</p>
<pre><code>let x : int option = (Some _)
</code></pre>
<p>Calling <code>construct</code> again will suggest <code>0</code> and result in:</p>
<pre><code>let x : int option = (Some 0)
</code></pre>
<p>In the future, Construct might also suggest fitting values from the local
environment instead of solely rely on a type's constructors.</p>
<h2>Destruct and Construct</h2>
<p>As stated previously calls to <code>destruct</code> and <code>construct</code> can be used in
collaboration. For example, after calling <code>destruct</code> on <code>x</code> in the following code snippet:</p>
<pre><code>type t = { a : unit; b : string option }
let f (x : int option) : t option = x
</code></pre>
<p><code>x</code> is replaced by a matching on <code>x</code> with the cursor on the first hole:</p>
<pre><code>let f (x : int option) : t option =
    match w with
    | None -&gt; _
    | Some _ -&gt; _
</code></pre>
<p>One can immediately call <code>construct</code> and choose a construction for the first branch:</p>
<pre><code>let f (x : int option) : t option =
    match w with
    | None -&gt; None
    | Some _ -&gt; _
</code></pre>
<p>And again for the second branch:</p>
<pre><code>let f (x : int option) : t option =
    match w with
    | None -&gt; None
    | Some _ -&gt; Some _
</code></pre>
<p>Finally, like Destruct, Construct also works with records and most OCaml types:</p>
<pre><code>Some _ → Some { a = _; b = _ } → Some { a = (); b = None }
</code></pre>
<h2>Conclusion</h2>
<p>When put to good use, these complementary features can remove some of the burden of working with big variant types. We encourage you to try them and see if they help your everyday workflow! If you encounter any issues or have ideas for improvement, please communicate them to us <a href="https://github.com/ocaml/merlin/issues">via the issue tracker</a>.</p>
]]></description><link>https://tarides.com/blog/2022-12-21-advanced-merlin-features-destruct-and-construct</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-21-advanced-merlin-features-destruct-and-construct.html</guid><dc:creator><![CDATA[ Ulysse Gérard ]]></dc:creator><pubDate>Wed, 21 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[How Nomadic Labs Used Multicore Processing to Create a Faster Blockchain]]></title><description><![CDATA[<p>The technology that makes blockchain possible is complex, cutting-edge, and fascinating. Balancing efficiency and security on a knife's edge, it finds perfect harmony between high transaction speeds and safe, predictable results. Blockchain technology is constantly evolving, pushing the boundaries of what’s possible, and driving innovation and research. Discover how Tarides helped Nomadic Labs use <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">OCaml 5</a> to boost blockchain performance with rollup technology.</p>
<p>In order to remain competitive with other major players in the blockchain market (including Bitcoin and Ethereum), the open-source blockchain Tezos invests in cutting-edge research and engineering. Another way Tezos has chosen to differentiate itself in a crowded market is by focusing on sustainability and scalable efficiency. Tezos is proof-of-stake (rather than proof-of-work) and is therefore more <a href="https://tezos.com/carbon/">energy efficient</a> than other blockchain technologies, with the annual energy consumption estimated at 0.001TWh, or “17 global citizens.” Furthermore, the <a href="https://4c.cst.cam.ac.uk">Cambridge Centre for Carbon Credits (4C)</a> is using the Tezos blockchain to create a trusted decentralised marketplace for carbon credits. The sheer amount of data needed to support verifiable carbon credits requires planetary-scale computations, coupled with the digital permanence needed to track projects over decades and even generations.</p>
<p>A growing number of users and companies are bringing their projects to the Tezos blockchain. To respond to increasing demand, the network is preparing for more activity and high-throughput applications. To that end, Nomadic Labs is working on implementing leading-edge rollup technology for the Tezos blockchain. Rollups are a way of increasing throughput and blockchain speeds without compromising on decentralisation, latency,  stability, security, or its resistance to censorship.</p>
<p>To achieve their goal of offering a major scaling solution for Tezos, Nomadic Labs called in Tarides. By leveraging the multicore capabilities of OCaml 5, the team was able to achieve significant performance boosts, making it a viable solution for increasing both blockchain speeds and transactions per second (TPS).</p>
<p>Marco Stronati co-leads the <a href="https://research-development.nomadic-labs.com/files/cryptography.html">cryptography team</a> at Nomadic Labs with cryptographer Marc Beunardeau. Looking back on the decision to use an early version of OCaml 5, Marco said, “We knew OCaml 5 was not ready yet, but that backward compatibility was an explicit goal of the project, so we thought we would put that to the test.”</p>
<h2>Who Are Nomadic Labs?</h2>
<p><a href="https://www.nomadic-labs.com">Nomadic Labs</a> is one of the largest research and development centres within the open-source <a href="https://tezos.com">Tezos ecosystem.</a> They work on the Tezos core technologies that run its distributive network as one of the largest research and development centres within the Tezos ecosystem.</p>
<p>Nomadic Labs handles software releases and amendments to the Tezos blockchain, focusing on the innovation, development, and implementation of new features. They help companies and institutions use Tezos to their advantage. At its core, the company values dependability, balancing cutting-edge innovation with reliable and consistent results.</p>
<h2>What Is a Rollup and How Does it Boost the Speed of the Blockchain?</h2>
<p>Rollups settle transactions outside the network and then post data back into it. This is what’s called a <a href="https://research-development.nomadic-labs.com/tezos-is-scaling.html">Layer 2 solution</a>, which avoids the main chain. By handling the process externally, they reduce the strain on the network. Rollups help blockchains like Tezos keep transaction speeds and throughput high without compromising the integrity of the blockchain. It is one of the many different scalability solutions a blockchain may employ to offer top-level performance. There are two main types of rollups: optimistic and zero-knowledge (zk).</p>
<p><a href="https://research-development.nomadic-labs.com/next-generation-rollups.html">Optimistic rollups</a> assume that the transaction data they’re processing is correct, and any fraud or other problem with the transaction is handled separately. Zk-rollups on the other hand use zero-knowledge proofs to validate a transaction. Zk-rollups provide better security and confidentiality than optimistic rollups, but they come with their own sets of limitations.</p>
<h2>Pushing the Limits of Zero-Knowledge Proving Systems</h2>
<p>The challenge Nomadic Labs faced was related to their proving system. In general, proving systems have to be very asymmetrical, with a prover doing most of the work off-chain and a lean verifier working on-chain. Whilst these rollup systems are great for blockchains, because of how quickly they can verify something, the proving stage is very CPU- and memory-intensive. This prevents the scalability of Epoxy, their zk-rollup solution, since its throughput and latency are directly limited by the speed of the prover.
Most of the innovation in the field of zero-knowledge proving systems today is driven by lowering the complexity of the provers, thanks to novel cryptography but also to highly optimised implementations.</p>
<h2>OCaml 5 Multicore Saves the Day</h2>
<p>The team’s solution to the prover’s inefficiencies was to use the <a href="https://gitlab.com/nomadic-labs/cryptography/privacy-team/">aPlonK</a> proving system to power Epoxy. It enables the efficient aggregation of multiple proofs that can be executed in parallel. This is where OCaml 5 comes in! With its Multicore capabilities and strong safety features, it is the perfect candidate for speeding up the proving process.</p>
<p>With OCaml 5, the team could parallelise the proving process by utilising multiple cores on one machine. According to Marco, “OCaml 5 drastically improved performance with minimal effort.”  Increased performance of the zk-rollup translates to high TPS and throughput for customers using digital currencies. Striving to be the fastest blockchain means constantly looking at opportunities to improve performance, from changing algorithms to workflows.</p>
<p>Furthermore, speaking as a seasoned developer, Marco emphasises the importance of OCaml 5 being easy to install and set up.</p>
<h3>“The most important thing was not having to revolutionise what I do. People don’t want to waste a week on upgrading, and this was a seamless experience.”</h3>
<p>Installing OCaml 5 proved to be easy for the team, and in the matter of a few hours, they were running Multicore on their machines. For people who don’t need Multicore, the upgrade is completely backwards compatible, and their sequential code will still work normally. It’s due to the precise balance OCaml 5 strikes between backwards compatibility and cutting-edge upgrades that Marco thinks there is literally “No reason not to upgrade.”</p>
<p>When using OCaml 5 before its official launch, the team faced some small compatibility issues they needed help with. They checked the state of compatibility on the helpful <a href="https://check.ocamllabs.io/">health check</a> website, where more and more packages were ‘going green’ daily. Even before they could file a bug report, they would find that their problem had been resolved. Since set up was so smooth, the team needed very little help. Still, Marco commented that: “The moment we had a problem, we would get help immediately.”</p>
<h2>What Does the Future Hold?</h2>
<p>The team is excited to continue using OCaml 5 in the future, as soon as Tezos begins using it in production after the full release. At the moment, the team has to parallelise using several machines to speed up the prover’s performance. With OCaml 5, they will be able to exploit multiple cores on several computers at the same time.</p>
<h3>“We can easily double the speed of what we’re doing with OCaml 5.”</h3>
<h2>Conclusion</h2>
<p>After successful experimentation with a prerelease of OCaml 5, the team at Nomadic Labs discovered that it gives their zk-rollup a significant performance boost. For the Tezos blockchain, this boost can result in higher TPS and throughput for customers who use their digital currency. Combined with Tezos’s other benefits, such as its energy efficiency (thanks to its proof-of-stake consensus mechanism), OCaml 5 definitively gives it a leg up with fast zk-rollups.</p>
<p>The technologies that make blockchains a reality are undeniably a driving force behind great innovation. Zk-rollups are just one example of technologies that aim to make complex processes like verifiers and provers lightning fast. Performance is vital in almost every field, and the use cases for this type of technology are endless.</p>
<p>Tarides offers its extensive expertise in OCaml to help businesses achieve their targets. Find out more about how OCaml 5 can help you transform <a href="/contact/">your business</a>!</p>
<h3>Sources</h3>
<p><a href="https://research-development.nomadic-labs.com/kathmandu-is-live.html">Kathmandu Blog on Nomadic Labs</a>
<a href="https://research-development.nomadic-labs.com/next-generation-rollups.html">Next Generation Rollups</a>
<a href="https://www.quicknode.com/guides/infrastructure/introduction-to-ethereum-rollups">Ethereum Rollups</a>
<a href="https://research-development.nomadic-labs.com/smart-rollups-are-coming.html">Smart Rollups</a></p>
]]></description><link>https://tarides.com/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-20-how-nomadic-labs-used-multicore-processing-to-create-a-faster-blockchain.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 20 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 With Multicore Support Is Here!]]></title><description><![CDATA[<p>It's here! It's finally here! On Friday, 16 December 2022, the OCaml community <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974/7">announced the official release of Multicore OCaml</a>! From the beginning, Tarides has been deeply involved in OCaml's evolution, so we're very proud to present OCaml 5!</p>
<p>Our work with the myriad of academics, industrial developers, and the entire OCaml community has been both inspiring and fulfilling. We look forward to continuing our collaboration for future iterations of OCaml. Watch <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC's keynote</a> to get a visual overview of all OCaml 5 has to offer!</p>
<h2>About OCaml 5</h2>
<p>OCaml is a pragmatic functional programming language. Its strength lies in the capacity to balance security, performance, and reliability. OCaml is used in both industry and academia to address problems in which a single mistake could be catastrophic, <a href="https://ocaml.org/success-stories/large-scale-trading-system">such as in finance</a> and <a href="https://ocaml.org/success-stories/sensor-analytics-and-automation-platform-for-sustainable-agriculture">sensor analytics</a> for sustainable agriculture. It's also used by millions of developers daily with <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">Docker for Desktop</a>.</p>
<p>OCaml 5 brings the long-awaited runtime support for shared memory <a href="https://v2.ocaml.org/releases/5.0/manual/parallelism.html">parallelism</a> and <a href="https://v2.ocaml.org/releases/5.0/manual/effects.html">effect handlers</a>. It is a major change (including the full rewrite of a new, concurrent garbage collector), but we worked hard to ensure there would be no breakage for existing OCaml users. This release combines the security and safety of OCaml with new features that bring huge performance benefits and an improved methodology for writing concurrent code.</p>
<p>OCaml 5 supports both the the x86-64 and ARM64 architectures, so Linux, the BSDs, macOS, and Mingw-w64 on Windows are all supported. Over the next year, the OCaml community and Tarides will restore support for most previously-supported architectures that fall outside of this range, but this doesn't mean you can't use OCaml now! OCaml 5 seeks to be completely backwards-compatible, and programs written for any version of OCaml 4 will continue to work in OCaml 5.</p>
<h2>Multicore</h2>
<p>With technological advances and the explosive growth of machines with more and more available cores, it's necessary for programming languages to support multicore technology. Until this new release, OCaml was single-threaded, meaning it could only utilise one core to run code. With OCaml 5, programs can now exploit multiple cores and execute processes in parallel, providing users with enhanced performance and efficiency.</p>
<p>Performance is key with OCaml 5 to ensure your programs run smoother, faster, and more efficient, a significant achievement in turning cutting-edge science into real-life applications and industrial-strength tools.</p>
<p>Multicore OCaml has been in the making for 8 years and required a full rewrite of it runtime environment, so you can believe that the OCaml community is absolutely thrilled to see this come to fruition from their years of hard work. Multicore support ensures beginners can achieve the same productivity as OCaml experts!</p>
<p>If you come across any unexpected behaviours that aren't covered by the few exceptions listed on the <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974">Discuss post</a>, please report them on the <a href="https://github.com/ocaml/ocaml/issues">OCaml issue tracker</a>.</p>
<h2>Eio &amp; Concurrency</h2>
<p>Concurrency in OCaml 5 is supported through the use of effect handlers, a new feature that enables the development of concurrent applications in a seamless fashion. The developer experience is improved, now that programmers can simply write concurrent code in the same style as non-concurrent code.</p>
<p><a href="https://github.com/ocaml-multicore/eio">Eio</a>, our experimental, high-performant I/O library, is an excellent example of use for effect handlers. Since we've already demonstrated that we can reach millions of requests per second while keeping simple, direct-style code, OCaml 5 is meant to be on par with Rust (and outperforms Go) for I/O heavy workloads.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/eio1-170w~tJ_A_JMOF0QFpG4NP5sXAw.webp 170w, /blog/images/eio1-340w~qzHjGABT9UAwxL4fmgKqWg.webp 340w, /blog/images/eio1-680w~A3bzhafOIsHXodSkaCPcdA.webp 680w, /blog/images/eio1-1360w~7PCVCQbe6qYYOyJIPZ7HGQ.webp 1360w" src="/blog/images/eio1-1360w~7PCVCQbe6qYYOyJIPZ7HGQ.webp" alt="Eio Performance"></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/eio2-170w~qHbhSvg52Qnnjr2_uB3_9g.webp 170w, /blog/images/eio2-340w~DUD9EX5yRxaPTvdqIARWJg.webp 340w, /blog/images/eio2-680w~LyUxa7r7swmbWYxhAcnueg.webp 680w, /blog/images/eio2-1360w~16DqoPonqlez8-OTSIxOWg.webp 1360w" src="/blog/images/eio2-1360w~16DqoPonqlez8-OTSIxOWg.webp" alt="Eio Performance"></p>
<p>With Eio, asynchronous and synchronous code can be composed together naturally, solving the <a href="https://www.tedinski.com/2018/11/13/function-coloring.html">function colouring problem</a> that affects many languages, including programs written in earlier versions of OCaml. This makes OCaml easier to use, even if you're new to the language.</p>
<h2>Runtime Events</h2>
<p>Another huge improvement with OCaml 5 is Runtime Events, a library included with OCaml 5's runtime, which efficiently processes performance data from the garbage collector (GC) and runtime. Runtime Events provides continuous monitoring of OCaml applications. Multicore programs are notoriously hard to debug and profile, so we ensured that OCaml 5 has a built-in mechanism to build great, relevant tooling. The basis of this is Runtime Events.</p>
<p>Read more about Runtime Events in the <a href="https://v2.ocaml.org/releases/5.0/htmlman/runtime-tracing.html">OCaml Manual</a>, and watch the <a href="https://watch.ocaml.org/videos/watch/299cab02-db94-44ac-b926-ea90ddda1b09">OCaml Workshop</a> for a visual introduction to Runtime Events. Also check out these WIP tools: <a href="https://github.com/sadiqj/runtime_events_tools">olly</a> and <a href="https://github.com/patricoferris/meio">meio</a>.</p>
<h2>Open Source and Open Arms</h2>
<p>OCaml 5 is freely available to everyone worldwide. For installation instructions, compiler configurations, and a detailed list of changes and new features, please visit <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-is-out/10974">this OCaml Discuss post</a>. While you're there, join in the conversation! We welcome you to the OCaml community with open arms! There has never been a better time to learn and use OCaml. Give it a try, and please report any problems to the <a href="https://github.com/ocaml/ocaml/issues">OCaml issue tracker</a>.</p>
<p>Read more about the journey to Multicore, Runtime Events, Eio, and more over the next six weeks in our OCaml 5 blog series. It will include several articles highlighting OCaml 5's new features, interviews with OCaml engineers, and reasons why OCaml should be the next language you learn. Follow us on <a href="https://bsky.app/profile/tarides.com">Bluesky</a> and <a href="https://www.linkedin.com/company/tarides">LinkedIn</a> so you don't miss a thing!</p>
<p>This OCaml 5 release is the best holiday gift for developers, both experienced and those new to programming. <a href="/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocaml/">Enjoy!</a></p>
]]></description><link>https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-here</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-19-ocaml-5-with-multicore-support-is-here.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Mon, 19 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Hillingar: MirageOS Unikernels on NixOS]]></title><description><![CDATA[<p>NixOS allows reproducible deployments of systems by managing configuration declaratively.
MirageOS is a unikernel creation framework that creates targeted operating systems for high-level applications that can run on a hypervisor.
By building MirageOS unikernels with Nix, we can enable reproducible builds of these unikernels and enable easy deployment on NixOS systems.</p>
<h2>Introduction</h2>
<p>The Domain Name System (DNS) is a critical component of the modern Internet, allowing domain names to be mapped to IP addresses, mailservers, and more.
This allows users to access services independent of their location in the Internet using human-readable names.
We can host a DNS server ourselves to have authoritative control over our domain, protect the privacy of those using our server, increase reliability by not relying on a third party DNS provider, and allow greater customisation of the records served.
However, it can be quite challenging to deploy one's own server reliably and reproducibly.
The Nix deployment system aims to address this.
With a NixOS machine, deploying a DNS server is as simple as:</p>
<pre><code class="language-nix">{
  services.bind = {
    enable = true;
    zones."freumh.org" = {
      master = true;
      file = "freumh.org.zone";
    };
  };
}
</code></pre>
<p>Which we can then query with:</p>
<pre><code><span class="bash-source">$ dig freumh.org @ns1.freumh.org +short
</span><span class="bash-source">135.181.100.27
</span></code></pre>
<p>To enable the user to query our domain without specifying the nameserver, we have to create a glue record with our registrar pointing <code>ns1.freumh.org</code> to the IP address of our DNS-hosting machine.</p>
<p>You might notice this configuration is running the venerable bind^[<a href="https://www.isc.org/bind/">ISC bind</a> has many <a href="https://www.cvedetails.com/product/144/ISC-Bind.html?vendor_id=64">CVE's</a>], which is written in C.
As an alternative, using functional, high-level, type-safe programming languages to create network applications can greatly benefit safety and usability whilst maintaining performant execution.
One such language is OCaml.</p>
<p>MirageOS^[ <a href="https://mirage.io">mirage.io</a> ] is a deployment method for these OCaml programs.
Instead of running them as a traditional Unix process, we instead create a specialised 'unikernel' operating system to run the application.
They offer reduced image sizes through dead code elimination, as well as improved security and efficiency.</p>
<p>However, to deploy a Mirage unikernel with NixOS, one must use the imperative deployment methodologies native to the OCaml ecosystem, thus eliminating the benefit of reproducible systems that Nix offers.
This blog post will explore how we enabled reproducible deployments of Mirage unikernels by building them with Nix.</p>
<h2>MirageOS</h2>
<div style="text-align: center;">
  <img src="/blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/mirage-logo~Rvr82kddD1802UEbBgymuw.svg" style="height: 300px; max-width: 100%" alt="MirageOS logo">
</div>
<p>^[Credits to Takayuki Imada]</p>
<p>MirageOS is a library operating system that allows users to create unikernels, which are specialised operating systems that include both low-level operating system code and high-level application code in a single kernel and a single address space.
It was the first such 'unikernel creation framework', but comes from a long lineage of OS research, such as the exokernel library OS architecture.
Embedding application code in the kernel allows for dead-code elimination, removing OS interfaces that are unused, which reduces the unikernel's attack surface and offers improved efficiency.</p>
<div style="text-align: center;">
  <img src="/blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/mirage-diagram~70WZ_SBHWW2LeQTqtt1W5A.svg" style="height: 300px; max-width: 100%" alt="Diagram representing the MirageOS Compiler ">
</div>
<p>Contrasting software layers in existing VM appliances vs. unikernel's standalone kernel compilation approach <a href="#madhavapeddyUnikernelsLibraryOperating2013">[3]</a></p>
<p>Mirage unikernels are written OCaml^[Barring the use of <a href="https://mirage.io/blog/modular-foreign-function-bindings">foreign function interfaces</a> (FFIs).].
OCaml is more practical for systems programming than other functional programming languages, such as Haskell.
It supports falling back on impure imperative code or mutable variables when warranted.</p>
<h2>Nix</h2>
<div style="text-align: center;">
  <img src="/blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nix-snowflake~rnI2Y2vr8L2rfy-ZYfT6QA.svg" style="height: 300px; max-width: 100%" alt="Nix Logo">
</div>
<p>Nix snowflake^[As 'nix' means snow in Latin. Credits to Tim Cuthbertson.].</p>
<p>Nix is a deployment system that uses cryptographic hashes to compute unique paths for components^[NB: we will use component, dependency, and package somewhat interchangeably in this blog post, as they all fundamentally mean the same thing -- a piece of software.] that are stored in a read-only directory: the Nix store, at <code>/nix/store/&lt;hash&gt;-&lt;name&gt;</code>.
This provides several benefits, including concurrent installation of multiple versions of a package, atomic upgrades, and multiple user environments.</p>
<p>Nix uses a declarative domain-specific language (DSL), also called Nix, to build and configure software.
The snippet used to deploy the DNS server is in fact a Nix expression.
This example doesn't demonstrate it, but Nix is Turing complete.
Nix does not, however, have a type system.</p>
<p>We used the DSL to write derivations for software that describe how to build said software with input components and a build script.
This Nix expression is then 'instantiated' to create 'store derivations' (<code>.drv</code> files), which is the low-level representation of how to build a single component.
This store derivation is 'realised' into a built artefact, hereafter referred to as 'building.'</p>
<p>Possibly the simplest Nix derivation uses <code>bash</code> to create a single file containing <code>Hello, World!</code>:</p>
<pre><code class="language-nix">{ pkgs ? import &lt;nixpkgs&gt; {  } }:

builtins.derivation {
  name = "hello";
  system = builtins.currentSystem;
  builder = "${nixpkgs.bash}/bin/bash";
  args = [ "-c" ''echo "Hello, World!" &gt; $out'' ];
}
</code></pre>
<p>Note that <code>derivation</code> is a function that we're calling with one argument, which is a set of attributes.</p>
<p>We can instantiate this Nix derivation to create a store derivation:</p>
<pre><code>$ nix-instantiate default.nix
/nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv
$ nix show-derivation /nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv
{
  "/nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv": {
    "outputs": {
      "out": {
        "path": "/nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello"
      }
    },
    "inputSrcs": [],
    "inputDrvs": {
      "/nix/store/mnyhjzyk43raa3f44pn77aif738prd2m-bash-5.1-p16.drv": [
        "out"
      ]
    },
    "system": "x86_64-linux",
    "builder": "/nix/store/2r9n7fz1rxq088j6mi5s7izxdria6d5f-bash-5.1-p16/bin/bash",
    "args": [ "-c", "echo \"Hello, World!\" &gt; $out" ],
    "env": {
      "builder": "/nix/store/2r9n7fz1rxq088j6mi5s7izxdria6d5f-bash-5.1-p16/bin/bash",
      "name": "hello",
      "out": "/nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello",
      "system": "x86_64-linux"
    }
  }
}
</code></pre>
<p>And build the store derivation:</p>
<pre><code><span class="sh-source">$ nix-store --realise /nix/store/5d4il3h1q4cw08l6fnk4j04a19dsv71k-hello.drv
</span><span class="sh-source">/nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello
</span><span class="sh-source">$ cat /nix/store/4v1dx6qaamakjy5jzii6lcmfiks57mhl-hello
</span><span class="sh-source">Hello, World</span><span class="sh-keyword-operator-pipe">!</span><span class="sh-source">
</span></code></pre>
<p>Most Nix tooling does these two steps together:</p>
<pre><code>nix-build default.nix
this derivation will be built:
  /nix/store/q5hg3vqby8a9c8pchhjal3la9n7g1m0z-hello.drv
building '/nix/store/q5hg3vqby8a9c8pchhjal3la9n7g1m0z-hello.drv'...
/nix/store/zyrki2hd49am36jwcyjh3xvxvn5j5wml-hello
</code></pre>
<p>Nix realisations (hereafter referred to as 'builds') are done in isolation to ensure reproducibility.
Projects often rely on interacting with package managers to make sure all dependencies are available and may implicitly rely on system configuration at build time.
To prevent this, every Nix derivation is built in isolation (without network access or access to the global file system) with only other Nix derivations as inputs.</p>
<blockquote>
<p>The name Nix is derived from the Dutch word <em>niks</em>, meaning nothing; build actions do not see anything that has not been explicitly declared as an input.</p>
</blockquote>
<h4>Nixpkgs</h4>
<p>You may have noticed a reference to <code>nixpkgs</code> in the above derivation.
As every input to a Nix derivation also has to be a Nix derivation, one can imagine the tedium involved in creating a Nix derivation for every dependency of your project.
However, Nixpkgs^[ <a href="https://github.com/nixos/nixpkgs">github.com/nixos/nixpkgs</a> ] is a large repository of software packaged in Nix, where a package is a Nix derivation.
We can use packages from Nixpkgs as inputs to a Nix derivation, as we've done with <code>bash</code>.</p>
<p>There is also a command line package manager installing packages from Nixpkgs, which is why people often refer to Nix as a package manager.
While Nix, and therefore Nix package management, is primarily source-based (since derivations describe how to build software from source), binary deployment is an optimisation of this.
Since packages are built in isolation and entirely determined by their inputs, binaries can be transparently deployed by downloading them from a remote server instead of building the derivation locally.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nixpkgs-170w~xdik0yVel3xEJeLVOLY_7g.webp 170w, /blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nixpkgs-340w~ZNtxzC4EFM9ulkJpBwHfRg.webp 340w, /blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nixpkgs-680w~38puNHIRJanuJvtZFug0vw.webp 680w, /blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nixpkgs-1360w~K5cPwtUl8Ya_Hw056Q388Q.webp 1360w" src="/blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nixpkgs-1360w~K5cPwtUl8Ya_Hw056Q388Q.webp" alt="Visualisation of Nixpkgs"></p>
<p>Visualisation of Nixpkgs^[<a href="https://www.tweag.io/blog/2022-09-13-nixpkgs-graph/">www.tweag.io/blog/2022-09-13-nixpkgs-graph/</a>]</p>
<h4>NixOS</h4>
<p>NixOS^[<a href="https://nixos.org">nixos.org</a>] is a Linux distribution built with Nix from a modular, purely functional specification.
It has no traditional filesystem hierarchy (FSH), like <code>/bin</code>, <code>/lib</code>, <code>/usr</code>, but instead stores all components in <code>/nix/store</code>.
The system configuration is managed by Nix and configured with Nix expressions.
NixOS modules are Nix files containing chunks of system configuration that can be composed to build a full NixOS system^[<a href="https://nixos.org/manual/nixos/stable/index.html#sec-writing-modules">NixOS manual Chapter 66. Writing NixOS Modules</a>.].
While many NixOS modules are provided in the Nixpkgs repository, they can also be written by an individual user.
For example, the expression used to deploy a DNS server is a NixOS module.
Together these modules form the configuration which builds the Linux system as a Nix derivation.</p>
<p>NixOS minimises global mutable state that -- without knowing it -- you might rely on being set up in a certain way.
For example, you might follow instructions to run a series of shell commands and edit some files to get a piece of software working.
You may subsequently be unable to reproduce the result because you've forgotten some intricacy or are now using a different version of the software.
Nix forces you to encode this in a reproducible way, which is extremely useful for replicating software configurations and deployments, aiming to solve the 'It works on my machine' problem.
Docker is often used to fix this configuration problem, but Nix aims to be more reproducible.
This can be frustrating at times because it can make it harder to get a project off the ground, but the benefits often outweigh the downsides.</p>
<p>Nix uses pointers (implemented as symlinks) to system dependencies, which are Nix derivations for programs or pieces of configuration files.
This means NixOS supports atomic upgrades, as the pointers to the new packages are only updated when the install succeeds; the old versions can be kept until garbage collection.
This also allows NixOS to trivially supports rollbacks to previous system configurations, as the pointers can be restored to their previous state.
Every new system configuration creates a GRUB entry, so you can boot previous systems even from your UEFI/BIOS.
Finally, NixOS also supports partial upgrades: while Nixpkgs also has one global coherent package set, one can use multiple instances of Nixpkgs (i.e., channels) at once, as this Nix store allows multiple versions of a dependency to be stored.</p>
<p>To summarise the parts of the Nix ecosystem that we've discussed:</p>
<div style="text-align: center;">
  <img src="/blog/images/2022-12-14-hillingar-mirageos-unikernels-on-nixos/nix-stack~Ak6saYpe90U2bvKhb23wjQ.svg" style="height: 300px; max-width: 100%" alt="NixOS, Nixpkgs and Nix">
</div>
<h4>Flakes</h4>
<p>We also use Nix flakes for this project.
Without going into too much depth, they enable hermetic evaluation of Nix expressions and provide a standard way to compose Nix projects.
With flakes, instead of using a Nixpkgs repository version from a 'channel'^[<a href="https://nixos.org/manual/nix/stable/package-management/channels.html">nixos.org/manual/nix/stable/package-management/channels.html</a>], we pin Nixpkgs as an input to every Nix flake, be it a project build with Nix or a NixOS system.
Integrated with flakes, there is also a new <code>nix</code> command aimed at improving the Nix UI.
You can read more detail about flakes in a series of blog posts by Eelco Dolstra on the topic^[<a href="https://www.tweag.io/blog/2020-05-25-flakes/">tweag.io/blog/2020-05-25-flakes</a>].</p>
<h2>Deploying Unikernels</h2>
<p>Now that we understand what Nix and Mirage are, and we've motivated the desire to deploy Mirage unikernels on a NixOS machine, what's stopping us from doing just that?
To support deploying a Mirage unikernel, like for a DNS server, we need to write a NixOS module for it.</p>
<p>A paired-down^[The full module can be found <a href="https://github.com/NixOS/nixpkgs/blob/fe76645aaf2fac3baaa2813fd0089930689c53b5/nixos/modules/services/networking/bind.nix">here</a>] version of the bind NixOS module, the module used in our Nix expression for deploying a DNS server on NixOS (<a href="#cb1">§</a>), is:</p>
<pre><code class="language-nix">{ config, lib, pkgs, ... }:

with lib;

{
  options = {
    services.bind = {
      enable = mkEnableOption "BIND domain name server";

      zones = mkOption {
        ...
      };
    };
  };

  config = mkIf cfg.enable {
    systemd.services.bind = {
      description = "BIND Domain Name Server";
      after = [ "network.target" ];
      wantedBy = [ "multi-user.target" ];

      serviceConfig = {
        ExecStart = "${pkgs.bind.out}/sbin/named";
      };
    };
  };
}
</code></pre>
<p>Notice the reference to <code>pkgs.bind</code>.
This is the Nixpkgs repository Nix derivation for the <code>bind</code> package.
Recall that every input to a Nix derivation is itself a Nix derivation (<a href="#nixpkgs">§</a>); in order to use a package in a Nix expression -- i.e., a NixOS module -- we need to build said package with Nix.
Once we build a Mirage unikernel with Nix, we can write a NixOS module to deploy it.</p>
<h2>Building Unikernels</h2>
<p>Mirage uses the package manager for OCaml called opam^[<a href="https://opam.ocaml.org/">opam.ocaml.org</a>].
Dependencies in opam, as is common in programming language package managers, have a file which -- among other metadata, build/install scripts -- specifies dependencies and their version constraints.
For example^[For <a href="https://github.com/mirage/mirage-www">mirage-www</a> targetting <code>hvt</code>.]</p>
<pre><code>...
depends: [
  "arp" { ?monorepo &amp; &gt;= "3.0.0" &amp; &lt; "4.0.0" }
  "ethernet" { ?monorepo &amp; &gt;= "3.0.0" &amp; &lt; "4.0.0" }
  "lwt" { ?monorepo }
  "mirage" { build &amp; &gt;= "4.2.0" &amp; &lt; "4.3.0" }
  "mirage-bootvar-solo5" { ?monorepo &amp; &gt;= "0.6.0" &amp; &lt; "0.7.0" }
  "mirage-clock-solo5" { ?monorepo &amp; &gt;= "4.2.0" &amp; &lt; "5.0.0" }
  "mirage-crypto-rng-mirage" { ?monorepo &amp; &gt;= "0.8.0" &amp; &lt; "0.11.0" }
  "mirage-logs" { ?monorepo &amp; &gt;= "1.2.0" &amp; &lt; "2.0.0" }
  "mirage-net-solo5" { ?monorepo &amp; &gt;= "0.8.0" &amp; &lt; "0.9.0" }
  "mirage-random" { ?monorepo &amp; &gt;= "3.0.0" &amp; &lt; "4.0.0" }
  "mirage-runtime" { ?monorepo &amp; &gt;= "4.2.0" &amp; &lt; "4.3.0" }
  "mirage-solo5" { ?monorepo &amp; &gt;= "0.9.0" &amp; &lt; "0.10.0" }
  "mirage-time" { ?monorepo }
  "mirageio" { ?monorepo }
  "ocaml" { build &amp; &gt;= "4.08.0" }
  "ocaml-solo5" { build &amp; &gt;= "0.8.1" &amp; &lt; "0.9.0" }
  "opam-monorepo" { build &amp; &gt;= "0.3.2" }
  "tcpip" { ?monorepo &amp; &gt;= "7.0.0" &amp; &lt; "8.0.0" }
  "yaml" { ?monorepo &amp; build }
]
...
</code></pre>
<p>Each of these dependencies will have its own dependencies with their own version constraints.
As we can only link one dependency into the resulting program, we need to solve a set of dependency versions that satisfies these constraints.
This is not an easy problem.
In fact, it's NP-complete ^[<a href="https://research.swtch.com/version-sat">research.swtch.com/version-sat</a>].
Opam uses the Zero Install^[<a href="https://0install.net">0install.net</a>] SAT solver for dependency resolution.</p>
<p>Nixpkgs has a large number of OCaml packages^[<a href="https://github.com/NixOS/nixpkgs/blob/9234f5a17e1a7820b5e91ecd4ff0de449e293383/pkgs/development/ocaml-modules/">github.com/NixOS/nixpkgs pkgs/development/ocaml-modules</a>], which we could provide as build inputs to a Nix derivation.
However, Nixpkgs has one global coherent set of package versions^[Bar some exceptional packages that have multiple major versions packaged, like Postgres.].
The support for installing multiple versions of a package concurrently comes from the fact that they are stored at a unique path and can be referenced separately, or symlinked, where required.
So different projects or users that use a different version of Nixpkgs won't conflict, but Nix does not do any dependency version resolution -- everything is pinned.
This is a problem for opam projects with version constraints that can't be satisfied with a static instance of Nixpkgs.</p>
<p>Luckily, a project from Tweag already exists (<code>opam-nix</code>) to deal with this^[<a href="https://github.com/tweag/opam-nix">github.com/tweag/opam-nix</a>].
This project uses the opam dependency versions solver inside a Nix derivation, and then creates derivations from the resulting dependency versions.</p>
<p>This still doesn't support building our Mirage unikernels, though.
Unikernels quite often need to be cross-compiled: compiled to run on a platform other than the one they're being built on.
A common target, Solo5^[<a href="https://github.com/Solo5/solo5">github.com/Solo5/solo5</a>], is a sandboxed execution environment for unikernels.
It acts as a minimal shim layer to interface between unikernels and different hypervisor backends.
Solo5 uses a different <code>glibc</code> which requires cross-compilation.
Mirage 4^[<a href="https://mirage.io/blog/announcing-mirage-40">mirage.io/blog/announcing-mirage-40</a>] supports cross compilation with toolchains in the Dune build system^[<a href="https://dune.build">dune.build</a>].
This uses a host compiler installed in an opam switch (a virtual environment) as normal, as well as a target compiler^[<a href="https://github.com/mirage/ocaml-solo5">github.com/mirage/ocaml-solo5</a>].
But the cross-compilation context of packages is only known at build time, as some metaprogramming modules may require preprocessing with the host compiler.
To ensure that the right compilation context is used, we have to provide Dune with all our sources' dependencies.
A tool called <code>opam-monorepo</code> was created to do just that^[<a href="https://github.com/tarides/opam-monorepo">github.com/tarides/opam-monorepo</a>].</p>
<p>We extended the <code>opam-nix</code> project to support the <code>opam-monorepo</code> workflow with this pull request: <a href="https://github.com/tweag/opam-nix/pull/18">github.com/tweag/opam-nix/pull/18</a>.
This is very low-level support for building Mirage unikernels with Nix, however.
In order to provide a better user experience, we also created the Hillinar Nix flake: <a href="https://github.com/ryanGibb/hillingar">github.com/RyanGibb/hillingar</a>.
This wraps the Mirage tooling and <code>opam-nix</code> function calls so that a simple high-level flake can be dropped into a Mirage project to support building it with Nix.
To add Nix build support to a unikernel, simply:</p>
<pre><code><span class="bash-punctuation-definition-comment">#</span><span class="bash-comment-line-number-sign"> create a flake from hillingar's default template</span><span class="bash-comment-line-number-sign">
</span><span class="bash-source">$ nix flake new . -t github:/RyanGibb/hillingar
</span><span class="bash-punctuation-definition-comment">#</span><span class="bash-comment-line-number-sign"> substitute the name of the unikernel you're building</span><span class="bash-comment-line-number-sign">
</span><span class="bash-source">$ sed -i 's/throw "Put the unikernel name here"/"&lt;unikernel-name&gt;"/g' flake.nix
</span><span class="bash-punctuation-definition-comment">#</span><span class="bash-comment-line-number-sign"> build the unikernel with Nix for a particular target</span><span class="bash-comment-line-number-sign">
</span><span class="bash-source">$ nix build .</span><span class="bash-punctuation-definition-comment">#</span><span class="bash-comment-line-number-sign">&lt;target&gt;</span><span class="bash-comment-line-number-sign">
</span></code></pre>
<p>For example, see the flake for building the Mirage website as a unikernel with Nix: <a href="https://github.com/RyanGibb/mirage-www/blob/master/flake.nix">github.com/RyanGibb/mirage-www/blob/master/flake.nix</a>.</p>
<h2>Evaluation</h2>
<p>Hillingar's primary limitations are (1) complex integration is required with the OCaml ecosystem to solve dependency version constraints using <code>opam-nix</code>, and (2) that cross-compilation requires cloning all sources locally with <code>opam-monorepo</code> (<a href="#dependency-management">§</a>).
Another issue that proved an annoyance during this project is the Nix DSL's dynamic typing.
When writing simple derivations this often isn't a problem, but when writing complicated logic, it quickly gets in the way of productivity.
The runtime errors produced can be very hard to parse.
Thankfully there is work towards creating a typed language for the Nix deployment system, such as Nickel^[<a href="https://www.tweag.io/blog/2020-10-22-nickel-open-sourcing/">www.tweag.io/blog/2020-10-22-nickel-open-sourcing</a>].
However gradual typing is hard, and Nickel still isn't ready for real-world use despite being open-sourced (in a week as of writing this) for two years.</p>
<p>A glaring omission is that despite it being the primary motivation, we haven't actually written a NixOS module for deploying a DNS server as a unikernel.
There are still questions about how to provide zone file data declaratively to the unikernel and manage the runtime of deployed unikernels.
One option to do the latter is Albatross^[<a href="https://hannes.robur.coop/Posts/VMM">hannes.robur.coop/Posts/VMM</a>], which has recently had support for building with Nix added^[<a href="https://github.com/roburio/albatross/pull/120">https://github.com/roburio/albatross/pull/120</a>].
Albatross aims to provision resources for unikernels such as network access, share resources for unikernels between users, and monitor unikernels with a Unix daemon.
Using Albatross to manage some of the inherent imperative processes behind unikernels, as well as share access to resources for unikernels for other users on a NixOS system, could simplify the creation and improve the functionality of a NixOS module for a unikernel.</p>
<p>There also exists related work in the reproducible building of Mirage unikernels.
Specifically, improving the reproducibility of opam packages (as Mirage unikernels are opam packages themselves)^[<a href="https://hannes.nqsb.io/Posts/ReproducibleOPAM">hannes.nqsb.io/Posts/ReproducibleOPAM</a>].
Hillingar differs in that it only uses opam for version resolution, instead using Nix to provide dependencies, which provides reproducibility with pinned Nix derivation inputs and builds in isolation by default.</p>
<h2>Conclusion</h2>
<p>To summarise, this project was motivated (<a href="#introduction">§</a>) by deploying unikernels on NixOS (<a href="#deploying-unikernels">§</a>).
Towards this end, we added support for building MirageOS unikernels with Nix:
we extended <code>opam-nix</code> to support the <code>opam-monorepo</code> workflow and created the Hillingar project to provide a usable Nix interface (<a href="#building-unikernels">§</a>).</p>
<p>While only the first was the primary motivation, the benefits of building unikernels with Nix are:</p>
<ul>
<li>Reproducible and low-config unikernel deployment using NixOS modules is enabled.</li>
<li>Nix allows reproducible builds pinning system dependencies and composing multiple language environments. For example, the OCaml package <code>conf-gmp</code> is a 'virtual package' that relies on a system installation of the C/Assembly library <code>gmp</code> (The GNU Multiple Precision Arithmetic Library). Nix easily allows us to depend on this package in a reproducible way.</li>
<li>We can use Nix to support building on different systems (<a href="#cross-compilation">§</a>).</li>
</ul>
<p>To conclude, while NixOS and MirageOS take fundamentally very different approaches, they're both trying to bring some kind of functional programming paradigm to operating systems.
NixOS does this in a top-down manner, trying to tame Unix with functional principles like laziness and immutability^[<a href="https://www.tweag.io/blog/2022-07-14-taming-unix-with-nix/">tweag.io/blog/2022-07-14-taming-unix-with-nix</a>]; whereas, MirageOS does this by throwing Unix out the window and rebuilding the world from scratch in a very much bottom-up approach.
Despite these two projects having different motivations and goals, Hillingar aims to get the best from both worlds by marrying the two.</p>
<hr>
<p>To dive deeper, please see a more detailed article on my <a href="https://ryan.freumh.org/blog/hillingar">personal blog</a>.</p>
<p>If you have a unikernel, consider trying to build it with Hillingar, and please report any problems at <a href="https://github.com/RyanGibb/hillingar/issues">github.com/RyanGibb/hillingar/issues</a>!</p>
]]></description><link>https://tarides.com/blog/2022-12-14-hillingar-mirageos-unikernels-on-nixos</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-14-hillingar-mirageos-unikernels-on-nixos.html</guid><dc:creator><![CDATA[ Ryan Gibb ]]></dc:creator><pubDate>Wed, 14 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 Release Candidate Now Available!]]></title><description><![CDATA[<p>We're in the home stretch for the full OCaml 5 release. Multicore is almost here! Yesterday its Release Candidate (RC) was announced on the <a href="https://discuss.ocaml.org/t/first-release-candidate-for-ocaml-5-0-0/10922">OCaml Discuss</a>, which is the final step before the major release, expected before Christmas.</p>
<p>To learn more about the exciting features coming with OCaml 5, you can watch <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC’s keynote address</a> and check out his <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">speaker slide deck</a> as well. As always, feel free to <a href="/contact/">contact us</a> for more information about using OCaml and for support on your OCaml projects.</p>
<p>The OCaml community has worked tirelessly to release 5.0 before the end of the year, with a lot of time spent on creating a smooth transition for OCaml users. There should be just enough time for you to try out OCaml 5 for a fun holiday project or the <a href="/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocaml/">Advent of Code</a>.</p>
<p>Your reports resulted in these bug fixes since the Beta 2 release last week:</p>
<ul>
<li><a href="https://github.com/ocaml/ocaml/issues/11776">11776</a>: Extend environment with functor parameters in <code>strengthen_lazy</code>. (Chris Casinghino and Luke Maurer, review by Gabriel Scherer)</li>
<li><a href="https://github.com/ocaml/ocaml/issues/11533">11533</a> and <a href="https://github.com/ocaml/ocaml/issues/11534">11534</a>: follow synonyms again in <code>#show_module_type</code> (this had stopped working in 4.14.0) (Gabriel Scherer, review by Jacques Garrigue, report by Yaron Minsky)</li>
</ul>
<p>For the full change log, <a href="https://github.com/ocaml/ocaml/blob/5.0/Changes">visit the GitHub repo</a>. The source code for the release candidate is available at these addresses:</p>
<ul>
<li>https://github.com/ocaml/ocaml/archive/5.0.0-rc1.tar.gz</li>
<li>https://caml.inria.fr/pub/distrib/ocaml-5.0/ocaml-5.0.0~rc1.tar.gz</li>
</ul>
<p>Please keep those testing reports coming in. We believe this release candidate is ready to go, but we really value testing right up to the last minute to be even more sure. Send us your valuable input! If you find something, please <a href="https://github.com/ocaml/ocaml/issues">open an issue on GitHub</a> or join the discussion on the <a href="https://discuss.ocaml.org/t/first-release-candidate-for-ocaml-5-0-0/10922">Discuss post</a>, where you can also find installation instructions.</p>
]]></description><link>https://tarides.com/blog/2022-12-07-ocaml-5-release-candidate-now-available</link><guid isPermaLink="false">https://tarides.com/blog/2022-12-07-ocaml-5-release-candidate-now-available.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 07 Dec 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 Beta2 Release]]></title><description><![CDATA[<p>Just about a month after the <a href="/blog/2022-10-17-ocaml-5-beta-release/">OCaml 5 Beta release</a>, the OCaml 5 Beta2 version has been released, taking us one step closer to the full OCaml 5 with Multicore release later this year. The OCaml community's collaboration is coming to fruition! Although we're not quite ready for the RC1 (Release Candidate) version, several things have been added and improved with Beta2.</p>
<p>To learn more about the exciting things coming with OCaml 5, please watch KC Sivaramakrishnan’s <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">keynote address</a> and check out <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">his speaker slide deck</a> as well. As always, feel free to <a href="/contact/">contact us</a> for more information about using OCaml and for support on your OCaml projects.</p>
<p>Here's a partial list of improvements/fixes with <a href="https://github.com/ocaml/ocaml/issues">issue numbers</a>:</p>
<ul>
<li><a href="https://github.com/ocaml/ocaml/pull/11631">#11631</a> - fix an assertion dealing with a segfault found by the Multicore test suite</li>
<li><a href="https://github.com/ocaml/ocaml/issues/11662">#11662</a>, <a href="https://github.com/ocaml/ocaml/pull/11673">#11673</a> - memory leak affecting <code>dynlink</code> with frame descriptor tables (reported by Frama-C devs)</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11704">#11704</a>, <a href="https://github.com/ocaml/ocaml/issues/11669">#11669</a> - segfault with effects fixed, having been tracked down to the refactoring of <code>Effect.Unhandled</code></li>
<li><a href="https://github.com/ocaml/ocaml/pull/11701">#11701</a> - fix spurious <code>.dSYM</code> files and directories being created on macOS</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11671">#11671</a> - bug in <code>top_heap_words</code> statistics accounting fix</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11670">#11670</a> - macOS fix when creating empty archives</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11097">#11097</a> - NetBSD fixes, including ARM64 support</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11194">#11194</a>, <a href="https://github.com/ocaml/ocaml/pull/11609">#11609</a> - fixes a regression from 4.14</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11622">#11622</a> - fixes a regression in error messages since 4.10</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11725">#11725</a> - remove <code>caml_alloc_N</code></li>
<li><a href="https://github.com/ocaml/ocaml/pull/11661">#11661</a> - erroneous <code>-force-tmc</code> option removed</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11367">#11367</a>, <a href="https://github.com/ocaml/ocaml/pull/11652">#11652</a> - Windows clean-ups</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11611">#11611</a> - fix --disable-instrumented-runtime</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11639">#11639</a> - configuration bookkeeping (ensure system, etc., always set)</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11632">#11632</a> - minor bookkeeping bug fix</li>
<li><a href="https://github.com/ocaml/ocaml/pull/11559">#11559</a>, <a href="https://github.com/ocaml/ocaml/pull/11649">#11649</a>, <a href="https://github.com/ocaml/ocaml/pull/11640">#11640</a>, <a href="https://github.com/ocaml/ocaml/pull/11301">#11301</a>, <a href="https://github.com/ocaml/ocaml/pull/11705">#11705</a> - docs updates</li>
</ul>
<p>In short, we're continuing to stabilise the release. We're also dealing with reports coming from the wonderful testing that's been going on, especially Multicore tests and the Frama-C report. Keep those testing reports and feedback coming by <a href="https://github.com/ocaml/ocaml/issues">opening an issue on the GitHub repo</a> or chiming in through the <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-second-beta-release/10871">OCaml Discuss forum post</a>.</p>
<p>Thanks to the hard work by all engineers working to make OCaml even better than before. It's a beautiful sight to watch brilliant developers come together on an open-source project like OCaml, and Tarides is proud to be part of this ever-growing community.</p>
]]></description><link>https://tarides.com/blog/2022-11-29-ocaml-5-beta2-release</link><guid isPermaLink="false">https://tarides.com/blog/2022-11-29-ocaml-5-beta2-release.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 29 Nov 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Solve the 2022 Advent of Code Puzzles with OCaml]]></title><description><![CDATA[<p>Too many programmers only know OCaml through a functional programming language overview course at university. They erroneously believe OCaml is used primarily in academia rather than in the real world. Not only is OCaml already used in <a href="/blog/2022-11-22-six-surprising-reasons-the-ocaml-programming-language-is-good-for-business/">several prominent businesses</a>, it can also be used for fun projects, like the upcoming Advent of Code.</p>
<h2>What is Advent of Code?</h2>
<p><a href="https://adventofcode.com/">Advent of Code</a> is an annual online Advent calendar produced specifically for programmers. It publishes a series of daily puzzles, revealing a new puzzle every day from 1 December to 25 December. That's 25 days of coding challenges! These puzzles can be solved in the language of your choice.</p>
<p>Every year they have new puzzles and anyone can participate, whether you're new to programming or a veteran. Advent of Code has been running since 2015 and has attracted a large community of developers around the world!</p>
<p>It's not only a fun way to learn new languages, but it also provides a great way to meet new people with the same interests and exchange ideas.</p>
<p>OCaml 5 is set for release later this year, so it's a perfect time to learn and practice OCaml with Advent of Code!</p>
<h2>5 Reasons to Learn OCaml</h2>
<p>There are so many misconceptions around OCaml that it's hard to know where to start. Basically, OCaml can do what Python, C++, Java, or any other major programming language can do. Here are a few specific reasons why OCaml should be the next language you learn:</p>
<h4>1. <strong>Secure-By-Design</strong></h4>
<p>Cyber attacks have become more commonplace and sophisticated, so security has become a top priority in software development. With the proliferation of cloud services and Internet-connected devices, software must be secure-by-design to prevent malicious actors from taking advantage of any bugs or loopholes. Creating software with a secure-by-design language like OCaml helps meet this goal.</p>
<p>OCaml has built-in features and design patterns, like <strong>type and memory safety</strong>, that make it secure-by-design.</p>
<h4>2. <strong>Performant Garbage Collector</strong></h4>
<p>Contrary to popular belief, OCaml's garbage collector (GC) doesn't slow things down because it's incremental, which can help avoid the problems of manual memory management in large or long-running programs.</p>
<p>OCaml's GC has to run periodically, but it can do so in <strong>small incremental steps</strong>. Although allocations that trigger a GC are longer than a malloc call (used in C), most of them are almost immediate because allocating from the minor heap is as cheap as allocating on the stack.</p>
<p>This incremental GC avoids the problems normally associated with garbage collection, like tying up memory and slowing down the process.</p>
<h4>3. <strong>Confidence in the Code Through Type Checking</strong></h4>
<p>OCaml's compile-time type checking eliminates many potential runtime errors, and its strong type inference eliminates several redundant type annotations. Often, a programming language has either type safety or type inference, but with OCaml you get both!</p>
<h4>4. <strong>Multicore!</strong></h4>
<p>With the release of OCaml 5 later this year comes Multicore support! <a href="https://github.com/ocaml-multicore/ocaml-multicore">Multicore</a> is an extension of OCaml with native support for <strong>Shared-Memory Parallelism</strong> through domains and <strong>Concurrency</strong> through algebraic effects. The ability to run OCaml on multiple cores will make it even faster than before.</p>
<h4>5. <strong>Extensive Tools &amp; Libraries</strong></h4>
<p>OCaml has some great tools and <a href="/blog/2022-10-12-8-ocaml-libraries-to-make-your-life-easier/">helpful libraries</a>, like <a href="https://mirage.io/">MirageOS</a>, <a href="https://irmin.org/">Irmin</a>, and so many others as reported by seasoned OCaml programmers in <a href="https://discuss.ocaml.org/t/top-5-favorite-ocaml-libraries/10626">this <em>Discuss</em> thread</a>.</p>
<p>OCaml platform tools include the <a href="https://dune.readthedocs.io/en/stable/">Dune build system</a>, <a href="https://opam.ocaml.org/">opam package manager</a>, <a href="/blog/2022-07-05-the-magic-of-merlin/">Merlin IDE</a>, and the <a href="https://ocaml.github.io/odoc/"><code>odoc</code> documentation generator</a>.</p>
<h2>Pro Tip: Learn OCaml Basics First</h2>
<p>Before you start solving puzzles, learn the basics to ensure you understand the most common data structures and algorithms. It's important to know how to print to the console, read and write files, and parse text.</p>
<p>It’s also a good idea to read about common problem-solving approaches. For example, checking whether a solution is correct is a crucial part of the solving process. This will help you understand the challenges better, and you can save yourself a lot of time and frustration.</p>
<p>Get <a href="https://ocaml.org/docs/up-and-running">Up &amp; Running</a> with OCaml today through the <a href="https://ocaml.org/docs">tutorials on OCaml.org</a>. Also consider joining the <a href="https://discuss.ocaml.org/">OCaml Community Forum <em>Discuss</em></a>. They're very welcoming of those new to OCaml as well as experienced OCaml programmers, and members will quickly answer any questions you have while you learn.</p>
<h2>Conclusion</h2>
<p>The Advent of Code is a great way to practise your problem-solving skills in a new programming language during the holidays.</p>
<p>Give OCaml a try this holiday season. You won't regret it!</p>
<p>Learn more about the forthcoming OCaml 5 through <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC Shivaramakrishnan's Keynote Address</a> and <a href="https://speakerdeck.com/kayceesrk/retrofitting-concurrency-lessons-from-the-engine-room">speaker deck</a>. Stay tuned to this blog for release updates and a series of posts about why you should consider OCaml as your next language.</p>
]]></description><link>https://tarides.com/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocaml</link><guid isPermaLink="false">https://tarides.com/blog/2022-11-24-solve-the-2022-advent-of-code-puzzles-with-ocaml.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 24 Nov 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Six Surprising Reasons the OCaml Programming Language is Good for Business]]></title><description><![CDATA[<p>Functional programming languages have been around since the 1950's, when the first high-level languages were used to program early computers. Examples of functional programming languages include OCaml, Erlang, Clojure, Haskell, Scala, and Common Lisp. Choosing the right programming language is critical to the long-term success and stability of your products, services, and operations. With strong academic roots and years of iteration and innovation, functional programming languages can offer a real competitive edge to businesses. This article explains some of the lesser-known benefits of OCaml from a business perspective.</p>
<h2>Why Functional Programming?</h2>
<p>The <a href="https://medium.com/javascript-scene/master-the-javascript-interview-what-is-functional-programming-7f218c68b3a0">strengths of functional programming</a> are becoming increasingly well-known. Functional programming lets programmers write programs in a declarative, logical, and mathematical style. This makes it easier for the developer to express their intent in a declarative style, where the code closely matches the specification. A specification is a mathematical description of what a program does, and programs that match their specification are proven to follow that description, which makes them predictable and safe. Furthermore, with features that limit the mutation of data and side effects, functional programs tend to have fewer bugs and vulnerabilities, remain easy to develop and maintain, and last longer overall.</p>
<p>Nowadays, widely-used imperative programming languages such as Python, Rust, and Java support programming in a functional style and use functional programming features to take <a href="https://spectrum.ieee.org/functional-programming">advantage of its strengths</a>. Features <a href="https://www.typescriptlang.org/">like rich type systems</a>, <a href="https://medium.com/digitalfrontiers/a-case-for-pattern-matching-b43a5c9796b8">pattern matching</a>, and <a href="https://en.wikipedia.org/wiki/Anonymous_function">lambda expressions</a> are becoming more mainstream, illustrating their usefulness.</p>
<h2>What Makes OCaml Unique?</h2>
<p>OCaml combines a strong foundation in functional programming with some select imperative and object-oriented programming features, allowing the user to choose the best approach for the task at hand. This is part of what makes OCaml such a great general-purpose programming language; it combines the strengths of several programming styles and offers the developer a full software development toolkit.</p>
<p>OCaml provides a unique balance of performance, security, and reliability. It combines features like a garbage collector, static type-checking, type-driven development, first-class functions, and pattern matching (features that work well together). Together they result in a language known for minimising errors, debugging easily, automatically managing memory, preventing structural errors in data, and providing a user-friendly developer environment.</p>
<p>When OCaml 5 is released later this year, the language will get a significant upgrade, introducing support for shared-memory parallelism and native support for simple concurrent programming. Running programs on multiple cores will allow developers to considerably reduce the runtime of their projects by executing code in parallel. The quality-of-life updates to concurrent programming will make it easier for developers to write high-performance concurrent code.</p>
<p>So what are some of OCaml’s greatest strengths? Here are the key reasons why businesses use OCaml to solve complex, critical, and time-sensitive problems.</p>
<h3>1. OCaml Is Trusted By Several Prominent Companies</h3>
<p>Think of OCaml as an academic language? Think again! Owing to its reputation for reliability, safety, and performance, many businesses use it to solve real-world problems. Talented software engineers all over the world create robust, maintainable, energy-efficient, and fast solutions in OCaml for high-pressure environments where a single mistake can cost millions.</p>
<p><a href="https://www.docker.com">Docker</a> is making life easier for developers by providing them with a state-of-the-art integrated development pipeline that consolidates application components, all available conveniently on your desktop. Docker has over <a href="https://containerjournal.com/features/docker-inc-dev-tools-boast-15-million-users/">fifteen million</a> registered users worldwide and uses <a href="https://github.com/moby/vpnkit">VPNKit</a>, which is written in OCaml, in its Docker Desktop app to <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">keep user networks secure.</a></p>
<p><a href="https://about.meta.com/company-info/">Meta</a> is a multiplatform company that uses tech to bring people together and build unique communities online. The social media giant uses OCaml in major parts of its infrastructure, such as the compiler and typechecker for its programming language Hack. Other Meta tooling that uses OCaml includes Infer, Flow, and the now retired Pfff.</p>
<p><a href="https://www.janestreet.com">Jane Street</a> is a quantitative trading firm that uses OCaml as their core solution for their research tools, trading systems, and accounting systems. Processing billions of dollars each day, Jane Street relies on OCaml to ensure it is done securely and quickly. Notably, nearly a million lines of their code is open source, and they work closely with the OCaml community to develop the language and its tools.</p>
<p>Other companies that use OCaml include <a href="https://ahrefs.com">Ahrefs</a>, an all-in-one SEO tool; <a href="https://www.nitrokey.com">Nitrokey</a>, a world-leading provider of open-source security hardware; and <a href="https://hyper.systems">Hyper</a>, who use OCaml to provide their customers with a unified data platform to manage large infrastructure. There are also two blockchains written in OCaml: <a href="https://tezos.foundation/">Tezos</a> and <a href="https://minaprotocol.com/">Mina</a>. When it comes to the modern landscape of software development, it is safe to say OCaml has an ever-growing part to play.</p>
<p><em>The Takeaway</em> : Several top companies are already using OCaml, so there is a strong tradition of successfully using OCaml on an industrial level.</p>
<h3>2. OCaml Has a Growing and Thriving Community</h3>
<p>The open-source community surrounding OCaml is diverse and flourishing. It congregates in several places online, like the <a href="https://discuss.ocaml.org">Discuss forum</a>, <a href="https://github.com/ocaml">GitHub repo</a>, and <a href="https://www.reddit.com/r/ocaml/">Reddit community</a>. Thanks to its increasing popularity, more and more <a href="https://ocaml.org/docs">tutorials and documentation</a> are becoming available online. Learning new languages with the help of online material is the trend nowadays, and OCaml welcomes all developers with open arms. In fact, there is a great book for learning OCaml <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571">entirely available online</a> called <em>Real World OCaml</em>.</p>
<p>Over thirty universities currently teach OCaml, including the University of Cambridge, Paris-Diderot University, the Indian Institute of Technology Madras, Harvard, and Cornell University. Cornell University has a <a href="https://www.cs.cornell.edu/courses/cs3110/2022fa/">great textbook</a> on OCaml. Students who learn OCaml often end up becoming part of its open-source community, contributing to projects and launching initiatives of their own. Companies that use OCaml to provide their customers with great products also contribute to the OCaml community, as do academics, researchers, and hobbyists. All entry-points contribute to the development of the language, and users end up interacting with each other across these categories, each bringing their own unique perspective.</p>
<p><em>The Takeaway</em> : The community surrounding OCaml is vibrant and thriving, allowing you to invest your time and energy in OCaml knowing it’s here to stay.</p>
<h3>3. OCaml Offers Powerful Tools and Plenty of Support</h3>
<p>OCaml is not only an industrial-strength programming language, but it also provides industry-ready tooling and ecosystem support. The OCaml Platform is a curated set of tools that have broad community support. It includes all the tools you'd expect from an industrial-strength programming language, including a build system, package manager, editor support, and documentation generator. The OCaml Platform tells you whether one of these tools is active, under incubation, or deprecated. The OCaml Platform ensures that developers not only have an excellent language at hand but also have the tools to productively develop software with that language. Furthermore, the OCaml website has thorough resources on everything OCaml, including a comprehensive <a href="https://ocaml.org/docs/up-and-running#setting-up-development-tools">guide to setting up OCaml on your computer</a>.</p>
<p>The OCaml compiler is regularly updated and has a dedicated team focused on innovation, improving new features, and keeping everything bug-free. If you have a problem or need help, the response time across the various OCaml forums is very quick and supportive of new learners. Recently, the aforementioned great and comprehensive book <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571">Real World OCaml</a> has been updated for its 2nd edition. This edition is available to download online as a free PDF to make learning OCaml even <a href="/blog/2022-10-14-real-world-ocaml-book-giveaway/">more accessible.</a></p>
<p>If you want to create something new in OCaml, or need help building something, there are several companies as well as hobbyist groups to consult. You can find like-minded people to discuss your ideas with on the <a href="https://discuss.ocaml.org">OCaml forum <em>Discuss</em></a>. Tarides is one of the companies that regularly work on OCaml, so you can <a href="/contact/">send us a message</a> if you’d like help with a project.</p>
<p><em>The Takeaway</em> : OCaml offers several up-to-date tools supported by an active group of contributors and maintainers. Using OCaml means getting help fast and having a complete developer environment with everything you need.</p>
<h3>4. OCaml is Secure-By-Design</h3>
<p>In today’s connected world, keeping yourself and your data safe online is not just a matter of convenience, but it's also a crucial task that can have serious personal and professional repercussions. Luckily, the OCaml programming language is built in a way that promotes safety, including features and design patterns that make it secure-by-design. It's easy to integrate formally verified code with OCaml programs, which makes OCaml more secure. Some formally verified libraries available in OCaml include <a href="https://hacl-star.github.io/">Microsoft's HACL*</a> and <a href="https://news.mit.edu/2019/fiat-cryptography-chrome-android-0617">MIT's Fiat</a>.</p>
<p>Secure-by-design is a known programming term which means that a language is constructed in a way that fundamentally promotes security and minimises vulnerabilities. A language that is secure-by-design makes it impossible to introduce a large class of security vulnerabilities into programs written using that language. A great example of how secure-by-design principles are implemented in OCaml is its type and memory safety, which prevents the most frequent kinds of attacks and crashes from ever happening.</p>
<p>Memory-safety attacks are extremely common, with approximately 70% of zero-day attacks being <a href="https://www.itpro.co.uk/security/zero-day-exploit/360447/why-zero-day-exploits-are-surging-on-an-unprecedented-scale">memory-safety attacks</a>. OCaml is memory safe because it doesn’t allow a pointer (the designator of the information being written into memory) to enter information into an unauthorised memory block. As a result, you can’t make a program crash with OCaml by manipulating where it writes code into memory, as OCaml simply does’t allow this to happen. This prevents programs crashing due to memory exploits, including buffer overflows, where memory is ‘tricked’ into writing more than the block ‘allows.’</p>
<p>OCaml is also statically-typed and type-safe, meaning that it detects errors at compile time and completely stops programs with defects from running, as well as limits what type of operations can be performed on which kinds of data. Both work to remove bugs and errors from the code, making programs written in OCaml more reliable and consistent.</p>
<p><em>The Takeaway</em> : Cybersecurity is an increasing area of concern, and OCaml has several built-in features that help make it secure-by-design. It’s an excellent choice for projects where security is paramount.</p>
<h3>5. OCaml is Big on Performance and Developer Productivity</h3>
<p>OCaml is hailed for striking a balance between a large number of advanced features and performance. For example, it has a very efficient compiler that’s divided into two parts: a bytecode compiler and a native compiler. The bytecode compiler is very quick and generates small, portable executables. The native code compiler produces highly-efficient machine code.</p>
<p>Since OCaml also allows for some uses of imperative and object-oriented programming features, it’s possible to use them in places where they can help with performance. This flexibility of programming paradigms is another way that OCaml helps programmers increase the speed and efficiency of the code they write.</p>
<p>Type inference allows the language to infer what type is being used, removing the need for the developer to annotate every single variable in their code. This makes developing in OCaml faster than many other languages that lack type inference. OCaml also allows the developer to write complex algorithms without introducing bugs. The developer can easily optimise algorithms for greater speed without compromising performance and security. Furthermore, the presence and use of algebraic data types, higher order functions, and immutable data all make manipulating large and complex data structures much easier and faster.</p>
<p>It is also worth noting that OCaml offers several strong methods for debugging its programs. From the fast, interactive, REPL to the powerful symbolic replay debugger, OCaml lets you eliminate bugs at compile time and avoid them at runtime. This, in combination with how effective the debugging programs are, makes OCaml an easy language to debug. This saves developer time and increases the productivity and speed of programming.</p>
<p><em>The Takeaway</em> : The OCaml language is strong on performance and has a lot of features that make the code run fast, while also making the development process more efficient.</p>
<h3>6. OCaml is Multicore!</h3>
<p>With the imminent release of OCaml 5, the language will support the use of multiple cores and have enhanced infrastructure in place for concurrent programming. Both bring significant performance boosts, allowing users to increase the speed of their programs.</p>
<p>The new I/O library Eio can serve more than one million requests per second, outperforming Go’s <code>net/http</code> and closely matching Rust’s <code>hyper</code>. Writing concurrent code will also be much easier in OCaml 5, just like writing regular OCaml. The ‘function colouring problem,’ whereby concurrent and non-concurrent code are incompatible, will also be a thing of the past after the new release. With OCaml 5, both kinds of code can coexist with minimal intervention on the part of the programmer.</p>
<p>Multicore or parallel programming increases the efficiency of a program by several orders of magnitude. By letting the computer use more than one core to execute the code, the program can do several things simultaneously rather than consecutively. For complex tasks that take a long time, Multicore revolutionises their applicability, making them more realistic and time-efficient alternatives.</p>
<p>You can look forward to more posts on the <a href="/blog/">Tarides blog</a> about OCaml 5, the technology behind it, and its use cases.</p>
<p><em>The Takeaway</em> : OCaml 5 is coming, and with it, Multicore. OCaml will become even more powerful, both for parallel programming and concurrent programming, allowing users to significantly boost the speed of their projects.</p>
<h2>Conclusion</h2>
<p>Choosing the perfect programming language for you is an important but difficult task. It needs to be powerful and guarantee performance while simultaneously offering strong security features. Developers are also going to need state-of-the-art tools alongside responsive help and support. OCaml is a good candidate that offers all of the above, making it great for businesses looking for a versatile and robust programming language.</p>
<p>Combining the power of functional programming, Multicore, and open source, OCaml offers a potent mix of strong features and an engaged community. For more information about OCaml, you can visit the <a href="https://ocaml.org/about">OCaml Website</a> or <a href="/contact/">contact Tarides</a> to see how we can make OCaml work for you.</p>
]]></description><link>https://tarides.com/blog/2022-11-22-six-surprising-reasons-the-ocaml-programming-language-is-good-for-business</link><guid isPermaLink="false">https://tarides.com/blog/2022-11-22-six-surprising-reasons-the-ocaml-programming-language-is-good-for-business.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 22 Nov 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 at Open Source India 2022]]></title><description><![CDATA[<p>With OCaml 5 just around the corner, it's been a really exciting year to attend conferences all over the world. Just recently, I presented some highlights of the OCaml 5 update, building on <a href="https://www.youtube.com/watch?v=zJ4G0TKwzVc">KC Sivaramakrishnan's great keynote address</a> at the 19th annual <a href="https://opensourceindia.in">Open Source India 2022</a> conference. Since Tarides was invited to participate, I gave a talk on <em>OCaml 5: Language, Platform, and Ecosystem</em> by starting with OCaml's history and ending with a Multicore OCaml matrix implementation running on 120 cores!</p>
<p>Open Source India was held on the 29-30th September 2022 as a physical event at the NIMHANS Convention Centre, Bengaluru, India. It was organised by the <a href="https://www.opensourceforu.com/">Open Source For You</a> magazine team in India, with the help of community and industry participation. The conference ran along multiple parallel "tracks" - FOSS for Everyone, Developers, CXO Summit, DevOps, AI &amp; ML, Data Management, and IT Infrastructure. My talk was part of the Developers track on the second day.</p>
<h2>Day I</h2>
<p>The conference had many exhibits, and I interacted with a number of participants at the booths. <a href="https://www.mosip.io/">MOSIP</a> is an open source platform for national foundational identities. Some Governments implement a digital identity system for its citizens, and MOSIP provides a robust, scalable, open source platform for governance. While they currently use <a href="https://github.com/mosip/registration/blob/master/db_scripts/README.md">PostgreSQL</a> as their database backend, it would be useful to re-model their backend to use <a href="https://irmin.io/">Irmin</a> as the data store for security reasons.</p>
<p>Another Business-to-Consumer (B2C), open-source software application was <a href="https://www.chatwoot.com/">Chatwoot</a>, a customer engagement and support platform that also uses PostgreSQL. It would be an interesting data modeling or solution architect project to implement Irmin support for their chat application. The <a href="https://www.umwelten.xyz/dwelling/">Compossible Umwelten</a> company are working on wearable computing using ARM processors, and they were interested in exploring using OCaml, instead of C, for their customer products.</p>
<p>Post-lunch, I attended a talk on <em>Open Source at AWS</em> by Suman Debnath, Principal Developer Advocate, Data Engineering and Analytics at Amazon Web Services. We had the opportunity to discuss the possibility of providing the OCaml Platform and Products available through Amazon directly to end users.</p>
<p>In the afternoon, I took the time to attend the AI &amp; ML track. The <em>Adopting MLSecOps</em> talk was presented by Dibya Prakash, CTO and Principal Consultant at Neural Hub, and he introduced me to MLSecOps and best practices in the industry. This was followed by a talk on <em>Time Series Analysis: Anticipating Future with Darts</em> by Binitha MT and Subhankar Adak from Dell. It was an interesting first day at the conference with useful discussions on technology and real world experiences.</p>
<h2>Day II</h2>
<p>In the morning, I spent some time at the speaker's lounge reviewing the slides for OCaml 5, as well as setting up the demo for the Multicore OCaml code examples. I also had the chance to meet my colleague, Puneeth Chaganti, who works remotely from Bengaluru.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Vf8n4at-170w~ry3cQisUunFi66Ydit3syw.webp 170w, /blog/images/Vf8n4at-340w~JNygHqKDpIwD25R07JGnpQ.webp 340w, /blog/images/Vf8n4at-680w~yW7JmgJruaz26HT3WLuGAw.webp 680w, /blog/images/Vf8n4at-1360w~PprZ_dLaBTeaPuSCAdwpJw.webp 1360w" src="/blog/images/Vf8n4at-1360w~PprZ_dLaBTeaPuSCAdwpJw.webp" alt="Puneeth and Shakthi"></p>
<p>That afternoon, I gave my talk on <em>OCaml 5: Language, Platform, and Ecosystem</em>. After reviewing OCaml's history, I discussed the recent <a href="https://github.com/ocaml/ocaml/pull/10831">Multicore OCaml merge</a>, then delved into the syntax of the OCaml 5 language: basic types, operations, control structures, data structures, user types, functions, recursion, and I/O. Following this, I showed examples of <a href="https://github.com/ocaml-multicore/domainslib">domainslib</a> and <a href="https://github.com/ocaml-multicore/eio">Eio</a> before demonstrating the impressive Multicore OCaml matrix implementation running on 120 cores!</p>
<p>Additionally, I presented the various platform tools available in the OCaml community, including the <a href="https://opam.ocaml.org/">OCaml package manager (opam)</a>, the <a href="https://dune.build/">Dune</a> build system, <a href="https://ocaml.github.io/odoc/">odoc</a>, <a href="https://github.com/ocaml/ocaml-lsp">OCaml-LSP</a>, <a href="https://ocaml.github.io/merlin/">Merlin</a>, and <a href="https://github.com/realworldocaml/mdx">MDX</a>. I also introduced the following ecosystem projects:  <a href="https://github.com/ocaml-bench/sandmark">Sandmark</a> benchmarking suite, <a href="https://tezos.com/">Tezos</a> blockchain, <a href="https://irmin.io/">Irmin</a> database, <a href="https://mirageos.org/">MirageOS</a> library operating system, <a href="https://aantron.github.io/dream/">Dream</a> web framework, and <a href="https://ocaml.xyz/">OCaml Scientific Computing</a> project. I finished my talk with some useful references for OCaml. To my delight, the participants were curious to learn more!</p>
<p>The conference gave me a great opportunity to reach out to developers and make them aware of the current state of OCaml. It was good to share the platform and ecosystem projects with them so that they can get started with their contributions. I look forward to participating in more conferences and promoting the use of OCaml!</p>
]]></description><link>https://tarides.com/blog/2022-11-16-ocaml-5-at-open-source-india-2022</link><guid isPermaLink="false">https://tarides.com/blog/2022-11-16-ocaml-5-at-open-source-india-2022.html</guid><dc:creator><![CDATA[ Shakthi Kannan ]]></dc:creator><pubDate>Wed, 16 Nov 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Presenting on Algebraic Effects at FP-SYD]]></title><description><![CDATA[<p>At ICFP this year, <a href="https://kcsrk.info/">KC Sivaramakrishnan</a> gave two talks that put OCaml 5 in the spotlight: his <a href="https://youtu.be/zJ4G0TKwzVc">keynote</a>, “Retrofitting Concurrency - Lessons from the Engine Room,” and the <a href="https://speakerdeck.com/kayceesrk/ocaml-5-dot-0">opening presentation</a> of the OCaml workshop, “OCaml 5.0 - Concurrent and Parallel Programming.” <em>Effect Handlers</em> feature heavily, as they are the foundations on which concurrency primitives were added to OCaml. Since I knew very little about <em>effects</em> in this context, I asked KC for some pointers on where to start with learning about them. He pointed me to the <a href="https://koka-lang.github.io">Koka programming language</a>, encouraging me to set it up, play with it, and see how its type systems work with effects and effect handlers.</p>
<p>Following the usual tradition of learning something by committing to give a talk on it, I signed up to speak about <em>algebraic effects</em> at my local functional programming meetup, <a href="https://www.meetup.com/FP-Syd/">FP-SYD</a>. I figured that I would have a friendly audience that would be forgiving of excessive hand waving! :)</p>
<h3>Brain Food</h3>
<p>I had an absolute blast learning about <em>algebraic effects</em>! Koka was really easy to install and use. Its rich set of examples and excellent documentation make it work really well as a place to explore the concepts of effect systems.</p>
<p>That got me started, but what really hooked me was discovering Andrej Bauer’s paper, <em>What is Algebraic about Algebraic Effects and Handlers.</em> As a mathematician, I found it to be an accessible way of getting to the topic's theoretical underpinnings (see <a href="https://github.com/yallop/effects-bibliography#2018">here</a> for the paper and videos). Next, Matija Pretnar’s <a href="https://www.eff-lang.org/handlers-tutorial.pdf">tutorial</a>, <em>An Introduction to Algebraic Effects and Handlers,</em> complimented Bauer's work really well.</p>
<p>As I learned more about the topic, I realised just how deep an area this is, and I realised how little I would know about it in four weeks of enthusiastic (but decidedly surface-level) reading. So, I decided to pitch the talk as an overview or a survey of the field. To make things concrete, I would also focus on examples from the effect handling systems of Koka and OCaml 5.</p>
<h3>The Talk</h3>
<p>October’s FP-SYD was on the 19th. Around 6pm in Sydney’s Central Business District, about twenty people showed up, ate pizza, and settled into general functional programming-related geekery. Haskell is very strongly represented in this community, and it turned out that most of the audience had not spent much time with effect systems, as their tools for working with effects have been Monads, Monad Transformers, and accompanying abstractions.</p>
<p>I enjoyed talking through what I had learned about Algebraic Effects. Since I've worked on Haskell teams, I connected with the community on the various ways  computational effects are handled between pure and impure functional languages. It was great to reference Alexis King's recent work that resulted in delimited continuations <a href="https://github.com/ghc-proposals/ghc-proposals/blob/master/proposals/0313-delimited-continuation-primops.rst">being added to GHC.</a> The examples from Koka helped give folks a taste for the type systems that include effects and values. I also leaned on the excellent work of KC and the Tarides team, borrowing heavily from the material they have written on effect handlers for the OCaml manual.</p>
<p>The talk certainly delivered, as it got me started on the vast topic of <em>algebraic effects</em>. It also piqued the curiosity of at least a few people in my community, so that’s not nothing! :)</p>
<h3>Acknowledgements</h3>
<p>I am particularly grateful to KC for getting me curious and giving me material and inspiration to get started. Thanks also to Sudha Parimala for putting together a comprehensive tutorial on parallelism and effects in OCaml and for answering questions and making helpful suggestions.  My colleague and friend Tim McGilchrist runs the FP-SYD meetup, so he helped by asking a lot of really good questions, listening to a draft version of the talk, and generally providing support and encouragement. Thanks, Tim! :)</p>
<p>The slides of my talk are available on the FP-SYD <a href="https://github.com/fp-syd/meetings/blob/master/2022/2022-10-Keswani-Algebraic-Effects-Survey.pdf">repository</a>.</p>
]]></description><link>https://tarides.com/blog/2022-11-15-presenting-on-algebraic-effects-at-fp-syd</link><guid isPermaLink="false">https://tarides.com/blog/2022-11-15-presenting-on-algebraic-effects-at-fp-syd.html</guid><dc:creator><![CDATA[ Navin Keswani ]]></dc:creator><pubDate>Tue, 15 Nov 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Towards Minimal Disk-Usage for Tezos Bakers]]></title><description><![CDATA[<p>Over the last few months, Tarides has focused on designing,
prototyping, and integrating a new feature for Tezos bakers: automatic
context pruning for rolling and full nodes. This feature will allow
bakers to run Tezos with minimal disk usage while continuing to enjoy
<a href="/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed/">12x more responsive operations</a>. The first version has been
released with <a href="https://forum.tezosagora.org/t/octez-v15-0-has-been-released">Octez v15</a>. The complete, more optimised context pruning
feature will come with Octez v16. We encourage every Tezos baker to
upgrade and give feedback.</p>
<p><em>We have implemented context pruning for rolling and full nodes, which
requires ~35GB of disk for storing 6 cycles in the upper layer. In
Octez v15, each subsequent pruning run needs an additional 40GB, but
that space is recovered when the operation finishes. We plan to remove
that extra requirement in Octez v16.</em></p>
<h2>Improve Space Usage with Context Pruning</h2>
<p>The [Tezos context] is a versioned key/value store that associates for
each block a view of its ledger state. The versioning uses concepts
similar to Git. The current implementation is using <a href="https://irmin.org">irmin</a> as
backend and abstracted by the <a href="https://ocaml.org/p/tezos-shell-context/14.0/doc/index.html">lib_context</a> library.</p>
<p>We have been designing, prototyping, and integrating a new structure
for Irmin storage. It is now reorganised into two
layers: one upper layer that contains the latest cycles of the
blockchain, which are still in use, and a lower layer containing
older, frozen data. A new garbage collection feature (GC) periodically
restructures the Tezos context by removing unused data in the oldest
cycles from the upper layer, where only the data still accessible from
the currently live cycles are preserved. The first version of the GC,
available in Octez-v15, is optimised for rolling and full nodes and
thus does not contain a lower layer. We plan to extend this feature in
Octez-v17 to dramatically improve the archive nodes' performance by
moving the unused data to the lower layer (more on this below).</p>
<p>Garbage collection and subsequent compression of live data improves
disk and kernel cache performance, which enhances overall node
performance. Currently, rolling nodes operators must apply a
manual cleanup process to release space on the disk by discarding
unused data. The manual cleanup is tedious and error-prone. Operators
could discard valuable data, have to stop their baker, or try to devise
semi-automatic procedures and run multiple bakers to avoid
downtime. The GC feature provides rolling nodes operators
a fully automated method to clean up the unused data and guarantees
that only the unused data is discarded, i.e., <em>all</em> currently used data
is preserved.</p>
<p>The GC operation is performed asynchronously with minimal impact on
the Tezos node. In the rolling node's case, a GC'd context uses less
disk space and has a more stable performance throughout,
as the protocol operations (such as executing smart contracts or
computing baking rewards) only need data from the upper layer. As
such, the nodes that benefit from the store's layered structure don't
need to use the manual snapshot export/import—previously necessary when
the disk’s context got too big. In the future, archive nodes’
performance will improve because only the upper layer is needed to
validate recent blocks. <em>This means archive nodes can bake as reliably
as rolling nodes.</em></p>
<h2>Tezos Storage, in a Nutshell</h2>
<p>The Tezos blockchain uses <a href="https://irmin.org">Irmin</a> as the main storage component. Irmin
is a library to design Git-like storage systems. It has many backends,
and one of them is <a href="https://ocaml.org/p/irmin-pack/"><code>irmin-pack</code></a>, which is optimised for the Tezos use
case. In the followings, we focus on the main file used to store
object data: the store <code>pack</code> file.</p>
<p><strong>Pack file:</strong> Tezos state is serialised as immutable functional objects.
These objects are marshalled in a append-only <code>pack</code> file, one after the
other. An object can contain pointers to the file's earlier (but not
later!) objects. Pointers to an earlier object are typically
represented by the offset (position) of the earlier object in the
<code>pack</code> file. The <code>pack</code> file is append-only: existing objects are
never updated.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-11-10.tezos-GC/J7N0pil-170w~FoyO0qqXT26NO2jj3YjsEQ.webp 170w, /blog/images/2022-11-10.tezos-GC/J7N0pil-340w~O0_bSoo6adsIQz9adgh9wA.webp 340w, /blog/images/2022-11-10.tezos-GC/J7N0pil-680w~EKam_urWfA7rTVRg0tjPag.webp 680w, /blog/images/2022-11-10.tezos-GC/J7N0pil-1360w~IW99Ifom36edN4O-nmX-Ng.webp 1360w" src="/blog/images/2022-11-10.tezos-GC/J7N0pil-1360w~IW99Ifom36edN4O-nmX-Ng.webp" alt="An Irmin pack file as a sequence of objects"></p>
<blockquote>
<p>An Irmin <code>pack</code> file as a sequence of objects: | obj | obj | obj | ...</p>
</blockquote>
<p><strong>Commit objects:</strong> Some of the objects in the <code>pack</code> file are commit
objects. A commit, together with the objects reachable from that
commit, represents the state associated to a Tezos' block.  The
Tezos node only needs the last commit to process new blocks, but
bakers will need a lot more commits to compute baking rewards.
Objects not reachable from these commits can are unreachable or dead
objects.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-11-10.tezos-GC/DQJJLll-170w~56qxTiIxK1bSdGuPY0FmmQ.webp 170w, /blog/images/2022-11-10.tezos-GC/DQJJLll-340w~XCK1ms2wZJh1yEIuCN7kFw.webp 340w, /blog/images/2022-11-10.tezos-GC/DQJJLll-680w~dJMsW_yuklNHWqnPtEelWA.webp 680w, /blog/images/2022-11-10.tezos-GC/DQJJLll-1360w~EjjZHXa5fVggVW_4Lk1J7w.webp 1360w" src="/blog/images/2022-11-10.tezos-GC/DQJJLll-1360w~EjjZHXa5fVggVW_4Lk1J7w.webp" alt="The data-structure (mental model) representation of the pack file vs. its physical representation"></p>
<blockquote>
<p>The data-structure (mental model) representation of the <code>pack</code> file vs. its physical representation.</p>
</blockquote>
<p><strong>Archive nodes and rolling nodes:</strong> There are different types of
Tezos nodes. An archive node stores the complete blockchain
history from the genesis block. Currently, this is over <em>2 million</em>
blocks. Roughly speaking, a block corresponds to a commit. A
rolling node stores only the last <em>n</em> blocks, where <em>n</em> is chosen
to keep the total disk usage within some bounds. This may be as small
as 5 (or even less) or as large as 40,000 or more. Another type of
node is the "full node," which is between an archive node and a
rolling node.</p>
<p><strong>Rolling nodes, disk space usage:</strong> The purpose of the rolling node
is to keep resource usage, particularly disk space, bounded by only
storing the last blocks. However, the current implementation does
not achieve this aim. As rolling nodes execute, the <code>pack</code> file
grows larger and larger, and no old data is discarded. To get around
this problem, node operators periodically export snapshots of the
current blockchain state from the node, delete the old data,
and then import the snapshot state back.</p>
<p><strong>Problem summary:</strong> The main problem we want to avoid is Tezos users
having to periodically export and import the blockchain state to
keep the disk usage of the Tezos node bounded. Instead, we want to
perform context pruning via automatic garbage collection of unreachable
objects. Periodically, a commit should be chosen as the GC
root, and objects constructed before the commit that are not
reachable from the commit should be considered dead branches, removed from
the <code>pack</code> store, and the disk space reclaimed. The problem is that
with the current implementation of the <code>pack</code> file, which is just an
ordinary file, it is impossible to "delete" regions corresponding to
dead objects and reclaim the space.</p>
<h2>Automatised Garbage Collection Solution</h2>
<p>Consider the following <code>pack</code> file, where the <code>GC-commit</code> object has
been selected as the commit root for garbage collection:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-11-10.tezos-GC/ySiXa1r-170w~lsZwHAHexF48v62PDUbLrg.webp 170w, /blog/images/2022-11-10.tezos-GC/ySiXa1r-340w~2nJqRieO2BJSNLm1y8aPWg.webp 340w, /blog/images/2022-11-10.tezos-GC/ySiXa1r-680w~HG_CQp9RxPVAEcw7o5psKg.webp 680w, /blog/images/2022-11-10.tezos-GC/ySiXa1r-1360w~P-DgaKmdrxTaVDs8N8c33g.webp 1360w" src="/blog/images/2022-11-10.tezos-GC/ySiXa1r-1360w~P-DgaKmdrxTaVDs8N8c33g.webp" alt="A graph displaying the commit root, the GC-commit node and a written object node"></p>
<p>Objects that precede the commit root are either reachable from the
commit (by following object references from it) or not. For the
unreachable objects, we want to reclaim the disk space. For reachable
objects, we need to be able to continue to access them via their
offset in the <code>pack</code> file.</p>
<p>The straightforward solution is to implement the <code>pack</code> file using two
other data structures: the <code>suffix</code> and the <code>prefix</code>. The <code>suffix</code>
file contains the root commit object (<code>GC-commit</code>) and the live
objects represented by <em>all</em> bytes following the offset of <code>GC-commit</code>
in the <code>pack</code> file. The <code>prefix</code> file contains all the objects
reachable from the root commit, indexed by their offset. Note that the
reachable objects appear earlier in the <code>pack</code> file than the root
commit.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-11-10.tezos-GC/QVeXtOB-170w~tEvoGcgFePnyxGEKvMtYZg.webp 170w, /blog/images/2022-11-10.tezos-GC/QVeXtOB-340w~L-znXupx11zTnE70cze2sQ.webp 340w, /blog/images/2022-11-10.tezos-GC/QVeXtOB-680w~WSurFXSDNUkq9be6PeZOfQ.webp 680w, /blog/images/2022-11-10.tezos-GC/QVeXtOB-1360w~d1mSqVBS9nomeJgoazu5Sg.webp 1360w" src="/blog/images/2022-11-10.tezos-GC/QVeXtOB-1360w~d1mSqVBS9nomeJgoazu5Sg.webp" alt="The layered structure of the pack file with prefix+suffix as the upper layer."></p>
<blockquote>
<p>The layered structure of the <code>pack</code> file with <code>prefix</code>+<code>suffix</code> as the upper layer.</p>
</blockquote>
<p>Reading from the <code>pack</code> file is then simulated in an obvious way: if
the offset is for the <code>GC-commit</code>, or later, we read from the <code>suffix</code>
file, and otherwise, we lookup the offset in the <code>prefix</code> and return
the appropriate object. We only access the reachable objects in the
<code>prefix</code> via their offset.  We replace the Irmin <code>pack</code> file with
these two data structures. Every time we perform garbage collection
from a given <code>GC-commit</code>, we create the next versions of the <code>prefix</code>
and <code>suffix</code> data-structures and <em>switch</em> from the current version to the next
version by deleting the old <code>prefix</code> and <code>suffix</code> to reclaim
disk space. Creating the next versions of the <code>prefix</code> and <code>suffix</code>
data-structures is potentially expensive. Hence, we implement these steps in a
separate process, the <em>GC worker</em>, with minimal impact on the running
Tezos node.</p>
<p><strong>Caveat:</strong> Following Git, a commit will typically reference its
parent commit, which will then reference its parent, and so
on. Clearly, if we used these references to calculate object
reachability, all objects would remain reachable forever. However,
this is not what we want, so when calculating the set of reachable
objects for a given commit, we ignore the references from a commit
to its parent commit.</p>
<h2>The <code>prefix</code> Data-Structure</h2>
<p>The <code>prefix</code> is a persistent data-structure that implements a map from
the offsets in <code>pack</code> file to objects (the marshalled bytes
representing an object). In our scenario, the GC worker creates the
<code>prefix</code>, which is then read-only for the main process. Objects are
never mutated or deleted from the <code>prefix</code> file. In this setting, a
straightforward implementation of an object store suffices: we store
reachable objects in a data file and maintain a persistent <code>(int → int)</code> map from "offset in the original <code>pack</code> file" to "offset in the
<code>prefix</code> file."</p>
<p><strong>Terminology:</strong> We introduce the term "virtual offset" for "offset in
the original <code>pack</code> file" and the term "real offset" for "offset in
the <code>prefix</code> file." Thus, the map outlining virtual offset to real
offset is made persistent as the <code>mapping</code> file.</p>
<p><strong>Example:</strong> Consider the following, where the <code>pack</code> file contains
reachable objects <code>o1</code> .. <code>o10</code>, (with virtual offsets <em>v1 .. v10</em>, respectively):</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-11-10.tezos-GC/JKWA4ff-170w~F_ZHwWsr93ffLLI4P60A6w.webp 170w, /blog/images/2022-11-10.tezos-GC/JKWA4ff-340w~tqCeyOSgHZvjF82Ij_IYaQ.webp 340w, /blog/images/2022-11-10.tezos-GC/JKWA4ff-680w~fwLcxkd5J-wycQmqsgc5lQ.webp 680w, /blog/images/2022-11-10.tezos-GC/JKWA4ff-1360w~2uOeIXhU4CoCIGUfyHV0Hg.webp 1360w" src="/blog/images/2022-11-10.tezos-GC/JKWA4ff-1360w~2uOeIXhU4CoCIGUfyHV0Hg.webp" alt="A complicated example graph"></p>
<p>Note that the objects <code>o1</code> .. <code>o10</code> are scattered throughout the
<code>pack</code> file where they appear in ascending order (i.e., <em>v1 &lt; .. &lt;
v10</em>). The <code>prefix</code> file contains the same objects but with different
"real" offsets <em>r1..r10</em>, as now the objects <code>o1 .. o10</code> appear one
after the other. The <code>mapping</code> needs to contain an entry <em>(v1 → r1)</em>
for object <code>o1</code> (and similarly for the other objects) to relate the
virtual offset in the <code>pack</code> file with the real offset in the <code>prefix</code>
file.</p>
<p>To read from "virtual offset <em>v3</em>" (say), we use the map to retrieve
the real offset in the <code>prefix</code> file (i.e., <em>r3</em>) and then read the object
data from that position.</p>
<h2>Asynchronous Implementation</h2>
<p>Tezos Context pruning is performed periodically. We want each round of
context pruning to take place asynchronously with minimal impact
on the main Tezos node. For this reason, when a commit is chosen as
the GC root, we fork a worker process to construct the next <code>prefix</code>
and <code>suffix</code> data structures. When the GC worker terminates, the <code>main</code> process
handles worker termination. It switches from the current
<code>prefix</code>+<code>suffix</code> to the next and continues operation. This
switch takes place almost instantaneously. The hard work is done in
the worker process as depicted next:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-11-10.tezos-GC/lob23OH-170w~WUTdwLeZ_mQqmDKNWjhttA.webp 170w, /blog/images/2022-11-10.tezos-GC/lob23OH-340w~fZ1_FrkcSHYowTNGHwJKaw.webp 340w, /blog/images/2022-11-10.tezos-GC/lob23OH-680w~DCBG8Yag1yN8JjbiQqmBNQ.webp 680w, /blog/images/2022-11-10.tezos-GC/lob23OH-1360w~iyZaKXWvdEKCP26JJPt9AA.webp 1360w" src="/blog/images/2022-11-10.tezos-GC/lob23OH-1360w~iyZaKXWvdEKCP26JJPt9AA.webp" alt="State machine for the main and worker processes"></p>
<p><strong>Read-only Tezos nodes:</strong> In addition to the main Tezos read/write
node that accesses the <code>pack</code> store, several read-only nodes also
access the <code>pack</code> store (and other Irmin data files) in read-only
mode. These must be synchronised when the switch is made from the
current <code>prefix</code>+<code>suffix</code> to the next <code>prefix</code>+<code>suffix</code>. This
synchronisation makes use of a (new) single control file.</p>
<h2>Further Optimisations in the Octez Storage Layer</h2>
<p>The context pruning via automatic garbage collection performs well and
within the required constraints. However, it is possible to make
further efficiency improvements. We next describe some potential
optimisations we plan to work on over the next months.</p>
<p><strong>Resource-aware garbage collection:</strong></p>
<p>The GC worker intensively uses disk, memory, and OS resources. For
example, the disk and memory are doubled in size during the
asynchronous execution of the GC worker. We plan to improve on this by
more intelligent use of resources. For example, computing the
reachable objects during the GC involves accessing earlier objects,
using a lot of random-access reads, with unpredictable latency. A more
resource-aware usage of the file system ensures that the objects are
visited (as much as possible) in the order of increased offset on
disk. This takes advantage of the fact that sequential file access is
much quicker and predictable than accessing the file randomly. The work on
context pruning via a resource-aware garbage collection is planned to
be included in Octez v16.</p>
<p><strong>Retaining older objects for archive nodes:</strong></p>
<p>Archive nodes contain the complete blockchain history, starting from
the genesis block. This results in a huge store <code>pack</code> file, many
times larger than the kernel’s page cache. Furthermore, live objects
are distributed throughout this huge file, which makes it difficult
for OS caching to work effectively. As a result, as the store becomes
larger, the archive node becomes slower.</p>
<p>In previous prototypes of the layered store, the design also included a
"lower" layer. For archive nodes, the lower layer contained all the
objects before the most recent <code>GC-commit</code>, whether they were reachable
or not. The lower layer was effectively the full segment of the
<code>pack</code> file before the GC commit root.</p>
<p>One possibility with the new layout introduced by the GC is to retain the
lower layer whilst still sticking with the <code>prefix</code> and <code>mapping</code> files
approach and preferentially reading from the <code>prefix</code> where
possible. The advantage (compared with just keeping the full <code>pack</code>
file) is that the <code>prefix</code> is a dense store of reachable objects,
improving OS disk caching and the snapshot export performance for
recent commits. In addition, the OS can preferentially cache the
<code>prefix</code>&amp;<code>mapping</code>, which enhances general archive node performance
compared with trying to cache the huge <code>pack</code> file. As baking
operations only need to access these cached objects, their performance
will be more reliable and thus will reduce endorsement misses
drastically. However, some uses of the archive node, such as
responding to RPC requests concerning arbitrary blocks, would still
access the lower layer, so they will not benefit from this
optimisation. The work on improving performance for archive nodes is
planned for Octez v17.</p>
<h2>Conclusion</h2>
<p>With the context pruning feature integrated, Tezos rolling and full nodes
accurately maintain all and only used storage data in a performant,
compact, and efficient manner. Bakers will benefit from these changes in
Octez v15, while the feature will be included in archive nodes in
Octez v17.</p>
]]></description><link>https://tarides.com/blog/2022-11-10-towards-minimal-disk-usage-for-tezos-bakers</link><guid isPermaLink="false">https://tarides.com/blog/2022-11-10-towards-minimal-disk-usage-for-tezos-bakers.html</guid><dc:creator><![CDATA[ Irmin Team ]]></dc:creator><pubDate>Thu, 10 Nov 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[The MirageOS Retreat: A Journey of Food, Cats, and Unikernels]]></title><description><![CDATA[<p><a href="https://mirage.io/">MirageOS</a> is an OCaml ecosystem to construct <a href="https://en.wikipedia.org/wiki/Unikernel">unikernels</a>, i.e., minimal operating systems. Here, we write about our social and technical experience at the MirageOS retreat in Morocco, as well as the vibe and wonderful organisational details. To sum up the technical part, we worked on different facets of the MirageOS world: different kinds of unikernels, some groundwork for Raspberry Pi 4 bare-metal unikernels, and a workflow to leverage an existing deployment/orchestrating infrastructure. The MirageOS retreat was amazing!</p>
<h2>About Our Journey</h2>
<p>Our journey started in Agadir, a Moroccan city right on the coast of the Atlantic sea, just south of the Atlas mountains. In Agadir, we had the best fish in the world (according to some) and amazing "cornes de gazelle," a delicious sample of Moroccan culture.</p>
<p>From Agadir, we went to Mirleft, a small town further south, full of square roads and beautiful reefs. That's where the MirageOS retreat took place. The venue had a kitchen and an amazing cook, a place for computers and presentations, a garden with a small pool, and a rooftop with dusty but nice views of the coast.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/mirleft_view-170w~ethL-inrEx5xee4Obq7HQw.webp 170w, /blog/images/mirleft_view-340w~IFwcgHfrqnvGIdEC-O5yGg.webp 340w, /blog/images/mirleft_view-680w~4h-DP2WGEfQlp6XfGC-fHA.webp 680w, /blog/images/mirleft_view-1360w~xTNvW-PfjxELwfMRSzpW2Q.webp 1360w" src="/blog/images/mirleft_view-1360w~xTNvW-PfjxELwfMRSzpW2Q.webp" alt="Beautiful Mirleft Sunset"></p>
<h2>About the Retreat</h2>
<p>Both the venue and Mirleft as a whole were extremely inspiring in many ways. One of which included hacking on MirageOS, which was the main reason we came--of course, but we also enjoyed amazing food, saw old and new friends, and had a great time collaborating and creating with MirageOS.</p>
<p>At least once a year since the <a href="https://mirage.io/blog/2016-spring-hackathon">first MirageOS retreat in 2016</a> (with a Covid break in 2021), people get together and work on anything related to MirageOS. These retreats provide a great atmosphere, working environment, and everything else that's needed to be productive and to have a wonderful time.</p>
<p>Besides, the retreat is always a nice opportunity to <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food">eat our own dog food</a>.</p>
<p>The organiser, <a href="https://twitter.com/h4nnes">Hannes</a> (among others), always makes sure that most of the infrastructure we rely on is running on MirageOS as much as possible. A welcome addition this year was a local <a href="https://hannes.robur.coop/Posts/OpamMirror">opam cache</a>, which allowed us to download and install packages without crushing the data allowance on the SIM card installed on our main access point.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Mirleft_venue-170w~e5uqXqAngyrcHWtKeiiQWw.webp 170w, /blog/images/Mirleft_venue-340w~LGMQx9vmeSO4Uod2pybyEA.webp 340w, /blog/images/Mirleft_venue-680w~Y_2OdfuAH-WdvjweskCwUg.webp 680w, /blog/images/Mirleft_venue-1360w~58rgxX8X5VUpexpnf7t8Pw.webp 1360w" src="/blog/images/Mirleft_venue-1360w~58rgxX8X5VUpexpnf7t8Pw.webp" alt="Lunch at the MirageOS Retreat"></p>
<h2>About MirageOS</h2>
<p><a href="https://mirage.io/">MirageOS</a> is an ecosystem that constructs unikernels. In a superficial nutshell, a <a href="https://en.wikipedia.org/wiki/Unikernel">unikernel</a> is a machine image that contains one process and a minimal set of operating system features the process requires. Unikernels are designed to be secure, efficient, and small. MirageOS unikernels are written in OCaml, a functional, semantically rich and type-safe programming language.</p>
<p>MirageOS can be used in a wide range of settings, like robust reimplementations of core system services and protocols like (<a href="https://github.com/mirage/ocaml-dns">DNS</a>, <a href="https://github.com/mirage/awa-ssh">SSH</a>, <a href="https://github.com/mirleft/ocaml-tls">TLS</a>, and <a href="https://github.com/mirage/">many more</a>), as well as higher level applications like <a href="https://hannes.robur.coop/Posts/OpamMirror">web services</a>. It's also on its way to become a good candidate for bare-metal applications on various chipsets (e.g., a <a href="https://github.com/dinosaure/gilbraltar">good choice for the Raspberry Pi 4</a>. See also the section below on <em>Implementing a Jack Port Driver</em>)</p>
<h2>Our Projects &amp; What We Learned</h2>
<p>We worked on lots of interesting things, but let's start with the ones that directly relate to MirageOS.</p>
<h3>Deploying Albatross on Nixos: No More iptables Debugging</h3>
<p><a href="https://github.com/roburio/albatross">Albatross</a> is an orchestrator for MirageOS unikernels. It runs on a Linux system and manages unikernels using Solo5. It's made of several services, one of which is the remote TLS endpoint, which accepts requests from the network to manage the orchestrator.</p>
<p>Some of us wanted to run Albatross on our favourite Linux distribution, <a href="https://github.com/NixOS/nixpkgs">NixOS</a>, and we hoped to be able to hack around this quickly; however, it turned out to be harder than expected. We learned so much about systemd and networking while doing this project.</p>
<p>A Nix flake (a new way of defining packages, which comes with many rough edges) and a NixOS module are added to the main repository in <a href="https://github.com/roburio/albatross/pull/120">this PR</a>.
To test that it works and to play with it, we've written a <a href="https://github.com/Julow/albatross-nixos-example">small tutorial</a> that explains how to build a Qemu VM with Albatross and how to deploy a unikernel using the remote TLS endpoint.</p>
<h3>Coffee Chat Bot: a Friendly Unikernel for a Friendly Work Environment</h3>
<p>Some of us worked on deploying a coffee chat bot as a MirageOS unikernel. Contrary to how it sounds, it isn't a robot that serves coffee (which would be extremely awesome)! Instead, it's a Slack bot that lets people on our company's Slack channel to opt-in for a coffee chat with a colleague. The coffee chat bot then matches each opt-in randomly with another opt-in.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/mirleft_coffeebot-170w~92nBpW2tXfAJDRD4PayJLQ.webp 170w, /blog/images/mirleft_coffeebot-340w~wP8CwYrZyPIFWlyUTPrVpA.webp 340w, /blog/images/mirleft_coffeebot-680w~KvqTG8nyx--YFIDJREmrGA.webp 680w, /blog/images/mirleft_coffeebot-1360w~-2BcYXmvaHTDzIRLu0_TTQ.webp 1360w" src="/blog/images/mirleft_coffeebot-1360w~-2BcYXmvaHTDzIRLu0_TTQ.webp" alt="Coffee Chat Bot"></p>
<p>This bot was already written in OCaml, and it's merely a single process. Due to its nature, it doesn't need to do any super complicated operating system stuff, so a natural question came to us: why not make a unikernel out of it?</p>
<p>So <a href="https://github.com/pitag-ha/slack_bot">we did</a>.</p>
<p>Making a unikernel out of a relatively simple application sounds rather straightforward. The first step was to get rid of all Unix operations. It's incredible how many small Unix calls we were doing without even noticing. For example, we were using <code>Unix.time</code> all over the place, such as scheduling, providing a seed for the random library, and giving timestamps to our database entries.</p>
<p>The database posed another problem. We had been using <code>irmin-unix</code>, which writes to disk using Unix. To fix that, now we use <code>irmin-mem</code>, which writes to memory. We persist (and inspect) the data by syncing our in-memory database with a GitHub repository. If you're not familiar with Irmin (a MirageOS library), its design follows the principles of the Git design and provides a library called <code>irmin-git</code> to bridge the two.</p>
<p>Providing the network stack needed for the Git (and also for the Slack API) communication is one of the typical tasks the operating system needs do. In our case, that's MirageOS. It has a concept called "devices," which are the operating system features your unikernel might need. Examples of "devices" are network interfaces, network stacks, filesystems (which we didn't need), and monotonic time sources. MirageOS will provide a concrete implementation of such a device at your unikernel's compile time, as long as you declare the device in the MirageOS configuration file <code>config.ml</code>.</p>
<p>The things described were just a small part of our nice, educational journey making a coffee chat unikernel. One more detail that's worth mentioning:  the bot now uses <a href="https://github.com/dinosaure/paf-le-chien"><code>httpaf</code></a> for the Slack API interactions. Before, it was using <code>cohttp</code>, which is already independent from Unix (unlike, for example, the OCaml <code>curl</code> wrapper <code>curly</code>). Porting it to <code>httpaf</code> wasn't technically necessary, but it was a great way to get to know and test the latest "cutting-edge" unikernel features.</p>
<h3>Implementing a Jack Port Driver, or How to Make a Unikernel Sing Bare-Metal</h3>
<p>We also went bare-metal during the retreat. "Bare-metal" sounds cool, doesn't it? Let us explain what we really mean by it. Often, the way to run a MirageOS unikernel is as follows:</p>
<ul>
<li>You have a Linux kernel on your machine and virtualize it via a hypervisor such as KVM.</li>
<li>That hypervisor is then abstracted further by a tool called Solo5 which integrates well with MirageOS unikernels.</li>
</ul>
<p>With this workflow, the communication between the unikernel and the hardware goes over several layers of abstraction. A "bare-metal" unikernel, on the contrary, communicates with the hardware directly, without any interfacing kernel such as Linux. The device we chose to do bare-metal work on is the Rasperry Pi 4 (RPi4).</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Mirleft_rpi4-170w~pMmb2u_8a9LMbQVucVTaow.webp 170w, /blog/images/Mirleft_rpi4-340w~fqIlKy-avpu1wRMMCsAhYg.webp 340w, /blog/images/Mirleft_rpi4-680w~rvMAE2DPFUgD2zUN_Fjoww.webp 680w, /blog/images/Mirleft_rpi4-1360w~-Hyt7xkQ50VMFqLzFPyoRg.webp 1360w" src="/blog/images/Mirleft_rpi4-1360w~-Hyt7xkQ50VMFqLzFPyoRg.webp" alt="Work set-up for RPi4 hacking"></p>
<p>So we needed an RPi4 bare-metal OCaml runtime. Luckily, Dinosaure wrote one last year: <a href="https://github.com/dinosaure/gilbraltar">Gilbraltar</a>. It also dumps the text of OCaml print statements into the UART, which is a technical way of saying that we can send such text over USB (concretely over a USB to serial TTL cable) and see it. Quite useful for debugging!</p>
<p>As you can see, doing bare-metal work is quite restrictive and everything that tends to be taken for granted needs to be implemented, like drivers, for example.</p>
<p>So that's what we decided to do.</p>
<p>Last year, some colleagues already implemented a driver for LED strips and powered our office's <a href="https://twitter.com/Dinoosaure/status/1471128595154231300">Christmas tree</a> with a bare-metal OCaml RPi4! What is cooler than <a href="/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4/">making our bare-metal RPi4 Christmas tree sing</a>? Well, a lot of things are. Anyways, we love music, so we decided to implement a jack port driver.</p>
<p>Jack port drivers on a digital device are an interesting concept. Digital devices are digital, but jack ports expect analog data. One way the RPi4 can handle that is via a concept called PWM: Pulse Width Modulation. The PWM modulates analog signals (i.e., values between 0 and 1) by sending digital signals (i.e., either 0 or 1) really fast.</p>
<p>That modulation is done on the hardware side of the RPi4, concretely on a RPi4 <em>peripheral</em> also called PWM. <a href="https://datasheets.raspberrypi.com/bcm2711/bcm2711-peripherals.pdf">Peripherals</a> are RPi4 hardware devices that are mapped to specific address ranges in the RPi4's memory. You communicate with them by writing to or reading from those locations in memory. The address range of each peripheral is structured into registers. One example of a register of the PWM is the PWM FIFO, i.e., the hardware queue that stores the data flowing from the program to the jack port.</p>
<p><a href="https://github.com/pitag-ha/rpi/blob/jack-port-driver-on-interrupts/src/peripherals/pwm.ml">Our jack port driver</a> does two things--both by writing to and reading from the right places in the PWM memory range.</p>
<ol>
<li>It can initiate the RPi4 for jack port communication (e.g., it sets the RPi4's clock to the correct frequency at which the port reads data from the FIFO, and it configures the correct modes to ensure the right data flow).</li>
<li>It can send music to the jack port (by writing data to the FIFO--without overflowing it).</li>
</ol>
<p>To use the new driver, we convert music into the right binary format by simply using <code>ffmpeg</code>. Then we <a href="https://github.com/pitag-ha/rpi/blob/jack-port-driver-on-interrupts/test/bare-metal/jack_port/main.ml">write a program</a> with that music in-memory using the MirageOS tool <a href="https://github.com/mirage/ocaml-crunch"><code>ocaml-crunch</code></a>. That program just calls the driver to do the rest and is compiled for the RPi4 target with <code>gilbraltar</code>.</p>
<p>This work is strongly related to MirageOS in three ways. First, the program playing music bare-metal on the RPi4 is a unikernel written in OCaml. Second, the program is compiled with <code>gilbraltar</code>, which forms part of the MirageOS ecosystem and whose design and implementation is based on core tools in the MirageOS ecosystem, such as Solo5 and <code>ocaml-solo5</code>. Third, by adding one layer of abstraction to the jack port driver, we can make it a MirageOS "device," so one could use the driver while also leveraging other MirageOS features that work bare-metal on a RPi4.</p>
<h3>Monitoring mirage.io and Chasing Memory Leaks</h3>
<p>One of the MirageOS goals is to be able self host our infrastructure. At the retreat, many tools we used were based on the MirageOS ecosytem: a DNS resolver (<a href="https://github.com/mirage/ocaml-dns">mirage/ocaml-dns</a>), an opam repository cache (<a href="https://git.robur.io/robur/opam-mirror">robur/opam-mirror</a>), and a portable file transfer application (<a href="https://github.com/dinosaure/bob">dinosaure/bob</a>). It's not a surprise that the official website, <a href="https://mirage.io">mirage.io</a>, is a unikernel itself. However, in the past six months, we experienced two website crashes due to <code>Out_of_memory</code> exceptions. The unikernel is configured to run with 1GB of RAM, so that's a slow running memory leak that requires investigation.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/mirleft_monitoring-170w~X0_k8cS9qWH11VHPdxw9zg.webp 170w, /blog/images/mirleft_monitoring-340w~_1oj5pCc842lZD0NApEkug.webp 340w, /blog/images/mirleft_monitoring-680w~Y5iXetauzYwTKqrqPCAwrQ.webp 680w, /blog/images/mirleft_monitoring-1360w~ELyZSQ6KKQLfUqHPmtW_pw.webp 1360w" src="/blog/images/mirleft_monitoring-1360w~ELyZSQ6KKQLfUqHPmtW_pw.webp" alt="Monitoring mirage.io"></p>
<p>The question is how to investigate such a leak.</p>
<h4>Locally?</h4>
<p>The initial attempt consisted in <em>tracing</em> memory allocations using <code>statmemprof</code> while bombarding the server with requests by using benchmarking tools such as ApacheBench (<code>ab</code>) or <code>wrk</code>. <code>statmemprof</code> is an implementation of <em>Statistical Memory Profiling</em> in the OCaml runtime. It enables sampling allocations at a fixed rate and tracing values until they are garbage collected. Using <a href="https://github.com/janestreet/memtrace_viewer">memtrace-viewer</a>, one can analyse the memory usage and see which values are still live when the program goes out of memory, for example. For a unikernel with network access, it's possible to add an endpoint to enable tracing on demand: <a href="https://github.com/roburio/memtrace-mirage">roburio/memtrace-mirage</a>.</p>
<p>Unfortunately, this setup didn't help us identify the leak. Indeed, we can still expect the server to work fine under normal conditions. Somehow we need to understand which rare event, leaking a small bit of memory at a time, is happening enough times to consume all available memory.</p>
<h4>Monitoring the Live Unikernel</h4>
<p>Only the <em>Real™</em> Internet would tell us the answer, so we monitored the live unikernel application. <a href="https://github.com/roburio/mirage-monitoring">roburio/mirage-monitoring</a> was of great help, as it enables two things:</p>
<ul>
<li>Reporting application-wide metrics to an InfluxDB endpoint, which can be displayed using Grafana</li>
<li>Changing logs level/metrics sources at runtime</li>
</ul>
<p>Adding <code>mirage-monitoring</code> to a unikernel was surprisingly easy. It was only a matter of updating the configuration file with some functoria voodoo: https://github.com/mirage/mirage-www/pull/767. At some point, it will upstreamed in the <code>mirage</code> tool so that adding monitoring is a single-line job. The hard part was providing the unikernel two network stacks to expose one to the internet while keeping the other for internal use only.</p>
<p>Next, we set up a typical Grafana deployment using InfluxDB/Telegraf for the metrics input and data storage. Logs were displayed using <code>albatross-client-local</code>.</p>
<h4>Chasing the Leak</h4>
<p>Now we can see the numbers for the live website. Memory usage, indeed, but also other metrics were included by default, such as the number of established connections in the TCP stack. There we found the source of the leak. Throughout the day, the number of established TCP connections kept increasing.</p>
<p>Finally at runtime, we temporarily changed the TCP stack's log level to <em>debug</em>, monitor the logs, and wait for the moment where the number of established TCP connections would increase without decreasing afterwards. These logs described what was going on in the TCP stack at the exact moment the connection leak happened. At this point, we figured out that it occurred when a client connected to the server but fail to perform the TLS handshake, so the server dropped the connection without <em>closing</em> it--hence leaking it <em>forever</em>.</p>
<p>Here we go: <a href="https://github.com/dinosaure/paf-le-chien/pull/72"><em>one less leak</em></a>.</p>
<h4>Next Steps</h4>
<p>Matching <em>logs</em> and <em>metrics</em> to inspect them together has proven to be very useful. We used Grafana for metrics, so the next step would be to also provide logs because Grafana supports structured logging through the <a href="https://grafana.com/oss/loki/">Loki</a> logs aggregation system.</p>
<h3>MirageHole - A Unikernel DNS Resolver with Holes</h3>
<p>One way to stop web trackers, advertisements, and malware is to block access to sites known to contain such things. A popular approach is through browser extensions like <a href="https://en.wikipedia.org/wiki/AdBlock">AdBlock</a> and <a href="https://en.wikipedia.org/wiki/Privacy_Badger">Privacy Badger</a>. Another approach known as <a href="https://en.wikipedia.org/wiki/DNS_sinkhole">a DNS sinkhole</a> involves installing a local DNS server that resolves bad domains to an invalid IP address. This approach has the advantage of working across different operating systems, browsers, and devices (laptops, smartphones, smart-TVs, etc.). For an added bonus, it can also save network bandwidth.</p>
<p>Another project initiated during this year's retreat was to implement <a href="https://github.com/jmid/mirage-hole">Mirage-hole</a>: a DNS sinkhole running as a Mirage Unikernel. It was inspired by <a href="https://en.wikipedia.org/wiki/Pi-hole">Pi-hole</a> for the Raspberry Pi. Starting from a DNS-stub example from <a href="https://github.com/roburio/dnsvizor">dnsvizor</a> (and after a bit of network debugging), we got a unikernel running that would block a single selected domain. We then extended this to fetch and parse <a href="https://github.com/blocklistproject/Lists">a blocklist</a> at start-up. Next, we worked on integrating a little webserver to serve statistics about the requested and blocked domains. Overall, the project was a nice opportunity to talk to and learn from several MirageOS contributors, and it served as a nice tour-de-force of several MirageOS networking libraries.</p>
<h3>Tarides Map - Serving the Tarides Geographical Distribution in a Unikernel</h3>
<p>Tarides Map is a project intended to show the geographic distribution of all Tarides collaborators as a website. At the retreat, we explored deploying the site in a unikernel. To do this, we had to decide how to serve the files on the server and integrate it into a unikernel. We had two options use <code>ocaml-crunch</code> or Docteur.</p>
<p>We initially used <a href="https://github.com/dinosaure/docteur">Docteur</a> due to an inspiration from a different project called <a href="https://github.com/dinosaure/pasteur">Pasteur</a>, which uses Docteur and is deployed in a unikernel as a static site, which was exactly what we were aiming to do with Tarides Map. However, integrating Docteur into the project proved to be more difficult than we had expected. One reason was that Solo5 isn't currently supported on MacOS, the operating system used to write the project at the retreat. After compiling to Unix instead and numerous hours debugging, we were eventually able to generate the disk image; however, we still had issues deploying it in a unikernel, so we decided to try using <a href="https://github.com/mirage/ocaml-crunch">ocaml-crunch</a> instead.</p>
<p>Using <code>ocaml-crunch</code> proved to be a more straightforward option. We merely had to move some files around so that the directory structure could be turned into a standalone OCaml module to serve the file contents without requiring an external filesystem to be present. After doing this, we were successfully able to deploy the site <a href="https://github.com/SaySayo/tarides_map_static_website">here</a>.</p>
<h2>What We Dreamed About</h2>
<p>Another very interesting part of the retreat were the <em>dreaming sessions</em> organized by Hannes. The central idea behind this exercice was to allow ourselves to dream about how we envision the MirageOS project in the future, no matter how untangible and seemingly unrealistic. We talked about those dreams in two sessions.</p>
<p>The initial session revolved around gathering these dreams and ideas, without discussing how to achieve them, and let our mind go free with what we wanted to accomplish with MirageOS. Often times, those dreams would be shared with other participants. Some dreamed about replacing their whole software infrastructure by MirageOS, if not their main operating system! Others dreamed of artistic applications for Mirage, like using it as a backbone for musical endeavors.</p>
<p>The subsequent session revolved around <em>how</em> we could reach those dreams. This facilitated a more practical discussion around the challenges we may face along the way. Interestingly enough, in some instances, it turned out some dreams were either already achieved (like reverse-debugging Solo5!) or were close to being achievable.</p>
<p>A beautiful example of the attendees' dedication is that it did not take long for some to start working on projects like MirageOS-OS, a hypervisor for MirageOS unikernels and written with MirageOS, or to successfully implement a jack port driver for the Raspberry Pi 4, bringing us closer to MirageOS powered synthesisers and to MirageOS midi interfaces!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Mirleft_reefs-170w~8cw9esI8qdYOBzhu4EzeCA.webp 170w, /blog/images/Mirleft_reefs-340w~nWJuv0V9qpDugWsLdp6HAQ.webp 340w, /blog/images/Mirleft_reefs-680w~7ecJSoWjZKBtyW7psxZ_vQ.webp 680w, /blog/images/Mirleft_reefs-1360w~R3Ms3mWi_LE0tzMRA_P8KQ.webp 1360w" src="/blog/images/Mirleft_reefs-1360w~R3Ms3mWi_LE0tzMRA_P8KQ.webp" alt="Mirleft's Beautiful Reefs"></p>
<h2>Inventing OCamlwave, Serenading Cats, and Christening Dogs</h2>
<p>As mentioned above, the retreat was extremely inspiring, even with respect to topics less related to MirageOS than the ones mentioned here. The one we're most proud of is <a href="https://www.youtube.com/playlist?list=PLmaiK3-DyqMy3kNjdHIPUEo-Gkltha3mT">our Mirleft MirageOS EP</a> that contains five tracks (<em>five</em> in the spirit of Solo<em>5</em> and OCaml <em>5.0</em>, of course). Its genre might be better described as <em>OCamlwave</em>! On our EP, you will find many musical oddities ranging from an on-premise recorded drum solo (with glasses, cloth-racks, and flip-flops) to a cat-powered cover of <em>Mr Sandman</em> (as an hommage to our time singing to Morroco's many, <em>many</em> cute cats.) to the occasional dramatic rendition of controversial pull requests on the OCaml compiler.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/mirleft_cats-170w~UhQ-E_TtsOH9YWWDUE2mLQ.webp 170w, /blog/images/mirleft_cats-340w~piVOEbA4JkB_-AbstJYxxw.webp 340w, /blog/images/mirleft_cats-680w~hSc6zQgPdhyrSqzm6lDeSw.webp 680w, /blog/images/mirleft_cats-1360w~CiZSJbD49O6qdKVPUt9RWQ.webp 1360w" src="/blog/images/mirleft_cats-1360w~CiZSJbD49O6qdKVPUt9RWQ.webp" alt="Cute street kittens in Mirleft"></p>
<p>However, not everything in Mirleft was about music and animals. Some things were also about the beauitful waves.</p>
<p>Mirleft is a paradise for surfing, both for beginners and advanced surfers! We went to a nice sandy beach with perfect conditions to get started with surfing. Advanced surfers would probably go to one of the reefs for surfing, which we, in turn, found amazing for a peaceful walk, sometimes with and sometimes without company from a street dog we christened <code>null</code>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/mirleft_dog-170w~N2bY4LPCi-3kAAl7rnwQug.webp 170w, /blog/images/mirleft_dog-340w~mGK7UxhC8OYq0GTTHr4FIA.webp 340w, /blog/images/mirleft_dog-680w~1JVzylQxXroHc6ZcPfU_EA.webp 680w, /blog/images/mirleft_dog-1360w~T741Ax2A6YfjOsunWTQh1w.webp 1360w" src="/blog/images/mirleft_dog-1360w~T741Ax2A6YfjOsunWTQh1w.webp" alt="Sweet null, our new friend!"></p>
<p>And, well, talking about <code>null</code> (apart from naming street animals), we also had plenty of other computer science related conversations at the retreat. All of them were extremely enriching! A couple of examples include exception backtracing in LWT programs and BGP intrinsics.</p>
<h2>Thanks for All the Fish!</h2>
<p>As this lengthy report can attest, our experience was an amazing one for all. The MirageOS Hack Retreats are always an otherworldly space, where amazing individuals gather to exchange thoughts and create new (and better) software. Friends are made along the way, some bugs are fixed, new ones are found, and great new ideas emerges.</p>
<p>This very special sense of community is rare, so we would like to thank everyone who organized, attended, and tended to the event. Thank you to our delightful hosts, who've been with us since the first retreat in 2016! Thank you as well to <a href="https://robur.io">Hannes and Robur</a> for organizing those retreats and spending time instilling the same inspiration in the great project that is MirageOS! Finally, thank you to old and new friends, as well as old and new MirageOS hackers, for this amazing week of happy banter and hacking!</p>
<p>PD: Some of the pics in this post were shared among us via <a href="https://github.com/dinosaure/bob">bob</a>, a MirageOS unikernel to share files.</p>
]]></description><link>https://tarides.com/blog/2022-10-28-the-mirageos-retreat-a-journey-of-food-cats-and-unikernels</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-28-the-mirageos-retreat-a-journey-of-food-cats-and-unikernels.html</guid><dc:creator><![CDATA[ Jules Aguillon, Sayo Bamigbade, Enguerrand Decorne, Sonja Heinze, Jan Midtgaard, Lucas Pluvinage ]]></dc:creator><pubDate>Fri, 28 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Up-to-Date Online Documentation]]></title><description><![CDATA[<h2>Into the Fire</h2>
<p>The OCaml ecosystem relies on various resources and infrastructure such as <a href="https://ocaml.org">ocaml.org</a>, <a href="https://hub.docker.com/r/ocaml/opam">OCaml Docker images</a>, <a href="https://check.ocamllabs.io/">opam-repo-ci</a>, that are built and deployed using <a href="https://www.ocurrent.org">OCurrent</a>. OCurrent is a library to express workflows and keep things up to date. As many of these projects are created using the same technology, it was interesting to centralise the documentation as it was spread throughout the various repositories. This post is about how we used OCurrent itself to automate this problem. We think it might also demonstrate how you can use OCurrent to automate some of yours!</p>
<h2>Can't Keep My Eyes Off You</h2>
<p>Before digging into the logic, it's essential to thoroughly define the problems in the documentation. The first problem was that the documentation lives in many GitHub repositories. Indeed, to make sure we update it whenever we modify the associated code, we keep the documentation closest to the code. The result is a repository organisation like this:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-10-13.ocurrentorg/tracker-170w~13woC025u4Ol95vzRiNB4g.webp 170w, /blog/images/2022-10-13.ocurrentorg/tracker-340w~K87HXMKCYd1WunXnQBzM4g.webp 340w, /blog/images/2022-10-13.ocurrentorg/tracker-680w~PQp68rvVwz2Ii8-xKXQI5w.webp 680w, /blog/images/2022-10-13.ocurrentorg/tracker-1360w~YGlaEtNhwMS__GEsjT9vrw.webp 1360w" src="/blog/images/2022-10-13.ocurrentorg/tracker-1360w~YGlaEtNhwMS__GEsjT9vrw.webp" alt="Tracking"></p>
<p>It's not a good judgement to count on humans' actions to monitor changes in all these repositories. As they fluctuate on their own time, we can't expect maintainers to backport documentation changes to the <code>ocurrent.org</code> website for each modification. To say it more technically, we need to track many files and keep them up to date. These actions should also update incrementally which matches with OCurrent nicely.</p>
<p>In addition, this documentation needs to stay up to date. Even if we centralise the documentation automatically, we must rebuild it regularly and fetch the changes from the repositories we track. Otherwise, the documentation will start to be outdated quickly. This is the opposite of what we want.</p>
<p>Furthermore, the system has to scale and be updated easily. Indeed, we would like to have the possibility to introduce new documents and repositories without having to install more applications. For instance, it would be beneficial to simply make a pull request somewhere.</p>
<p>In the next section, we will focus on the OCurrent pipeline design, which will automate our tasks and solve these problems.</p>
<h2>Here I Dreamt I Was an Architect</h2>
<p>The project is composed of several blocks we want to write to achieve our work:</p>
<ul>
<li>Fetch files from GitHub</li>
<li>Rebuild a subset of the code</li>
<li>Make the system modular</li>
<li>Store and deploy the data easily</li>
</ul>
<p>One aspect that will make our work a bit easier is to have it all concentrated in the same place, GitHub. As OCurrent provides a plugin to fetch information from GitHub, <code>current_github</code>, we don't have to worry about it. Furthermore, everything is cached thanks to OCurrent itself. We don't have to care about the incremental build. The only requirement is wisely choosing the data we want to cache.</p>
<p>Our architecture uses a <code>trackers.yml</code> file describing how the pipeline should interact with our heterogeneous repositories. It describes the files we want to track and where we would like them in the final website structure. The configuration gives a way to achieve modularity at a low cost, as we only have to open a PR on the repository that contains the tracker file to update them. Additionally, it allows us to track the repositories we want quickly. Once it's followed, we don't have to worry about the monitoring, as OCurrent can be set to rebuild stuff at the regular cycle. In our case, we want to control every week that the code hasn't mutated. In the present version of <code>trackers.yml</code>, we can specify the files we want to copy and the indexes we want to create to build our structure. This file is stored in the repository on which the <code>GitHub App</code> is installed.</p>
<p>Another critical component in the architecture is handling new files from the remote repository and integrating them into the website structure. This element is in charge of moving the piece from one part of the system to another. Moreover, it will have to ensure the paths are consistent and fail if not.</p>
<p>The last item must push the code to a specific Git repository because we decided to use <code>GitHub Pages</code> to store the website. To avoid issues with account management, it needs <code>ssh</code> to get access to a specific repository.</p>
<p>In the end, the pipeline design would look like this:
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-10-13.ocurrentorg/pipeline-170w~Piiw_uSHvYsc1dfnQJbjIg.webp 170w, /blog/images/2022-10-13.ocurrentorg/pipeline-340w~asmKLOe-UMsY0gX7Unua5w.webp 340w, /blog/images/2022-10-13.ocurrentorg/pipeline-680w~5-eXDFvyI1LaJGZNgpfKxA.webp 680w, /blog/images/2022-10-13.ocurrentorg/pipeline-1360w~uAPPmMWzOm0bM-KVF_T_Lg.webp 1360w" src="/blog/images/2022-10-13.ocurrentorg/pipeline-1360w~uAPPmMWzOm0bM-KVF_T_Lg.webp" alt="Pipeline"></p>
<p>Now that we have our workflow let's see how it is implemented in practice!</p>
<h2>This is How We Do It</h2>
<p>In this section, we will focus on the way to implement this infrastructure. We won't view all the elements in detail, but we will try to concentrate on the most important ones, like how to create a custom <code>ocurrent</code> component and chain them together to build a pipeline.</p>
<h3><code>current_github</code></h3>
<p>Let's focus on a standard structure in an OCurrent project: the way to get the HEAD of a branch on GitHub and fetch the commit with Git. In the related code, we find the HEAD, then ask GitHub to give us information about the HEAD commit on the default branch and finally get the content with Git (it returns the related commit):</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">fetch_commit</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">github</span><span class="ocaml-source"> ~</span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">head</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current_github</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">API</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">head_commit</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">GitHub</span><span class="ocaml-source"> </span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">commit_id</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current_github</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Api</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Commit</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">id</span><span class="ocaml-source"> </span><span class="ocaml-source">head</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Current_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fetch</span><span class="ocaml-source"> </span><span class="ocaml-source">commit_id</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">commit</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">main</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">github</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> GitHub App code </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">fetch_commit</span><span class="ocaml-source"> ~</span><span class="ocaml-source">github</span><span class="ocaml-source"> ~</span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Use the commit code </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>The documentation of <code>current_github</code> and <code>current_git</code> is available <a href="https://www.ocurrent.org/ocurrent/">online</a>.</p>
<h3>Fetching the Files</h3>
<p>As we know how to extract data from GitHub, applying the process to various repositories will be easy. It can be noticed that the <code>commit</code> element is of type <code>Commit.t Current.t</code>. To work with <code>Current.t</code>, we need to "unwrap" the object with specific functions like <code>map</code> and <code>bind</code>. This post does not present how to load the content from a <code>Yaml</code> file. We assume that we get a <code>selection list Current.t</code>, where <code>selection</code> is defined as:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">selection</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Commit</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">files</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>It contains the source repository, the commit associated with the specified branch, and the list of files to monitor from this repository.</p>
<p>To <code>git clone</code> the content, we must apply the <code>fetch_commit</code> function.</p>
<h3>Copy the Content</h3>
<p>In this subsection, we will see how we can define a custom component and how to make it interact with the rest of our code.</p>
<p>The component is in charge of fetching the content of the files from the source directory and storing it in memory. To trigger the action only when the content changes, we will define a <code>Current_cache</code> element. Thanks to OCurrent, the content is cached and only rebuilt on change or request.</p>
<p>It manipulates some <code>File.info</code> (source, destination, …) and produces a <code>File.t</code> when the content is read. <code>File.t</code> is simply a:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">File</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">metadata</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">File</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">info</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">content</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>Our file is represented as a <code>string list</code>, as we need to be able to add more information. We know the size of the files is limited, so it is not an issue for us.
The component is defined as a <code>Current_cache.BUILDER</code> with whom the signature looks like this:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">BUILDER</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">context</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other-ocaml">module</span><span class="ocaml-source"> Key : </span><span class="ocaml-keyword-other-ocaml">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">digest</span><span class="ocaml-source">:
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">marshall</span><span class="ocaml-source"> : t -&gt; string
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">unmarshall</span><span class="ocaml-source"> : string -&gt; t
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">build</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">context</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Job</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Value</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">or_error</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span></code></pre>
<p>As the <code>Value</code> and the <code>Key</code> modules only use functions to manipulate <code>JSON</code>, we can focus on the <code>build</code> function definition:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">build</span><span class="ocaml-source"> </span><span class="ocaml-source">files</span><span class="ocaml-source"> </span><span class="ocaml-source">job</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">commit</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">repo</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Job</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">start</span><span class="ocaml-source"> </span><span class="ocaml-source">job</span><span class="ocaml-source"> ~</span><span class="ocaml-source">level</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Level</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Average</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Current_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">with_checkout</span><span class="ocaml-source"> ~</span><span class="ocaml-source">job</span><span class="ocaml-source"> </span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">dir</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">extract</span><span class="ocaml-source"> ~</span><span class="ocaml-source">job</span><span class="ocaml-source"> ~</span><span class="ocaml-source">dir</span><span class="ocaml-source"> </span><span class="ocaml-source">repo</span><span class="ocaml-source"> </span><span class="ocaml-source">files</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwt_result</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">return</span><span class="ocaml-source">
</span></code></pre>
<p>It creates a temporary directory with the content fetched from Git. Then, it extracts the data as a <code>File.t</code> and returns the result. The interesting detail here is <code>Current_git.with_checkout fn</code>. It is used to copy our code somewhere in the system temporarily. <code>Current.Job.start</code> is just some boilerplate code to start a job asynchronously.</p>
<p>Consequently, we can give the builder a functor to construct our cache system. Moreover, we create a function associated with it thanks to the <code>Content</code> module newly created:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Content</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current_cache</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Content</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">weekly</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current_cache</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Schedule</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> ~</span><span class="ocaml-source">valid_for</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Duration</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">of_day</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">7</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">fetch</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">repo</span><span class="ocaml-source"> ~ </span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-source">files</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">component</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">fetch-doc</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Content</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> ~</span><span class="ocaml-source">schedule</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">weekly</span><span class="ocaml-source"> </span><span class="ocaml-source">files</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">{</span><span class="ocaml-source">content</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">repo</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Content</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>We specify the date when the cache is invalidated to trigger the rebuild at least every week.</p>
<h3>Build &amp; Deploy</h3>
<p>In this last subsection, we discuss how to write all the files stored in the cache to the right place in the filesystem. We use <code>hugo</code> to build the website and <code>git</code> with <code>ssh</code> to deploy it. As we expect the information to be cached, we build a <code>Current_cache</code> module again, where the <code>build</code> function is:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">build</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">files</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">indexes</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">conf</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-source">job</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">commit</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Job</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">start</span><span class="ocaml-source"> </span><span class="ocaml-source">job</span><span class="ocaml-source"> ~</span><span class="ocaml-source">level</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Level</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Average</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Current_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">with_checkout</span><span class="ocaml-source"> ~</span><span class="ocaml-source">job</span><span class="ocaml-source"> </span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">dir</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">write_all</span><span class="ocaml-source"> </span><span class="ocaml-source">job</span><span class="ocaml-source"> </span><span class="ocaml-source">dir</span><span class="ocaml-source"> </span><span class="ocaml-source">files</span><span class="ocaml-source"> </span><span class="ocaml-source">indexes</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Lwt_result</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">bind</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">hugo</span><span class="ocaml-source"> ~</span><span class="ocaml-source">cwd</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">dir</span><span class="ocaml-source"> </span><span class="ocaml-source">job</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-source">cwd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Current_git</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Commit</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">hash</span><span class="ocaml-source"> </span><span class="ocaml-source">commit</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">deploy_over_git</span><span class="ocaml-source"> ~</span><span class="ocaml-source">cwd</span><span class="ocaml-source"> ~</span><span class="ocaml-source">job</span><span class="ocaml-source"> ~</span><span class="ocaml-source">conf</span><span class="ocaml-source"> </span><span class="ocaml-source">dir</span><span class="ocaml-source"> </span><span class="ocaml-source">commit</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Current</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Process</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">with_tmpdir</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>In this context, the pipeline creates an <code>indexes</code> file as <code>_index.md</code>. It's used by Hugo to build the directory structure. This function uses the same <code>Current_git.checkout</code> process to create a temporary directory containing the website's skeleton. All the work is done in the <code>deploy_over_git</code> function, but this is not relevant to go further in detail. The component writes all the <code>File.t.content</code> to the destination specified in their metadata. Once we have successfully written them, we generate the website with <code>hugo --minify --output-dir=public/</code>. Last but not least, we copy the content of the <code>public</code> repository to a fresh temporary one, so we can add the files with a <code>git init</code> and push our work to GitHub. Finally, on the target repository, GitHub Pages will deploy the website.</p>
<p>And voila, our website is up-to-date and online!</p>
<h2>Happy Together</h2>
<p>This blog post has described how we handle our distributed documentation and centralise it on our website. We have seen how to use some <code>Current_*</code> plugins and how to write our own. It was also the occasion to speak about various OCurrent structures.</p>
<p>If you are curious, you can check the code in the <a href="https://github.com/ocurrent/ocurrent.org">ocurrent/ocurrent.org</a> repository. Feel free to look at the <a href="https://ocurrent.org">ocurrent.org</a> built with this pipeline. The description of the pipeline is also available in the <a href="https://github.com/ocurrent/ocurrent.org/tree/master/bin">bin</a> repository.</p>
]]></description><link>https://tarides.com/blog/2022-10-20-up-to-date-online-documentation</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-20-up-to-date-online-documentation.html</guid><dc:creator><![CDATA[ Étienne Marais ]]></dc:creator><pubDate>Thu, 20 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Porting Charrua-Unix and Rawlink to Eio]]></title><description><![CDATA[<p>This article describes the porting of the DHCP daemon <code>charrua-unix</code> and its companion library <code>rawlink</code> to <a href="https://github.com/ocaml-multicore/eio">Eio</a> for the upcoming OCaml 5 release. Before we get started, it makes sense to briefly describe what DHCP is and how we use it in production.</p>
<h2>What is DHCP?</h2>
<p>DHCP stands for Dynamic Host Configuration Protocol, and it's described in <a href="https://www.rfc-editor.org/rfc/rfc2131.txt">RFC2131</a>, <a href="https://www.rfc-editor.org/rfc/rfc2132.txt">RFC2132</a>, and others. It was first published in 1993, so it's considerably old, yet very much alive in virtually every network these days—from your home, to your office, to your ISP Wide Area Network.</p>
<p>When your computer, laptop, phone, or any IP-connected device boots up or changes network, it requests network parameters via broadcast. These parameters are requested and answered via the DHCP protocol. The common/minimum parameters a client requests are:</p>
<ul>
<li>An IPv4 address</li>
<li>An IPv4 gateway</li>
<li>The address of a DNS resolver</li>
</ul>
<p>This is enough to get connectivity in most networks. DHCP can also provide many extra parameters, but they are outside of the scope of this document.</p>
<h2>What is <code>charrua-dhcp</code>?</h2>
<p><code>charrua-dhcp</code> is a DHCP library suite written in pure OCaml. You might not know it, but if you have ever used Docker Desktop, be it on Windows or macOS, you're a user of <code>charrua-dhcp</code> already !</p>
<p>In Docker Desktop, a complete Linux VM is run in the background in order to be able to run Docker containers. This VM needs to acquire network parameters from the host operating system, and this is done via <code>charrua-dhcp</code>. You can check more details on how OCaml and <code>charrua</code> are used to power Docker Desktop in <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">this article</a>.</p>
<p><code>charrua-dhcp</code> is also the standard DHCP implementation used in <a href="https://mirageos.org/">Mirage OS</a>, both when used as a server or a client, and perhaps more importantly, it is used on high profile, critical cases, like the home network of yours truly. It is a stable and tested library that has been in use for years, and it has also been put to the challenge against <a href="https://github.com/stedolan/crowbar">Crowbar</a>. See more details in <a href="https://somerandomidiot.com/blog/2017/04/26/crowbar-dhcp/">this article</a> by Mindy Preston.</p>
<p><code>charrua-dhcp</code> is split into <code>charrua-core</code> and <code>charrua-unix</code>:</p>
<p><code>charrua-core</code> implements the DHCP server and client logic in pure OCaml, as well as providing serialisers and deserialisers for the protocol wire format. It also provides a textual configuration interface, like <a href="https://www.isc.org/dhcp/">ISC-DHCP</a> does.</p>
<p>When we say pure OCaml, we mean it! <code>charrua-core</code> is purely functional and doesn't produce anything via side-effects; therefore, it also does not perform any kind of I/O.</p>
<p><code>charrua-unix</code> implements the effect-full bits, and it does I/O, feeding incoming packets to <code>charrua-core</code> and sending out replies given by <code>charrua-core</code>.</p>
<p>The idea is that <code>charrua-core</code> has the complex DHCP logic, while <code>charrua-unix</code> does the basic things: logging, sending/receiving packets, making sure the environment is secure, and so on.</p>
<p>The name <code>charrua</code> is a reference to the seminomadic tribe Charrúa from what is today Uruguay, Argentina, and southern Brazil. The rationale is that DHCP serves parameters to roaming (nomadic) clients.</p>
<h2>What is <code>rawlink</code>?</h2>
<p>DHCP is not an IP protocol. It sits above the Ethernet layer, which means a DHCP application must be able to craft and receive the full Ethernet packet, not just the layers above IP.</p>
<p>Each operating system provides a slightly different mechanism on how to accomplish this. Linux provides a special socket family called AF_SOCKET, whereas BSDs (OpenBSD, FreeBSD, macOS...) and most other Unix systems provide the same via BPF.</p>
<p><code>rawlink</code> is an OCaml library with C stubs that abstracts these differences away. You get a link on a network interface, which you use to craft and receive full Ethernet packets, bypassing most of the operating system network stack. In other words, <code>rawlink</code> allows you to work with <code>raw</code> packets on an Ethernet <code>link</code>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/charrua-170w~-_E2SdogSi0QGWp5KAEAtw.webp 170w, /blog/images/charrua-340w~t6nrfYR28hZkLF_R5MYoTA.webp 340w, /blog/images/charrua-680w~92yqzKqcWDPfZNEvElhg9g.webp 680w, /blog/images/charrua-1360w~-5xA-10vPkEjIEHAk6R3lQ.webp 1360w" src="/blog/images/charrua-1360w~-5xA-10vPkEjIEHAk6R3lQ.webp" alt="Architecture of Charrua - and how it connect to the kernel via rawlink and syslog"></p>
<h2>What Changes in OCaml 5?</h2>
<p>OCaml 5 provides two main new features:</p>
<ul>
<li>Parallelism</li>
<li>Effect handlers</li>
</ul>
<p>Parallelism makes little sense on a slow, control protocol like DHCP, so we don't use it and it's not the focus of this article.</p>
<p>Effect handlers allow OCaml programs to write non-blocking code as <em>if</em> they were blocking.</p>
<p>Until OCaml 5 and effect handlers, the common way to write non-blocking code was through <a href="https://github.com/ocsigen/lwt">Lwt</a>, a concurrent programming library for OCaml. Lwt provides a concurrent scheduler and a monadic style of writing programs through promises. With it, the program becomes a long string of binding promises.</p>
<p>One issue with Lwt is that it's very "infectious," and as soon as you add the first Lwt promise (called "thread" in Lwt lingo), the whole code must now behave as a promise as well. Another issue is that the monadic programming is somewhat syntax heavy, so it can clutter the code. Since the promises are allocations themselves, it can also negatively affect performance. Lwt is a great library, but with OCaml 5 and effects we can do better.</p>
<p>With OCaml 5 and effect handlers we can have the best of both worlds. We can write non-blocking code in a blocking style without the monadic clutter imposed by Lwt. The library we are proposing to replace Lwt in OCaml 5 is <a href="https://github.com/ocaml-multicore/eio">Eio</a>, which takes full advantage of the effect system, as well as providing a framework to express parallelism.</p>
<h3>Lwt vs. Eio</h3>
<p>This code snippet is the main function of <code>charrua-unix</code>, using Eio (left) and Lwt (right). We can summarize what is happening as follows:</p>
<p>1 - We read a packet from the network.
2 - We feed the packet to <code>charrua-core</code>, which then gives us a possible <code>Reply (reply, db)</code>, the packet to be sent out and the new DHCP database state, respectively.
3 - We send the reply out and loop for more packets.</p>
<p>It's a fairly simple code, but it shows how much less cluttered the Eio version can be by removing all Lwt decorators. Another nice advantage is that if we were to write a blocking version of the same code, we would only need to change <code>Eio_rawlink.{read,write}_packet</code> to <code>Rawlink.{read,write}_packet</code> as their signatures remain the same, something impossible with Lwt.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/code_Eio-170w~AA5FBLe-SbDgx10Vqb9eQw.webp 170w, /blog/images/code_Eio-340w~aSaNokC7Rfao6UU2cNykQA.webp 340w, /blog/images/code_Eio-680w~gc2OjKGVbrjQgwZ2sdQwRg.webp 680w, /blog/images/code_Eio-1360w~W7JviAIyN3rwEJI4VnkzRA.webp 1360w" src="/blog/images/code_Eio-1360w~W7JviAIyN3rwEJI4VnkzRA.webp" alt="Comparison of code size between eio and lwt"></p>
<h3><code>rawlink</code> and Eio Switches</h3>
<p><code>rawlink</code> uses a file descriptor that Eio knows nothing about, so in order for us to use Eio with it, we want to attach an <code>Eio.Flow.t</code> to the file descriptor. An <code>Eio.Flow.t</code> is an Eio abstraction of a bidirectional <code>socket</code>, even though it was designed mostly for a <code>STREAM</code>-like <code>socket</code> in mind, the semantics fit <code>rawlink</code> case. We do this in <code>Rawlink_eio.opensock</code>:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">open_link</span><span class="ocaml-source"> </span><span class="ocaml-source">?</span><span class="ocaml-source">filter</span><span class="ocaml-source"> </span><span class="ocaml-variable-parameter-optional">?</span><span class="ocaml-source">(</span><span class="ocaml-variable-parameter-optional">promisc</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">ifname</span><span class="ocaml-source"> ~</span><span class="ocaml-source">sw</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">fd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Rawlink_lowlevel</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">opensock</span><span class="ocaml-source"> ?</span><span class="ocaml-source">filter</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">filter</span><span class="ocaml-source"> ~</span><span class="ocaml-source">promisc</span><span class="ocaml-source"> </span><span class="ocaml-source">ifname</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">flow</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Eio_unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">FD</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">as_socket</span><span class="ocaml-source"> ~</span><span class="ocaml-source">sw</span><span class="ocaml-source"> ~</span><span class="ocaml-source">close_unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source"> </span><span class="ocaml-source">fd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">flow</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">fd</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">packets</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">ref</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-list">[]</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">buffer</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Cstruct</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">65536</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p><code>Rawlink_lowlevel.opensock</code> is a call into the actual C stub that returns a BPF or AF_PACKET descriptor, we then create the <code>Eio.Flow.t</code> with <code>Eio_unix.FD.as_socket</code>.</p>
<p>Two things appear out of the ordinary in the flow creation call: The <code>sw (Eio.Switch.t)</code> and <code>close_unix</code> arguments, in order to make sense of them we have to understand what an <code>Eio.Switch.t</code> is.</p>
<p>A long standing issue with Lwt was "how to make sure my file descriptors are not leaked if something goes wrong." Eio attempts to solve this by forcing each <code>Eio.Flow.t</code> to belong to a <code>Eio.Switch.t</code>. You can't create a <code>Eio.Flow.t</code> without giving it a <code>Eio.Switch.t</code>, so this is what the <code>Eio_unix.FD.as_socket</code> does. Since <code>Flows</code> are also attached to normal file descriptors, <code>Eio.Switch.t</code> also takes care of them.</p>
<p>An Eio program creates one or more <code>Eio.Switch.t</code> in order to attach a <code>Eio.Flow.t</code> to it. An <code>Eio.Switch.t</code> can also be nested, creating a tree-like structure, as every new <code>Eio.Switch.t</code> becomes a child of its parent <code>Eio.Switch.t</code>. When an <code>Eio.Switch.t</code> terminates, either succesfully or by some exception, all of its children <code>Eio.Flow.t</code> are also terminated, automatically closing the file descriptor and guaranteeing we don't have a descriptor leak.</p>
<p><code>close_unix</code> tells Eio to call <code>close(2)</code> when the <code>Eio.Switch.t</code> terminates.</p>
<p>Imagine a TCP server where each client has at least one dedicated <code>Eio.Switch.t</code>, and some of these clients create additional <code>Eio.Switch.t</code> to handle a specific unit of work:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/switch-170w~4mT1pE9s-9Ki3Eu1f4g-pw.webp 170w, /blog/images/switch-340w~z7RAzamCSNy_A8OPPZ61Gg.webp 340w, /blog/images/switch-680w~LwXV9IXZ159B7iFiq_2CnA.webp 680w, /blog/images/switch-1360w~-RDe42RINq24vKGkG0Epcw.webp 1360w" src="/blog/images/switch-1360w~-RDe42RINq24vKGkG0Epcw.webp" alt="Example of connected switches"></p>
<h2>Conclusion</h2>
<p>Both Lwt and Eio provide means to achieve concurrency, but they only provide parallelism with <code>Domains</code>. Lwt uses monadic-style promises to achieve concurrency, which pollutes the code and makes it harder to reason about it. Eio makes full use of the new effect handlers and Domains of OCaml 5, providing concurrency and parallelism while maintaining the same programming style of synchronous blocking programs.</p>
<p>Eio is a library that aims to replace Lwt, but with a more modern style and feature set. It provides abstractions for sockets, fibers, streams, flows, and more.</p>
<p>To review,<code>charrua-unix</code> is a feature-packed, yet simple DHCP server implementation for Unix systems based on the OCaml library <code>charrua-core</code>.<code>rawlink</code> makes it possible to read and craft Ethernet packets on most Unix-like systems through an easy-to-use library.</p>
<p>It's relatively easy to port <code>rawlink</code> to Eio by attaching an Eio abstraction of a bidirectional socket, namely <code>Eio.Flow.t</code>, to the file descriptor.</p>
<p>We hope you enjoyed this article and found it helpful. As always, if there are any questions or concerns, feel free to <a href="/contact/">reach out</a>.</p>
]]></description><link>https://tarides.com/blog/2022-10-19-porting-charrua-unix-and-rawlink-to-eio</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-19-porting-charrua-unix-and-rawlink-to-eio.html</guid><dc:creator><![CDATA[ Christiano Haesbaert ]]></dc:creator><pubDate>Wed, 19 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml's Platform Installer Alpha Release]]></title><description><![CDATA[<p>Yesterday we announced the <a href="/blog/2022-10-17-ocaml-5-beta-release/">OCaml 5 beta release</a>, and today we're excited to introduce the OCaml Platform Installer! The <a href="https://ocaml.org/docs/platform">OCaml Platform</a> is the recommended toolchain when working with OCaml. This new installer enables programmers to quickly set up the OCaml developer environment, so they don't need to waste precious coding time with a lengthy installation process. If you come across any obstacles, the Platform team encourages you to open a <a href="https://github.com/tarides/ocaml-platform-installer/issues">GitHub Issue</a>.</p>
<p>We've also updated the state of the Platform, making several important changes like promoting <code>odoc</code> and OCamlformat from Incubate to Active. We have <a href="https://discuss.ocaml.org/t/ann-ocaml-platform-installer-alpha-release/10652">notified the OCaml Community</a> about the Platform Installer's alpha release, so you can read about all the new changes and the simple installation process on the <a href="https://discuss.ocaml.org/t/ann-ocaml-platform-installer-alpha-release/10652">OCaml Discuss post</a>.</p>
<p>Stay tuned to our blog as well as our <a href="https://bsky.app/profile/tarides.com">Bluesky feed</a> to get the latest updates on the OCaml 5 release!</p>
]]></description><link>https://tarides.com/blog/2022-10-18-ocaml-s-platform-installer-alpha-release</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-18-ocaml-s-platform-installer-alpha-release.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 18 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 Beta Release]]></title><description><![CDATA[<p>Back in June, we announced the <a href="/blog/2022-06-15-ocaml-5-alpha-release/">OCaml 5 alpha release</a>, and today we're excited to announce <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-first-beta-release/10623">the first beta release</a>! Now is an excellent time to test it and report positive or negative feedback on your projects (i.e., did it work, did you see impressive performance speed up, did you have issues finding documentation, etc.)</p>
<p>This beta version stabilised several <a href="https://opam.ocaml.org/">opam</a> packages, fixed several small internal runtime processes (especially the <code>systhreads</code> library), and tweaked the Domain and Effect interface, just to name a few improvements. This version also enables you to update your libraries and software. See the <a href="https://discuss.ocaml.org/t/ocaml-5-0-0-first-beta-release/10623">post on the OCaml Discuss forum</a> for installation instructions and more information. While you're there, join the growing and vibrant OCaml community!</p>
<p>The full OCaml release is expected by the end of the year. Just in time for Christmas! Perhaps more importantly, in time for the new <a href="https://adventofcode.com/">Advent of Code calendar</a>, so you can play around with OCaml 5 with Multicore support.</p>
]]></description><link>https://tarides.com/blog/2022-10-17-ocaml-5-beta-release</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-17-ocaml-5-beta-release.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Mon, 17 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Real World OCaml Book Giveaway!]]></title><description><![CDATA[<p><em>Real World OCaml is a fantastic book on OCaml and functional programming – a great resource for beginners and experienced users alike. At Tarides, we want to support new learners of OCaml as much as we can, making it easier for people to become part of the vibrant community surrounding the language.</em>
Tarides is proud to announce that we are sponsoring the Gold Open Access release of <a href="https://www.cambridge.org/core/books/real-world-ocaml-functional-programming-for-the-masses/052E4BCCB09D56A0FE875DD81B1ED571"><em>Real World OCaml</em>, 2nd Edition</a> by Yaron Minsky and Anil Madhavapeddy! It’s published by Cambridge University Press, and Tarides is making it possible for everyone to download the book to their local device. You can also receive free copies of the book (see below).</p>
<h2>Accessible to Everyone and Free Copies</h2>
<p>Since its first release in 2013, the book has been a fantastic resource for members across the community. On the Open Access release, its authors said, “As long-standing members of the open-source community, we are excited to see our work made more accessible for all users of OCaml.”</p>
<p>The authors and publisher have generously agreed to give away some physical copies of <em>Real World OCaml</em>. They really want to reward the amazing and active members of the community who are so engaged with the book. That’s why <em>ten people</em> who get a PR merged with a suggested improvement to the book on <a href="https://github.com/realworldocaml/book">GitHub</a> will receive a <em>free copy</em> of <em>Real World OCaml!</em> Just email rwo@tarides.com when your PR has been merged, and we'll let you know if you're one of the lucky 10 who receive a book!</p>
<h2>Gold Open Access</h2>
<p>Until recently, you could read <em>Real World OCaml</em> online, but for offline access, you had to rely on the printed version. With Gold Open Access, you can now download a PDF to your local device for easy access at any time.</p>
<p>Whilst this expanded access will benefit everyone, one group that we are particularly excited to support are new learners. OCaml 5.0 is right around the corner, and we want to make the entry to OCaml for new users as easy as possible. Making the book more accessible also aligns with Tarides’s inclusivity goals; lowering the barriers to entry into the community will encourage greater participation among people who otherwise would not have had the means to join.</p>
<p>On this topic, David Tranah, the editorial director of Mathematical Sciences and Information Technology at Cambridge University Press, shares his unique insight into the benefits of Open Access: “Gold Open Access publishing allows anyone, anywhere, who can connect to the internet to stay up-to-date on the latest research. This in turn drives innovation and leads to new discoveries.”</p>
<p>We of course encourage anyone with the means to purchase a physical copy of the book, as it’s the result of a lot of hard work and dedication, and it comes in a beautifully printed physical edition.</p>
<h2>Real World OCaml</h2>
<p>The first version of Real World OCaml was written by Yaron Minsky, Anil Madhavapeddy, and Jason Hickey. Since its release, several contributors have improved on the original text, adding new examples, correcting errors, and expanding on chapters. The second edition was published in 2021 by Anil and Yaron and includes the most recent improvements and changes for an updated version of the book.</p>
<p>The book itself covers several aspects of OCaml, from fundamental concepts like functors and objects, to different tools and techniques, including the OCaml Platform and JSON, as well as a section on the compiler and runtime system. It takes its reader on a journey through OCaml moving from basics to increasingly advanced topics, making it the perfect companion for anyone regardless of their level of OCaml.</p>
<p>Over the years, Tarides has supported Real World OCaml by contributing to the book’s tooling infrastructure. Some of that work has transformed into <a href="https://github.com/realworldocaml/mdx">standalone community projects like MDX</a> that help to improve all OCaml documentation.</p>
<p>The book also has a mutually beneficial relationship with OCaml.org, as Thibaut Mattio (currently leading the <a href="/blog/2022-05-02-ocaml-org-reboot-user-centric-design-content/">community redesign effort</a> of <a href="https://ocaml.org">OCaml.org</a>) explains: “There are crosslinks between <em>Real World OCaml</em> and V3 of OCaml.org. The new package documentation site is a great accompaniment to Real World OCaml, and there are multiple links to it embedded within the book for API documentation.” This is just one example showing how <em>Real World OCaml</em> is used in projects and interacting with new content in a productive and useful way.</p>
<h2>About the Authors</h2>
<p>Anil Madhavapeddy is Professor of Planetary Computing at the University of Cambridge and a fellow of Pembroke College. He has a wide range of experience, having worked in industry (NetAPP, Citrix, Intel), academia (Cambridge, Imperial, UCLA), and open source (OCaml, OpenBSD, Xen, Docker).
Joining Jane Street in 2003, Yaron Minsky is to thank for introducing the company to OCaml. Founding the firm’s quantitative research group, he managed the transition of all its core infrastructure to OCaml, ultimately making it the world’s largest industrial user of OCaml. Minsky has also been an avid lecturer, blogger, and writer on the topic of programming, publishing articles in <em>Communications of the ACM</em> and the <em>Journal of Functional Programming.</em></p>
<p>Summing up their thoughts on the Open Access upgrade, Anil and Yaron say: “Open Access has been shown to encourage the usage of a particular work, resulting in increased citations and public engagement. We are excited for what this move will mean when it comes to greater accessibility for users of OCaml worldwide: making it easy to use excerpts from our book in new projects, encouraging new learners, and supporting teachers in their work.”</p>
]]></description><link>https://tarides.com/blog/2022-10-14-real-world-ocaml-book-giveaway</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-14-real-world-ocaml-book-giveaway.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 14 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[8 OCaml Libraries to Make Your Life Easier]]></title><description><![CDATA[<p>OCaml is a statically-typed programming language that emphasizes readability, programmer efficiency, and semantic clarity. This powerful and efficient language has been gaining popularity among developers. The growing adoption of OCaml is because it's fast, type safe, and secure. It can be used in industries where performance and security matter, like <a href="https://ocaml.org/success-stories/sensor-analytics-and-automation-platform-for-sustainable-agriculture">IoT</a>, <a href="https://ocaml.org/success-stories/peta-byte-scale-web-crawler">Data Analytics</a>, or <a href="https://ocaml.org/success-stories/large-scale-trading-system">financial services</a>.</p>
<p>Like any other programming language, there are numerous libraries available for OCaml that can make your life easier as a developer. In this article, we will explore some of the top OCaml libraries that will help streamline your workflow and boost your productivity as a programmer.</p>
<p>All these libraries and tools are open source, so they’re distributed under a free software licence.</p>
<h2>A Few Helpful OCaml Libraries</h2>
<h3><strong><code>Lwt</code></strong></h3>
<p><a href="https://github.com/ocsigen/lwt">Lwt</a> is a library for writing asynchronous code. It provides many helpful abstractions for writing asynchronous code such as promises and futures. This is a very useful library for writing network applications, like web servers. That said, the <a href="https://ocaml.org/p/eio/0.5">Eio library</a> in the forthcoming release of OCaml 5.0 might become preferable!</p>
<h3><strong>Dream</strong></h3>
<p>Dream is described as a "tidy, feature-complete Web framework" on <a href="https://ocaml.org/p/dream/1.0.0~alpha4">OCaml.org</a>. Dream has a simple programming model where web apps are merely functions, and it supports TLS, WebSockets, and GraphQL. Plus it has cryptography helpers! It's easy-to-use and documented <a href="https://aantron.github.io/dream/">all in one place</a>. The entire Dream API is available there, where you can also find many examples.</p>
<h3><strong><code>Cmdliner</code></strong></h3>
<p><a href="https://ocaml.org/p/cmdliner/1.1.0/doc/index.html">This library</a> is used by several packages to build command line tools, which is beneficial when you want to write an executable in OCaml. <code>Cmdliner</code> gives programmers a simple, compositional method for turning command line arguments into OCaml values. Not only can you then pass those values to functions, <code>Cmdliner</code> can automatically handle syntax errors, help messages, and UNIX man page generation as well.</p>
<h3><strong>Alcotest</strong></h3>
<p>This colorful framework performs simple unit tests on a simple interface. <a href="https://github.com/mirage/alcotest">Alcotest</a> only displays faulty runs at the end of the output, along with full logs for your inspection. The straightfoward, expressive query language makes it easy to select which tests to run, and the results are displayed in a fun rainbow of colors.</p>
<h3><strong><code>base</code></strong></h3>
<p>Although the <a href="https://dev.realworldocaml.org/prologue.html#the-core-standard-library"><em>standard library</em></a> is somehow minimalist, multiple extensions exist to make programmers' lives easier. For instance <code>base</code>, created and maintained by Jane Street, is used to develop critical applications by industrial users. <code>base</code>is written in pure OCaml and has no dependencies other than the OCaml standard library. The <code>base</code> library is useful for building many applications. Read more about <code>base</code> in the book <a href="https://dev.realworldocaml.org/prologue.html#the-core-standard-library">*Real World OCaml</a>.</p>
<h3><strong>Yojson</strong></h3>
<p><a href="https://github.com/ocaml-community/yojson">Yojson</a> is an OCaml library for creating and reading JSON data in OCaml. JSON is a data format that is commonly used in web applications. The Yojson bindings can be used to easily generate and parse JSON data in OCaml.</p>
<h3><strong>Notty</strong></h3>
<p>This interesting <a href="https://ocaml.org/p/notty/0.2.3">OCaml library</a> enables the user to write declarative terminal UI. Notty is based on a notion
of composable images, and it delivers a more simple and expressive model than the basic terminal programming. Engineers know that programming terminals are tedious, so Notty makes it enjoyable!</p>
<h3><strong><code>ppxlib</code></strong></h3>
<p>PreProcessor eXtensions, or PPX for short, are used for meta-programming, like for generating boilerplate or for extending the OCaml syntax. PPX act on the AST and are integrated into the language via two AST features: <a href="https://ocaml.org/manual/attributes.html">attributes</a> and <a href="https://ocaml.org/manual/extensionnodes.html">extension nodes</a>. <a href="https://github.com/ocaml-ppx/ppxlib"><code>ppxlib</code></a> is a set of tools and libraries that enables programmers both to write and use PPX. See <a href="/blog/2019-05-09-an-introduction-to-ocaml-ppx-ecosystem/">this Tarides blog post</a> on how to write PPX and <a href="https://ocaml.org/docs/metaprogramming">this official guide</a> on how to use PPX.</p>
<h2>Conclusion</h2>
<p>The tools available in OCaml make it easy to prototype new applications and build production-quality software. In fact, the full release of OCaml 5.0 with Multicore support is on the horizon, and <a href="/blog/2022-06-15-ocaml-5-alpha-release/">the alpha version has already been released</a>.</p>
<p>OCaml libraries help you write beautiful, elegant code in this powerful and versatle language. There has never been a better time to give OCaml a try, and now you know there are beneficial libraries to help you code. The libraries covered in this article are just a few examples. For more information, please visit <a href="https://ocaml.org/packages">ocaml.org</a></p>
]]></description><link>https://tarides.com/blog/2022-10-12-8-ocaml-libraries-to-make-your-life-easier</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-12-8-ocaml-libraries-to-make-your-life-easier.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Wed, 12 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[ICFP 2022 Review]]></title><description><![CDATA[<p>After two years of online conferences, it was fantastic to have ICFP 2022 in person. The conference organisers had done a fantastic job adjusting to online conferences, but nothing beats the hallway track for meeting new people and catching up with old friends. This year, Slovenia's capital hosted the event. Ljubljana was a beautiful city to visit, with plenty of classic European architecture and even a castle to explore.</p>
<p>My conference schedule was packed. I had a talk to present on Friday and five preceding days of conference talks to attend. The first three days were a whirlwind of talks (some on the edge of my understanding) and hallway track conversations.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/OCamlMoon-170w~v0lsQBmZoZaPJImAPaKsPA.webp 170w, /blog/images/OCamlMoon-340w~IctexPWI1UIKSPCmAuxDcg.webp 340w, /blog/images/OCamlMoon-680w~sdx09BI_ALMBXPJixqc2fw.webp 680w, /blog/images/OCamlMoon-1360w~XPbShSBzfAyZjImV4wPAfA.webp 1360w" src="/blog/images/OCamlMoon-1360w~XPbShSBzfAyZjImV4wPAfA.webp" alt="OCaml's Trajectory">
<em><a href="https://twitter.com/yminsky/status/1569956010483220481?s=20&amp;t=Fp12V9v11Xp2kMP0TmOhoQ">Image by Yaron Minsky of Jane Street</a></em></p>
<h2>OCaml Reaches for the Stars</h2>
<p>Although it happened mid-week, I want to start with KC Sivaramakrishnan's keynote, <a href="https://icfp22.sigplan.org/details/ocaml-2022-papers/16/OCaml-5-0-Concurrent-and-Parallel-programming-for-OCaml">Retrofitting Concurrency – Lessons from the Engine Room</a>, as it was definitely the highlight of ICFP. He covered the full story of introducing parallelism and concurrency into OCaml, along with references to papers published along the way. <a href="https://youtu.be/zJ4G0TKwzVc">Watch the video of his keynote</a>. The "Where do we go from here?" slide ties together the effect system with targeting Javascript, modal types to avoid heap allocations, unboxed types to control memory layout, and Flambda2 for aggressive compiler optimisation. This is hugely exciting for the OCaml community. It brought together many threads of work into a coherent picture.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/zJ4G0TKwzVc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<p>There was a real buzz afterwards from both OCaml and Haskell people I talked with. It set the stage for the ML Workshop on Thursday, which included <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/14/Efficient-and-Scalable-Parallel-Functional-Programming-Through-Disentanglement">Efficient and Scalable Parallel Functional Programming Through Disentanglement</a>, <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/13/Unboxed-types-for-OCaml">Unboxed Types for OCaml</a>, <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/10/Module-Shapes-for-Modern-Tooling">Module Shapes for Modern Tooling</a>, <a href="https://icfp22.sigplan.org/details/ocaml-2022-papers/9/Stack-allocation-for-OCaml">Stack Allocation for OCaml</a>, and <a href="https://icfp22.sigplan.org/details/mlfamilyworkshop-2022-papers/4/Boxroot-fast-movable-GC-roots-for-a-better-FFI">Boxroot, Fast Movable GC Roots for a Better FFI</a>, picking up themes from the keynote. The full playlist for the ML Workshop is <a href="https://www.youtube.com/playlist?list=PLyrlk8Xaylp7f8T7L5SFFwOS5_c5d1Jyq">available on YouTube</a>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/sudhaICFP-170w~udBj6-dKJfleqxS10i8Pjw.webp 170w, /blog/images/sudhaICFP-340w~Q4ve5Zb8cHdiIXZZj0NyJA.webp 340w, /blog/images/sudhaICFP-680w~0qAjCWSFKvXQieR7XbY5Tw.webp 680w, /blog/images/sudhaICFP-1360w~awFJH9O1yYFeD5dKDvZw1A.webp 1360w" src="/blog/images/sudhaICFP-1360w~awFJH9O1yYFeD5dKDvZw1A.webp" alt="Sudha OCaml Workshop"></p>
<p>Thursday also featured the <a href="https://icfp22.sigplan.org/details/icfp-2022-tutorials/1/OCaml-5-for-the-working-programmer">OCaml 5.0 for the Working Programmer</a> tutorial, presented by <a href="https://twitter.com/tarides_/status/1570346706448879617">my colleague Sudha Parimala</a> (above) and Marek Kubica at Tarides. There are <a href="https://github.com/Sudha247/ocaml5-tutorial-icfp-22">slides and exercises</a> to work through to help you understand effects and parallelism in OCaml.</p>
<h2>Friday: A Full Day of OCaml</h2>
<p>The final day was dedicated to OCaml Workshops. KC kicked off the first session with a keynote on the "here and now" of OCaml 5.0. His presentation addressed developers' frequently asked questions when moving from sequential OCaml 4 to 5.0, discussed the details of the merge process, and included a deep dive of the developments since the merge of Multicore OCaml early this year. His talk concluded with a call to action, encouraging OCaml developers to start migrating to OCaml 5.0, even if they do not immediately plan to use the new concurrency and parallelism features.</p>
<p>The keynote was followed by Jan Midtgaard, Principal Engineer at Tarides, talking about a number of testing techniques and tools for concurrent programs that Tarides has developed for OCaml 5.0. This was followed by Deepali Ande, KC's student at IIT Madras, presenting a novel way to enable different schedulers written using effect handlers to communicate with each other.</p>
<p>These talks ended up being so popular that there was standing room only, so during the coffee break after the first session, the ICFP organisers announced to everyone that the remaining OCaml presentations would be in "the fanciest theatre ever," an amphitheatre known as the Štih Room (below).</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/audience-170w~cJ9ubQ_9jsUbC34JHZuhIw.webp 170w, /blog/images/audience-340w~hhG6Nm-NAWUgFLkJCe_XTA.webp 340w, /blog/images/audience-680w~GeCop1Ilu3g40Y6D0gQcdg.webp 680w, /blog/images/audience-1360w~yLzqCC12jqw0ZayDRwcZdQ.webp 1360w" src="/blog/images/audience-1360w~yLzqCC12jqw0ZayDRwcZdQ.webp" alt="Full Ampitheatre OCaml"></p>
<p>I presented our collective work on bringing OBuilder to non-Linux platforms. OBuilder is the underlying library responsible for providing sandboxed build environments for <a href="https://ci.ocamllabs.io/">ocaml-ci</a>, <a href="https://opam.ci.ocaml.org/">opam-repo-ci</a>, and <a href="https://check.ocamllabs.io">opam-healthcheck</a>. The talk covered the architecture of OBuilder, showing how it gets used in our multi-archtecture cluster of build machines. I also reviewed the Linux implementation that uses native Linux containerisation technology, like runC and cgroups. Then, moving onto the implementation on macOS, I demonstrated using user isolation to provide sandboxing with some file system tricks, followed by the implementation on Windows using Docker for Windows, which during testing found a number of interesting bugs in LWT and GNU Tar. The full details are available in the <a href="https://github.com/tmcgilchrist/ocaml-2022-submission/blob/master/ocurrent.pdf">Extended Abstract</a> and on GitHub <a href="https://github.com/ocurrent/obuilder/">https://github.com/ocurrent/obuilder/</a>.</p>
<p>The OCaml Workshop also featured an impressive back-to-back presentation from David Allsopp on opam's CLI compatibility work and the upcoming opam 2.2 features. He also covered how to make the OCaml compiler relocatable and how that will allow for fast switch creation in opam. Congratulations to David for making such a polished and well-received double show!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/ICFP-170w~P4mjUkdbZFIn1PAMQ9tBvw.webp 170w, /blog/images/ICFP-340w~7aR62yjXwJJrMfyE5iNBlg.webp 340w, /blog/images/ICFP-680w~MwYqMu-EbZsvmJD6XzdGHg.webp 680w, /blog/images/ICFP-1360w~tAUXuy0Zm2ZQzf4amdyKPw.webp 1360w" src="/blog/images/ICFP-1360w~tAUXuy0Zm2ZQzf4amdyKPw.webp" alt="ICFP"></p>
<h2>Functional Programming Development</h2>
<p>Although OCaml certainly stole the show after <a href="https://youtu.be/zJ4G0TKwzVc">KC's keynote</a>, earlier in the week I attended several fascinating workshops, a few of which I outline below. There was also a strong theme of Effects talks throughout the third day of ICFP, which I plan to come back to and review the papers in more detail. Below are some highlights, but the full Haskell Implementors Workshop (HIW) is <a href="https://www.youtube.com/playlist?list=PLyrlk8Xaylp4kkqJltshENjF_SL7-fDTn">available on YouTube</a>.</p>
<h3>Haskell</h3>
<p>On Sunday, I spent my day in the Haskell Implementors Workshop, which kicked off with the "State of GHC" talk from Simon Peyton Jones. Simon covered all the new and important developments in 9.4 and <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/status/ghc-9.6">9.6</a>. For me, the complete overhaul of <a href="https://www.haskell.org/ghc/blog/20220807-ghc-9.4.1-released.html">GHC’s Windows support</a>, including many fixes in WinIO, refactoring of GHC's error messages, and the ongoing work to upsteam GHCJS and WebAssembly backends into GHC, were the most exciting changes. The other takeaway was how Cabal development has been overhauled with a new team managing the project and the resulting acceleration of improvements making their way into Cabal, starting from Cabal 3.6. That, combined with the significantly improved Haskell LSP support, means Haskell tooling is in a great place.</p>
<h3>GHC &amp; Racket</h3>
<p>Alexis King's talk, "<a href="https://icfp22.sigplan.org/details/hiw-2022/6/A-look-across-the-pond-a-comparison-between-GHC-and-Racket-compilation-models">A Look Across the Pond: A Comparison Between GHC and Racket Compilation Models</a>," was a great highlight on how Racket tooling works and how Cabal could be further improved, and perhaps how we could improve opam. The key idea was that Racket uses a set of core data structures to represent packages and provides functions across those data structures, with the end-user tooling being a very thin wrapper around these functions. Alexis demonstrated how to query and manipulate the set of installed packages within Racket in a way that's impossible with current Cabal. This is an intriguing idea that hopefully gets some attention, and I wonder if the more dynamic representation available in Racket makes this easier compared to Haskell.</p>
<h3>GHC &amp; Mu</h3>
<p>My second recommendation is "<a href="https://icfp22.sigplan.org/details/hiw-2022/8/Compiling-Mu-with-GHC-Halfway-Down-the-Rabbit-Hole">Compiling Mu with GHC: Halfway Down the Rabbit Hole</a>" by Georgo Erdi, which covered the effort to port Mu to reuse the GHC compiler frontend and backend as much as possible. Currently Mu uses a custom compiler for its strict Haskell variant that includes MultiParam Typeclasses and Functional Dependencies, along with a variation on Type Families. I like hearing about compiler engineering efforts that need to handle existing codebases and think through the trade-offs. The choice of Functional Dependencies and Multiparam Typeclasses is a sweet spot in the design space for typeclasses, with Purescript choosing a similar approach.</p>
<h2>OCaml Reception</h2>
<p>Overall the ICFP had a huge buzz of excitement around OCaml 5.0 featuring Multicore support and effects. The palpable enthusiasm after <a href="https://youtu.be/zJ4G0TKwzVc">KC's impressive keynote</a> lasted throughout the rest of the week. I had many great conversations with both OCaml and Haskell people about the new features and how exciting the future is for OCaml. In fact, the OCaml Farewell Reception, held at the Ljubljana Zoo, attracted three times the expected number of attendees because everyone wanted to keep talking about the new features coming soon in OCaml 5.0. I spoke at length with a senior Haskell programmer who was very interested in OCaml Multicore, and we all enjoyed sampling a local honey liqueur and having a hot meal, which helped warm us that cool, rainy evening. As a special treat, the ICFP organisers even arranged for a real live camel to make an appearance! It was a delightful evening, and the perfect ending to ICFP 2022.</p>
<p>In the end, the week in Ljubljana was fulfilling, both on a personal and professional level. After over two years of limited in-person events, it was truly refreshing to meet and network with colleagues face to face.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/dragon-170w~G_0rlwJ3pTbK9oXXsF9OSg.webp 170w, /blog/images/dragon-340w~e42hT2mBULjLqjnV6aTGnA.webp 340w, /blog/images/dragon-680w~fMXdvNrzQo9EgUCwPqSDFQ.webp 680w, /blog/images/dragon-1360w~7TpYuOJRtdFCBB9vnd0Mqw.webp 1360w" src="/blog/images/dragon-1360w~7TpYuOJRtdFCBB9vnd0Mqw.webp" alt="Ljubljana Dragon">
<em>The Famous Ljubljana Dragon Bridge</em></p>
]]></description><link>https://tarides.com/blog/2022-10-10-icfp-2022-review</link><guid isPermaLink="false">https://tarides.com/blog/2022-10-10-icfp-2022-review.html</guid><dc:creator><![CDATA[ Tim McGilchrist ]]></dc:creator><pubDate>Mon, 10 Oct 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Sponsors High School Hackers]]></title><description><![CDATA[<p>Tarides is excited to sponsor the <a href="https://esolangconf.com/">Paradigm Conference</a> (previously EsoLangConf) high school hackathon. This weekend, students from all over the world will team up to solve tricky programming problems, investigate diverse features of a range of programming languages, and build cool things!</p>
<p>At Tarides, we are always looking for new ways to increase the awareness and adoption of OCaml. The Paradigm Conference is a fantastic opportunity for all students interested in computer science to discover real-world use cases of OCaml.</p>
<p>If you are a high school student interested in using OCaml to solve fun and complex problems while learning new skills and meeting new people, you can register for the conference <a href="https://docs.google.com/forms/d/e/1FAIpQLSdEuny13Vb7n3tMiJ9r1Ci3OoRSWhlU3nO73gjDdmpZywVnKw/viewform">here</a>.</p>
<h2>What Makes this Conference Unique?</h2>
<h3>Created by Students for Students</h3>
<p>Rohan Mehta and the organisational team provide a safe and inclusive hackathon space for attendees from anywhere in the world to explore different programming languages and concepts. Breaking out of the standard computer science curriculum, they have designed a hackathon specifically for high school students to showcase non-mainstream languages and the diversity of programming approaches available.</p>
<h3>A Diversity of Programming Paradigms</h3>
<p>The conference features different language tracks, focussing specifically on functional, array-based, and knowledge-based programming. Each team will choose from OCaml, Haskell, Clojure, Wolfram, and APL, letting them explore the unique features of these less well-known languages and discovering their benefits for themselves. The conference puts the joy of programming at the top of the priority list, giving students a fantastic opportunity to experiment and broaden their horizons.</p>
<p>If you want to learn about pattern matching, macros, or higher-order functions, then you’re in the right place!</p>
<h3>Knowledge Sharing</h3>
<p>The organisers have gathered learning resources for every language in the conference—a mammoth task in itself! Not only are they widely sharing knowledge that already exists (but may not be easy to find), but they are also creating <a href="https://docs.google.com/document/d/e/2PACX-1vRtBufinbvANjQUMJrFdKyQ0VhsICM6QJ5K040MswBFMqGxuIGDrgLYsDLT-4txw1ZkVd-AJ0LCjCCo/pub?urp=gmail_link">new living documents</a> for each language built from these existing resources and their own learning experiences.</p>
<h3>Combined Coding Competitions and World-Class Lectures</h3>
<p>The team has worked hard to create an engaging and interesting event by interspersing talks from industry language users and programming language experts with coding competitions and hackathon events. Naturally, any conference wouldn’t be complete without swag! Each attendee will receive a sticker and t-shirt, with additional prizes for the competition winners.</p>
<h2>Get Involved!</h2>
<h3>Attending</h3>
<p>Students will join teams (of up to 5) by either creating one in advance or registering as an individual. Everyone will be allocated to a team once the conference starts.</p>
<h3>Mentoring</h3>
<p>If you are a high school student and you’re interested in attending, you can sign up here. If you have a bit more experience with any of the languages featured, you can join as a mentor and help students grasp new languages and concepts.</p>
<p>Rohan and his team promise that if you attend Paradigm Conf 2022 “your programming worldview will be flipped upside down!”</p>
<p>You can find the Paradigm Conference on <a href="https://www.instagram.com/esolangconf/">Instagram</a> and <a href="https://twitter.com/EsolangT">twitter</a></p>
]]></description><link>https://tarides.com/blog/2022-09-23-tarides-sponsors-high-school-hackers</link><guid isPermaLink="false">https://tarides.com/blog/2022-09-23-tarides-sponsors-high-school-hackers.html</guid><dc:creator><![CDATA[ Gemma Gordon ]]></dc:creator><pubDate>Fri, 23 Sep 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Sponsors Girls Can Code]]></title><description><![CDATA[<p><em>The tech industry has long struggled with a lack of diversity. This existing imbalance combined with social and educational problems such as early gender bias still tends to prevent lots of people, including women, from entering the field. <em>Girls Can Code</em> aims to help young women learn programming and gain valuable experience working on projects alongside other like minded individuals.</em></p>
<p>Tarides is proud to share that we sponsored a <a href="https://girlscancode.fr"><em>Girls Can Code</em></a> summer camp! Between Aug 22nd and 27th, the camp offered participants a fully-packed week of programming, socialising, and learning.</p>
<h2>What is Girls Can Code?</h2>
<p><em>Girls Can Code</em> is an initiative launched by the organisation <a href="https://prologin.org"><em>Prologin</em></a>, hosting summer camps specifically aimed at teaching young women about computer programming. No prior experience is required, and they accept participants from secondary school and up through the equivalent of A-Levels. Attendance is free, since the events are run by students who generously volunteer their time and expertise.</p>
<p>Camps come in two variants: long and short. The long camps last for a week and the short ones for a weekend. The short camps cover less content, but can be organised more frequently. They are always in the form of a practical introduction to computer science, but may focus on special topics. The week-long summer camps start with an introduction to Python and then continue with several tutorials on various topics. Finally, all participants have the chance to complete a personal project on either robotics, video games, or microcontrollers.</p>
<h2>Impact</h2>
<p>At Tarides, we’re committed to using our resources to foster diversity and inclusion. Our own goal is to have 50% of our tech roles be filled by women. For that to be a reality, not just at Tarides but everywhere, more women need to feel welcome in the tech space. Sadly, according to this <a href="https://www.lemonde.fr/campus/article/2017/12/11/femmes-et-informatique-vingt-ans-de-desamour_5227726_4401467.html">Le Monde article</a>, only 11% of French women chose to pursue IT careers in 2010. A more <a href="https://technation.io/diversity-and-inclusion-in-uk-tech/#executive-summary">recent survey</a> by Tech Nation showed that in the UK, only 26% of the tech workforce is made up by women. While this number is better, there’s still a lot of work to be done before women make up 50% of the workforce in tech.</p>
<p><em>Girls Can Code</em> works proactively to inspire a generation of young French women and make a difference in the sector. They have been arranging summer camps since 2014 and have been growing steadily since. We’re proud to support their efforts for a more equitable future!</p>
]]></description><link>https://tarides.com/blog/2022-09-06-tarides-sponsors-girls-can-code</link><guid isPermaLink="false">https://tarides.com/blog/2022-09-06-tarides-sponsors-girls-can-code.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 06 Sep 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Goes on Holiday!]]></title><description><![CDATA[<p><em>Relaxing in today’s world can be difficult. Taking the time you need to cool off, refocus, and explore something new requires a solid amount of time in which you can disconnect from daily habits and find a new beat.</em></p>
<p>At Tarides we address this by providing the framework needed for our employees to take that unbroken time away from work. In August, all Tarides employees get two weeks of paid leave to go away and come back refreshed.</p>
<h2>How it Began</h2>
<p>We first trialled the two weeks of leave in 2021 to help alleviate the stress caused by the pandemic. Taking a solid two weeks off would allow everyone to slow down and enjoy the last of the summer months (or winter for our Australian colleagues!) without having to worry about losing pay or using up annual leave in the face of an uncertain global situation.</p>
<p>The results were very positive, with a lot of good feedback across the teams. People came back refreshed and inspired, easily making up for the time away. An important takeaway from the Tarides team was that since everyone was away at the same time, no one had to worry about having a pile of work waiting for them when they came back. This made everyone’s holiday more enjoyable and restful.</p>
<h2>It’s Back!</h2>
<p>Since it was a very popular measure last year, we decided to reintroduce it as a recurring event! From August 8th to August 19th, Tarides had its official 2022 office closure. We hope the entire Tarides team took some time to go on adventures or simply relax before the rest of the year.</p>
<p>Taking time off does not just allow everyone to recharge their batteries, but it also lets them experience new things that can generate moments of inspiration. That said, rest is also very powerful: when we rest, we prepare for the challenges ahead, increase our resilience, and strengthen our resolve.</p>
<p>Here’s to a great rest of 2022!</p>
]]></description><link>https://tarides.com/blog/2022-08-26-tarides-goes-on-holiday</link><guid isPermaLink="false">https://tarides.com/blog/2022-08-26-tarides-goes-on-holiday.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Fri, 26 Aug 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Irmin in the Browser]]></title><description><![CDATA[
<h2>Introduction</h2>
<p>Over the past six months, I have been working on using Irmin in the browser, including <code>irmin-server</code> and the GraphQL interface. This has been fun and a great learning journey for me. Before this internship, <code>irmin-server</code> was primarily a Unix-based application. My project was to port <code>irmin-server</code> to work in the browser and design interfaces for people to interact with the store (Irmin stores).</p>
<p>I was paired to work with Patrick Ferris as my mentor and with the entire Irmin team, who all contributed immensely to this project.</p>
<h2>Irmin and <code>irmin-server</code></h2>
<p>Irmin is simply a data store (database). It is based on the same design principle as Git with features to merge and branch data stores. Irmin has several stores (<code>irmin-mem</code>, <code>irmin-indexeddb</code>, <code>irmin-fs</code>, <code>irmin-chunk</code>, <code>irmin-git</code>) and store interfaces (<code>irmin-http</code>, <code>irmin-graphql</code>).</p>
<p><code>irmin-server</code> is a high-performance server for Irmin. For efficient communication, it implements a specialised <a href="https://github.com/mirage/irmin-server/blob/master/PROTOCOL.md">wire-protocol</a> to send and receive data over a bytestream. It wraps an Irmin store, providing a way to connect to the server and access the store via its API using a client. But the client makes an assumption that the user is on a Unix machine, which makes <code>irmin-server</code> primarily a Unix-based application.</p>
<h2><code>irmin/irmin-client</code> in the Browser</h2>
<p>In this modern age, it's become a necessity to make applications "offline first." Offline-First applications function without being affected by the intermittent lack of a network connection. It usually implies the ability to sync data between multiple devices. Irmin as a data store supports multiple backends, making it very portable. Plus, Irmin's mergeable replicated data-types make it much easier to build applications that can transform the state offline and resynchronise the state later, just like Git. With this concept, resynchronising Irmin stores (from server to client) is much simpler on <code>irmin-server</code>, which implements a specialised wire protocol for efficient communication. Making <code>irmin/irmin-client</code> work in the browser simply means that it would be possible to create offline-first web applications.</p>
<p>More information on offline-first applications can be found <a href="https://2022.ecoop.org/home/plf-2022">here</a></p>
<h2>The Problems</h2>
<p>An initial summary of the problem was published on this <a href="https://github.com/mirage/irmin-server/issues/46">issue</a>, but here is a quick breakdown of the problems we identified.</p>
<ol>
<li><strong><code>irmin-server</code> was tightly coupled around <code>conduit-lwt-unix</code>:</strong> <code>irmin-server</code> was initially designed to be a Unix-based application that established communication with a client via <code>conduit-lwt-unix</code>. This became a problem because <code>conduit-lwt-unix</code> cannot establish a communication from a browser. This meant that there was a need to abstract the I/O module so that every client will provide its I/O.</li>
<li><strong>Reuse some internal modules:</strong> We needed to reuse the <code>irmin-server</code> internal logic related to the protocol but provide a portable I/O interface that can work in the browser.</li>
<li><strong>Provide a browser communication channel:</strong> We needed a non-blocking way to establish a channel to create communication between <code>irmin-server</code> and the browser, and also pass data across this channel.</li>
</ol>
<h2>The Solutions</h2>
<h3><code>irmin-server</code> was tightly coupled around <code>conduit-lwt-unix</code></h3>
<p>Thanks to Zach Shipko, who abstracted the I/O library and split out <code>irmin-client-unix</code> and <code>irmin-client-cli</code> to have their own I/O module that depends on <code>conduit-lwt-unix</code> (<a href="https://github.com/mirage/irmin-server/pull/32">here</a>), a client can connect to a running <code>irmin-server</code> using its own I/O module. While he was working on the restructuring, I spent my time working on a sample project that combines <code>dream</code> with <code>irmin-graphql</code> (more on this project).</p>
<p>With the coupling out of the way, the next step was to create <code>irmin-client-jsoo</code>, a browser client with its own I/O module.</p>
<h3><code>irmin-server</code> was primarily a Unix-based application</h3>
<p>The <code>irmin-server</code> initial architecture had to be restructured to accommodate other platforms. To achieve this, <code>irmin-client</code> was no longer coupled with a specific I/O implementation. Rather, a Unix-based one was provided over conduit flows, which are <code>Lwt_io</code> input and output channels. This channel was established over a TCP connection or a Unix domain socket.</p>
<p>Right now, <code>irmin-server</code> can communicate with two (2) clients: <code>irmin-client-cli</code> from a command line and <code>irmin-client-unix</code> from a Unix-based machine. This project was about creating a third client: <code>irmin-client-jsoo</code>, to be called from browser applications.</p>
<h3>Enable communication from the browser</h3>
<p>After considering other options to create a communication channel for <code>irmin-client-jsoo</code>, like HTTP, RPC, etc., Patrick suggested WebSocket, so we decided to go with WebSocket, a bidirectional communication protocol between client and server.</p>
<h4>The Challenges</h4>
<p><code>irmin-server</code> uses flows to communicate between the server and the client and flows are bytestreams. WebSocket provides a bidirectional communication channel in the browser, but it is not stream-oriented rather it is message-oriented.</p>
<p>TCP (Transmission Control Protocol) is a type of protocol or standard to transfer information over the Internet while WebSocket is a message-oriented application protocol, which uses TCP as the transportation layer.</p>
<p>The idea behind the WebSocket protocol consists of reusing the established TCP connection between a client and server. Even though WebSocket is built on TCP, the data it passes is always either sent as a whole "message" or not at all. These implementations are non-blocking.</p>
<p>Since we are avoiding a full redesign of the <code>irmin-server</code> protocol, we had to make the message-oriented process seem like bytestreams of data.</p>
<h2>More on <code>irmin-client-jsoo</code></h2>
<p>Communicating with <code>irmin-server</code> from the browser is very easy. You can achieve that by following these steps:</p>
<ol>
<li>Pin <code>irmin-server</code>, using this command: <code>opam pin add git+https://github.com/mirage/irmin-server/commit#013a28fd1507f8ba69494515533119804903aa99</code></li>
<li>Set up the server.</li>
</ol>
<pre><code>open Lwt.Syntax
module Store = Irmin_mem.KV.Make (Irmin.Contents.String)
module Server = Irmin_server.Make (Store)

let main =
  let uri = Uri.of_string "ws://localhost:9090/ws" in
  let config = Irmin_git.config "penit" in
  let* store = Store.Repo.v config in
  let* main = Store.main store in
  let* server = Server.v ~uri config in
  let () = Format.printf "Listening on %a@." Uri.pp uri in
  Server.serve server

let () = Lwt_main.run main
</code></pre>
<p><a href="https://github.com/dinakajoy/pen-it-down/blob/main/server/server.ml">Check out this implementation</a></p>
<ol start="3">
<li>Create the client and ping the server.</li>
</ol>
<pre><code>module Store = Irmin_mem.KV.Make (Irmin.Contents.String)
module Client = Irmin_client_jsoo.Make (Store)

let config = Irmin_client_jsoo.config (Uri.of_string "ws://localhost:9090/ws")
let client = Client.Repo.v config in
Client.ping client
</code></pre>
<p>More examples can be found on <a href="https://github.com/mirage/irmin-server/tree/master/examples">here</a></p>
<h2>My Projects</h2>
<p><strong>Simple Mini GitHub:</strong>
I worked on this project to experiment with combining <code>irmin-graphql</code> with <code>dream</code>. This turned out simpler than I thought. You only need to expose <code>irmin-graphql</code> schema. In this application, you simply enter a GitHub repository, and the repository details such as name, date, author, commit message, and README file will be displayed. You can also open <code>/graphiql</code> and make queries.</p>
<p>The full code can be accessed <a href="https://github.com/dinakajoy/simple_mini_github">here</a>.</p>
<p><strong>Pen-It-Down:</strong>
Pen-it-down is a note app that uses <code>irmin-indexeddb</code> and <code>irmin-server</code> to show an offline-first functionality. Users can type in their notes without being bothered about internet connectivity. You can create, edit, delete, and sync your notes to the server.</p>
<p>The full code can be accessed <a href="https://github.com/dinakajoy/pen-it-down">here</a>.</p>
<h2>Conclusion</h2>
<p>Working on this project was challenging! I am so glad I had the opportunity to work on it, even though there were days I felt lost. Some days I was confused because it seemed I was doing the wrong thing. Other days I was happy because things worked as expected! It’s basically been about research and experimenting for me. I learned a lot from Patrick and Zach. I was exposed to networking concepts like the network layers, client-server handshake, data encryption, and decryption, and I got to try out WebSocket for the first time. I look forward to building more projects with OCaml.</p>
]]></description><link>https://tarides.com/blog/2022-08-02-irmin-in-the-browser</link><guid isPermaLink="false">https://tarides.com/blog/2022-08-02-irmin-in-the-browser.html</guid><dc:creator><![CDATA[ Odinaka Joy ]]></dc:creator><pubDate>Tue, 02 Aug 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides is on the Wavestone Radar!]]></title><description><![CDATA[<p><em>Cybersecurity is a growing concern for individuals and companies alike. At Tarides, security is at the centre of every solution we provide, and this year we have been recognised for our efforts! We’ve been accepted to <a href="/blog/2022-06-28-thales-cyber-station-f-selection/">Cyber@StationF’s acceleration program</a> and are now featured in the 2022 Cybersecurity Startup Radar.</em></p>
<p>Tarides is proud to announce that we’re part of the 2022 Cybersecurity Startup Radar by Wavestone and BPIFrance! It spotlights promising startups who are making a difference in the cybersecurity arena. Tarides has been featured in the Radar twice before, in 2019 and in 2020.</p>
<p>Tarides is featured under the ‘IoT Security’ section of the Wavestone Radar, demonstrating our commitment to creating safer and more efficient software for the IoT (Internet of Things) ecosystem. At Tarides, security-by-design is at the core of everything we do; it’s our guiding development principle that we use to produce a variety of solutions addressing the different challenges present in this space.</p>
<h2>Tarides’s Cybersecurity Solutions</h2>
<p>Most of today’s software solutions are very complex and as a result often inefficient and vulnerable to attack. It is well documented that IoT technology and devices can pose safety risks due to their development environment and limited processing capabilities that don’t offer full protection from attacks.</p>
<p>Our technology solves these problems in a revolutionary way by combining the features of OCaml (focused on security, safety, and efficiency) with state-of-the-art tools (as part of MirageOS) to provide solutions to complex problems where mistakes can have disastrous consequences.</p>
<p>Firstly, we provide several solutions relating to device connectivity, from internet protocols to low-bandwidth networks. Secondly, we build custom applications that are securely deployed on IoT devices either on bare-metal or within hypervisors. MirageOS provides an efficient IoT environment with a small footprint. Finally, we address the IoT security layer via formally verified cryptographic libraries and other security building blocks that match the latest standards.</p>
<p>We maintain a special focus on cybersecurity to ensure that efficiency and security go hand-in-hand, and that one is not achieved at the expense of the other. We develop and maintain secure, fast, and performant code solutions that leverage OCaml's strong safety features for reliable results. We collaborate closely with the thriving open-source community surrounding the language, which constantly tests and audits its performance.</p>
<h2>About the Radar</h2>
<p>The Radar is used to analyse the French cybersecurity sector: what solutions are being worked on, what trends are emerging, and what kind of innovation is happening. In light of the current geopolitical climate, cybersecurity is a major strategic issue now more than ever. Consequently, this year’s Radar is perhaps even more salient than those of previous years, as it highlights the role of France’s innovation ecosystem in advancing cybersecurity goals.
By being featured, Tarides gains a lot of visibility in the sector and can also discover other startups and promising projects in the same field. We’re excited to be part of this great networking opportunity!</p>
<h2>About Wavestone &amp; BPI France</h2>
<p><a href="https://www.wavestone.com/en/">Wavestone</a> operates at the intersection of management and consulting, helping their clients not just overcome but master challenges whether they be digital, competitive, or environmental. Their mission is to guide organisations during critical transformations, helping them obtain the best results. Furthermore, they are also committed to promoting ethical and sustainable solutions that benefit society as a whole.</p>
<p><a href="https://www.bpifrance.com">BPI France</a> is a financial institution whose mission is to support entrepreneurs and visionaries who take risks to achieve their goals and grow their businesses. BPI France has many resources that they make available to companies, offering support for innovation, coaching, acceleration programmes, and international expansion. They focus on smaller businesses such as microbusinesses, SMEs, and mid-caps, but also offer solutions for larger companies.</p>
]]></description><link>https://tarides.com/blog/2022-07-19-tarides-is-on-the-wavestone-radar</link><guid isPermaLink="false">https://tarides.com/blog/2022-07-19-tarides-is-on-the-wavestone-radar.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 19 Jul 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Faster Incremental Builds with Dune 3]]></title><description><![CDATA[<p>In February 2022, we released Dune 3.0. This updated version is the result of considerable development work over the previous six months. Dune 3.0 contains many new features, one of which is “watch mode,” an exciting new feature explained below.</p>
<p>As a build system, Dune’s main goal is to build targets. These targets can be either files (like an executable file) or “aliases,” a group of targets that can have a visible outcome (like running tests). By default, when running a build, Dune receives a target. Dune will then build it and exit. For example <code>dune build</code> (an alias for <code>dune build @all</code>) will build everything it knows about, then exit.</p>
<p>When working on a piece of code, many developers use an edit-save-build loop:</p>
<ul>
<li>Edit a piece of code</li>
<li>Save the corresponding file</li>
<li>Run a build command</li>
</ul>
<p>Using the outcome of the build (i.e., Did the build work? Did the tests all pass?), developers start a new iteration of this loop manually, but it’s more efficient to have a quick, automated iteration process. This is the goal of the “watch mode.”</p>
<p>When active, Dune will watch the source files in a project, and when one of them has changed, it will re-execute the same build command automatically and display the results of the build. It doesn’t exit automatically, so it continues watching for changes. This is more efficient because the developer can stay focussed in their text editor and see the build start automatically when the file is saved. You can enable watch mode by passing the <code>-w</code> flag to <code>dune</code>, like <code>dune build -w</code>.</p>
<p>A simple implementation is to have a special process check for file changes in the source tree and run the build command when something has changed. This works, but it isn’t a very precise solution. First because the external process doesn’t know about the relationships between the files, so it will run more builds than necessary. For example, changing a README file usually should not trigger a new build because it isn’t a source file. But also, there are various subtleties to handle. If a file is changed while a build is running, a new build should be started, but the previous one should also be cancelled.</p>
<p>For these reasons, it’s more efficient to have the build system itself “drive” the watch mode. This is how it’s implemented in Dune 1.x and Dune 2.x. When starting a build in watch mode for a certain target, Dune computes the set of files that can influence this target (using the build rules) and calls an external process that can subscribe to file changes. When a file changes, Dune cancels existing builds and will start a new one.</p>
<p>This is better, but it’s still not very efficient. To see why, let’s see what Dune does and how it can be fast.</p>
<p>To run a build, Dune needs to do two things:</p>
<ul>
<li>Load the rules: detect the workspace (determine which files to consider), parse the <code>dune</code> files (open them, transform them into s-expressions and stanzas), and interpret them (execute the logic to transform the stanzas into rules)</li>
<li>Execute the rules: copy files around, call external processes, etc.</li>
</ul>
<p>The time it takes to load the rules is related to the size of the current workspace (number and size of <code>dune</code> files). This is particularly noticeable in organisations that use a monorepo (all the source code in a large Dune workspace). It's difficult to make this step fast because it has a lot of work to do, but it's doable by avoiding computing the same things over and over, made possible by an internal memoisation framework. An initial version of this system is described <a href="https://dune.build/blog/new-computation-model/">in this blog post</a>.</p>
<p>The time it takes to execute the rules depends on the amount of work necessary. For example, a clean build needs to execute most of the build actions, while a second full build usually needs to execute no rule at all. To make this step fast, Dune tries to avoid executing actions that wouldn't change the final outcome (a technique called early cutoff), and it executes independent actions in parallel.</p>
<p>In the context of the watch mode, whenever a new build starts, Dune has to forget everything it knows about the workspace, so it will reload all the rules. This is pretty wasteful.</p>
<p>To do better, the new watch mode in Dune 3 makes rule loading incremental. For example, if a <code>dune</code> file is edited to add a stanza, Dune parses it again, only adding the new rule. The other ones are not interpreted. This ensures very fast iteration times.</p>
<p>This project was challenging because for it to work, everything in the Dune core had to be ported to the memoisation API. For instance, the library loading code (which looks for library definitions in the current opam switch and in the Dune workspace) relied on a “classic” cache (a global hash table) to avoid parsing files repeatedly. However, this does not play nice with the memoisation API, which assumes that the functions it caches are all pure. So, in Dune 3, this piece of code has been rewritten on top of the memoisation API. This has another benefit: since file system accesses (“does this file exist?”) are cached too, the memoisation API now has an idea of which functions can read which files. This is used to re-evaluate only the affected parts of the rule graph once a file is modified.</p>
<p>Thanks to that work, watch mode is now a lot more responsive than in Dune 2.x. This performance improvement is barely noticeable in small-to-medium-sized projects, but it is essential in a workspace with several million lines of OCaml code. In such a setting, re-evaluating rules over and over (either by manually running <code>dune build</code> or by using the strategy in Dune 2.x) means that the feedback loop takes dozens of seconds instead being almost instantaneous.</p>
<p>As Dune performance improves, it's able to support workspaces that are larger and larger. This means that the bottlenecks shift to different places. We'll continue to improve Dune so that it stays a build system that's convenient to use and endlessly scalable.</p>
]]></description><link>https://tarides.com/blog/2022-07-12-faster-incremental-builds-with-dune-3</link><guid isPermaLink="false">https://tarides.com/blog/2022-07-12-faster-incremental-builds-with-dune-3.html</guid><dc:creator><![CDATA[ Etienne Millon ]]></dc:creator><pubDate>Tue, 12 Jul 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[The Magic of Merlin]]></title><description><![CDATA[<p>Tarides provides support and development services for OCaml tools, packages, and libraries for our commercial partners and for the benefit of the entire OCaml community. We focus on groundbreaking innovation, feature development, and crucial maintenance of OCaml-based projects. One of these projects is called Merlin, an advanced Integrated Development Environment (IDE).</p>
<h2>Overview</h2>
<p>When someone hears the word "Merlin," images of King Arthur, the Round Table, and the Holy Grail might come to mind. This fantasy world introduces us to Merlin, the mighty wizard who becomes Arthur's advisor and mentor. When speaking in the world of technology, our Merlin's magic comes in the form of a powerful editor service that offers completion, typing, navigation, refactoring, and code generation. This IDE was made specifically for OCaml, so it complements the safety and expressiveness of the OCaml language with powerful tools. In short, Merlin is a loyal companion that magically helps OCaml developers be more productive and write better programs!</p>
<h2>Installing Merlin</h2>
<p>Merlin integrates with most editors, including Visual Studio Code (VSCode), via the Language Server Protocol (LSP). It also implements custom features on top of LSP to enable powerful developer workflows that are only available in OCaml. As a result, Merlin helps developers write in OCaml more easily, as they’re provided with instant feedback on any possible errors that they could make. It helps train these programmers to make fewer errors in the future and eases project maintenance by automating complex (and otherwise error-prone) workflows.</p>
<p>The easiest way to install Merlin is by using VSCode's OCaml extension. See the <a href="https://github.com/ocamllabs/vscode-ocaml-platform#readme">manual</a> for more information. You can also use Merlin through Vim or GNU Emacs. Read more on <a href="https://ocaml.github.io/merlin/">OCaml.org's Merlin page</a>.</p>
<p>Once the installation process is complete, your editor will automatically start Merlin whenever an <code>.ml</code> or <code>.mli</code> file is opened.</p>
<p>Voilà! So easy!</p>
<h2>Merlin in Use</h2>
<p>Here's a glimpse of what Merlin looks like in VSCode:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/merlin-170w~6JbrnLOkjtOt55mCOwd2MQ.webp 170w, /blog/images/merlin-340w~UIJ2rANfIsuOw0P2LFzFGQ.webp 340w, /blog/images/merlin-680w~qLdKhrSFKu8B49Tdv5bStw.webp 680w, /blog/images/merlin-1360w~SNOFu6KKe_yj0X_Da-siAw.webp 1360w" src="/blog/images/merlin-1360w~SNOFu6KKe_yj0X_Da-siAw.webp" alt="Merlin in VSCode"></p>
<p>Merlin is the default IDE tool for OCaml developers. It's utilised by many commercial OCaml users who are funding maintenance work and evolutions of the project. For instance, Jane Street developers, sysadmin, and traders confidently use Merlin to browse, maintain, and modify an OCaml codebase that runs into millions of lines of code. This codebase provides the foundation for Jane Street's financial market trading around the world. It critically depends on tools (such as Merlin’s) to ensure it can continue to evolve while functioning safely.</p>
<h2>Beneficial Features in Merlin</h2>
<p>One of Merlin's main developers, Frédéric Bour, says his favourite feature is "completion," which has the ability to complete a prefix typed by a programmer in a manner that is (somewhat) relevant in the context. He says, "I like it for two reasons: less things to remember (and in a programming language, usually you have to recall exactly because there is not much room for fuzzy interpretation) and also it makes things 'discoverable.' Sometimes you work in an area that is new to you, and looking at the Merlin view with its suggestions really helps engineers become acquainted with this program."</p>
<p>The Tarides CTO, Thomas Gazagnaire, loves that Merlin "is the perfect companion to any professional OCaml developer. It helps navigate completely new codebases by providing the necessary feedback to learn the project (and the OCaml language!) quicker. It is also super useful to refactor large pieces of existing code, with immediate feedback and hints. I remember very clearly when I started using Merlin on my projects. It provided me a great productivity boost that completely changed the way I programmed in OCaml and made me much more effective. Tarides is now committed to make sure this tool continues to be supported actively. We're always adding new features to improve developer productivity, like value and type renaming, semantic search over a whole project, etc.”</p>
<h2>Learn More</h2>
<p>If you'd like to read more about Merlin, or become a contributor, visit its <a href="https://github.com/ocaml/merlin">GitHub repo</a>, and feel free to <a href="https://github.com/ocaml/merlin/issues">open an Issue</a> if you have any suggestions. Please <a href="/contact/">contact us</a> if you would like to subscribe to commercial support or discuss future development.</p>
]]></description><link>https://tarides.com/blog/2022-07-05-the-magic-of-merlin</link><guid isPermaLink="false">https://tarides.com/blog/2022-07-05-the-magic-of-merlin.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 05 Jul 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Thales Cyber@Station F Selection]]></title><description><![CDATA[<p><em>The online world is becoming an increasingly bigger part of our everyday lives, bringing the issue of cybersecurity to the forefront of more and more minds. At Tarides we put security at the centre of everything we do, and we’re honoured to be part of <em>Cyber@Station F</em> in 2022.</em></p>
<p>Tarides is thrilled to announce that we have been selected for the Cyber@Station F Acceleration Program! It’s a fantastic opportunity for Tarides to exchange information and collaborate with other startups, as well as connect with the cybersecurity giant Thales.</p>
<h2>The Program</h2>
<p>The  Cyber@Station F program was established by <a href="https://thalesdigital.io/">Thales Digital Factory</a> at <a href="https://stationf.co">Station F</a> to “help accelerate startups’ development in cybersecurity by providing them advice, expertise, and access to our big markets.”</p>
<p>Thales Cyber@Station F is a startup acceleration program that centres around the areas of cybersecurity &amp; trust. It guides its participants through four stages: Select, Define, Deliver &amp; Test, and Business Acceleration. The stages are designed to give each startup an opportunity to come up with and test a proof of concept with potential clients, which if successful is then developed further.</p>
<h2>What We Do</h2>
<p>At Tarides we know that a lot of today’s technology solutions are overly complex, vulnerable to attack, and time consuming to develop. As an alternative to this, we provide secure, safe, and efficient solutions by leveraging the features of OCaml in combination with cutting-edge tools and technologies. In collaboration with a rich open-source community, we develop and maintain the OCaml language, the operating system MirageOS including Unikernels, as well as a range of developer tools.</p>
<p>The OCaml language is efficient, writes safe and secure code, and is easy to maintain and adapt thanks to its modular nature. When combined with MirageOS and Unikernels, which reduce runtime complexity for lightweight and accurate results, OCaml’s already impressive features are used more effectively. Finally, our range of modern, easy-to-use tools offer developers a range of options for writing projects in OCaml.</p>
<p>Cybersecurity is at the heart of what we do, and we’re excited to combine our knowledge and experience with the substantial resources offered by Thales and Station F. We are also looking forward to engaging with other companies in the same industry and discovering new opportunities for our projects and our clients.</p>
<h2>Thales</h2>
<p>Thales is a global leader in technology innovations and solutions specialising in digital and ‘deep tech’ innovations such as Big Data, artificial intelligence, connectivity, cybersecurity, and quantum technology. Their goal is to invest in these technologies to build a better future that people can trust. Thales focuses on five vertical markets: digital identity and security, defence and security, aerospace, space, and transport. Their clients play a central role in these socially important markets.</p>
<h2>Station F</h2>
<p>Located in Paris, Station F is a startup campus that hosts over 1000 startups. StationF offers everything an entrepreneur needs to start and grow their business, making over 35 public services, 150 perks, and 600 workshops and events available to people in their network. Their services include startup programs on specific themes, special mentorship offices, a Flatmates service, exclusive discounts, and much more.</p>
<h2>More Coming Soon</h2>
<p>The programme started at the beginning of June, and we’ll follow up with more developments as they happen. In the meantime, we’ve selected some relevant posts and information on our work in cybersecurity you can read if you want to know more. We’ve received funding from the EU for our work on a <a href="/blog/2021-04-30-scop-selected-for-dapsi-initiative/">secure open messaging</a> platform, we’ve been laureates of the <a href="/blog/2019-07-05-i-lab-2019/">i-Lab innovation contest</a> for our work on Osmose, and we’ve won the <a href="/blog/2019-12-11-tarides-wins-the-fic-2020-startup-award/">2020 FIC startup award</a>.  If you want to read up on the technical side of things, these papers on <a href="https://www.usenix.org/system/files/conference/usenixsecurity15/sec15-paper-kaloper-mersinjak.pdf">unikernels</a> and on  <a href="https://anil.recoil.org/papers/2018-hotpost-osmose.pdf">Osmose</a> are a good place to start.</p>
]]></description><link>https://tarides.com/blog/2022-06-28-thales-cyber-station-f-selection</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-28-thales-cyber-station-f-selection.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 28 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Team Tarides Visits a 17th Century Chateau]]></title><description><![CDATA[<p><em>Everyone at Tarides recently had the opportunity to meet up in person for the first time! Since the global pandemic left much of our distributed team unable to meet, we organised a working retreat that brought all our teams together to work, learn, and have fun.</em></p>
<p>This was the first formal retreat we’ve held as a company, and our goals were to provide an opportunity to disconnect from day-to-day work, meet with new people, and discuss new projects - all in a novel environment surrounded by fresh air and open space. During the course of the pandemic, Tarides grew from 36 to 71 people based in 11+ countries, with few of them ever having met in person. As soon as it was safe to do so, we knew that gathering together would give us an invaluable experience in building and reinforcing our own company culture.</p>
<p>This May, all members of Tarides were invited to a beautiful 17th century chateau at Les Prés D’Ecoublay, surrounded by fields, orchards, and lush green forests. Attendees participated in exciting workshops, team-building activities, tech-talks, and inspiring discussions. Over the course of two days, everyone had plenty of time to collaborate, socialise, and eat fantastic food!</p>
<h3>What We Got Up To</h3>
<p>The fun began at 9am on Thursday, May 19th, when Tarides’s distributed global team gathered to make their way to what was to be their castle for the next two days. Greeted by a feast of French pastries and fresh fruit, introductions were made between people from Australia, France, India, Germany, the USA, the UK, and many more countries.</p>
<h4>Tech Talks &amp; Tutorials</h4>
<p>Over the next couple of days, the itinerary left plenty of space for knowledge sharing via tutorials and ‘tech talks’ or presentations. KC Sivaramakrishnan gave everyone a sneak peak of OCaml 5 with a detailed <a href="https://github.com/kayceesrk/ocaml5-tutorial/">tutorial</a>. The tutorial is openly available on GitHub. Please give us feedback if you try it!</p>
<p>Engineers Sonja Heinze and Jan Midtgaard held presentations (so-called ‘tech talks’) in the main hall for everyone’s benefit. Sonja’s talk was on the benefits of <a href="https://www.outreachy.org">Outreachy</a>, an open-source internship coordinator that creates opportunities for those most affected by underrepresentation or discrimination. Jan introduced everyone to property-based testing, a fascinating way to test the ‘correctness’ of code. The goal of the 'tech talks' is to provide a safe space for people to share their work with others, where everyone is welcome to ask questions and discuss the topics covered.</p>
<h4>Cross-Team Collaboration</h4>
<p>There were also plenty of opportunities for teams to meet and work together, with individual meeting rooms readily at hand. The coffee machines (and accompanying sweets) in each room significantly boosted the productivity of all teams! Teamwork was not limited to individual groups, but time was made for cross-team brainstorming and collaboration.</p>
<p>All of the teams at Tarides are working towards the big OCaml 5.0 release scheduled for later this year, and we took the opportunity to use this release as a focal point to align our goals for the next few months. Each team spent some time together to conceptualise their team-specific goals first, and then the Team Leads followed up with an "office hours" drop in session to discuss cross-project interaction and ideas. Creating dedicated space to think beyond the usual daily tasks and projects proved to be really valuable, producing insights and connections that may have otherwise been missed.</p>
<h4>Fun &amp; Games!</h4>
<p>It wouldn’t be a working retreat without opportunities for relaxation and fun! We got competitive in a treasure hunt that had us searching the forest and tall grasses for clues. There was plenty of time to explore the vast castle grounds, which included a pool, archery arena, ping pong table, karaoke pavilion, and much more.</p>
<h3>Until Next Time</h3>
<p>The off-site was a huge success! As a distributed team, it’s important to occasionally get together and put a face to a name (or Slack and GitHub handle!). Everyone at Tarides looks forward to the next off-site, wherever it may take place.</p>
]]></description><link>https://tarides.com/blog/2022-06-23-team-tarides-visits-a-17th-century-chateau</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-23-team-tarides-visits-a-17th-century-chateau.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 23 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Functional Conf 2022]]></title><description><![CDATA[<p>This year, Tarides attended the 2022 <em>Functional Conf</em> in India. Tarides’s engineers Sudha Parimala and Shakthi Kannan gave presentations on the OCaml platform and <em>Sandmark</em>, a continuous benchmarking tool for Multicore OCaml.</p>
<p>The <em>Functional Conf</em> is a three-day conference on everything functional programming! It’s a great event for beginners and experienced developers alike. Beginners have the opportunity to be introduced to different functional programming languages and understand their fundamental principles, and those who are more experienced have plenty to learn from both participants and speakers on how they have leveraged functional programming in their projects.</p>
<p>This year’s <em>Functional Conf</em> was held online and attended by people from around the world. It has been referred to as “Asia’s premiere functional programming conference” and welcomes participants from a broad range of backgrounds.</p>
<h2><em>OCaml Platform 2022</em></h2>
<p><a href="https://www.youtube.com/watch?v=tv4_Le4E-gQ">Sudha Parimala’s talk</a> is a great introduction to OCaml and its features, covering the installation process and "Hello World," as well as more advanced topics such as the text editor, publishing a library, and debugging. For the <em>Functional Conf</em>, it was a great way to give people interested in functional programming a taste of OCaml and what makes it stand out.</p>
<p>The presentation is a fantastic resource for people who are starting their journey in OCaml and want to know more about what they can do with the language, as well as for people further ahead looking for inspiration on different ways to progress.</p>
<h2><em>Benchmarking (Multicore) OCaml</em></h2>
<p><a href="https://www.youtube.com/watch?v=_-4XNtKs3wM">Shakthi Kannan’s talk</a> centres around Sandmark, the benchmarking suite designed to test various parts of the OCaml compiler and its runtime. Benchmarking is a challenging process. As a result, there are few tools available that do the job well. OCaml’s Sandmark can test various performance axes such as CPU, memory, and I/O, as its tools build the compiler under various configuration settings. It also comes with a dashboard that lets the user explore the results of benchmarking runs in an interactive and easily digestible format.</p>
<p>In his talk, Shakthi describes the journey to Sandmark, originally developed to support the Multicore OCaml project. He covers the challenges the team faced and the lessons they learned along the way, especially with an evolving programming language and the need to support multiple CPU architectures. It’s an amazing resource for teams who are looking to set up their own benchmarking procedures, and it is also a great example of how to approach a difficult task as a team.</p>
<h2><em>In Conclusion</em></h2>
<p>The <em>Functional Conf</em> is a great conference that brings the growing community of functional programmers together. It offers opportunities for people to learn about functional programming and exchange information with others in similar fields. Tarides is proud to have participated in their effort to bring functional languages to the forefront of programming.</p>
<p>To learn more about <em>Functional Conf</em> you can visit <a href="https://confengine.com/conferences/functional-conf-2022">their website</a>, along with the individual pages on <a href="https://confengine.com/conferences/functional-conf-2022/proposal/16096/ocaml-platform-in-2022">Sudha’s talk</a> and <a href="https://confengine.com/conferences/functional-conf-2022/proposal/16102/fast-and-curious-benchmarking-multicore-ocaml">Shakthi’s talk</a>, respectively.</p>
]]></description><link>https://tarides.com/blog/2022-06-21-functional-conf-2022</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-21-functional-conf-2022.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 21 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml 5 Alpha Release]]></title><description><![CDATA[<p><em>OCaml 5 is live! This major release introduces domains and effects, delivering unprecedented speed and efficiency to OCaml. Testing shows that OCaml 5 is able to outperform Go and closely match Rust in terms of performance. Keep reading for more details!</em></p>
<p>Tarides is thrilled that the alpha release of the long-awaited OCaml 5 is live! OCaml 5 is the culmination of over 8 years of research and engineering into concurrency and parallelism support for OCaml, made real thanks to hard work and dedication from all corners of the community. Tarides has been a major contributor to the engineering effort. Our engineers have contributed not only to the core compiler but also to the tools around release readiness, ecosystem compatibility testing, and continuous performance monitoring.</p>
<p><strong>If you are using OCaml in an industrial setting (or if you are interested to do so), we'd like to make sure everything is ready for you to move to OCaml 5 and benefit from the new performance boost. Tell us what you need in this <a href="https://framaforms.org/tarides-ocaml-5-user-survey-1655303113">user survey</a>.</strong></p>
<h2>What Kind of Performance to Expect?</h2>
<p>This update brings <em>unprecedented results</em> in terms of performance, with an HTTP server based on OCaml 5’s <a href="https://github.com/ocaml-multicore/eio"><code>Eio</code></a> being able to serve 1M+ requests/sec, outperforming Go’s <code>nethttp</code>, and closely matching Rust’s <code>hyper</code> performance! This is just a small indication of OCaml 5’s potential in terms of speed and efficiency.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/http_load1-170w~RZckgL6RDd06Q9ouLGqLJQ.webp 170w, /blog/images/http_load1-340w~FCaiS-rUwT1xzrczGpYpKg.webp 340w, /blog/images/http_load1-680w~tTh6XJY6gRfFUzO8ESaIkg.webp 680w, /blog/images/http_load1-1360w~3JAovQ7etL5s-zNQ-g2iSQ.webp 1360w" src="/blog/images/http_load1-1360w~3JAovQ7etL5s-zNQ-g2iSQ.webp" alt="HTTP Load">
<img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/http_cores-170w~1Kzb-EuWKivQYYpJqW9ZOQ.webp 170w, /blog/images/http_cores-340w~Q1U_ZoD-jhH79E1J61G-xg.webp 340w, /blog/images/http_cores-680w~lsfzuAVwOOvti7HxKEf8_A.webp 680w, /blog/images/http_cores-1360w~cYaEj-kXvokoKC_cc4Sq0A.webp 1360w" src="/blog/images/http_cores-1360w~cYaEj-kXvokoKC_cc4Sq0A.webp" alt="HTTP Cores"></p>
<p>As it’s an alpha version, it’s still subject to some change and fine tuning. This means that we need your help to make OCaml 5 better! Please give <a href="https://discuss.ocaml.org">plenty of feedback</a> and <a href="https://github.com/ocaml/ocaml/issues">report any bugs</a> you find.</p>
<h2>Under the Hood</h2>
<p>The alpha release of OCaml 5 adds support for shared memory parallel execution via <em>domains</em> and a new model for concurrent execution via <em>effect handlers</em>.</p>
<p>Domains enable shared-memory parallel programming that allow OCaml programs to run on multiple cores. With domains, OCaml programs will scale better by exploiting multicore processing. Effect handlers are a mechanism for concurrent programming. With the introduction of effect handlers, simple direct-style OCaml code will be flexible, easy to develop, debug, and maintain. No more monads for concurrency! These features will benefit the entire ecosystem and community, and we expect it to attract many new users to the language.</p>
<p>The Standard Library is gaining several of the parallelism primitives previously only found in the Threads library (Condition, Mutex, and Semaphore). Interestingly, having added domains and effects, we hope and expect that most users will never need to use them directly! Instead, we warmly encourage users to look at adopting <a href="https://github.com/ocaml-multicore/domainslib"><code>domainslib</code></a> to parallelise programs and <a href="https://github.com/ocaml-multicore/eio"><code>Eio</code></a> as a replacement for Lwt/Async monadic-style concurrency.</p>
<p>This work seeks to remain entirely backwards-compatible. Programs written for any version of OCaml 4, even if they use the Thread library, will continue to work with the same semantics, similar performance, and as always for OCaml, without crashes.</p>
<h2>Next Steps</h2>
<h3>Installation</h3>
<p>For instructions on how to install OCaml 5 on your machine, <a href="https://discuss.ocaml.org/t/ocaml-5-0-zeroth-alpha-release/10026">Florian Angeletti’s Discuss post</a> goes into great detail on how to do so, depending on what version of OCaml you’re running and what machine you have. KC Sivaramakrishnan has also created a <a href="https://github.com/kayceesrk/ocaml5-tutorial/">tutorial</a> on OCaml 5 that introduces its new parallelism features, a great resource for anyone looking to make the most of the update.</p>
<p>Other OCaml 5 documentation includes information on <a href="https://kcsrk.info/webman/manual/parallelism.html">parallelism</a>, <a href="https://kcsrk.info/webman/manual/effects.html">effect handlers</a>, and the <a href="https://kcsrk.info/webman/manual/memorymodel.html">memory model</a>.</p>
<h3>Feedback</h3>
<p>We want to reiterate that as with any alpha release of OCaml, we’re keen to hear about bugs and performance regressions. The move to parallel OCaml may bring new debugging challenges, but it remains the case that pure OCaml programs which do not use unsafe features should absolutely never crash. We’ll be taking part in the discussion on <a href="https://github.com/ocaml/ocaml/issues">GitHub</a>, <a href="https://discuss.ocaml.org">Discuss</a>, and <a href="https://twitter.com/tarides_?s=20&amp;t=xD04dp9D8eDpCxX6WDkC0A">Twitter</a>.</p>
<p>The change in major version number (from 4.<em>n</em> to 5.<em>n</em>) may result in minor breaking changes which affect your packages, particularly if you’ve been allowing some deprecation warnings to slip through in the past! We’ll be following up with more information about the required tweaks that may be required for packages supporting both old and new versions of OCaml, as well as with specifics on the testing infrastructure.</p>
<h3>Survey</h3>
<p>As discussed above, we’ve created a <a href="https://framaforms.org/tarides-ocaml-5-user-survey-1655303113">user survey</a> to help us get a better sense of how people are planning on using OCaml 5. It would be very helpful if you could fill it out for us, which should only take a few minutes.</p>
<h3>OCaml 5 Timeline:</h3>
<ul>
<li>The beta release will take place once these <a href="https://github.com/ocaml/ocaml/milestone/40">issues</a> have been resolved.</li>
<li>The final release is expected in September.</li>
<li>There is <em>no time limit</em> on reporting bugs, so please <a href="https://github.com/ocaml/ocaml/issues">report them here</a>.</li>
</ul>
<h2>Further Reading</h2>
<p>The <a href="https://discuss.ocaml.org/tag/multicore-monthly">Multicore monthlies</a>, produced by Shakthi Kannan and Anil Madhavapeddy, provide important context for the work behind OCaml 5 and what’s coming with the release.</p>
]]></description><link>https://tarides.com/blog/2022-06-15-ocaml-5-alpha-release</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-15-ocaml-5-alpha-release.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Wed, 15 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Adding Merkle Proofs to Tezos]]></title><description><![CDATA[<p>The Upcoming Tezos <a href="https://tezos.gitlab.io/protocols/013_jakarta.html#protocol-jakarta">Jakarta Protocol</a> will support compact Merkle
proofs to scale the network's trust infrastructure.
This allows nodes that do not trust each other to agree on the
validity of Tezos transactions with orders of magnitude smaller
storage requirements.
For instance, the block <a href="https://tzstats.com/2400319">2,400,319</a>,
containing 402 transactions and 638 operations,
can be validated using a Merkle proof of 6.3 MB instead of requiring
a Tezos node with at least 3.4 GB of storage, a savings of 99.8%!</p>
<p>Tarides contributed to Jakarta by extending the Tezos
storage system to support compact
storage proofs. This
feature extends the compact cryptographic representation of the ledger
state to sequences of operations. As a result, nodes that do not trust
each other can still agree on the series of operations' validity,
even if they don't know the entire contents of the ledger. The
upcoming stateless nodes (like <a href="https://tezos.gitlab.io/user/light.html">Tezos
light-client</a>),
<a href="https://tezos.gitlab.io/user/proxy.html">proxies</a>, and mechanisms that allow the exchange of trust between disjointed tamper-proof storage (like <a href="https://tezos.gitlab.io/alpha/transaction_rollups.html">L2
transactional-rollups</a>,
L2 smart-contract rollups,...) will use these proofs to scale the
Tezos trust infrastructure to <a href="https://research-development.nomadic-labs.com/tezos-is-scaling.html">new heights</a>.</p>
<p>The Merkle Proof API is one of the last major features that we integrated from
<a href="https://www.dailambda.jp/blog/2019-08-08-plebeia/">Plebeia</a>. It is
the result of a years-long collaboration between
<a href="https://www.dailambda.jp/">DaiLambda</a> and Tarides to improve the
storage system of Tezos.</p>
<h3>A (Very) Quick Tour of Tezos</h3>
<p>The Tezos network builds trust between its nodes by using two components:</p>
<ul>
<li><strong>(i)</strong> a tamper-proof database that can generate cryptographic hashes,
which uniquely and compactly represent the state of its contents; and</li>
<li><strong>(ii)</strong> a consensus algorithm to share these cryptographic hashes
across the network of (potentially adversarial) nodes.</li>
</ul>
<p>Both components have seen impressive improved performance
recently. First, for <strong>(i)</strong>, we've discussed
the improvements that we released in <a href="/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed/">Octez v13 to improve
the efficiency of the storage component by a factor of 6</a>. Second, for <strong>(ii)</strong>, the consensus algorithm in Ithaca 2 changed from a
Nakamoto-style algorithm (like Bitcoin) to
<a href="https://arxiv.org/abs/2001.11965">Tenderbake</a> -- a Byzantine Fault
Tolerance consensus with deterministic finality. This change
significantly improved the time it takes for converging towards a
uniquely agreeing state hash in the Tezos network.</p>
<p>The Merkle proofs that we
introduced in the Jakarta Protocol will allow us to <a href="https://research-development.nomadic-labs.com/tezos-is-scaling.html">improve the
chain's performance even
more</a>.</p>
<h3>The Tezos Ledger is a Merkle Tree</h3>
<p>Tezos represents the ledger state (for instance, the amount of tokens
owned by everyone) as a Merkle tree, using the
<a href="https://irmin.io">Irmin</a> storage library. Merkle trees are immutable
tree-like data structures where each leaf is labelled with a
cryptographic hash. Each node's hash is then obtained by
recursively hashing its children's label. Tezos then combines that
root hash computation with its consensus protocol to make sure every
node in the network agrees on the ledger's state.</p>
<p>But there is another interesting aspect of Merkle trees that was not
exposed and used by Tezos until now: <em>Merkle proofs</em>. In the protocol
<code>J</code> proposal, we are introducing a new feature: Merkle proofs for
Tezos as partial, compressed, Merkle trees. In a blockchain, as in
Tezos, Merkle proofs are an efficient way to verify the integrity of
operations over Merkle trees. For this reason, Merkle proofs are
a central part of the optimistic rollups projects that will
be available with the Tezos
<a href="https://tezos.gitlab.io/protocols/013_jakarta.html">Jakarta Protocol</a>.</p>
<p>In collaboration with DaiLambda, Marigold, Nomadic Labs, and TriliTech,
we have integrated <a href="https://github.com/camlspotter/plebeia">Plebeia</a>
Merkle proofs in Irmin. Plebeia use Patricia binary trees that are capable of
generating very compact Merkle proofs. For instance, the proof of
100 operations can be represented within 46 kB, while storing the full
Tezos context requires 3.4 GB of disk storage to store the relevant context.
This compactness comes from its specialised store structure and clever
optimisations, such as path compression and inlining. We have been
working with the DaiLambda team to unite Irmin and Plebeia's strengths
and bring built-in compact Merkle proof support to Tezos. We added
support for both the existing storage stack, where trees have a
branching factor of 32, and for new L2 storage systems that could use
binary trees directly. We have also worked with Marigold and Nomadic
Labs to propose an alternative representation of these proofs using
streams, that comes with a simplified verification algorithm.
A stream proof encodes the same information as a regular proof.
However, instead of being
encoded as a tree, the proof is encoded as a sequence of steps that
reveal a Merkle tree lazily, from root to leaves.</p>
<div role="region"><table>
<tbody><tr>
<th>Kind of Proofs</th>
<th>1 op.</th>
<th>100 ops.</th>
<th>1k ops.</th>
<th>10k ops.</th>
</tr>
<tr>
<td>binary Merkle trees</td>
<td>0.7kB</td>
<td>46kB</td>
<td>371kB</td>
<td>2.8MB</td>
</tr>
<tr>
<td>stream binary Merkle trees</td>
<td>1kB</td>
<td>75kB</td>
<td>602kB</td>
<td>4.5MB</td>
</tr>
<tr>
<td>Merkle B-trees (32 children)</td>
<td>3.1kB</td>
<td>158kB</td>
<td>1232kB</td>
<td>7.8MB</td>
</tr>
<tr>
<td>stream Merkle B-trees (32 children)</td>
<td>3.1kB</td>
<td>158kB</td>
<td>1238kB</td>
<td>7.9MB</td>
</tr>
</tbody></table></div><blockquote>
<p>The table above shows the size for such proofs, using 2.5M entries (the current number of entries in <code>/data/contracts/index</code> in the Tezos context). We are simulating 1, 100, 1000, and 10_000 random read operations on the entries, and we display the size of the related proofs.</p>
</blockquote>
<h3>An Example</h3>
<p>Let's look at a simple example of a Merkle proof produced for ensuring
tamper-proof banking account statements.</p>
<p>We can model a bank that stores its customer balances in the form of a
Merkle tree. To avoid publishing the entire contents of its customer
accounts, this bank can publicly export the bank's Merkle tree's
hash. To let 3rd-parties validate an operation, it can also produce
Merkle proofs that reveal the balance of some customers. Anyone in
possession of a Merkle proof can hash it and verify that it hashes
identically to the public hash announced by the bank. This equality of
hash is proof of correctness.</p>
<p>Our bank contains the balances for Eve (30 coins), Ben (10 coins), and
Bob (20 coins). It stores the customers in a radix tree (Eve's balance
is stored under <code>"e", "v", "e"</code>).</p>
<p>Irmin stores data as hash trees: whenever data is added to the
database, the corresponding nodes in the tree are hashed in order to
then generate the hash of the root node. The hash of a commit also
acts as a reference for accessing it in the future. This storage
format is close to Merkle proofs. It suffices to blind the part of the
tree that a transaction has not accessed to generate its proof. For
example, the account statement for Eve is in green. It doesn't leak
sensitive information about the other customers: only the letter "b"
is leaked. The hash corresponds to Bob and Ben's subtree.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Merkle-Proof-170w~xRndxMoyt5hJgMMD-fVz0w.webp 170w, /blog/images/Merkle-Proof-340w~zvgsGC0PEJ0alte_N2s5mg.webp 340w, /blog/images/Merkle-Proof-680w~s4ca06e2J3h7QIYvNne-xg.webp 680w, /blog/images/Merkle-Proof-1360w~MI9w4tZTqfa1arOKt-HI_A.webp 1360w" src="/blog/images/Merkle-Proof-1360w~MI9w4tZTqfa1arOKt-HI_A.webp" alt="A merkle proof"></p>
<blockquote>
<p>The Figure shows the difference between a Merkle tree (on the left) and a Merkle proof (on the right). In Merkle proofs, some subtrees can be blinded and represented only by their hash. Here the subtree under H2 is blinded and replaced by its hash. Thanks to this, Merkle trees and Merkle proofs will have the same root hash. Merkle proofs are hence very useful to provide Merkle tree summaries for a subset of the full data available.</p>
</blockquote>
<p>Merkle proofs are thus partial Merkle trees, with the same root hash. But proofs
can also be represented using an alternate definition: a stream of elements
that needs to be visited in order to build the tree's root hash. There is
a one-to-one correspondence between the two representations, but stream proofs
are easier to implement as they encode the order in which nodes have to be visited to verify the proof. However, stream proofs need to carry the hash of all the intermediate nodes, while tree proofs can omit those. As a consequence, tree proofs are smaller than stream proofs, as shown in the above table.</p>
<p>For instance, in the above Figure, the equivalent stream proof is the sequence:</p>
<ul>
<li>A leaf with 30 coins;</li>
<li>A node with hash <code>H3</code> with a child ("e", "30 coins");</li>
<li>A node with hash <code>H1</code> with a child ("v", <code>H3</code>);</li>
<li>A node with hash <code>H0</code> with two children ("e", <code>H1</code>) and ("b", <code>H2</code>).</li>
</ul>
<p>A verifier can just apply these elements in sequence to verify that <code>H0</code> is
a valid root hash.</p>
<h3>Merkle Proofs in Irmin</h3>
<p>If you want more details about the Merkle proof implementation, head
over to the
<a href="https://mirage.github.io/irmin/irmin/Irmin/module-type-S/Tree/Proof/index.html">documentation</a>
or at https://github.com/mirage/irmin/pull/1802 for the example above
revisited in Irmin.</p>
]]></description><link>https://tarides.com/blog/2022-06-13-adding-merkle-proofs-to-tezos</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-13-adding-merkle-proofs-to-tezos.html</guid><dc:creator><![CDATA[ Irmin Team ]]></dc:creator><pubDate>Mon, 13 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Matrix: A Virtual World]]></title><description><![CDATA[<h3>Introduction</h3>
<p>One of Tarides' projects is to create an open and secure infrastructure for <a href="/blog/2022-03-08-secure-virtual-messages-in-a-bottle-with-scop/">communication protocols</a>, initially focusing on emails and <a href="https://matrix.org/">Matrix</a>. This will allow organisations to self-host their messaging services, using either personal cloud resources or low-cost embedded devices. Individuals and organisations can use this framework to avoid having their emails and messages read and managed by third parties.</p>
<p>Every component of our system is carefully designed as independent libraries, using modern development techniques to avoid the common reported threats and flaws. For instance, the protocols' implementation is written in a type-safe language and tested with state-of-the-art, coverage-driven tests, such as fuzzing. Then it's deployed as unikernels for enhanced security, model quality, and library portability. The combination of these techniques will increase users’ trust to migrate their personal data to these new secure services.</p>
<h3>The Matrix</h3>
<p>When hearing the word <em>Matrix</em>, people invariably think about Neo and his ability to see the code behind his virtual world. In lieu of the cultural connection to the popular film series, the Matrix Communication Standard creators respond to the implicit assumption regarding their choice of name: “We are called Matrix because we provide a structure in which all communication can be matrixed together.”</p>
<p>Communication is essential to our society to both create and maintain relationships, whether personal or professional. As we progress further into this age of information, people communicate and stay connected through online and text-based communication. Gone are the days when someone would pick up a phone to call a friend or family member. Now, most people send a text message or email as the default. Thus, online communication has become the norm in our current society. Inevitably, this online communication is vulnerable to malicious actors trying to invade our privacy and hijack our correspondence. Tarides has addressed this issue and aims to host community discussions about open-source projects.</p>
<p>Matrix is an established protocol for human-to-human and human-to-machine communications, including instant messaging. OCaml Matrix is an OCaml implementation of the Matrix protocol. This provides a secure communication layer which is based on MirageOS’s unikernel technology in order to reduce the attack surface. It uses Irmin as storage for the communication content to ensure integrity, and we have integrated it into the CI system for all OCaml projects.</p>
<p>Let's take a closer look at the <code>ocaml-matrix</code> component and explore some details about the Matrix Communication Standard to see if it’s indeed communication matrixed together or if it’s comparable to Neo’s Matrix with people plugged into a virtual world.</p>
<h3>Matrix Beginnings (History)</h3>
<p>Matrix is an open standard for interoperable, decentralised, real-time communication over the Internet, created in 2014 inside <a href="https://www.amdocs.com">Amdocs</a>, a company specialised in software and services for communications. <a href="https://matrix.org">Matrix</a> provides fully decentralised and federated architecture, so they don’t store users’ information in a centralised location. This means when people join one of the Matrix virtual rooms to send messages, video chat, or share files, their exchanges are truly private, especially with Matrix’s end-to-end encryption. Matrix’s decentralised, federated architecture ensures communication integrity and availability in every room.</p>
<p>Matrix is openly <a href="https://spec.matrix.org/latest/">specified</a> and <a href="https://github.com/matrix-org">implemented</a> with the open-source reference implementation server <a href="https://github.com/matrix-org/synapse">Synapse</a> and client <a href="https://github.com/vector-im">Element</a>, previously Riot, which already have several, astute security features and allow end-to-end encryption. Starting in 2018, the French Government deployed a private federation of <a href="https://github.com/matrix-org/synapse-dinsic">Matrix home servers</a> and <a href="https://github.com/tchapgouv">Tchap</a>, an open-source client forked from Riot. The French National Cybersecurity Agency (<a href="https://www.ssi.gouv.fr/en/">ANSSI</a>) jointly works with the Interdepartmental Digital Directorate (<a href="https://www.numerique.gouv.fr/dinum/">DINUM</a>) on a cybersecurity audit of Tchap. Matrix’s interesting security features include end-to-end capable search and enables private rooms’ end-to-end encryption by default.</p>
<h3>Matrix Reloaded (Architecture)</h3>
<p>Users interact by sending and receiving events in Matrix rooms. Each Matrix user registers a homeserver that is identified by a unique ID, like “Neo:tarides.com.” The registration goes through a client application that connects to a Matrix homeserver via the client-server API. This allows users to perform actions such as sending messages, controlling rooms, or synchronising their conversation history. All communication in a Matrix room replicates across the room participants’ homeservers, so every homeserver connected to a room stores the content of the room’s history.</p>
<p>Basically, the user communicates to a home server via a client application. Once the user decides to join a room, the client sends this request to the homeserver, and it’s the homeserver’s responsibility to connect the user to the room, to store the history of the messages of that room, and to send the messages back to the user. The homeserver gets all this information by talking with the other users’ homeservers in that room. This way, if a homeserver goes down, the conversation can continue as the remaining homeservers are still exchanging messages. When a homeserver comes back online, it resynchronises the messages. It receives old ones from other homeservers and inserts its own into others’ timelines.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/servers-170w~4qfFZGvdXnYxeR2Ktcawgg.webp 170w, /blog/images/servers-340w~bQ-8eb0i0mlQ7byvG3lvHQ.webp 340w, /blog/images/servers-680w~FLWoNrVrk-cjp5jkJeCEag.webp 680w, /blog/images/servers-1360w~miNBUfIeCfYLhNn23byBLQ.webp 1360w" src="/blog/images/servers-1360w~miNBUfIeCfYLhNn23byBLQ.webp" alt="Matrix Architecture"></p>
<p><em>Matrix Architecture Image Description: Matrix users communicate via Matrix clients, which can be web client, a mobile client, desktop clients, or embedded clients built into existing apps like Slack via Matrix bridges. It could even be a piece of hardware (e.g., a drone) that is Matrix enabled. A user's client connects via an unique ID to a single homeserver, which stores the communication history and account information for that user. It also shares data with the wider Matrix federation by synchronising communication history with other homeservers. The conversations among users take place in rooms that have their contents replicated across all of the homeservers associated with the users present in a room.</em></p>
<p>The centralised communication architectures keep the data within their own systems, which induces a series of security issues. For example, usually the centralised systems offer very little transparency regarding their implementations. This means that, for the claimed purpose of security, the centralised system could either hide backdoors or have security flaws that pose serious issues to privacy. By contrast, an open-source system promotes transparent development, which provides assurance regarding the liability of the implementation by allowing ad-hoc code audits. Moreover, the decentralised architecture empowers users to host their own conversations rather than all their data being stored by the service provider. This renders less incentives for attacks targeting massive data leaks and, in combination with the confidentiality ensured by the end-to-end encryption, induces an increased level of security while promoting ownership and data sovereignty.</p>
<h3>Matrix Revolutions (in OCaml)</h3>
<p>Matrix’s <a href="https://www.matrix.org/security-disclosure-policy/">Hall of Fame</a> shows several ethical researchers’ investigative work into Matrix’s security vulnerabilities. For example, a recently discovered <a href="https://www.cvedetails.com/cve/CVE-2021-44538/">buffer overflow</a> produces a considerable information disclosure in other Matrix implementations, such as Element. At Tarides, we mitigate a consistent class of these vulnerabilities with the OCaml development environment, which provides secure-by-design guarantees for the <a href="https://github.com/mirage/ocaml-matrix">OCaml Matrix</a> project.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/architecture-170w~Yt1tqGFzVdMBWFA7M1cNaw.webp 170w, /blog/images/architecture-340w~JfR7f1P_nprPXjEYdY0hYA.webp 340w, /blog/images/architecture-680w~fi_Tyw3ClmcSGumIpMBDwg.webp 680w, /blog/images/architecture-1360w~lgOg3VrxTFWjEGjSAUXE1w.webp 1360w" src="/blog/images/architecture-1360w~lgOg3VrxTFWjEGjSAUXE1w.webp" alt="Matrix Servers"></p>
<p>OCaml Matrix Architecture: The OCaml CI Client is a bot that communicates with Matrix servers via the TLS protocol, such as the <code>ocaml-matrix</code> server. The <code>ocaml-matrix</code> server is the unikernel that ensures the communication with other Matrix servers from the federation to synchronise upon events in the Matrix rooms. For this purpose, OCaml Matrix exchanges DNS information with a unikernel that plays the role of a Primary DNS Server and connects with an Irmin storage unit to save the rooms’ states.</p>
<p>Our <code>ocaml-matrix</code> server manages its own clients, who create public rooms for events and messaging. It also handles foreign servers; their users can ask to join these public rooms. This server interacts with other servers and manages their users requests for registration and event updates in public rooms via the server-to-server communication API. Our OCaml implementation follows the Matrix specification standard. From this, we extract the parts describing the subset of Matrix components that we choose to implement for our OCaml Matrix MVP (Minimum Viable Product). However, the MVP applies its constraints while taking into account that other servers would not be aware of them by using errors/rights restrictions provided by the Matrix standard.</p>
<p>We also implemented an OCaml-CI client that communicates with the Matrix servers via the client-server API. This client implements a subset of the actions defined in the specification and is meant to be used as a bot only (and would therefore not need to drift apart from this subset). The OCaml-CI client was specifically designed to allow an easy implementation for our OCaml server, but it is totally compatible with other Matrix homeservers. We tested the integration of the OCaml-CI client with both Synapse and our <code>ocaml-matrix</code> server, and we used it for testing throughout the <code>ocaml-matrix</code> server implementation.</p>
<p>For now, we’ve only given the OCaml Matrix access to public rooms because they don’t require the end-to-end encryption protocol. Nevertheless, we define support for encrypted communication via the <em>Key</em> module, and we note that most of the encryption algorithms used by the end-to-end encryption protocol are available in MirageOS unikernels via the <a href="https://github.com/mirage/mirage-crypto"><code>mirage-crypto</code> library</a>.</p>
<p>We deployed the <code>ocaml-matrix</code> server as an end-to-end application and converted it into the unikernel format. The process of unikernel deployment enables the<code>ocaml-matrix</code> unikernel’s compatibility to run on various platforms in isolation, increasing the security level of the Matrix server. The unikernel format of the Matrix server is <a href="https://github.com/mirage/ocaml-matrix/tree/mirage/ci-server-mirage">completed for Unix</a> and in the final stages for the platforms ported by Solo5. It is noteworthy to say that throughout the stage of <code>ocaml-matrix</code> unikernel deployment, we’ve had our share of <a href="https://github.com/aantron/dream">dream</a>-ing. Going through this experience was a game changer.</p>
<h3>Matrix Resurrections (Future Work)</h3>
<p>Although we’re thrilled about the progress thus far, there is still much work to do. We plan to revive the OCaml Matrix to improve or add certain features. First, we will add user access to private rooms with end-to-end encryption and more authentication methods that follow Matrix specifications and GDPR recommendations. We will also adopt a methodology for testing and benchmarking for both the <code>ocaml-matrix</code> client and server, integrate the <code>ocaml-matrix</code> codebase into OCaml Multicore, create other <code>ocaml-matrix</code> unikernel deployments, and evaluate the security model provided in the Matrix specifications. Finally, we’ll update and complete the implementation according to the latest Matrix specifications.</p>
<h3>Conclusions</h3>
<p>Having said all of the above, we invite you to decide whether the Matrix name comes from the provided federated structure in which all communication can be matrixed together or from the idea that it's creating a virtual world that is sustained by the users plugged into it. Do you want to know the truth behind the Matrix? It’s up to you. Will you choose the blue pill or the red pill?</p>
]]></description><link>https://tarides.com/blog/2022-06-09-ocaml-matrix-a-virtual-world</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-09-ocaml-matrix-a-virtual-world.html</guid><dc:creator><![CDATA[ Irina Mariuca Asavoea ]]></dc:creator><pubDate>Thu, 09 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Sponsors 12th Annual Journées Franciliennes]]></title><description><![CDATA[<p>Tarides is proud to sponsor the 12th annual programming contest <em><a href="https://journees-franciliennes-de-programmation.org/">Journées Franciliennes de Programmation!</a></em> On the 31st of May 2022, students from three different Parisian universities met at La Sorbonne University to engage in some friendly but lively competition.</p>
<p>Bachelor students from La Sorbonne (Paris 6), Paris Cité (Paris 7), and Paris Saclay (Paris 11) participated in a day-long programme creating solutions to a variety of problems. The aim of the competition was not that participants needed to demonstrate detailed knowledge on specific areas of programming, but rather that they applied their combined knowledge of programming usefully. At the end of the day, participants were awarded points based on the problems they’d solved during the day and the winners were announced.</p>
<p>The event was organised by teachers and researchers of computer science, some of whom specialise in OCaml. It was a great opportunity for students to experiment with OCaml under the guidance and supervision of experienced programmers.</p>
<p>Tarides provided computer science books for the participants, along with some fun Tarides swag!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/classroom1-170w~wq_4rKLb0ncGBrHhmTKVRA.webp 170w, /blog/images/classroom1-340w~vuEfKuhmZW52Iz9m9Xb9zA.webp 340w, /blog/images/classroom1-680w~uDf9MCQXa8yq-ORklFm2SQ.webp 680w, /blog/images/classroom1-1360w~IgKy0e_xgehbJRHUIC_5tg.webp 1360w" src="/blog/images/classroom1-1360w~IgKy0e_xgehbJRHUIC_5tg.webp" alt="Programming Contest at La Sorbonne"></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/classroom2-170w~Mg8g83vzSmT3iOjUz0dulw.webp 170w, /blog/images/classroom2-340w~jV3Nq7BZMrkp6AkVGNSDfQ.webp 340w, /blog/images/classroom2-680w~xJf9CaDFy0LujSgYndMP6A.webp 680w, /blog/images/classroom2-1360w~gforUNEp8Ob1ca_6xryumQ.webp 1360w" src="/blog/images/classroom2-1360w~gforUNEp8Ob1ca_6xryumQ.webp" alt="Student Creating Solutions"></p>
]]></description><link>https://tarides.com/blog/2022-06-02-tarides-sponsors-12th-annual-journ-e-francilienne</link><guid isPermaLink="false">https://tarides.com/blog/2022-06-02-tarides-sponsors-12th-annual-journ-e-francilienne.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Thu, 02 Jun 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml.org Reboot: User-Centric Design & Content]]></title><description><![CDATA[<p>Tarides is pleased to announce the launch of the updated community site, <a href="https://ocaml.org/">ocaml.org</a>.</p>
<p>Over the past year and a half, we have supported and collaborated with members of the OCaml community on the creation of an updated community website. We are proud to present new features and improvements that will benefit both existing and new generations of OCaml users.</p>
<h3>Features</h3>
<p>Some of the quality-of-life improvements that users can expect from this update include:</p>
<ul>
<li><a href="https://ocaml.org/packages">Package documentation</a> site which contains the documentation of every version of every OCaml package</li>
<li><a href="https://ocaml.org/opportunities">Job board</a> to list job opportunities from the community</li>
<li><a href="https://ocaml.org/blog">Syndicated blog</a> that links to blog articles from the community and offers original blog posts</li>
<li><a href="https://ocaml.org/success-stories">Success stories</a> that explore how notable companies solve real-world challenges using OCaml</li>
<li><a href="https://ocaml.org/learn">New documentation site</a> which aggregates resources and tutorials to learn OCaml</li>
<li><a href="https://ocaml.org/play">OCaml playground</a> to try OCaml directly in the browser</li>
</ul>
<h3>The Road So Far</h3>
<p>We have worked hard to address concrete requirements from users and provide solutions for the new website. The 2020 OCaml Community Survey highlighted several areas to improve, resulting in new features and content being added.</p>
<p>The survey concluded that the original site lacked easily accessible package documentation, and that job applicants and employers had a difficult time connecting. To address this, we decided early on to include both a <a href="https://ocaml.org/opportunities">job board</a> and a fully-incorporated package documentation page. The job board now provides a place where job seekers can discover opportunities in OCaml, and employers can look for applicants. The <a href="https://ocaml.org/packages">package documentation</a> page allows users to find, explore, and compare documentation all conveniently located in one place. Additionally, the team wanted to improve site navigation. This included ensuring easy pathfinding between related topics, together with a focus on improving overall accessibility, allowing successful navigation within the site.</p>
<p>Taking the perspective of different users of the site inspired the creation of brand-new content like <a href="https://ocaml.org/success-stories">Success Stories</a> that highlight ways professionals, academics, and others use OCaml to solve hard problems, create impact, and foster collaboration.  It also inspired the new area for <a href="https://ocaml.org/learn/">tutorials and guides</a>, as well as the <a href="https://ocaml.org/play">OCaml playground</a>, both aimed at making learning OCaml easier and more engaging.</p>
<h3>Looking Ahead</h3>
<p>There is still plenty of room for improvement and new ideas, and now is a great time for the community to be more involved. We’d like everyone in the community to participate in improving the site and we have created a <a href="https://github.com/ocaml/ocaml.org/blob/main/CONTRIBUTING.md">contribution guide</a> to make the process easier. Please reach out on the <a href="https://github.com/ocaml/ocaml.org/issues">issue tracker</a> with ideas and suggestions. We are especially looking for people to help maintain and run the website, and improve the content and general user-experience to help grow our community even more!</p>
<p>To learn more about the reboot and how to get involved please read the <a href="https://discuss.ocaml.org/t/v3-ocaml-org-we-are-live/9747">original Discuss post</a>.</p>
]]></description><link>https://tarides.com/blog/2022-05-02-ocaml-org-reboot-user-centric-design-content</link><guid isPermaLink="false">https://tarides.com/blog/2022-05-02-ocaml-org-reboot-user-centric-design-content.html</guid><dc:creator><![CDATA[ Thibaut Mattio ]]></dc:creator><pubDate>Mon, 02 May 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Lightning Fast with Irmin: Tezos Storage is 6x faster with 1000 TPS surpassed]]></title><description><![CDATA[<p>Over the last year, the Tarides
storage team has been focused on scaling the storage layer of <a href="https://tezos.gitlab.io/">Octez</a>,
the most popular node implementation for the <a href="https://tezos.com/">Tezos</a> blockchain. With
the upcoming release of Octez v13, we are reaching our performance goal of
<strong>supporting one thousand transactions per second</strong> (TPS) in the
storage layer! This is a <strong>6x improvement</strong> over Octez 10. Even better, this
release also <strong>makes the storage layer orders of magnitude more stable</strong>,
with a <strong>12x improvement in the mean latency of operations</strong>. At the
same time, we <strong>reduced the memory usage by 80%</strong>.
Now Octez requires a mere 400 MB of RAM to bootstrap nodes!</p>
<p>In this post, we'll explain how we achieved these milestones thanks to
<a href="https://irmin.org">Irmin 3</a>, the new major release of the <a href="https://mirage.io">MirageOS</a>-compatible
storage layer developed and maintained by Tarides and used by Tezos.
We'll also explain what this means for the Tezos community now and
in the future.</p>
<p>As explained by a <a href="https://research-development.nomadic-labs.com/tps-evaluation.html">recent post on Nomadic Labs
blog</a>,
there are various ways to evaluate the throughput of Tezos. Our
purpose is to optimise the Tezos storage and identify and fix
bottlenecks. Thus, our benchmarking setup replays actual data (the
150k first blocks of the Hangzhou Protocol on Tezos Mainnet,
corresponding to the period Dec 2021 – Jan 2022) and explicitly
excludes the networking I/O operations and protocol computations to
focus on the context I/O operations only. Thanks to this setup we
managed to identify, fix, and verify that we removed the main
I/O bottlenecks present in Octez:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/transactions_per_second-170w~ZtnAnR8jKD0j4hjGuSnKeA.webp 170w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/transactions_per_second-340w~HRRLVlMp16C8H0MQYnLG6g.webp 340w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/transactions_per_second-680w~WACgeOxLzuuUWnuoXmGStg.webp 680w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/transactions_per_second-1360w~bvm523rxV6Y9B2d-GfI3Gg.webp 1360w" src="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/transactions_per_second-1360w~bvm523rxV6Y9B2d-GfI3Gg.webp" alt="Bar chart of mean transactions per second for various Irmin configurations"></p>
<blockquote>
<p>Comparison of the Transactions Per Second (TPS) performance between Octez 10,
11, 12 and 13 while replaying the 150k
first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="#fn-1" id="ref-1-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup>.
Octez 13 reaches 1043 TPS on average which is a <strong>6x improvement</strong> over Octez 10.</p>
</blockquote>
<h3>Merkle databases: to index or not to index</h3>
<p>A Tezos node keeps track of the blockchain state in a database called the
<em>context</em>. For each block observed by the node, the context stores a
corresponding <a href="https://en.wikipedia.org/wiki/Merkle_tree">tree</a> that witnesses the state of the chain at that
block.</p>
<p>Each leaf in the tree contains some data (e.g., the balance of a particular
wallet) which has a unique hash. Together these leaf hashes uniquely determine
the hashes of their parent nodes all the way up to the root hash of the tree.
In the other direction – moving down the tree from the root – these hashes form
<em>addresses</em> that allow each node to later be recovered from disk. In the Octez
node, the context is implemented using <a href="https://irmin.org">Irmin</a>, an open-source OCaml
library that solves exactly this problem: storing trees of data in which each
node is addressed by its hash.</p>
<p>As with any database, a crucial aspect of Irmin's implementation is its
<a href="https://en.wikipedia.org/wiki/Database_index">index</a>, the component that maps addresses to data locations
(in this case, mapping hashes to offsets within a large append-only data file).
Indexing each object in the store by hash has some important advantages: for
instance, it ensures that the database is totally
<a href="https://en.wikipedia.org/wiki/Data_deduplication">deduplicated</a> and enables fast random access to any
object in the store, regardless of position in the tree.</p>
<p>As discussed in <a href="/blog/2020-09-01-introducing-irmin-pack/">our <code>irmin-pack</code> post</a>, the context index was
optimised for very fast reads at the cost of needing to perform an expensive
maintenance operation at regular intervals. This design was very effective in
the early months of the Tezos chain, but our <a href="/blog/2021-10-04-the-new-replaying-benchmark-in-irmin/">recent work on benchmarking the
storage layer</a> revealed two problems with it:</p>
<ul>
<li>
<p><strong>content-addressing bottlenecks transaction throughput</strong>. Using hashes as
object addresses adds overhead to both reads and writes: each read requires
consulting the index, and each write requires adding a new entry to it. At
the current block rate and block size in Tezos Mainnet, these overheads are
not a limiting factor, but this will change as the protocol and shell become
faster. Our overall goal is to support a future network throughput of <strong>1000
transactions per second</strong>, and doing this required rethinking our reliance on
the index.</p>
</li>
<li>
<p><strong>maintaining a large index impacts the stability of the node</strong>. The larger
the index becomes, the longer it takes to perform regular maintenance
operations on it. For sufficiently large contexts (i.e., on archive nodes),
the store may be unable to perform this maintenance quickly enough, leading
to long pauses as the node waits for service from the storage layer.
In the context of Tezos, this can lead to users occasionally exceeding the
maximum time allowed for baking or endorsing a block, losing out on the
associated rewards.</p>
</li>
</ul>
<p>Over the last few months, the storage team at Tarides has been hard at work
addressing these issues by switching to a <em>minimal indexing</em> strategy in the
context. This feature is now ready to ship, and we are delighted to present the
results!</p>
<h3>Consistently fast transactions: surpassing the 1000 TPS threshold</h3>
<p>The latest release of Irmin ships with a <a href="https://github.com/mirage/irmin/pull/1510">new core feature</a>
that enables object addresses that are not hashes. This feature unlocks many
future optimisations for the Octez context, including things like automatic
inlining and layered storage. Crucially, it has allowed us to <a href="https://github.com/mirage/irmin/pull/1659">switch to using
direct pointers</a> between internal objects in the Octez
context, eliminating the need to index such objects entirely! This has two
immediate benefits:</p>
<ul>
<li>
<p><strong>read operations no longer need to search the index</strong>, improving the overall
speed of the storage considerably;</p>
</li>
<li>
<p><strong>the index can be shrunk by a factor of 360</strong> (from 21G to 59MB in our tests!). We now only need to index
<em>commit</em> objects in order to be able to recover the root tree for a given
block at runtime. This "minimal" indexing strategy results in indices that
fit comfortably in memory and don't need costly maintenance. As of Octez 13,
<a href="https://gitlab.com/tezos/tezos/-/merge_requests/4714">minimal indexing is now the default</a> node
behaviour<sup><a href="#fn-2" id="ref-1-fn-2" role="doc-noteref" class="fn-label">[2]</a></sup>.</p>
</li>
</ul>
<p>So what is the performance impact of this change? As detailed in our <a href="/blog/2021-10-04-the-new-replaying-benchmark-in-irmin/">recent
post on replay benchmarking</a>, we were able to isolate and
measure the consequences of this change by "replaying" a previously-recorded
trace of chain activity against the newly-improved storage layer. This process
simulates a node that is bottlenecked purely by the storage layer, allowing us
to assess its limits independently of the other components of the shell.</p>
<p>For these benchmarks, we used a replay trace containing the first 150,000
blocks of the Hangzhou Protocol deployment on Tezos Mainnet (corresponding to
the period December 2021 – January 2022)<sup><a href="#fn-3" id="ref-1-fn-3" role="doc-noteref" class="fn-label">[3]</a></sup>.</p>
<p>One of the most important metrics collected by our benchmarks is overall
throughput, measured in <em>transactions</em> processed per second (TPS). In this
context, a "transaction" is an individual state transition within a particular
block (e.g., a balance transfer or a smart contract activation). We
queried the <a href="https://tzstats.com/docs/api#tezos-api">TzStats API</a> in order to determine the number
of transactions in each block and thus, our measured transaction throughput.
As shown in the graph above, doing this for the last few releases of Octez
reveals that storage TPS has skyrocketted from ~200 in Octez 12 to more than
1000 in Octez 13! 🚀</p>
<p>As a direct consequence, the total time necessary to replay our Hangzhou trace
on the storage layer has decreased from ~1 day to ~4 hours. We're nearly 6
times faster than before!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/cpu_time-170w~y4n7PtxLYPVqRJ7ocrmtDA.webp 170w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/cpu_time-340w~J569bJMq4WVWqcdWop9Kpw.webp 340w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/cpu_time-680w~QU2_8dEP3LRDzWlHV6d84g.webp 680w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/cpu_time-1360w~Z6N8xDo1qt_ketNxindzgw.webp 1360w" src="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/cpu_time-1360w~Z6N8xDo1qt_ketNxindzgw.webp" alt="Bar chart of CPU time elapsed during replay for various Irmin configurations"></p>
<blockquote>
<p>Comparison of CPU time elapsed between Octez 10, 11, 12, and 13 while replaying the 150k
first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="#fn-1" id="ref-2-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup>. While Octez 10 took 1 day to complete
the replay, Octez 13 only takes 4 hours and is nearly <strong>6 times faster</strong> than
before!</p>
</blockquote>
<p>Overall throughput is not the only important metric, however. It's also
important that the <em>variance</em> of storage performance is kept to a minimum, to
ensure that unrelated tasks such as endorsement can be completed promptly. To
see the impact of this, we can inspect how the total block time varies
throughout the replay:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/block_time-170w~s-qeK7ZQQFDDJDgn61q1bA.webp 170w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/block_time-340w~fmZgp1FiThaRLdPDpy52mw.webp 340w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/block_time-680w~Tn9xwOgFdGdHOyXIloZxKA.webp 680w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/block_time-1360w~jYed291kb8TJ0BnuK9c5ww.webp 1360w" src="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/block_time-1360w~jYed291kb8TJ0BnuK9c5ww.webp" alt="Line graph of block time during replay for various Irmin configurations"></p>
<blockquote>
<p>Comparison of block time latencies between Octez 10, 11, 12, and 13 while replaying the 150k
first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="#fn-1" id="ref-3-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup>. Octez 13's mean block validation time is
23.2 ± 2.0 milliseconds while Octez v10 was down from 274 ± 183 milliseconds
(and a worst-case peak of 800 milliseconds!). This <strong>12x improvement in
opearation's mean latency</strong> leads to much more consistent endorsement rights for bakers.</p>
</blockquote>
<p>Another performance metric that has a big impact on node maintainers is the
<em>maximum memory usage</em> of the node, since this sets a lower bound on the
hardware that can run Octez. Tezos prides itself on being deployable to very
resource-constrained hardware (such as the Raspberry Pi), so this continues
to be a focus for us. Thanks to the reduced index size, Octez 13 greatly
reduces the memory requirements of the storage layer:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/memory_usage-170w~5VUPlOJg_Acu3_U-4xso1A.webp 170w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/memory_usage-340w~6UHMF_N_cxa2dvMofCbezw.webp 340w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/memory_usage-680w~67hNUgz1OmLwy4EvdpsF3A.webp 680w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/memory_usage-1360w~qR1K2baeG3bT2He_vlft3Q.webp 1360w" src="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/memory_usage-1360w~qR1K2baeG3bT2He_vlft3Q.webp" alt="Bar chart of maximal memory usage during replay for various Irmin configurations"></p>
<blockquote>
<p>Comparison of maximal memory usage (as reported by <code>getrusage(2)</code>) between
Octez 10, 11, 12, and 13 while replaying the 150k first blocks of the Hangzhou
Protocol on Tezos Mainnet<sup><a href="#fn-1" id="ref-4-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup>. <strong>The peak memory usage is x5 less</strong>
in the Octez 13 storage layer compared to  Octez 10**, owing to the
significantly reduced size of the index. 400 MB of RAM is now enough
to bootstrap Octez 13!</p>
</blockquote>
<p>Finally, without an index the context store can no longer guarantee to have perfect object deduplication. Our tests and benchmarks show that this choice has relatively little impact on the context size as a whole, particularly since it no longer needs to store an index entry for every object!</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/storage_size-170w~Q96M8b3cPZlMxxSu825MnQ.webp 170w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/storage_size-340w~qDGirPwU90f5zYRRJxaZgQ.webp 340w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/storage_size-680w~hNoNZv5Sx7LUyT7fsnCc4Q.webp 680w, /blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/storage_size-1360w~6jfOSleU51VMpJkIMP-QGg.webp 1360w" src="/blog/images/2022-04-08.tezos-storage-surpasses-1000-tps/storage_size-1360w~6jfOSleU51VMpJkIMP-QGg.webp" alt="Line graph of storage size during replay for various Irmin configurations"></p>
<blockquote>
<p>Comparison of storage size between Octez 10, 11, 12, and 13 while replaying the 150k
first blocks of the Hangzhou Protocol on Tezos Mainnet<sup><a href="#fn-1" id="ref-5-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup>. Octez 13's uses similar disk resources
than previous versions: the duplicated data is fully compensated by the
reduced indexed size.</p>
</blockquote>
<p>What this means for users of the Octez shell:</p>
<ul>
<li><strong>The general I/O performance of the storage layer is vastly improved</strong>, as
the storage operations are 6 times faster and a have 12 times lower mean
latency while the memory usage is divided by 5.</li>
<li>In particular, this mode <strong>eliminates the risk of losing baking rewards</strong>
due to long index merges.</li>
</ul>
<h3>Migrating your Octez node to use the newer storage</h3>
<p>Irmin 3 is included with <a href="https://tezos.gitlab.io/releases/version-13.html">Octez v13-rc1</a>, which has just been released today.
The storage format is <strong>fully
backwards-compatible</strong> with Octez 12, and no migration process is required to
upgrade.</p>
<p>Newly-written data after the shell upgrade will automatically benefit from the
new, direct internal pointers, and existing data will continue being read as
before. Performing a bootstrap (or importing a snapshot) with Octez 13 will
build a context containing only direct pointers. Node operators should upgrade
as soon as possible to benefit.</p>
<h3>The future of the Octez storage layer</h3>
<p>Irmin 3 is just the beginning of what the Tarides storage team has in store for
2022. Our next focus is on implementing the next iteration of the <em>layered
store</em>, a garbage collection strategy for rolling nodes. Once this has landed,
we will collaborate with the Tarides Multicore Applications team to help
migrate Octez to using the newly-merged Multicore OCaml.</p>
<p>If this work sounds interesting, the Irmin team at Tarides is <a href="/careers/">currently
hiring</a>!</p>
<p>Thanks for reading, and <a href="https://bsky.app/profile/tarides.com">stay tuned</a> for future updates from
the Irmin team!</p>
<section role="doc-endnotes"><ol>
<li id="fn-1">
<p>Our benchmarks compare Octez 10.2, 11.1, 12.0, and 13.0-rc1 by replaying the 150k first blocks of the Hangzhou Protocol on Tezos Mainnet (corresponding to the period Dec 2021 – Jan 2022) on <a href="https://metal.equinix.com/product/servers/c3-small/">an Intel Xeon E-2278G processor</a> constrained to use at most 8 GB RAM. Our benchmarking setup explicitly excludes the networking I/O operations and protocol computations to focus on the context I/O operations only. Octez 10.2 uses Irmin 2.7.2, while both Octez 11.1 and 12.0 use Irmin 2.9.1 (which explains why the graphs are similar). Octez v13-rc1 uses Irmin 3.2.1, which we just released this month (Apr 2022).</p>
<span><a href="#ref-1-fn-1" role="doc-backlink" class="fn-label">↩︎︎<sup>1</sup></a><a href="#ref-2-fn-1" role="doc-backlink" class="fn-label">↩︎︎<sup>2</sup></a><a href="#ref-3-fn-1" role="doc-backlink" class="fn-label">↩︎︎<sup>3</sup></a><a href="#ref-4-fn-1" role="doc-backlink" class="fn-label">↩︎︎<sup>4</sup></a><a href="#ref-5-fn-1" role="doc-backlink" class="fn-label">↩︎︎<sup>5</sup></a></span></li><li id="fn-2">
<p>The trade-off here is that without an index the context store can no longer guarantee to have perfect deduplication, but our testing and benchmarks indicate that this has relatively little impact on the size of the context as a whole (particularly after accounting for no longer needing to store an index entry for every object!).</p>
<span><a href="#ref-1-fn-2" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-3">
<p>To reproduce these benchmarks, you can download the replay trace we used <a href="https://data.tarides.com/lib_context/hangzou-level2.tgz">here</a> (14G). This trace can be replayed against a fork of <code>lib_context</code> available <a href="https://github.com/ngoguey42/tezos/tree/new-action-trace-recording">here</a>.</p>
<span><a href="#ref-1-fn-3" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
]]></description><link>https://tarides.com/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed</link><guid isPermaLink="false">https://tarides.com/blog/2022-04-26-lightning-fast-with-irmin-tezos-storage-is-6x-faster-with-1000-tps-surpassed.html</guid><dc:creator><![CDATA[ Irmin Team ]]></dc:creator><pubDate>Tue, 26 Apr 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Partners with 50inTech!]]></title><description><![CDATA[<h3>50inTech</h3>
<p>Tarides is proud to have been recognised by 50inTech and <a href="https://app.50intech.com/company/tarides">featured on their website</a>! 50inTech’s mission is to achieve a 50% representation of women in tech by 2050.</p>
<p>To this end, 50inTech runs several amazing initiatives that generate opportunities for women looking to have successful careers in tech. Their job board matches talented women with inclusive companies that are hiring, the 50inTech Gender Score helps European companies measure their level of gender-inclusion, and their free virtual bootcamps provide their network of 15,000 women in Europe with crucial networking and mentoring opportunities.</p>
<h3>Partnership</h3>
<p>Tarides has been selected as an “inclusive company,” based on metrics including work-life balance, equal pay, fair career path, and diversity and inclusion policies.  We are incredibly proud to be recognised in this way, and we will continue to invest in programs and initiatives to further increase diversity and inclusion.</p>
<p>Our partnership with 50inTech connects us with a highly-skilled, diverse set of people, and we hope that this collaboration will help us achieve our target of filling 50% of Tarides’s tech roles with women. Currently that number is 20%, and we’d like to increase it!</p>
<p>Read more about our diversity and inclusion goals, and <a href="https://app.50intech.com/company/tarides?page=jobs">see our open positions on 50inTech’s website</a>.</p>
<h3>Previous Efforts</h3>
<p>As Tarides has grown, we have made continuous efforts towards making OCaml and our place within its community more inclusive and diverse. As our CEO Gemma Gordon says, “Different opinions, experiences, backgrounds, and strategies are essential for innovation and vital to the success of Tarides.”</p>
<p>In this vein, we have supported initiatives such as: <em><a href="https://www.outreachy.org/">Outreachy</a></em>, where we sponsor three paid remote internships per quarter for people experiencing systemic bias and underrepresentation in the tech industry; <em><a href="https://adatechschool.fr/">Ada Tech School</a></em>, which is a programming school that facilitates greater access to programming positions and promotes the feminisation of tech; and <em><a href="https://shecancode.io/">SheCanCode</a></em>, whose mission is to close the tech gender gap. All of these enterprises help women enter, remain, and excel in the tech industry.</p>
<p>Tarides remains committed to inclusivity and continuously looks for ways to reach out to new groups, gain new perspectives, and diversify our workforce. Read more about our mission to support women in tech on <a href="https://app.50intech.com/company/tarides?page=diversity">50inTech’s website</a>.</p>
]]></description><link>https://tarides.com/blog/2022-04-19-tarides-partners-with-50intech</link><guid isPermaLink="false">https://tarides.com/blog/2022-04-19-tarides-partners-with-50intech.html</guid><dc:creator><![CDATA[ Héloïse Berra Lutton, Isabella Leandersson ]]></dc:creator><pubDate>Tue, 19 Apr 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[What's New in MirageOS 4!]]></title><description><![CDATA[<h2>MirageOS 4.0 Release Week</h2>
<p>Tarides is thrilled to see the great responses to <a href="https://mirage.io/blog/announcing-mirage-40">MirageOS
4.0</a> and the excitement
that’s building across the community. We’re proud to have played an
important part in its development and release, bringing great tools
and opportunities to OCaml developers. If you haven’t kept up with
what’s been going on since the release, here is a summary of several
articles posted by various OCaml users.</p>
<h3>Cross-Compilation</h3>
<p>The MirageOS 4.0 update brings with it a major change in its build
system to support <a href="https://dune.build/">the Dune build system</a>.
Tarides has been working on this feature since 2019,
iterating on various design solutions in the <code>mirage</code> tool with
<a href="https://github.com/mirage/mirage/issues/969">mirage/mirage/#</a>,
<a href="https://github.com/mirage/mirage/pull/979">mirage/mirage#979</a>,
<a href="https://github.com/mirage/mirage/pull/1020">mirage/mirage/#1020</a>,
<a href="https://github.com/mirage/mirage/pull/1024">mirage/mirage#1024</a>,
<a href="https://github.com/mirage/mirage/pull/1153">mirage/mirage#1153</a>, and
finally <a href="https://github.com/mirage/mirage/pull/1226">miarge/mirage#1226</a>.
This incremental process resulted in making several contributions to
upstream OCaml for features and tools required to support
the flexible building of MirageOS libraries: for
instance, adding support for <a href="https://dune.readthedocs.io/en/stable/variants.html">virtual library and
variants</a> in Dune
with <a href="https://github.com/ocaml/dune/pull/1900">ocaml/dune#1900</a>,
<a href="https://github.com/ocaml/dune/pull/2098">ocaml/dune#2098</a>, and
<a href="https://github.com/ocaml/dune/pull/2169">ocaml/dune#2169</a>; or the
development or a new opam plugin to manage
<a href="https://github.com/ocamllabs/opam-monorepo">mono-repositories</a>. We
are happy to see it released to all with Mirage 4.0.</p>
<p>What makes Dune a great option to build MirageOS is that it allows for
customisable cross-compilation flags to compile MirageOS to different
architectures. Using Dune also enables developers to use the Merlin
tool to access a rich set of IDE features when writing
applications. It unlocks a new development workflow based on
<code>opam-monorepo</code>, which downloads all the unikernel dependencies into a
single Dune workspace. Having a single workspace containing all of the
unikernel’s code lets developers edit code anywhere in the stack,
which makes work like debugging libraries and improving APIs a faster
and more enjoyable experience. In his <a href="https://mirage.io/blog/2022-03-30.cross-compilation">excellent article on build
contexts in MirageOS
4.0</a>, Lucas
Pluvinage goes into detail about how to use the new cross-compilation
features to build MirageOS unikernels for new architectures.</p>
<h3>Email in OCaml &amp; Mr. MIME</h3>
<p>Mr. MIME is an OCaml library that aims to give its users peace of mind
when it comes to the security of their email communications. Mr. MIME
is built on unikernels and deploys them to handle email traffic. At
Tarides, we got a grant from <a href="https://dapsi.ngi.eu/">NGI DAPSI</a> to
work on this project, and several of our engineers have been busy
working hard to make it happen.</p>
<p>Several other libraries support the Mr. MIME library and enable it to
transform an email into an OCaml value, then create an email from it
again. An amazing thing about Mr. MIME is its reliability. Using the
<a href="https://github.com/mirage/hamlet"><code>hamlet</code></a> tool, which proposes a
large corpus of emails for Mr. MIME to parse and re-encode, the team
can prove that Mr. MIME doesn’t alter anything in the message between
the parser and the encoder.</p>
<p>The team behind Mr. MIME has also created the library
<em><a href="https://github.com/mirage/colombe">Colombe</a></em> that implements the
foundations of an SMTP protocol with the ability to upgrade its flow
to TLS, giving its users an extra layer of security. A goal for the
future is to provide a full SMTP stack that’s able to send and receive
emails.</p>
<p>Mr. MIME also allows its users to manipulate emails through the use of
CLI tools, including
<a href="https://github.com/mirage/ocaml-dkim"><code>ocaml-dkim</code></a>, a tool to verify
and sign an email, and
<a href="https://github.com/mirage/spamtacus"><code>spamtacus</code></a>, a tool which
analyses the incoming email to determine if it’s spam or not. The
<a href="https://github.com/mirage/ptt">ptt repo</a> contains several more as well.</p>
<p>If you want to find out more information about Mr. MIME, including
details about its architecture, please read Romain Calascibetta’s
<a href="https://mirage.io/blog/2022-04-01-Mr-MIME">article</a>.</p>
<h2>MirageOS in Production</h2>
<p>The use of MirageOS benefits not only Tarides, but it also enables
several other companies to make their products better. Below are a
couple of examples from <a href="https://docker.com">Docker</a> and
<a href="https://robur.coop">Robur</a> on how they use MirageOS to their
advantage.</p>
<h3>VPN Kit</h3>
<p>Docker Desktop is a tool that enables its users to build and share
containerised or isolated applications in either a Mac or Windows
environment. Its main challenge is that running Docker on macOS or
Windows is difficult in terms of compatibility, as Linux primitives
are unavailable on those platforms.</p>
<p>This is where VPN Kit comes in; it uses MirageOS to bridge the gap
between Linux primitives and macOS or Windows by reading the raw
ethernet frames coming out of the Linux VM and translating them into
macOS or Windows high-level syscalls. In this way, MirageOS networking
libraries transparently handle the traffic of millions of containers
every day.</p>
<p>To find out more go read the article “How MirageOS Powers Docker
Desktop” <a href="https://mirage.io/blog/2022-04-06.vpnkit">on mirage.io</a>
or
<a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">on docker.com</a>.</p>
<h3>Robur Projects</h3>
<p>Robur uses MirageOS for several of their projects, including OpenVPN,
DNS Projects, and CalDAV. All of these projects are written in OCaml
and are deployed as MirageOS unikernels.</p>
<p>The DNS Projects include the ‘Let’s Encrypt’-Certified DNS solver, a
DNS resolver, and an authoritative DNS server. Robur’s DNS server
ensures that the internet user gets to the right IP address, whilst
its DNS resolver finds the exact server to handle the user’s
request. Only strictly necessary elements are included in order to
keep the codebase as small as possible for security and
simplicity.</p>
<p>CalDAV is the most recent unikernel released by Robur. As the name
implies, CalDAV is a protocol used to synchronise calendars.
Its minimal codebase comes with significant security benefits.</p>
<p>To find out more go read the article “MirageOS Unikernels at Robur” on
<a href="https://mirage.io/blog/2022-04-08.robur">mirage.io</a>.</p>
<hr>
<p>To learn more about MirageOS, take a look at some recent articles at
<a href="https://mirage.io">mirage.io</a>.
If you’re interested in working with Tarides or
incorporating MirageOS tools in your project, please <a href="/contact/">contact us via
our website</a>.</p>
]]></description><link>https://tarides.com/blog/2022-04-14-what-s-new-in-mirageos-4</link><guid isPermaLink="false">https://tarides.com/blog/2022-04-14-what-s-new-in-mirageos-4.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Thu, 14 Apr 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS 4 Released!]]></title><description><![CDATA[<p>Tarides is delighted to announce that <a href="https://mirage.io">MirageOS 4</a> is finally released! As core contributors to the project, we are proud to have been part of the journey to 4.0.</p>
<p>What is MirageOS?
MirageOS is a library operating system that constructs unikernels for fast and secure network applications that work across a variety of cloud computing and mobile platforms. The goal of MirageOS is to give the individual control of their own data and take back control of their privacy.</p>
<p>It achieves these goals in several ways, from securely deploying <a href="https://github.com/roburio/unipi">static website hosting</a> with <em>Let’s Encrypt</em> certificate provisioning and a secure <a href="https://github.com/mirage/ptt">SMTP stack</a>, to ensuring data privacy with decentralised communication infrastructures like <a href="https://github.com/mirage/ocaml-matrix">Matrix</a>, <a href="https://github.com/roburio/openvpn">OpenVPN Servers</a>, and <a href="https://github.com/roburio/tlstunnel">TLS tunnels</a>, as well as using <a href="https://github.com/mirage/ocaml-dns">DNS(SEC) Servers</a> for better authentication.</p>
<p>Over the years since its first release in 2013, the Mirage ecosystem has grown to include <a href="https://github.com/mirage/">hundreds of libraries</a> and service millions of daily users, along with several major commercial users that rely on MirageOS to keep their code secure. Examples of this include <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">Docker Desktop’s VPNkit</a>, the <a href="https://www.citrix.com/fr-fr/products/citrix-hypervisor/">Citrix Hypervisor</a>, as well as <a href="https://robur.io">Robur</a>, <a href="https://www.nitrokey.com/products/nethsm">Nitrokey</a>, and Tarides itself!</p>
<p>What’s in the New Release?
The new release focuses on better integration with existing ecosystems. For example, it is now much easier to integrate with existing OCaml libraries, as MirageOS 4 is now using <code>dune</code> to build unikernels.</p>
<p>There has also been a major change in how MirageOS compiles projects with the introduction of a new tool called <a href="https://github.com/ocamllabs/opam-monorepo"><code>opam-monorepo</code></a> that separates package management from building the resulting source code. The Opam plugin can create a lock file for project dependencies, download and extract dependency sources locally, and even set up a <a href="https://dune.readthedocs.io/en/stable/dune-files.html#dune-workspace-1">Dune workspace</a>, which then enables <code>dune build</code> to build everything simultaneously.</p>
<p>The new release also adds systematic support for cross-compilation to all supported unikernel targets, meaning that libraries that use C stubs can now have those stubs seamlessly cross-compiled to a desired target.</p>
<p>To find out more about the new release please read <a href="https://mirage.io/blog/announcing-mirage-40">the official release post on Mirage.io</a>.</p>
<p>Keep an eye on <a href="https://mirage.io">mirage.io</a>'s blog over the next two weeks for more posts on the exciting new things that come with MirageOS 4.0, starting with “Introduction to Build Contexts in MirageOS 4.0” tomorrow!</p>
]]></description><link>https://tarides.com/blog/2022-03-29-mirageos-4-released</link><guid isPermaLink="false">https://tarides.com/blog/2022-03-29-mirageos-4-released.html</guid><dc:creator><![CDATA[ Isabella Leandersson ]]></dc:creator><pubDate>Tue, 29 Mar 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Secure Virtual Messages in a Bottle with SCoP]]></title><description><![CDATA[<p>People love to receive mail, especially from loved ones. It’s heartwarming to read
each word as their thoughts touch our deepest feelings. Now imagine someone else
reading those private sentiments, like a postal worker. Imagine how violated they’d
feel if their postal carrier handed them an open letter with a knowing smile. Of course,
people trust that postal employees won’t read their personal correspondence;
however, they regularly risk their privacy when sending emails, images, and messages.</p>
<p>Around 300 billion emails traverse the Internet every single day. They travel
through portals with questionable security, and the messages often contain
private or sensitive data. Most online communication services are composed of
multiple components with complex interactions. If anything goes wrong, it
results in critical security incidents. This leaves an unlocked door for
malicious hackers to breach private information for profit or just for fun.
Since it takes considerable technical skills and reliable infrastructure to
operate a secure email service, most Internet users must
rely on third-parties operators. In practice, there
are only a few large companies that can handle communications with the
proper security levels. Unfortunately for regular people, these companies
profit from mining their personal data. Due to this global challenge, Tarides
focused their efforts to address these issues and find solutions to protect
both personal and professional data.</p>
<h3>An Innovative Solution</h3>
<p>Our work resulted in the project "Secure-by-Design Communications Protocols"
(SCoP), a secure, easily deployable solution to preserve users' privacy. In
essence, SCoP puts your messages in a secure, virtual ‘bottle’ to protect it
from invasive actions. This bottle represents a secure architecture using
type-safe languages and unikernels for both email and instant messaging.
We mould <a href="https://mirage.io/">unikernels</a> (specialised applications that
run on a VM) into refined meshes linked by TLS-firm communication pipes,
as depicted in the image below.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/Dapsi_4.001-170w~CBT5vubzh4xzS5nnH0wlow.webp 170w, /blog/images/Dapsi_4.001-340w~8yDedLkPgy-bBIvsf2ytIA.webp 340w, /blog/images/Dapsi_4.001-680w~07eYmjtHf1YFqKRCbcyshA.webp 680w, /blog/images/Dapsi_4.001-1360w~BxqQUsUYNOKgQ9XvR6S3DQ.webp 1360w" src="/blog/images/Dapsi_4.001-1360w~BxqQUsUYNOKgQ9XvR6S3DQ.webp" alt="TLS Communication Pipes"></p>
<p>The SCoP virtual bottle creates a trustworthy information flow where dedicated
unikernels ensure security for communication from origin to destination. We
carefully design every component of SCoP as independent libraries, using
modern development techniques to avoid the common reported threats and flaws.
The <a href="https://ocaml.org">OCaml</a>-based development enables this safe online
environment, which eliminates many exploited security pitfalls. Moreover,
our SCoP project comes with energy-efficient consumption provided by the
lightweight and low-latency design components.</p>
<p>We mostly focused on the sender’s side, securing the message inside the SCoP
bottle. For instant messages, we created a capsule with a
<a href="https://github.com/mirage/ocaml-matrix">Matrix client library</a>,
and for emails we based our bottle on the <a href="https://github.com/mirage/ptt">SMTP protocol</a>
and <a href="https://github.com/mirage/mrmime">Mr. MIME</a>. For further protection,
we developed the bottle’s ‘cork’ with the
<a href="https://github.com/mirage/hamlet">Hamlet email corpus</a>.</p>
<h3>The SCoP Processes</h3>
<p>First, we generated Hamlet, a collection of emails to test our parser
implementation against existing projects, to ensure that they kept equivalence
between the encoder and decoder. After we successfully parsed and encoded one
million emails, we used Hamlet to stress-test our SMTP stack.</p>
<p>Secondly, we created an SMTP extension mechanism and support for SPF, including
an implementation for DMARC, a security framework in addition to DKIM and SPF.
We completed four components: SPF, DKIM, SMTP, and Mr. MIME, which can generate
a correctly-signed email, signatures, and the DKIM field containing the signatures.</p>
<p>In essence, we designed the SMTP sender bottle with a mesh of unikernels connected
via secured communication pipes. The SMTP Submission Server unikernel receives
the sender’s authentication credentials against the secured database maintained
by Irmin. After it confirms the credentials, it sends the email for sealing
(via a TLS pipe) to the DKIM signer. Then the DKIM signer unikernel, responsible
for handling IP addresses, communicates via the nsupdate protocol with the Primary
DNS Server. The DKIM signer places the sender’s and receiver’s addresses on the email,
seals it with the DKIM signature, and sends it to the SMTP relay for distribution.
The SMTP relay unikernel communicates with the DNS resolver unikernel to locate the
receiver by the DNS name, then it coordinates this location with the Irmin database
to verify the authorization according to the SPF protocol. After all these checks
have passed, the signed and sealed email is secured in the SCoP bottle and launched
through Cyberspace.</p>
<p>Next, we developed the Matrix protocol’s client library, and we used it to enable
notifications from the CI system, testing all the new OCaml packages. We also
designed an initial PoC for a Matrix’s server-side daemon.</p>
<p>We made significant progress in deploying DNSSEC, a set of security extensions over
DNS. While we completed our first investigation into the DNSSEC prototype, we also
discovered several issues, so we addressed those as lessons learned.</p>
<p>Finally, we completed the <a href="https://github.com/tarides/unikernels">SCoP bottle</a> with
the email receiver, which <a href="https://github.com/mirage/spamtacus">Spamtacus</a> (the
Bayesian spam filter) guards against spam intruders. Furthermore, the
<a href="https://github.com/mirage/ocaml-matrix">OCaml-Matrix</a> server represents
our solution to take care of the instant communication in the Matrix federation.</p>
<h3>A Secure-by-Design SMTP Stack</h3>
<p>We researched state-of-the-art email spam filtering methods and identified machine
learning as the main trend. We followed this path and equipped our email architecture
with a spam-filter unikernel, which uses a Bayesian method for supervised learning
of spam and acts as a proxy for internet communication in the SMTP receiver. This
spam filter works in two states: preparation, where the unikernel detects spam,
and operation, where the unikernel integrates into the SMTP receiver unikernel
architecture to filter spam emails. Our spam-filter unikernel can also be used
independently as an individual anti-spam tool to help enforce GDPR rules and protect
the user’s privacy by preventing spam-induced attacks, such as phishing.</p>
<p>We integrated our spam filter into a unikernel positioned at the beginning
of the SMTP receiver stack. This acts as a first line of defence in an eventual
attack targeting the receiver in order to maintain functionality. The spam-filter
unikernel can be extended to act as an antivirus by analysing the email attachment
for certain features known to characterise malware. We’ve already set the premises
for the antivirus by using a prototype analysis of the email attachments. Moreover,
the spam-filter unikernel can contribute with a list of frequent spammers to the
firewall, which we plan to add into the SMTP receiver as the next step in our development of SCoP.</p>
<h3>How the Technology Works</h3>
<p>DKIM, SPF, and DMARC are three communication protocols meant to ensure email
security by verification of sender identity. The latest RFC standards for
DKIM, SPF, and DMARC are RFC8463, RFC7208, and RFC7489, respectively.</p>
<p>DKIM provides a signer protocol and the associated verifier protocol. DKIM
signer allows the sender to communicate which email it considers legitimate.
Our implementation of the DKIM verifier is associated with the SMTP receiver,
it follows the RFC8463 standard and supports the ED25519 signing algorithm,
i.e., the elliptic curve cryptography generated from the formally verified
specification in the fiat project from MIT.</p>
<p>SPF is an open standard that specifies a method to identify legitimate mail
sources, using DNS records, so the email recipients can consult a list of IP
addresses to verify that emails they receive are from an authorised domain.
Hence, SPF is functioning based on the blacklisting principle in order to
control and prevent sender fraud. Our implementation of the SPF verifier
follows the RFC7208 standard.</p>
<p>DMARC (Domain-based Message Authentication, Reporting, and Conformance) enables
a sender to indicate that their messages comply with SPF and DKIM, and applies
clear instructions for the recipient to follow if an email does not pass SPF or
DKIM authentications (reject, junk, etc.). As such, DMARC is used to create
domain reputation lists, which can help determine the actual email source
and mitigate spoofing attacks. Our implementation of the DMARC verifier is
integrated in the SMTP receiver and follows the RFC7489 standard.</p>
<p>Our secure-by-design SMTP stack contains the DKIM/SPF/DMARC verifier unikernel
on the receiver side. This unikernel verifies the email sender’s DNS
characteristics via a TLS communication pipe, and in case the DNS verification
passes, the spam-labelled email goes to the SMTP relay to be dispatched to the
email client. However, in case the DNS verification doesn’t pass, we can use
the result to construct a DNS reputation list to improve the SMTP security
via a blacklisting firewall.</p>
<h3>Matrix Server</h3>
<p>The Matrix server in our OCaml Matrix implementation manages clients who are
registered to rooms that contain events. These represent client actions, such
as sending a message. Our implementation follows the Matrix specification
standard. From here, we extracted the parts describing the subset of the Matrix
components we chose to implement for our OCaml Matrix server MVP. The OCaml
implementation environment provides secure-by-design properties and avoids
various vulnerabilities, like the buffer overflow recently discovered that
produces considerable information disclosure in other Matrix implementations,
e.g., Element.</p>
<p>The Matrix clients are user applications that connect to a Matrix server via the
client-server API. We implemented an OCaml-CI client, which communicates with the
Matrix servers via the client-server API and tested the integration of the OCaml-CI
communication with both Synapse and our OCaml Matrix server. Please note that our
OCaml Matrix server supports a client authentication mechanism based on user name
identification and password, according to the Matrix specification for authentication
mechanisms.</p>
<h3>Spam Filter</h3>
<p>We researched the state of the art in email spam filtering and we identified machine
learning as the main trend. We follow this trend and we equip our email architecture
with a spam filter unikernel, which uses a Bayesian method for supervised learning of
spam and acts as a proxy to the internet communication in the SMTP receiver. The spam
filter implementation works in two stages: preparation, when the unikernel is trained
to detect spam, and operation, when the unikernel is integrated into the SMTP receiver
architecture of unikernels to filter the spam emails. It is worth mentioning that the
spam filter unikernel can be used independently as an individual anti-spam tool to help
enforce the GDPR rules and protect the user's privacy by preventing spam induced attacks
such as phishing.</p>
<p>We integrate the spam filter into an unikernel positioned at the beginning of the
SMTP receiver stack as the first line of defence in an eventual attack targeting the
receiver. In this situation, the unikernel format provides isolation of the attack and
allows the SMTP receiver to maintain functionality. The spam filter unikernel can be
extended to act as an antivirus by analysing the email attachment for certain features
that are known to characterise malware. We have already set the premises for the antivirus
by a prototype analysis of the email attachments. Moreover, the spam filter unikernel could
contribute with a list of frequent spammers to the firewall, which is planned to be added
into the SMTP receiver, as the next step in future work.</p>
<h3>The DAPSI Initiative</h3>
<p>Much of the SCoP project was possible thanks to <a href="https://dapsi.ngi.eu">the DAPSI initiative</a>.
They gave Tarides the incentive to further explore an open and secure infrastructure for
communication protocols, especially emails. First, DAPSI supported our team by providing
necessary financing, but their contribution to our project’s prosperity runs much deeper
than funding. DAPSI facilitated multiple coaching sessions that helped broaden our horizons
and established reachable goals. Notably, their business coaching enabled us to identify
solutions for our market strategy. Their technical coaching and training offered access
to data portability experts and GDPR regulations, which opened our perspective to novel
trends and procedures. Additionally, DAPSI helped raise our visibility by organising public
communications, and DAPSI’s feedback revealed insights on how to better exploit our project’s
potential and what corners of the cyber-ecosystem to prioritise. We are deeply grateful to
DAPSI for their support and backing, and we’re thrilled to have passed Phase 2!</p>
<h3>Up Next for SCoP</h3>
<p>We’re excited to further develop this project. We’ll be experimenting with deploying
unikernels on a smaller chipset, such as IoT. We’d also like to research secure data
porting in other domains such as journalism, law, or banking.</p>
<p>Of course we’ll be maintaining each of the SCoP components in order to follow the latest
available standards and state-of-the-art technology, including periodical security
analyses of our code-base and mitigation for newly discovered vulnerabilities.</p>
<p>As in all of our work at Tarides, we strive to benefit the entire OCaml community and beyond.
Please find more information on SCoP through our blog posts:
<a href="/blog/2021-04-30-scop-selected-for-dapsi-initiative/">DAPSI Initiative</a>
and <a href="/blog/2021-10-14-scop-selected-for-dapsi-phase2/">DAPSI Phase 1</a>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/DAPSI_generic-170w~iDbXLn3WcT-7eJLTnJWyOg.webp 170w, /blog/images/DAPSI_generic-340w~ZGfMn5IbiT56z-FHBcuDVA.webp 340w, /blog/images/DAPSI_generic-680w~TyXGVMfW4ULusIzlJDNAXA.webp 680w, /blog/images/DAPSI_generic-1360w~wjT-2RalNrPjOAem6u2MDw.webp 1360w" src="/blog/images/DAPSI_generic-1360w~wjT-2RalNrPjOAem6u2MDw.webp" alt="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, cap-digital, IMT Starter, Fraunhofer IAIS."></p>
]]></description><link>https://tarides.com/blog/2022-03-08-secure-virtual-messages-in-a-bottle-with-scop</link><guid isPermaLink="false">https://tarides.com/blog/2022-03-08-secure-virtual-messages-in-a-bottle-with-scop.html</guid><dc:creator><![CDATA[ Irina Mariuca Asavoea, Christine Rose ]]></dc:creator><pubDate>Tue, 08 Mar 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[Segfault Systems Joins Tarides]]></title><description><![CDATA[<p>We are delighted to announce that Segfault Systems, a spinout from IIT-Madras,
is joining Tarides. Tarides has worked closely with Segfault Systems over the
last couple of years, most notably on the award-winning Multicore OCaml project
and the upstreaming plans for OCaml 5.0. This alliance furthers the goals of
Tarides, bringing the compiler and benchmarking expertise of the Segfault team
directly into the Tarides organisation.</p>
<p>KC Sivaramakrishnan, CEO &amp; CTO of Segfault Systems says that “Segfault Systems
was founded to secure the foundations of scalable systems programming in OCaml.
We have successfully incorporated cutting-edge research on
<a href="https://dl.acm.org/doi/10.1145/3453483.3454039">concurrent</a> and
<a href="https://dl.acm.org/doi/10.1145/3408995">parallel</a> programming into OCaml. This
addresses the long-standing need of OCaml developers to utilise the widely
available multicore processing on modern machines. Tarides is at the forefront
of OCaml developer tooling and platform support, and we are excited to join the
team to make OCaml the best tool for industrial-strength concurrent and parallel
programming.”</p>
<p>“We’re thrilled to have the Segfault Systems team join Tarides,” says Thomas
Gazagnaire, CTO of Tarides. “They have been integral to the success of the
Multicore OCaml project, which has combined cutting edge research and
engineering with consistent communication, promoting Multicore OCaml as an
upstream candidate to the core developer team, as well as
<a href="https://discuss.ocaml.org/tag/multicore-monthly">publishing monthly reports</a>
for the wider community. We look forward to working with our new partners to
make OCaml the tool of choice for developers.”</p>
<p>All of Segfault Systems’ existing responsibilities and open-source commitments
will migrate over to Tarides, where work will continue towards the three main
objectives in 2022:</p>
<ul>
<li>Releasing OCaml 5.0 with support for domains and effect handlers</li>
<li>Supporting the ecosystem to migrate the OCaml community over to OCaml 5.0</li>
<li>Improving developer productivity for OCaml 5.0 by releasing the best platform
tools</li>
</ul>
<h2>OCaml 5.0</h2>
<p>The next major release of OCaml, version 5.0, will feature primitive support for
parallel and concurrent programming through domains and effect handlers. The
goal is to ensure that the fine balance that OCaml has struck between ease of
use, correctness and performance over the past 25 years continues into the
future with these additional features.</p>
<p>Domains enable shared-memory parallel programming allowing OCaml programs to run
on multiple cores: with domains, OCaml programs will scale better by exploiting
multicore processing. Effect handlers are a mechanism for concurrent
programming: with the introduction of effect handlers, simple direct-style OCaml
code will be flexible, easy to develop, debug and maintain (no more monads for
concurrency!). These features will benefit the entire ecosystem and community,
and we expect it to attract many new users to the language.</p>
<p>As part of the Multicore OCaml project, the team developed
<a href="https://github.com/ocaml-bench/sandmark">Sandmark</a>, a suite of sequential and
parallel benchmarks together with the infrastructure necessary to carefully run
the programs and analyse the results. Sandmark has been instrumental in
assessing and tuning the scalability of parallel OCaml programs and ensuring
that OCaml 5.0 does not introduce performance regressions for existing
sequential programs compared to OCaml 4.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-03-01.Segfault-joins-Tarides/scalability-170w~uuMu_Barhe0G9E8a8Ko5tg.webp 170w, /blog/images/2022-03-01.Segfault-joins-Tarides/scalability-340w~A7mwFe3oSfqACBXwevSg-Q.webp 340w, /blog/images/2022-03-01.Segfault-joins-Tarides/scalability-680w~6oZVzJL0vFZ5Gl-do5zXuQ.webp 680w, /blog/images/2022-03-01.Segfault-joins-Tarides/scalability-1360w~g_q5_qebUseoDIfWP0VB2Q.webp 1360w" src="/blog/images/2022-03-01.Segfault-joins-Tarides/scalability-1360w~g_q5_qebUseoDIfWP0VB2Q.webp" alt="Matrix of graphs showing scalability of various multicore OCaml workloads"></p>
<p align="center"><i>Scalability of compute intensive OCaml programs</i></p>
<p>Sandmark is now run as <a href="https://sandmark.ocamllabs.io">a nightly service</a>
monitoring the performance of OCaml 5 as it is being developed. Development will
continue to make it even easier to use and more practical by fully integrating
it with <a href="https://github.com/ocurrent/current-bench">current-bench</a> (the continuous
benchmarking system based on OCurrent).
<a href="/contact/">Get in touch</a> if you want to know more.</p>
<h2>Ecosystem</h2>
<p>At Tarides we want all OCaml users to benefit from the new features that OCaml
5.0 will bring, and this means ensuring that the ecosystem is fully prepared. We
aim to develop and maintain a robust set of libraries that work with domains and
effects, together with a diverse parallel benchmarking and performance profiling
suite to use with OCaml 5 applications. The
<a href="https://discuss.ocaml.org/t/eio-0-1-effects-based-direct-style-io-for-ocaml-5/9298">first version of Eio</a>,
the effects-based direct-style IO stack for OCaml 5.0, has been released,
generating lots of interesting discussion within the community. Eio not only
makes it easier to develop, debug and maintain applications utilising
asynchronous IO, but is also able to take advantage of multiple cores when
available.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-03-01.Segfault-joins-Tarides/http_load-170w~wLQ0OzOr1AsUs8YWrXO8AQ.webp 170w, /blog/images/2022-03-01.Segfault-joins-Tarides/http_load-340w~hM2hUZapkLL7Q21V8bPDqQ.webp 340w, /blog/images/2022-03-01.Segfault-joins-Tarides/http_load-680w~mK2fv2ST3cLnPXjJyLyg8w.webp 680w, /blog/images/2022-03-01.Segfault-joins-Tarides/http_load-1360w~VmGHzQ_OWTcXItbk-GUyFg.webp 1360w" src="/blog/images/2022-03-01.Segfault-joins-Tarides/http_load-1360w~VmGHzQ_OWTcXItbk-GUyFg.webp" alt="Line chart showing the scalability of HTTP server implementations in OCaml, Rust and Go"></p>
<p align="center"><i>HTTP server performance using 24 cores</i></p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2022-03-01.Segfault-joins-Tarides/http_cores-170w~X5kEDoNjqybJEL1NS5aicg.webp 170w, /blog/images/2022-03-01.Segfault-joins-Tarides/http_cores-340w~31_pedSg8i0SoOKpZZKe0w.webp 340w, /blog/images/2022-03-01.Segfault-joins-Tarides/http_cores-680w~yRm7rk8UeiP_EFCGvSgAxw.webp 680w, /blog/images/2022-03-01.Segfault-joins-Tarides/http_cores-1360w~uNRtRpRv1xsprFDHs5S95A.webp 1360w" src="/blog/images/2022-03-01.Segfault-joins-Tarides/http_cores-1360w~uNRtRpRv1xsprFDHs5S95A.webp" alt="Line chart showing the load response of HTTP server implementations in OCaml, Rust and Go"></p>
<p align="center"><i>HTTP server scaling maintaining a constant load of 1.5 million requests per second</i></p>
<p>The early results are quite promising. An HTTP server based on Eio is able to
serve 1M+ requests/sec on 24 cores, outperforming Go's <code>nethttp</code> and closely
matching Rust's <code>hyper</code> performance. Eio is still heavily under development.
Expect even better numbers for its stable release planned later this year.</p>
<p>The next step is to iterate on the design in collaboration with the community
and our partners. <a href="/contact/">Get in touch</a> if you have
performance-sensitive applications that you'd like to port to Eio, so we can
discuss how the design can meet your needs.</p>
<h2>OCaml Platform</h2>
<p>In collaboration with community members and commercial funders, Tarides has been
developing and defining the
<a href="https://v3.ocaml.org/learn/platform">OCaml platform tool suite</a> for the last
four years. The goal of the platform is to provide OCaml developers with easy
access to high-quality, practical development tools to build any and every
project. We will continue to develop and maintain these tools, and make them
available for OCaml 5. <a href="/contact/">Reach out to us</a> if you
have specific feature requests to make your developer teams more efficient.</p>
<p>This alliance brings the headcount of Tarides up to 60+ people, all working
towards making OCaml the best language for any and every project.
<a href="/careers/">Join us</a>!</p>
]]></description><link>https://tarides.com/blog/2022-03-01-segfault-systems-joins-tarides</link><guid isPermaLink="false">https://tarides.com/blog/2022-03-01-segfault-systems-joins-tarides.html</guid><dc:creator><![CDATA[ Gemma Gordon, Thomas Gazagnaire ]]></dc:creator><pubDate>Tue, 01 Mar 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Labs Joins Tarides]]></title><description><![CDATA[<p>Today I am incredibly delighted to announce that <a href="https://anil.recoil.org/projects/ocamllabs">OCaml
Labs</a>, a spinout from the
<a href="https://www.cl.cam.ac.uk/">University of Cambridge</a>, is joining
<a href="/">Tarides</a>. After successfully collaborating on
many OCaml projects over the last four years, this alliance will
combine the expertise of both groups and enable us to bring OCaml, one
of the most advanced programming languages in the world, into
mainstream use. Combining forces will accelerate OCaml development and
its broader adoption. Furthermore, it will bring the security,
portability, and performance of OCaml to a large spectrum of
use-cases: from academic endeavours, such as formal methods and
existing threats within cyber-security, to real-world applications for
climate change, sustainable agriculture, and even space exploration!</p>
<p>All of OCaml Labs’ existing responsibilities and open-source
commitments will migrate over to Tarides, and thanks to how closely
the teams already work, business will continue without interruption to
continuity or delivery. Gemma Gordon will step up as CEO of Tarides,
and I will continue to lead the technological vision and strategy as
CTO. As Prof. Anil Madhavapeddy, founder of OCaml Labs and scientific
advisor of Tarides, points out, “The cutting edge research we
conducted at the University over the past decade has now migrated into
mainline OCaml, and so the ongoing curation and development will now
happen on a commercially supported basis. I’m excited to continue
collaborating on research with Tarides from the University of
Cambridge.” Tarides will continue the work started at OCaml Labs and
invest in the growth, health, and development of OCaml alongside its
wider use-cases.</p>
<p>I am honoured to have the incredible OCaml Labs team - the team who
carefully designed and crafted <a href="https://discuss.ocaml.org/tag/multicore-monthly">Multicore OCaml</a> - join Tarides. We
share a similar view that a common plague affects the ever-growing
software industry, namely the bad quality of software and the
omnipresence of bugs. However, this is not a fatal flaw. Tools
developed by OCaml Labs over the years do not compromise on quality,
and they allow dev teams to automatically fix at least 70% of
<a href="https://msrc-blog.microsoft.com/2019/07/18/we-need-a-safer-systems-programming-language/">security
bugs</a>
and <a href="https://googleprojectzero.blogspot.com/p/0day.html">0-days security
exploits</a>. Consequently,
OCaml is a simple yet powerful language that can respond to the many
challenges developers face today. Since Tarides’s inception, we have
envisioned a future where all OCaml applications are easily deployable
as specialised, secure, and energy-efficient
<a href="https://mirage.io">MirageOS</a> unikernels. This alliance is a step
further in that direction. Since OCaml is the language used to develop
MirageOS, Tarides has continuously developed and maintained parts of
the OCaml ecosystem since its creation. Our alliance with OCaml Labs
makes this more evident. The MirageOS ecosystem critically depends on
OCaml, and the OCaml ecosystem benefits from innovations coming from
the MirageOS project. Tarides is therefore fully committed to making
the synergy between OCaml and MirageOS a success.</p>
<p>Several exciting projects related to Multicore OCaml are coming to a
head this year. The OCaml 5.0 release will support multicore and
effects handlers, influencing every aspect of the language and its
ecosystem. The update will significantly improve both performance and
user experience whilst maintaining existing features that make OCaml
the language of choice for building, for instance, verification
software tools. Using the teams’ combined experience and zest for
innovation, Tarides is looking to the future of the OCaml language and
community with excitement. We will continue to push the boundaries of
exploration whilst focusing on what's good for the
community. <strong>Therefore, this alliance will complement the commercial
offering of Tarides and contribute to Tarides' mission: empowering
developers, communities and organisations to adopt OCaml as their
primary programming experience by providing training, expertise, and
development services around the OCaml language.</strong></p>
<p>“We are thrilled to be part of an organisation innovating in many
areas around operating systems, distributed systems, and security with
the <a href="https://irmin.org">Irmin</a> distributed store and the MirageOS
unikernel projects,” says Gemma Gordon, CEO of Tarides. “I am
incredibly proud of the people OCaml Labs has collaborated with. We
have been able to build a sustainable open-source community, with
people from various backgrounds all collaborating together. It used to
be that people would have to volunteer their time on OCaml, or work in
academic research. We have created an additional funded path, one that
has increased the diversity and innovation of our community. I’m
excited to continue to be part of a group that brings the best minds
together to solve the many problems the software industry faces
today. I look forward to building a flourishing and sustainable
commercial business with existing Tarides partners as well as
developing new collaborative opportunities.”</p>
<p>This alliance brings the headcount of Tarides up to 60+ people, all
working towards making OCaml the best language for any and every
project. Join our team: /company</p>
<h4>OCaml Labs</h4>
<p><em>OCaml Labs has been at the forefront of innovation in OCaml for
nearly a decade. It was founded at the University of Cambridge by
Prof. Anil Madhavapeddy in 2012 and developed into a spin-out
consultancy company in 2016. OCaml Labs' mission was to push OCaml and
functional programming forward as a platform, making it a more
effective tool for all users (including large-scale industrial
deployments) while at the same time growing the appeal of the language
to broaden its applicability and popularity.</em></p>
<p><em>OCaml Labs has been instrumental in developing and maintaining the
OCaml platform for OCaml usage at an industrial scale. OCaml Labs
contributed to the development and maintenance of the
<a href="https://opam.ocaml.org/">opam</a> package management
ecosystem and of the OCaml community website, https://ocaml.org/,
first launched in 2012. These sites act as hubs for the OCaml community
to showcase the state-of-the-art and facilitate innovation. A new and
improved version of the site has been <a href="https://v3.ocaml.org/">released under
beta</a> this month. In addition, OCaml Labs' most
significant (and technically complex) project, OCaml Multicore, will
finally come to fruition this year. Work on this project began in
2014, followed by award-winning papers and presentations in 2020 and
the announcement in late 2021 that Multicore will become part of the
mainline OCaml compiler.</em></p>
<h4>Tarides</h4>
<p><em>Tarides is a tech start-up founded in Paris in 2018 by pioneers of
programming languages and cloud computing. They develop a software
infrastructure platform to deploy secure, distributed applications
with strict resource constraints and low-latency performance
requirements. This platform builds upon innovative and open-source
projects such as MirageOS and Irmin and underpins mission-critical
deployments such as
<a href="/blog/2021-03-04-florence-and-beyond-the-future-of-tezos-storage/">Tezos</a>,
Citrix XenServer, or <a href="https://www.docker.com/blog/how-docker-desktop-networking-works-under-the-hood/">Docker for
Desktop</a>. In
addition, Tarides uses unikernel technologies and applies the research
done in programming languages to real-world systems to build safe and
performant applications.</em></p>
<p><em>Tarides has been part of the Founder program of Station F in 2018. In
addition, it got selected in France for the <a href="/blog/2019-07-05-i-lab-2019/">Concours d’Innovation
i-Lab</a>, organised by
the French Ministry of Higher Education, Research and Innovation in
partnership with Bpifrance. This national contest awards company
creation and innovative technologies. Tarides got awarded during the
<a href="/blog/2019-12-11-tarides-wins-the-fic-2020-startup-award/">FIC
2020</a>,
the leading European cybersecurity event.</em></p>
]]></description><link>https://tarides.com/blog/2022-01-27-ocaml-labs-joins-tarides</link><guid isPermaLink="false">https://tarides.com/blog/2022-01-27-ocaml-labs-joins-tarides.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Thu, 27 Jan 2022 00:00:00 GMT</pubDate></item><item><title><![CDATA['Signals and Threads' Podcast: What is an Operating System?]]></title><description><![CDATA[<p>November has become MirageOS month! Between the upcoming official MirageOS 4.0 release, making custom Christmas Tree garlands
with <a href="/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4/">MirageOS on a Raspberry Pi</a>,
and now <a href="https://signalsandthreads.com/what-is-an-operating-system">this "What is an Operating System?" podcast</a> (featuring Tarides advisor and core MirageOS maintainer Anil Madhavapeddy), it truly is MirageOS month!</p>
<p>MirageOS can do much more than program a Raspberry Pi for Christmas decor. From
<a href="/blog/2021-11-18-tarides-hyper-partners-in-agricultural-innovation/">agricultural monitoring</a> to
<a href="/blog/2021-06-29-tarides-introduces-osmose-at-the-open-source-innovation-sprint/">smart buildings</a>, its
applications cover a wide range of needs. For example, it can also be used in critical pieces of infrastructure where security is of paramount importance, such as the <a href="/blog/2020-04-20-the-future-of-tezos-on-mirageos/">Tezos blockchain</a>.
Combined with <a href="https://irmin.org">Irmin</a>, it allows developers to build secure-by-design, offline-first systems and invert
the current cloud-centric model for designing applications to securely connect physical spaces with extremely low latency
and high bandwidth, using local-area computation capabilities.</p>
<p>You can read the entire <a href="https://signalsandthreads.com/what-is-an-operating-system/">transcript here</a> and find links to multiple places to listen through podcast apps.</p>
]]></description><link>https://tarides.com/blog/2021-11-23-signals-and-threads-podcast-what-is-an-operating-system</link><guid isPermaLink="false">https://tarides.com/blog/2021-11-23-signals-and-threads-podcast-what-is-an-operating-system.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 23 Nov 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides & Hyper: Partners in Agricultural Innovation]]></title><description><![CDATA[<p>We are thrilled to announce a partnership between Tarides and <a href="https://hyper.ag">Hyper</a>, a technology provider in the agritech space who’s building
an "operating system for high-performing farms." Indoor and vertical farms are becoming tech businesses that require scalable,
flexible, and easy-to-use tools to facilitate data analysis and thereby increase productivity. According to the <a href="https://www.eitfood.eu/blog/post/is-vertical-farming-really-sustainable">State of Indoor
Farming 2020 Report</a>, “40% of vertical and indoor farms are
implementing data analytics and control automation to increase yield and lower cost of production.”</p>
<p>Hyper’s product offers a developer-friendly platform for modern farms to integrate analytics and automation without
worrying about hardware, scaling, or maintenance. There is a natural synergy between Hyper's product and Tarides’s mission
to bring robust and scalable functional systems to the industry. Both teams will be working on technology to help solve
some big problems the world faces.</p>
<p>First, some context about how agriculture affects our environment:
<strong>The world’s population is growing, but the size of our planet is not.</strong></p>
<p>Agriculture is currently responsible for:</p>
<ul>
<li>75% of the world’s deforestation<sup><a href="#fn-1" id="ref-1-fn-1" role="doc-noteref" class="fn-label">[1]</a></sup></li>
<li>50% of the world’s habitable land</li>
<li>70% of the world's freshwater withdrawals</li>
<li>26% of the greenhouse gas emissions<sup><a href="#fn-2" id="ref-1-fn-2" role="doc-noteref" class="fn-label">[2]</a></sup></li>
</ul>
<p>It’s an important time for indoor farming, as it’s already producing twenty times (20x) more yield per area while using 95% less water and
zero (0) pesticides. Plus, indoor farming will reduce the carbon footprint by cutting their food miles in half (50%)<sup><a href="#fn-3" id="ref-1-fn-3" role="doc-noteref" class="fn-label">[3]</a></sup>. Vertical farming
complements traditional farming by growing fresh produce for cities in a sustainable way.</p>
<p>Hyper's mission is to simplify the integration of sensors and controllers for data collection and automation. Their roadmap is
focused on implementing real-time computation of metrics for environmental data gathered from farms, prototyping computer vision models
for crop analysis, and automated crop traceability infrastructure. With their data platform, growers can optimise yield and reduce operating
costs by getting access to crop growth metrics and climate automation profiles without a dedicated engineering team.</p>
<p>Hyper's founders are experienced engineers and OCaml hackers who have worked on IoT and data analytics products in the past in the retail,
biotech, and multimedia industries. Utilizing our MirageOS ecosystem, Hyper's sensor networks and cameras are continuously collecting
millions of data points across large-scale farming operations to ensure consistent crop quality, detect issues early, and automate climate
control. Since the technology is offline-first, it enables Hyper to collect this data securely, without the risk of breach or loss, as is
sometimes the case in cloud computing.</p>
<p>As a technological partner, Tarides will help Hyper build a scalable IoT platform that leverages the OCaml ecosystem. Our support will
help Hyper bring the product to the market faster while also contributing to the open-source IoT ecosystem for MirageOS and related
projects. Hyper’s IoT platform will be fully open-source next year and will focus on implementing better support for MQTT, CoAP, and other protocols.</p>
<p>A new version of MirageOS will be released in November, so this exciting announcement couldn't come at a better time! Hyper’s use of
the <a href="https://mirage.io">MirageOS</a> ecosystem to build its data intelligence product is yet another real-world example that displays the
power and efficiency of MirageOS.</p>
<p>It’s truly exciting technology because we’re at a point in history where even the most adamant climate change critics have begun to admit
the climate is indeed changing. There are things we can each do to contribute, so Tarides is proud to collaborate with a company that’s
actively seeking solutions to help reduce the amount of greenhouse gas emissions while increasing productivity.</p>
<p>Hyper currently has production deployments in vertical and indoor farms in the UK and East Africa and is planning on scaling the
operations in the coming months.</p>
<h4>SOURCES</h4>
<section role="doc-endnotes"><ol>
<li id="fn-1">
<p>https://ourworldindata.org/drivers-of-deforestation</p>
<span><a href="#ref-1-fn-1" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-2">
<p>https://ourworldindata.org/food-ghg-emissions</p>
<span><a href="#ref-1-fn-2" role="doc-backlink" class="fn-label">↩︎︎</a></span></li><li id="fn-3">
<p>https://ourworldindata.org/environmental-impacts-of-food</p>
<span><a href="#ref-1-fn-3" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
]]></description><link>https://tarides.com/blog/2021-11-18-tarides-hyper-partners-in-agricultural-innovation</link><guid isPermaLink="false">https://tarides.com/blog/2021-11-18-tarides-hyper-partners-in-agricultural-innovation.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 18 Nov 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS Workshop: Working with the Raspberry Pi 4]]></title><description><![CDATA[<p>Earlier this week, Romain Calascibetta hosted an in-house MirageOS workshop for employees, both locally and remotely
around the world. This interactive workshop taught participants how to build an operating system on a Raspberry
Pi 4 using MirageOS. They got to create their own OS and play with projects, like one they dubbed <em>GuirlandeOS</em> for
which they programmed an LED garland to trim their Christmas tree, creating their own customized light show! There will
be a dedicated blog to <em>GuirlandeOS</em> soon.</p>
<p>That’s one fun example of what can be done with MirageOS and Raspberry Pi, but the possibilities are endless.
For example, one could use this dynamic pair to create solar-powered websites (something we’ll hear more about next week).</p>
<p>The spirit of <a href="https://mirage.io">MirageOS</a> is that anyone can integrate it, even if they don't work at Tarides. Although the
workshop was only for employees, MirageOS is for everyone! Since it's autonomous from Tarides, we encourage you
to play with MirageOS and see what you can create.</p>
<p>Romain has opened the contents of his informative workshop to the world! Follow along with
<a href="https://drive.google.com/file/d/1NeYA5pjN-4xjFWCpyYxkVSsn4ii9Nktp/view?usp=drivesdk">his slides</a>,
which will walk you through the MirageOS toolchain to create your very own projects. You can read more about the
Raspberry Pi process <a href="https://github.com/mirage/mirage/pull/1253">in this repo</a>.</p>
<p>Early next week, we’ll continue with MirageOS month by showcasing a podcast where Anil Madhavapeddy
talks about those solar-powered websites and more!</p>
<p>Happy hacking!</p>
]]></description><link>https://tarides.com/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4</link><guid isPermaLink="false">https://tarides.com/blog/2021-11-11-mirageos-workshop-working-with-the-raspberry-pi-4.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 11 Nov 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS 4.0 Preview Live Presentation]]></title><description><![CDATA[<p>The official release of MirageOS 4.0 quickly approaches! Learn about
some general MirageOS concepts and get a sneak park at the forthcoming
changes in MirageOS 4.0 during a LIVE presentation today at 15h CET.</p>
<p>Lucas Pluvinage will lead you through a live-streaming presentation to
acquaint you with MirageOS 4.0. You’ll learn what kinds of problems
MirageOS can solve and about Functoria, the compilation model. Then
Lucas will also discuss the switch to the <em>Dune</em> build system and how
that enables cross-compilation, not to mention the creation of new
compilation targets, such as the Raspberry Pi!</p>
<p>Watch the live presentation today at 15h CET. Just go to:
https://meet.google.com/iqy-urht-rcn, or dial <em>(FR) +33 1 87 40 43 45</em>
with the PIN: <em>288 878 885</em>#. You can also find other phone numbers
here:&nbsp;https://tel.meet/iqy-urht-rcn?pin=3736259978366 if you're not in
France.</p>
<p>This presentation will be a great introduction to Romain
Calascibetta's forthcoming MirageOS workshop tomorrow.  Check back on
this blog and <a href="https://bsky.app/profile/tarides.com">follow us on Bluesky</a> to
find out more about how to watch and participate in this informative
workshop tomorrow!</p>
<p>Until then, check out Lucas' presentation today at 15h CET.</p>
]]></description><link>https://tarides.com/blog/2021-11-09-mirageos-4-0-preview-live-presentation</link><guid isPermaLink="false">https://tarides.com/blog/2021-11-09-mirageos-4-0-preview-live-presentation.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 09 Nov 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[SCoP Passed Phase 1 of the DAPSI Initiative!]]></title><description><![CDATA[<p><strong>In April, <a href="/blog/2021-04-30-scop-selected-for-dapsi-initiative/">we announced</a> that the <a href="https://dapsi.ngi.eu">DAPSI initiative</a> accepted
the proposal for our Secure-by-Design Communication Protocols (SCoP) project. Today, we are thrilled to announce that SCoP has passed the initiative’s Phase 1,
and we are now on our way to Phase 2!</strong></p>
<p>SCoP is an open, secure, and resource-efficient infrastructure to engineer a modern basis for open messaging (for existing and emerging protocols)
using type-safe languages and unikernels—to ensure your private information remains secure. After all, you wouldn’t like your postal carrier reading
your snail mail, so why should emails be any different?</p>
<h2>Challenges</h2>
<p>To operate an email service requires many technical skills and reliable infrastructure. As a result, only a few large companies can handle emails with
the proper security levels.  Unfortunately, the core business model of these companies is to mine your personal data.</p>
<p>The number of emails exchanged every day is expected to reach 333 billion in 2022. That’s a considerable amount of data, much of it private or sensitive,
sent across Cyberspace through portals with questionable security. The ‘memory unsafe’ languages used in most communication services leave far too much room
for mistakes that have serious ramifications, like security flaws that turn into security breaches, leaving your personal or business information vulnerable to
malicious hackers.</p>
<p>Due to this global challenge, we set out to build a simple, secure, easily deployable solution to preserve users' privacy, and we’re making great strides toward
accomplishing that goal. We base our systems on scientific foundations to last for decades and drive positive change for the world. Our robust understanding of
both theory and practice enables us to solve these security problems, so we explore ideas where research and engineering meet at the intersection of the domains
of operating systems, distributed systems, and programming languages.</p>
<p>Every component of SCoP is carefully designed as independent libraries, using modern development techniques to avoid the common reported threats and flaws.
For instance, the implementation of protocol parsers and serializers are written in a type-safe language and tested using fuzzing. Combining these techniques
will increase users' trust to migrate their personal data to these new, more secure services.</p>
<h2>Architecture</h2>
<p>The architecture of the SCoP communication service is composed of an Email Service based on a secure extension of the SMTP protocol, and a decentralised
real-time communication system based on Matrix.</p>
<p>The <a href="https://github.com/dinosaure/ptt">SMTP</a> and <a href="https://github.com/clecat/ocaml-matrix">Matrix</a> protocols implemented in SCoP follow the separation of
concerns design principle, meaning that the SMTP Sender and SMTP Receiver are designed as two distinct units. They’re implemented as isolated micro-services
which run as unikernels. The SMTP Sender, Receiver, and Matrix are all configurable, and each configuration comes with a security risk analysis report to
understand possible privacy risks</p>
<h2>Progress</h2>
<p>Not only are we on our way to Phase 2 in the <a href="https://dapsi.ngi.eu">DAPSI Initiative</a>, but we’re also proud to report that we’re on track with our
planned milestones!</p>
<p>Our <strong>first milestone</strong> was to generate a corpus of emails to test our parser implementation against existing projects in order to detect differences
between the descriptions specified in the RFCs. We now have 1 million emails that have been parsed/encoded without any issues! Our email corpus keeps
isomorphism between the encoder and decoder, and you can find it in this <a href="https://github.com/mirage/hamlet">GitHub Repo</a>, as we encourage implementors of other languages to use it to improve
their trust in their own implementation.</p>
<p>We set out to implement an SMTP extension mechanism and support for SPF as well as implement DMARC, a security framework, on top of DKIM and SPF for our
<strong>second milestone</strong>, and we are right on target. To date, we’ve completed four components:</p>
<ul>
<li><a href="https://github.com/dinosaure/ocaml-spf">SPF</a></li>
<li><a href="https://github.com/dinosaure/ocaml-dkim">DKIM</a></li>
<li><a href="https://github.com/dinosaure/ptt">SMTP</a> can send and verify emails</li>
<li><a href="/blog/2019-09-25-mr-mime-parse-and-generate-emails/">MrMIME</a> can generate the email, then SMTP sends the email (signed by a DKIM private key). We can correctly sign an email, generate a signature, and the DKIM field containing the signature. When the email is received, we check the DKIM signature and the SPF metadata.</li>
</ul>
<p>For our <strong>third milestone</strong>, we set out to implement DNSSEC, a set of security extensions over DNS. This security layer verifies the identity of an email sender
through DKIM/SPF/DMARC, but it also needs security extensions in the DNS protocol. We completed our initial investigation of a DNSSEC implementation prototype,
and we discovered several issues, like some of the elliptic curve cryptography was missing. Those necessary cryptographic primitives are now available, so we
should complete this milestone by the end of the month.</p>
<p>Finally, our <strong>fourth milestone</strong> was to implement the Matrix protocol (client and server). We completed the protocol’s client library, which sends a notification
from OCaml CI. Plus, we have a PoC, and Matrix’s server-side, which received the notification, is also complete.</p>
<p>Although we still have much work ahead of us, we’re quite pleased with the progress thus far, and so is the DAPSI Initiative! Follow our progress by <a href="/feed.xml">subscribing
to this blog</a> and our <a href="https://bsky.app/profile/tarides.com">Bluesky feed (@tarides.com)</a> for the latest updates.</p>
<br>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/DAPSI_generic-170w~iDbXLn3WcT-7eJLTnJWyOg.webp 170w, /blog/images/DAPSI_generic-340w~ZGfMn5IbiT56z-FHBcuDVA.webp 340w, /blog/images/DAPSI_generic-680w~TyXGVMfW4ULusIzlJDNAXA.webp 680w, /blog/images/DAPSI_generic-1360w~wjT-2RalNrPjOAem6u2MDw.webp 1360w" src="/blog/images/DAPSI_generic-1360w~wjT-2RalNrPjOAem6u2MDw.webp" alt="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, cap-digital, IMT Starter, Fraunhofer IAIS."></p>
]]></description><link>https://tarides.com/blog/2021-10-14-scop-selected-for-dapsi-phase2</link><guid isPermaLink="false">https://tarides.com/blog/2021-10-14-scop-selected-for-dapsi-phase2.html</guid><dc:creator><![CDATA[ Romain Calascibetta, Christine Rose ]]></dc:creator><pubDate>Thu, 14 Oct 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[The New Replaying Benchmark in Irmin]]></title><description><![CDATA[<p>As mentioned in our <a href="https://forum.tezosagora.org/t/tezos-storage-irmin-summer-2021-update/3744">Tezos Storage / Irmin Summer 2021 Update</a> on the Tezos Agora forum, the Irmin team's goal has been to improve Irmin's performance in order to speed up the <em>Baking Account</em> migration process in Octez, and we managed to make it 10x faster in the first quarter of 2021. Since then, we've been working on a new benchmark program for Irmin that's based on the interactions between Irmin and Octez. This won't just help make Irmin even faster, it will also help speed up the Tezos blockchain process and enable us to monitor Irmin's behavior in Octez.</p>
<p>Octez is the <a href="https://gitlab.com/tezos/tezos">Tezos node implementation</a> that uses Irmin to store the blockchain state, so Irmin is a core component of Octez that's responsible for virtually all the filesystem operations. Whether a node is launched to produce new blocks (aka “bake”) or just to participate in peer-to-peer sharing of existing blocks, it must first update itself by rebuilding blocks individually until it reaches the head of the blockchain. This first phase is called <em>bootstrapping</em>, and once it reaches the blockchain head, we say it has been <em>bootstrapped</em>. Currently, the <em>bootstrapped</em> phase processes 2 block per minute, which is the rate at which the Tezos blockchain progresses. The next goal is to increase that rate to 12 blocks per minute.</p>
<p>Irmin stores the content of the Tezos blockchain on a disk using the <code>irmin-pack</code> library. There is one-to-one correspondence between the Tezos block and the Irmin commits. Each time Tezos produces a block, Irmin produces a commit, and then the Tezos block hash is computed using the Irmin commit hash. The Irmin developers are working on improving the <code>irmin-pack</code> performance which in turn will improve the performance of Octez.</p>
<p>A benchmark program is considered “fair” when it's representative of how the benchmarked code is used in the real world—for example, the access-patterns to Irmin. A standard database benchmark would first insert random data and then remove it. Such a synthetic benchmark would fail to reproduce the bottlenecks that occur when the insertions and removal are interleaved. Our solution to “fairness” is radical: <em>replaying</em>. Within a sandboxed environment, we <em>replay</em> a real world situation.</p>
<p>Basically, our new benchmark program makes use of a benchmarked code and records statistics for later analysis. The program is stored in the <code>irmin-bench</code> library and makes use of operation traces (called <em>action traces</em>) when Octez runs with Irmin. Later, the program replays the recorded operations one at a time while simultaneously recording tonnes of statistics (called stat traces). Data analysis of the stat traces may reveal many interesting facts about the behaviour of Irmin, especially if we tweak:</p>
<ul>
<li>the configuration of Irmin (e.g., what’s the impact of doubling the size of a certain cache?)</li>
<li>the replay parameters (e.g., does Irmin's performance decay over time? Does <code>irmin-pack</code> perform as well after 24 hours of replay as after 1 minute of replay?)</li>
<li>the hardware (e.g., does <code>irmin-pack</code> perform well on a Raspberry Pi?)</li>
<li>the code of Irmin (e.g., does this PR have an impact on performance?)</li>
</ul>
<p>This benchmarking process is similar to the record-replay feature available with <a href="https://docs.tezedge.com/tezedge/record-replay">TezEdge</a>.</p>
<h3>Recording the Action Trace</h3>
<p>By adding logs to Tezos, we can record the Tezos-Irmin interactions and thus capture the Irmin “view” of Tezos. We’ve recorded <em>action traces</em> during the <em>bootstrapping</em> phase of Tezos nodes, which started from <em>Genesis</em>—the name of the very first Tezos block inserted into an empty Irmin store.</p>
<p>The interaction surface between Irmin and Octez is quite simple, so we were able to reduce it to eight (8) elementary operations:</p>
<ul>
<li><code>checkout</code>, to pull an Irmin tree from disk;</li>
<li><code>find</code>, <code>mem</code> and <code>mem_tree</code>, read only operations on an Irmin tree;</li>
<li><code>add</code>, <code>remove</code> and <code>copy</code>, write only operations on an Irmin tree;</li>
<li><code>commit</code>, to push an Irmin tree to disk.</li>
</ul>
<p>It’s important to remember that Irmin behaves much like Git. It has built-in snapshotting and is compatible with Git itself when using the <code>irmin-git</code> library. In fact, these operations are very similar to Git, too.</p>
<h3>Sequence of Operations</h3>
<p>To illustrate further, here's a concrete example of an operation sequence inside an action trace:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2021-10-04.irmin-replay/ygWh3cg-170w~MsgLR_or00or2d16bLDqFA.webp 170w, /blog/images/2021-10-04.irmin-replay/ygWh3cg-340w~kZLkRsNWXeVu0m038jDClQ.webp 340w, /blog/images/2021-10-04.irmin-replay/ygWh3cg-680w~GfmE0tvrzPDwU80ZFf6dew.webp 680w, /blog/images/2021-10-04.irmin-replay/ygWh3cg-1360w~i_rCjfCJn4U5_VBsWemL5Q.webp 1360w" src="/blog/images/2021-10-04.irmin-replay/ygWh3cg-1360w~i_rCjfCJn4U5_VBsWemL5Q.webp" alt="An example of an execution trace"></p>
<p>This shows Octez’s first interaction with Irmin at the very beginning of the blockchain! The first block, <em>Genesis</em>, is quite small (it ends at operation #5), but the second one is massive (it ends at operation #309273). It contains no transactions because it only sets up the entire structure of the tree. It precedes the beginning of Tezos' initial protocol called “Alpha I”.</p>
<h3>Benchmark Benefits</h3>
<p>Our benchmark results convey the sheer magnitude of the Tezos blockchain and the role that Irmin plays within it. We’ve recorded a trace that covers the blocks from the beginning the blockchain in June 2018 all the way up to May 2021. It weighs <strong>96GB</strong>.</p>
<p>Although it took <strong>34 months</strong> for Tezos to reach that state, bootstrapping so far takes only <strong>170 hours</strong>, and replaying it takes a mere <strong>37 hours</strong> on a section of the blockchain that contains <strong>1,343,486 blocks</strong>. On average, this corresponds to <strong>1 per minute</strong> when the blocks were created, <strong>132 per minute</strong> when bootstrapping, and <strong>611 per minute</strong> during replay.</p>
<p>On this particular section of the blockchain, Octez had <strong>1,089,853,521 interactions</strong> with Irmin. On average, this corresponds to <strong>12 per second</strong> when the blocks were created, <strong>1782 per second</strong> during bootstrapping, and <strong>8258 per second</strong> during replay.</p>
<p>The chart below demonstrates how many of each Irmin operation occur per block (on average):</p>
<center>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2021-10-04.irmin-replay/4yKd8iQ-170w~Pp6X6ZwlONcoWRNjIHpJGQ.webp 170w, /blog/images/2021-10-04.irmin-replay/4yKd8iQ-340w~rukcm_nlHzc1f6j4kFYpzw.webp 340w, /blog/images/2021-10-04.irmin-replay/4yKd8iQ-680w~ZHHLrDb2kXaIUU_XGrVwBQ.webp 680w, /blog/images/2021-10-04.irmin-replay/4yKd8iQ-1360w~M7CSbufWlafzI_HcMpw0Gg.webp 1360w" src="/blog/images/2021-10-04.irmin-replay/4yKd8iQ-1360w~M7CSbufWlafzI_HcMpw0Gg.webp" alt="A chart showing the number of operations per blocks"></p>
</center>
<p>This next chart displays where the time is spent during replay:</p>
<center>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2021-10-04.irmin-replay/u5Fv2Zb-170w~wubsnL7lVzgCZiwUAojPXA.webp 170w, /blog/images/2021-10-04.irmin-replay/u5Fv2Zb-340w~ida6crx7BD34lhdizhXX2g.webp 340w, /blog/images/2021-10-04.irmin-replay/u5Fv2Zb-680w~9X2AkBQYKyHjHBeTBWtNSg.webp 680w, /blog/images/2021-10-04.irmin-replay/u5Fv2Zb-1360w~Ak4wO0CdG_FL5wwerENo3A.webp 1360w" src="/blog/images/2021-10-04.irmin-replay/u5Fv2Zb-1360w~Ak4wO0CdG_FL5wwerENo3A.webp" alt="A chart showing the time spent during replay"></p>
</center>
<p>With <code>irmin-pack</code>, an OCaml thread managed by the <a href="https://github.com/mirage/index/"><code>index</code> library</a> is running concurrently to the main thread (i.e., the <em>merge</em> thread), a fraction of the durations (shown above) are actually spent in that thread. Refer to this <a href="/blog/2020-09-01-introducing-irmin-pack/">blog post</a> for more details on <code>index</code>'s <em>merges</em>.</p>
<p>The following chart illustrates how memory usage evolves during replay:</p>
<center>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2021-10-04.irmin-replay/F0bORTg-170w~fcE-9fDJoj-3wpCPQ63Ryg.webp 170w, /blog/images/2021-10-04.irmin-replay/F0bORTg-340w~HYHnUwIOa4DS0fXHleEUgw.webp 340w, /blog/images/2021-10-04.irmin-replay/F0bORTg-680w~SofPK1vehEqbEftVxu7-qw.webp 680w, /blog/images/2021-10-04.irmin-replay/F0bORTg-1360w~vd9mSc2bXxxv-PBo7aMjcg.webp 1360w" src="/blog/images/2021-10-04.irmin-replay/F0bORTg-1360w~vd9mSc2bXxxv-PBo7aMjcg.webp" alt="Evolution of memory usage during replay "></p>
</center>
<p>On a logarithmic scale, this last chart shows the evolution of the <em>write amplification</em>, which indicates the amount of rewriting (e.g., at the end of the replay, 20TB of data have been written to disk in order to create a store that weighs 73GB).</p>
<center>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2021-10-04.irmin-replay/PhNqloN-170w~uT69R_hufwMu3Szb-1ITbQ.webp 170w, /blog/images/2021-10-04.irmin-replay/PhNqloN-340w~Es6tpkSbHpQQwOSg0B5nkg.webp 340w, /blog/images/2021-10-04.irmin-replay/PhNqloN-680w~ZegfYAWNmgG3Ap8CyJ3lvg.webp 680w, /blog/images/2021-10-04.irmin-replay/PhNqloN-1360w~lnBhOqE_1-ULz-qq7z3RdQ.webp 1360w" src="/blog/images/2021-10-04.irmin-replay/PhNqloN-1360w~lnBhOqE_1-ULz-qq7z3RdQ.webp" alt="Evolution of write amplification during replay"></p>
</center>
<p>The <em>merge</em> operations of the <code>index</code> library are the source of this poor <em>write amplification</em>. The Irmin team is working hard on improving this metric:</p>
<ul>
<li>on the one hand, the new <em>structured keys</em> feature of the upcoming Irmin 3.0 release will help to reduce the pressure on the <code>index</code> library,</li>
<li>on the other hand, we are working on algorithmic improvements of <code>index</code> itself.</li>
</ul>
<p>Another nice way to use the trace is for testing. When replaying a trace, we can recompute the commit hashes and check that they correspond to the trace hashes, so the benchmark acts as additional tests to ensure we don't compromise the hashes computed in Tezos.</p>
<p>Complex changes to Tezos can be simulated first in Irmin. For example, the <a href="https://gitlab.com/tezos/tezos/-/merge_requests/2771">path flattening in Tezos</a> feature (merged in August 2021) can now be tested earlier in the process with our benchmark. Prior to the trace benchmarks, we first had to make the changes in Tezos to understand their repercussions on Irmin directly from the Tezos benchmarks.</p>
<p>Lastly, we continue to test alternative libraries and compare them with the ones integrated in Tezos; however, using these alternative libraries to build Tezos nodes has proven to be more complicated than merely adding them in Irmin and running our benchmarks. While testing continues on most new libraries, we can definitely use replays to compare our <a href="https://github.com/mirage/cactus/">new <code>cactus</code> library</a> as a replacement for our <a href="https://github.com/mirage/index/"><code>index</code> library</a>.</p>
<h3>Future Directions</h3>
<p>While the <em>action trace</em> recording was only made possible on a development branch of Octez, we would next like to upstream the feature to the main branch of Octez, which would give all users the option to record Tezos-Irmin interactions. This would simplify bug reporting overall.</p>
<p>Although the first version only deals with the <em>bootstapping</em> phase of Tezos, an upcoming goal is to make it possible to benchmark the <em>boostrapped</em> phase of Tezos as well. Additionally, we plan to replay the multiprocess aspects of a Tezos node in the near future.</p>
<p>The first stable version of this benchmark has existed in Irmin’s development branch since Q2 2021, and we will release it as part of <code>irmin-bench</code> for Irmin 3.0 in Q4 2021. This release will allow integration into the <a href="https://github.com/ocaml-bench/sandmark">Sandmark OCaml</a> benchmarking suite.</p>
<p>Follow the Tarides blog for future Irmin updates.</p>
]]></description><link>https://tarides.com/blog/2021-10-04-the-new-replaying-benchmark-in-irmin</link><guid isPermaLink="false">https://tarides.com/blog/2021-10-04-the-new-replaying-benchmark-in-irmin.html</guid><dc:creator><![CDATA[ Nicolas Goguey ]]></dc:creator><pubDate>Mon, 04 Oct 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Announcing Tezos’ 8th protocol upgrade proposal: Hangzhou]]></title><description><![CDATA[<p>The last upgrade of the Tezos protocol, Granada, activated on August 6th, 2021.
We are now glad to announce a new protocol proposal, Hangzhou, the result of a
collaborative work from various teams.</p>
<p><em>This is a joint post with <a href="https://www.nomadic-labs.com/">Nomadic Labs</a>,
<a href="https://marigold.dev/">Marigold</a>, <a href="https://www.oxheadalpha.com/">Oxhead Alpha</a>
and <a href="https://www.dailambda.jp/">DaiLambda</a>.</em></p>
]]></description><link>https://tarides.com/blog/2021-09-21-announcing-tezos-8th-protocol-upgrade-proposal-hangzhou</link><guid isPermaLink="false">https://tarides.com/blog/2021-09-21-announcing-tezos-8th-protocol-upgrade-proposal-hangzhou.html</guid><dc:creator><![CDATA[ Tarides ]]></dc:creator><pubDate>Tue, 21 Sep 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Returns to FIC 2021]]></title><description><![CDATA[<p>Last year, Tarides had the honour of winning the “Coup de Coeur” Startup Award
at the International Cybersecurity Forum (FIC). It’s the leading cybersecurity
event in the EU. It’s both a forum, to present and discuss innovations and
reflect on the state of the European cybersecurity ecosystem, and a trade fair,
where cybersecurity and other tech professionals can meet and network.</p>
<p>This year, Tarides returns to FIC with their own booth! Our representatives,
including founder and CEO of Tarides, Thomas Gazagnaire, look forward to making
new connections, looking for collaborators, and catching up with colleagues.</p>
<p>The FIC theme for 2021 centres on encouraging “a collective and collaborative
cybersecurity.” From the <a href="https://www.forum-fic.com/en/home/discover/what-is-the-fic.htm">FIC
Website</a>:</p>
<blockquote>
<p>”Collective, because each stakeholder is responsible not only for its own
security but also for the security of every other stakeholder, and therefore of
the whole. Collaborative, because cooperation and information sharing are
essential to compensate the asymmetry between the «attacker» and the
«defender»…FIC 2021 will focus on the major operational, industrial,
technological, and strategic challenges of cooperation”</p>
</blockquote>
<p>Solutions for most cybersecurity issues already exist, and we at Tarides are
happy to consult on our many solutions that can make your systems more secure.
From secure emails to runtime protection to secure IoT protocols, our
representatives can help!</p>
<p>So stop by the Tarides booth at FIC to chat about your options. We’ll have
snacks and other goodies, too!</p>
]]></description><link>https://tarides.com/blog/2021-09-06-tarides-returns-to-fic-2021</link><guid isPermaLink="false">https://tarides.com/blog/2021-09-06-tarides-returns-to-fic-2021.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Mon, 06 Sep 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Benchmarking OCaml projects with current-bench]]></title><description><![CDATA[<p>Regular CI systems are optimised for workloads that do not require stable performance over time. This makes them unsuitable for running performance benchmarks.</p>
<p><a href="https://github.com/ocurrent/current-bench"><code>current-bench</code></a> provides a predictable environment for performance benchmarks and a UI for analysing results over time. Similar to a CI system, it runs on pull requests and branches which allows performance to be analysed and compared.  It can currently be enabled as an app on GitHub repositories with zero configuration. Several public repositories are running<code>current-bench</code>, including <a href="https://github.com/mirage/irmin">Irmin</a> and <a href="https://github.com/ocaml/dune">Dune</a>. We plan to enable it on more projects in the future.</p>
<p>In this article, we give a technical overview of <code>current-bench</code>, showing how results are collected and analysed, requirements for using it and how we built the infrastructure for stable benchmarks. We also describe future work that would allow more OCaml projects to run <code>current-bench</code>.</p>
<h2>Introduction</h2>
<p>For performance critical software, we must run benchmarks to ensure that there's no regression. Running benchmarks before the user submits their pull request is tedious, and since every user might have a different machine, you can't be sure if the benchmarks performed actually improved or regressed performance.</p>
<p>Our <code>current-bench</code> aims to solve this problem by providing a stable benchmarking platform that runs every time the user submits a pull request and compares the result to the benchmarks on the main branch. As <code>current-bench</code> is zero-configuration, users can enroll their repository to run benchmarks with ease. This <code>current-bench</code> has helped projects ensure that regression doesn't happen, so you can merge code with more confidence.</p>
<h2>Architecture</h2>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/current-bench-arch-170w~oJHOFD3U3nYirSf2hir_VQ.webp 170w, /blog/images/current-bench-arch-340w~C09t8z5VF8_VOo1Mz9byVQ.webp 340w, /blog/images/current-bench-arch-680w~DZBeXsBMA186JmxcA6sdKA.webp 680w, /blog/images/current-bench-arch-1360w~kFjOHawptK1y2XpTTrgPtA.webp 1360w" src="/blog/images/current-bench-arch-1360w~kFjOHawptK1y2XpTTrgPtA.webp" alt="Figure 1: Current bench architecture"></p>
<h3>Benchmarking Pipeline</h3>
<p>As shown in Figure 1 (above), the benchmarking infrastructure uses <code>ocurrent</code><sup><a href="https://icfp20.sigplan.org/details/ocaml-2020-papers/6/OCaml-CI-A-Zero-Configuration-CI">1</a></sup>, an embedded Domain Specific Language to write a pipeline. The <code>ocurrent</code> command computes the build incrementally and helps with static analysis. Whenever a pull request is opened on a repository monitored by <code>current-bench</code>, a <code>POST</code> request is sent to the server running the pipeline. The pipeline fetches the head commit on the pull request and uses Docker to compile the code, and then it runs the <code>make bench</code> command inside the generated Docker image.</p>
<p>The pipeline runs on a single node, and the process is pinned to a single core to ensure there's no contention of resources when running the benchmarks. Once finished, the raw JSON result is stored in a <code>Postgres</code> database, which the frontend can query using a <code>GraphQL</code> API, as shown in Figure 2 below.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/current-bench-ui-170w~3CeGWIaWofdeMVXxhj53fg.webp 170w, /blog/images/current-bench-ui-340w~sUhq915_K8mo3EaTqHgE2A.webp 340w, /blog/images/current-bench-ui-680w~qtZcOlS6BF6Aig1DFmNfYg.webp 680w, /blog/images/current-bench-ui-1360w~esiIgN8ZOZXPV2nlVMVK-w.webp 1360w" src="/blog/images/current-bench-ui-1360w~esiIgN8ZOZXPV2nlVMVK-w.webp" alt="Figure 2: Current bench UI"></p>
<p>The frontend supports historical navigation and provides comparison with the default branch. It allows users to select a pull request of which they want to see the graphs. The graphs display the individual result of the head commit and the comparison with the commits on the default branch. The frontend permits users to select the historical interval when they want to compare benchmarks, and it also shows the standard deviation.  Once the benchmarks have run successfully, the pipeline sets the pull request status to the frontend URL. Then the user can look at the graphs.</p>
<h3>Hardware Optimisation</h3>
<p>Our <code>current-bench</code> uses the hardware optimisations developed for OCaml multicore compiler benchmarks <a href="https://github.com/ocaml-bench/ocaml_bench_scripts#notes-on-hardware-and-os-settings-for-linux-benchmarking">(presented at ICFP OCaml Workshop 2019)</a> with a few modifications to allow the benchmarks to run inside Docker containers. To get stable performance, we configured the kernel to isolate some of the CPU cores. Linux then avoids scheduling other user processes automatically. We also disabled IRQ handling and power saving.</p>
<p>The container that runs the benchmark is pinned to one of the isolated cores. Since I/O operations can make the benchmarks less stable, we use an in-memory <code>tmpfs</code> partition in <code>/dev/shm</code> for all storage. For NUMA enabled systems, we configure this partition to be allocated on the NUMA node of the isolated core. The pipeline disables ASLR inside the container automatically, which is normally blocked by the default Docker seccomp profile, so we have modified the profile to allow the <code>personality(2)</code> syscall.</p>
<h2>Enrolling a repository</h2>
<p>To enroll a repository, you need to ensure the following:</p>
<ul>
<li>Enable the <a href="https://github.com/marketplace/ocaml-benchmarks">ocaml-benchmarks</a> GitHub app for your repository.</li>
<li>The repository needs a <code>bench</code> Makefile target. This is triggered from the <code>current-bench</code> pipeline.</li>
<li>The output of the <code>make bench</code> target is JSON, which can be parsed by the pipeline and displayed by the frontend.</li>
</ul>
<h2>Future work</h2>
<p>Anyone who wants to roll out a continuous, zero-configured benchmarking infrastructure can set up the current-bench infrastructure. In the future, we want to scale <code>current-bench</code> by isolating cores on multiple machines and adding a scheduler to ensure that benchmarks use only one core at a time per machine. We plan to add support for different benchmarking libraries that repositories can use—for example, we currently support repositories using <code>bechamel</code>.  We also aim to make the adoption of <code>current-bench</code> easier by adding a conversion library that can convert any benchmark output into output parseable by <code>current-bench</code>. We intend to add support for <code>quick</code> and <code>slow</code> benchmarks, which would allow users to have faster feedback loops on pull requests while ensuring they can still run more extensive, time consuming benchmarks to see the performance.</p>
<p>Thank you for reading! You can check out the implementation for <code>current-bench</code> <a href="https://github.com/ocurrent/current-bench">here</a>!</p>
]]></description><link>https://tarides.com/blog/2021-08-26-benchmarking-ocaml-projects-with-current-bench</link><guid isPermaLink="false">https://tarides.com/blog/2021-08-26-benchmarking-ocaml-projects-with-current-bench.html</guid><dc:creator><![CDATA[ Gargi Sharma ]]></dc:creator><pubDate>Thu, 26 Aug 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Engineers to Present at ICFP 2021]]></title><description><![CDATA[<p>This year marks the 25th anniversary of the OCaml Language! It's an exciting
time for OCaml programmers and enthusiasts. A fun and informative way to
celebrate OCaml's birthday is to attend the <a href="https://icfp21.sigplan.org/home/ocaml-2021">26th Annual International
Conference on Functional
Programming</a> (ICFP), held online
this year due to ongoing Covid restrictions. While this is disappointing news
for so many, it's beneficial to those of you outside France because now you
can hear professionals talk about cutting edge technology from the comfort of
your own home.</p>
<p>Tarides engineers, as well as our colleagues at <a href="https://ocamllabs.io">OCaml Labs
Consultancy</a> and <a href="https://segfault.systems/">Segfault Systems</a>,
have some exciting presentations at this year's ICFP! Listen to talks on running
OCaml on multiple cores, generating fuzzing suites, benchmarking, and the
experimental OCaml effects.</p>
<p>You can search the [complete ICFP
Timetable](https://icfp21.sigplan.org/program/program-icfp-2021/?past=Show
upcoming events only&amp;date=Fri 27 Aug 2021) for other topics of interest and read
below about our engineers' projects and presentations. Times are listed both in
London (GMT +1) and Paris (GMT +2) for ease of planning. The following talks are
scheduled for Friday, 27 August 2021.</p>
<p>Grab a cup of coffee for our first morning talk at <strong>9am London / 10am Paris</strong>
and learn about <strong>Adapting the OCaml Ecosystem for Multicore OCaml.</strong> With the
soon-to-be released OCaml 5.0, there will be support for Shared-Memory
Parallelism. There’s increasing interest in the community to port existing
libraries to Multicore, so this talk will cover the arrival of Multicore and
what that means to the OCaml ecosystem. Our engineers will highlight existing
tools and provide methods for a smooth transition, so viewers can benefit from
Multicore parallelism. They'll also share some insights from their experience
porting existing libraries to Multicore OCaml.</p>
<p>Read more about this topic on todays' post at <a href="https://segfault.systems/blog/2021/adapting-to-multicore/">Segfault
Systems</a>, written by
one of tomorrow's presenters, <a href="https://icfp21.sigplan.org/profile/sudhaparimala">Sudha
Parimala</a> of Segfault Systems.
Joining Sudha for the presentation are <a href="https://icfp21.sigplan.org/profile/enguerranddecorne1">Enguerrand
Decorne</a> (Tarides),
<a href="https://icfp21.sigplan.org/profile/sadiqjaffer">Sadiq Jaffer</a> (Opsian and OCaml
Labs Consultancy), <a href="https://icfp21.sigplan.org/profile/tomkelly">Tom Kelly</a>
(OCaml Labs Consultancy), and <a href="https://icfp21.sigplan.org/profile/kcsivaramakrishnan">KC
Sivaramakrishnan</a> of IIT
Madras.</p>
<p>Next up is <strong>Leveraging Formal Specifications to Generate Fuzzing Suites</strong> at
<strong>11:10 London / 12:10 Paris,</strong> presented by Tarides' own <a href="https://icfp21.sigplan.org/profile/nicolasosborne">Nicolas
Osborne</a> and <a href="https://icfp21.sigplan.org/profile/clementpascutto">Clément
Pascutto</a>. They'll discuss
how developers typically first have to capture the semantics they want when
checking a library and then write the code implementing these tests and find
relevant test cases that expose possible misbehaviours. Through their work,
they'll present a tool that automatically takes care of those last two steps by
automatically generating fuzz testing suites from OCaml interfaces annotated
with formal behavioural specifications. They'll also show some ongoing
experiments on fuzzing capabilities and limitations applied to real-world
libraries.</p>
<p>Next up is our talk on <strong>Continuous Benchmarking for
OCaml Projects</strong> at <strong>12:30 London / 13:30 Paris</strong>. Regular CI systems are
optimised for workloads that do not require stable performance over time, which
makes them unsuitable for running performance benchmarks. Tarides engineers
<a href="https://icfp21.sigplan.org/profile/gargisharma">Gargi Sharma</a>, <a href="https://icfp21.sigplan.org/profile/rizoisrof">Rizo
Isrof</a>, and <a href="https://icfp21.sigplan.org/profile/magnusskjegstad">Magnus
Skjegstad</a> will discuss how
<code>current-bench</code> provides a predictable environment for performance benchmarks
and a UI for analysing results over time. Similar to a CI system it runs on pull
requests and branches allowing performance to be analysed and compared, and it
can currently be enabled on as an app on GitHub repositories with zero
configuration. Several public repositories already run <code>current-bench</code>,
including <a href="https://github.com/mirage/irmin">Irmin</a> and
<a href="https://github.com/ocaml/dune">Dune</a>, and they plan to enable it on more
projects in the future. <a href="/blog/2021-08-26-benchmarking-ocaml-projects-with-current-bench/">Read Gargi's recent blog post for more information on
benchmarking</a>.</p>
<p>In this presentation, they will give a technical overview of <code>current-bench</code>, showing how results are collected and analysed, requirements for using it, and how they built the infrastructure for stable benchmarks. They'll also cover some future work that will allow more OCaml projects to run <code>current-bench</code>.</p>
<p>Immediately after the Benchmarking talk, catch <strong>A Multiverse of Glorious Documentation</strong>
scheduled at <strong>12:50 London / 13:50 Paris.</strong> <a href="https://icfp21.sigplan.org/profile/lucaspluvinage1">Lucas
Pluvinage</a> of Tarides and
<a href="https://icfp21.sigplan.org/profile/jonathanludlam">Jonathan Ludlam</a> of OCaml
Labs Consultancy will discuss the process of generating documentation for every
version of every package that can be built from the Opam repository and present
it as a single coherent website that's continuously updated as new packages are
released and old packages are updated. They will address the challenges of
caching, handling different compiler versions, and incompatible libraries. The
process has been implemented as an OCurrent pipeline named <code>ocaml-docs-ci</code> and
is already available on Github. It has been used to produce the documentation of
more than 10,000 package versions, generating 2.5M HTML pages. That's 38GB of
artifacts!</p>
<p>After a relaxing lunch, come back for <strong>Experiences with Effects</strong> at <strong>15:30
London / 16:30 Paris</strong>. Join OCaml Labs and Tarides engineers <a href="https://icfp21.sigplan.org/profile/thomasleonard">Thomas
Leonard</a>, <a href="https://icfp21.sigplan.org/profile/craigferguson">Craig
Ferguson</a>, <a href="https://icfp21.sigplan.org/profile/patrickferris">Patrick
Ferris</a>, <a href="https://icfp21.sigplan.org/profile/sadiqjaffer">Sadiq
Jaffer</a>, <a href="https://icfp21.sigplan.org/profile/tomkelly">Tom
Kelly</a>, <a href="https://icfp21.sigplan.org/profile/kcsivaramakrishnan">KC
Sivaramakrishnan</a>, and
<a href="https://icfp21.sigplan.org/profile/anilmadhavapeddy">Anil Madhavapeddy</a> as they
talk about an exciting, experimental branch of Multicore OCaml that adds support
for effect handlers. In this presentation, they'll discuss their experiences
with effects, both from converting existing code and from writing new code. They
discovered that converting the Angstrom parser from a callback style to effects
greatly simplified the code while also improving performance and reducing
allocations. Their <a href="https://github.com/ocaml-multicore/eio">experimental Eio
library</a> uses effects that allows
writing concurrent code in direct style, without the need for monads (as found
in Lwt or Async).</p>
<p>Enjoy a full day of OCaml innovation and get to know some of our talented
engineers better by joining Tarides, OCaml Labs, and Segfault Systems at ICFP on
Friday, 27 August 2021. See you there!</p>
]]></description><link>https://tarides.com/blog/2021-08-26-tarides-engineers-to-present-at-icfp-2021</link><guid isPermaLink="false">https://tarides.com/blog/2021-08-26-tarides-engineers-to-present-at-icfp-2021.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Thu, 26 Aug 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides at WomenHack Virtual Event]]></title><description><![CDATA[<p>Tarides takes great pride in a diverse workforce and strives to continue
bringing talented people to its team from around the globe. This is why Sonja
Heinze, a Tarides software engineer, and the Head of HR, Héloïse Lutton, will
attend WomenHack, an online event dedicated to recruiting more women into the
tech world. They're participating not only to present Tarides to the Women In
Tech community, but to also network and possibly find new talented programmers
to join our growing team.</p>
<p>The <strong>WomenHack</strong> event has a unique setup similar to 'speed dating,' but for
prospective jobs. Each candidate is paired with a company, and they have 5
minutes to chat before moving on to the next company. This "rapid interview" is
both fun and efficient for all involved, and it ensures each company will meet
several talented candidates from diverse backgrounds, which increases their
chance of finding that perfect fit for an open position!</p>
<p>From their website:</p>
<blockquote>
<p><strong><a href="https://womenhack.com/">WomenHack</a></strong> is a community that empowers women in
tech through events, jobs, and reviews. We aim to create a more inclusive and
diverse workplace for all. Our diversity recruiting events target some of the
most talented women in tech which include software developers, designers, and
product talent.&nbsp;</p>
</blockquote>
<p>Join us tomorrow, July 21st, at WomenHack! You can <a href="https://womenhack.com/events/72907/?tickets">get your ticket
here</a>.</p>
]]></description><link>https://tarides.com/blog/2021-07-20-tarides-at-womenhack-virtual-event</link><guid isPermaLink="false">https://tarides.com/blog/2021-07-20-tarides-at-womenhack-virtual-event.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 20 Jul 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides Introduces OSMOSE at the Open-Source Innovation Sprint]]></title><description><![CDATA[<p>Tarides is excited to announce that our CEO, Dr. Thomas Gazagnaire, and Prof.
Anil Madhavapeddy, from the University of Cambridge, will present their
innovative platform OSMOSE at the Open Source Innovation Sprint (OSIS)
conference on 1 July 2021. This event is organized by
<a href="https://systematic-paris-region.org">Systematic</a>.</p>
<p>OSMOSE is a software platform made to manage digital infrastructure at scale,
securely and efficiently. It uses the groundbreaking creation of unikernels to
radically simplify the way applications are built and deployed for the cloud.</p>
<p>Since digital transformation has become more and more dependent on cloud
computing, it has increasing problems with high response latency, security
risks, and resource inefficiencies—all which make these services ultimately
unreliable. The demand for interconnected devices is ever increasing, but the
security of these devices remain unchecked, making them susceptible to security
vulnerabilities. This leaves consumers and businesses open to exploitation, as
demonstrated in reports of tech devices violating users’ privacy, like sending
audio recordings without their knowledge or consent.</p>
<p>Tarides addresses these issues with OSMOSE, a platform which combines hardware
and software elements to invert the current cloud-centric model. OSMOSE securely
connects with physical spaces to provide extremely low latency and
high-bandwidth, local-area computation capabilities, which can turn a fleet of
IoT devices into a local data-centre.</p>
<p>This innovative platform enables computer resources to be tracked efficiently
and temporarily rented to users. This turns any IoT deployment into a local,
private cloud, allowing a better utilization of local resources and improved
security.</p>
<p>Major components of OSMOSE already have commercial applications. It’s been used
to make existing cloud deployments more secure and efficient by companies such
as Amazon, Citrix, and Docker.</p>
<p>Tarides applies a high-touch, mixed business strategy—using consultancy services
to field test open source components under development. Tarides applies their
research to real-world systems to build unikernels, a secure-by-design and
resource-efficient application specialised to their run-time environments. If
interested in using OSMOSE as a solution for your business,
<a href="/contact/">please reach out to Tarides</a> for more
information.</p>
<p><a href="https://systematic-paris-region.org/evenement/open-source-innovation-spring-edge-iot/">Register for OSIS</a>
to attend the OSMOSE presentation on 1 July 2021.</p>
]]></description><link>https://tarides.com/blog/2021-06-29-tarides-introduces-osmose-at-the-open-source-innovation-sprint</link><guid isPermaLink="false">https://tarides.com/blog/2021-06-29-tarides-introduces-osmose-at-the-open-source-innovation-sprint.html</guid><dc:creator><![CDATA[ Christine Rose ]]></dc:creator><pubDate>Tue, 29 Jun 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides project SCoP is selected as one of the brightest Data Portability projects in Europe!]]></title><description><![CDATA[<p><strong>Tarides is taking part in the Data Portability &amp; Services Incubator (DAPSI), a
3-year EU funded project that empowers internet innovators to develop new
solutions in the Data Portability field.</strong></p>
<h2>What is DAPSI?</h2>
<p>The <a href="https://dapsi.ngi.eu">Data Portability and Services Incubator (DAPSI)</a> is
an EU funded project, under the European Commission’s Next Generation Internet
(NGI) initiative. The aim of this initiave is to empower top internet innovators
to develop human-centric solutions. DAPSI addresses the challenge of personal
data portability on the internet, as foreseen under the GDPR and make it
significantly easier for citizens to have any data which is stored with one
service provider transmitted directly to another provider.</p>
<p>Take a look at the <a href="https://dapsi.ngi.eu/hall-of-fame/">DAPSI innovators
portfolio</a> to see more information about the
selected projects.</p>
<h2>What is our project?</h2>
<p>Our project, called <strong>SCoP</strong> for <strong>Secure-by-design Communication Protocols</strong>,
is taking part in the DAPSI to tackle data portability issues in communication
services.</p>
<p>Over the past few decades, the usage of emails has been massively widespread by
both individuals and companies. Billions of emails are sent every day and this
number is expected to increase to reach 333 billion of emails exchanged daily
in 2022. Moreover, as managing internet communication stacks have become
increasingly complex, end-users have tended to entrust this task to third-party
companies like Google and Microsoft. Furthermore, existing implementations of
these communication services rely on ad-hoc methodologies and memory-unsafe
languages, where minor developer errors can easily escalate into major security
flaws. The centralization of these communication services means that a single
successful attack leads to major personal data breaches.</p>
<p>To fix this issue, <strong>our project aims to engineer a modern basis for open
messaging that supports existing protocols such as emails but is also extensible
and customizable for emerging protocols such as matrix</strong>. We will be building
trustable implementations of these open protocols using type-safe languages and
we will deploy these implementations as specialized, secure and resource
efficient unikernels. They will become the basis of the communication system of
OSMOSE, Tarides’ commercial solution for secure-by-design IoT infrastructure.</p>
<p>Every component of that system will be carefully designed as independent
libraries, using modern development techniques to avoid the common reported
threats and flaws. For instance, the implementation of protocol parsers and
serializers will be written in a type-safe language and will be using fuzzing,
e.g state-of-the-art coverage-driven tests. The combination of these techniques
will increase users’ trust to migrate their personal data to these new secure
services.</p>
<p>Moreover, these techniques are also useful to produce a large and reusable
corpus of test materials, which we plan to release separately for other
implementations to use. It will give the tools to other developers to write the
next generation of messaging applications by extending the existing protocols
with more confidence.</p>
<h2>Want to be part of it?</h2>
<p>Would you like to hear more about the project? Or want to deploy our solution?
This project will build on a number of existing components in
<a href="https://mirage.io">MirageOS</a>, such as
<a href="/blog/2019-09-25-mr-mime-parse-and-generate-emails/">MrMime</a>
and <a href="/blog/2020-09-08-irmin-september-2020-update/">Irmin</a>,
so feel free to contribute to these existing components! Please <a href="/contact/">contact us</a>.</p>
<br>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/DAPSI_generic-170w~iDbXLn3WcT-7eJLTnJWyOg.webp 170w, /blog/images/DAPSI_generic-340w~ZGfMn5IbiT56z-FHBcuDVA.webp 340w, /blog/images/DAPSI_generic-680w~TyXGVMfW4ULusIzlJDNAXA.webp 680w, /blog/images/DAPSI_generic-1360w~wjT-2RalNrPjOAem6u2MDw.webp 1360w" src="/blog/images/DAPSI_generic-1360w~wjT-2RalNrPjOAem6u2MDw.webp" alt="Sequence of entity logos: in association with NGI, EU, Zabala, FGS, cap-digital, IMT Starter, Fraunhofer IAIS."></p>
]]></description><link>https://tarides.com/blog/2021-04-30-scop-selected-for-dapsi-initiative</link><guid isPermaLink="false">https://tarides.com/blog/2021-04-30-scop-selected-for-dapsi-initiative.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Fri, 30 Apr 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Florence and beyond: the future of Tezos storage]]></title><description><![CDATA[<p>In collaboration with Nomadic Labs, Marigold and DaiLambda, we're happy to
announce the completion of the next Tezos protocol proposal:
<a href="https://doc.tzalpha.net/protocols/009_florence.html"><strong>Florence</strong></a>.</p>
<p><a href="https://tezos.com/">Tezos</a> is an open-source decentralised blockchain network providing a
platform for smart contracts and digital assets. A crucial feature of Tezos is
<a href="https://tezos.com/static/white_paper-2dc8c02267a8fb86bd67a108199441bf.pdf"><em>self-amendment</em></a>: the network protocol can be upgraded
dynamically by the network participants themselves. This amendment process is
initiated when a participant makes a <em>proposal</em>, which is then subject to a
vote. After several years working on the Tezos storage stack, this is our first
contribution to a proposal; we hope that it will be the first of many!</p>
<p>As detailed in today's <a href="https://blog.nomadic-labs.com/florence-our-next-protocol-upgrade-proposal.html">announcement from Nomadic Labs</a>,
the Florence proposal contains several important changes, from the introduction
of Baking Accounts to major quality-of-life improvements for smart contract
developers. Of all of these changes, we're especially excited about the
introduction of <em>sub-trees</em> to the blockchain context API. In this post, we'll
give a brief tour of what these sub-trees will bring for the future of Tezos.
But first, what <em>are</em> they?</p>
<h3>Merkle sub-trees</h3>
<p>The Tezos protocol runs on top of a versioned tree called the “context”, which
holds the chain state (balances, contracts etc.). Ever since the pre-Alpha era,
the Tezos context has been implemented using <a href="https://github.com/mirage/irmin">Irmin</a> – an open-source
Merkle tree database originally written for use by MirageOS unikernels.</p>
<p>For MirageOS, Irmin’s key strength is flexibility: it can run over arbitrary
backends. This is a perfect fit for Tezos, which must be agile and
widely-deployable. Indeed, the Tezos shell has already leveraged this agility
many times, all the way from initial prototypes using a Git backend to the
optimised <a href="/blog/2020-09-01-introducing-irmin-pack/"><code>irmin-pack</code></a> implementation used today.</p>
<p>But Irmin can do more than just swapping backends! It also allows users to
manipulate the underlying Merkle tree structure of the store with a high-level
API. This “<a href="https://mirage.github.io/irmin/irmin/Irmin/module-type-S/Tree/">Tree</a>” API enables lots of interesting use-cases of
Irmin, from mergeable data types (<a href="https://kcsrk.info/papers/banyan_aplas20.pdf">MRDTs</a>) to zero-knowledge proofs.
Tezos doesn't use these more powerful features directly yet; that’s where Merkle
proofs come in!</p>
<h3>Proofs and lightweight Tezos clients</h3>
<p>Since the Tezos context keeps track of the current "state" of the blockchain,
each participant needs their own copy of the tree to run transactions against.
This context can grow to be very large, so it's important that it be stored as
compactly as possible: this goal shaped the design of <code>irmin-pack</code>, our latest
Irmin backend.</p>
<p>However, it's possible to reduce the storage requirements even further via the
magic of Merkle trees: individuals only need to store a <em>fragment</em> of the root
tree, provided they can demonstrate that this fragment is valid by sending
“<a href="https://bentnib.org/posts/2016-04-12-authenticated-data-structures-as-a-library.html">proofs</a>” of its membership to the other participants.</p>
<p>This property can be used to support ultra-lightweight Tezos clients, a feature
<a href="https://gitlab.com/smelc/tezos/-/commits/tweag-client-light-mode">currently being developed</a> by TweagIO. To make this a reality,
the Tezos protocol needs fine-grained access to context sub-trees in order build
Merkle proofs out of them. Fortunately, Irmin already supports this! We
<a href="https://gitlab.com/tezos/tezos/-/merge_requests/2457">extended the protocol</a> to understand sub-trees, lifting the power
of Merkle trees to the user.</p>
<p>We’re excited to work with TweagIO and Nomadic Labs on lowering the barriers to
entering the Tezos ecosystem and look forward to seeing what they achieve with
sub-trees!</p>
<h3>Efficient Merkle proof representations</h3>
<p>Simply exposing sub-trees in the Tezos context API isn’t quite enough:
lightweight clients will also need to <em>serialize</em> them efficiently, since proofs
must be exchanged over the network to establish trust between collaborating
nodes. Enter <a href="https://dailambda.jp/blog/2020-05-11-plebeia/">Plebeia</a>.</p>
<p>Plebeia is an alternative Tezos storage layer – developed by DaiLambda – with
strengths that complement those of Irmin. In particular, Plebeia is capable of
generating very compact Merkle proofs. This is partly due to its specialized
store structure, and partly due to clever optimizations such as path compression
and inlining.</p>
<p>We’re working with the DaiLambda team to unite the strengths of Irmin and
Plebeia, which will bring built-in Merkle proof support to the Tezos storage
stack. The future is bright for Merkle proofs in Tezos!</p>
<h3>Baking account migrations</h3>
<p>Trees don’t just enable <em>new</em> features; they have a big impact on performance
too! Currently, indexing into the context always happens from its <em>root</em>, which
duplicates effort when accessing adjacent values deep in the tree. Fortunately,
the new sub-trees provide a natural representation for “cursors” into the
context, allowing the protocol to optimize its interactions with the storage
layer.</p>
<p>To take just one example, DaiLambda recently exploited this feature to reduce
the migration time necessary to introduce Baking Accounts to the network by a
factor of 15! We’ll be teaming up with Nomadic Labs and DaiLambda to ensure that
Tezos extracts every bit of performance from its storage.</p>
<p>It's especially exciting to have access to lightning-fast storage migrations,
since this enables Tezos to evolve rapidly even as the ecosystem expands.</p>
<h3>Storage in other languages</h3>
<p>Of course, Tezos isn’t just an OCaml project: the storage layer also has a
performant Rust implementation as part of <a href="https://github.com/simplestaking/tezedge">TezEdge</a>. We’re working with
<a href="https://github.com/simplestaking">Simple Staking</a> to bring Irmin to the Rust community via an
<a href="https://github.com/simplestaking/ocaml-interop">FFI toolchain</a>, enabling closer alignment between the different
Tezos shell implementations.</p>
<h3>Conclusion</h3>
<p>All in all, it’s an exciting time to work on Tezos storage, with many
open-source collaborators from around the world. We’re especially happy to see
Tezos taking greater advantage of Irmin’s features, which will strengthen both
projects and help them grow together.</p>
<p>If all of this sounds interesting, you can play with it yourself using the
recently-released <a href="https://github.com/mirage/irmin">Irmin 2.5.0</a>. Thanks for reading, and stay tuned for
future Tezos development updates!</p>
]]></description><link>https://tarides.com/blog/2021-03-04-florence-and-beyond-the-future-of-tezos-storage</link><guid isPermaLink="false">https://tarides.com/blog/2021-03-04-florence-and-beyond-the-future-of-tezos-storage.html</guid><dc:creator><![CDATA[ Craig Ferguson ]]></dc:creator><pubDate>Thu, 04 Mar 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Partnering for more diversity in Tech]]></title><description><![CDATA[<p>Tarides is very glad to announce our partnership with <a href="https://adatechschool.fr">Ada Tech
School</a>.</p>
<p>Founded in 2019 and based in Paris (France), Ada Tech School, named for pioneer
computer scientist <a href="https://en.wikipedia.org/wiki/Ada_Lovelace">Ada Lovelace</a>,
is a programming school designed for women but open to all. The program is
driven by three values: feminism, empathy and singularity. Its mission is to
facilitate access to programming positions and promote the feminization of tech,
by creating training that tackles the gender and cultural biases of IT.</p>
<p>Unfortunately, the diversity of the candidate pool is very limited when a
company tries to fill positions. Barely 10% of computer science students in
France are girls. Ada Tech School is an excellent initiative to democratize
software education amongst women. The school was created so that women can land
a job easily in the IT industry through rigorous training, and then offer
ongoing coaching and support to ascend the professional ladder within tech
companies.</p>
<p>At Tarides, we believe that a healthy team is a diverse one; and that trust,
fairness and inclusion are values needed to build a strong company.</p>
<p>We are committed to doing better, not only by hiring a diverse team and
providing a welcoming work environment, but also by putting people first at
every stage. This means providing fair and equitable compensation as well as
meaningful career advancement opportunities for every employee.</p>
<p>We believe that a great technology always derives from great people, regardless
of their background. Head <a href="/careers/">here</a> to see our
currently-open positions.</p>
]]></description><link>https://tarides.com/blog/2021-02-15-partnering-for-more-diversity-in-tech</link><guid isPermaLink="false">https://tarides.com/blog/2021-02-15-partnering-for-more-diversity-in-tech.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Mon, 15 Feb 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Recent and upcoming changes to Merlin]]></title><description><![CDATA[<p>Merlin is a language server for the OCaml programming language; that is, a daemon
that connects to your favourite text editor and provides the usual services of
an IDE: instant feedback on warnings and errors, autocompletion, "type of the
code under the cursor", "go to definition", etc. As we (Frédéric Bour, Ulysse
Gérard and I) are about to do a new major release, we thought now would be a
good time to talk a bit about some of the changes that are going into this
release.</p>
<h2>Project configuration</h2>
<p>Since its very first release, merlin has been getting information about the
project being worked on through a <code>.merlin</code> file, which used to be written by
the user, but is now often generated by build systems.</p>
<p>This had the advantage of being fairly simple: Merlin would just look in the
current directory if such a file existed, otherwise it would look in the parent
directories until it found one; and then read it. But there were also some
sore points: the granularity of the configuration is the directory not the file,
and this information is duplicated from the build system configuration (be it
dune, Makefiles, or, back in the days, ocamlbuild).</p>
<p>After years of thinking about it, we've finally decided to make some light
changes to this process. Since version 3.4, when it scans the filesystem Merlin
is now looking for either a <code>.merlin</code> file or a dune (or dune-project) file. And
when it finds one of those, it starts an external process in the directory where
that file lives, and asks that process for the configuration of the ml(i) file
being edited.</p>
<p>The process in charge of communicating the configuration to Merlin will either
be a specific dune subcommand (when a dune file is found), or a dedicated
<code>.merlin</code> reader program.</p>
<p>We see several advantages in doing things this way (rather than, for instance,
changing the format of <code>.merlin</code> files):</p>
<ol>
<li>this change is entirely backward compatible, and indeed the transition has
already happened silently; although dune is still emitting <code>.merlin</code> files,
this will only stop with dune 2.8.</li>
<li>externalizing the reading of <code>.merlin</code> files and simply requiring a
"normalized" version of the config (i.e. with no mention of packages, just of
flags and paths) allowed us to simplify the internals of Merlin.</li>
<li>talking to the build system directly not only gets us a much finer grained
configuration (which is important when you build different executables with
different flags in the same directory, or if you apply different ppxes to
different files of a library), it opens the door to getting a nicer behavior
of Merlin in some circumstances. For instance, the build system can (and
does) tell Merlin when the project isn't built. Currently we only report that
information to the user when he asks for errors, alongside all the other
(mostly rubbish) errors. Which is already helpful in itself. But in the
future we can start filtering the other errors to only report those that
would remain even after building the project (e.g. parse errors).</li>
</ol>
<p>There are however some changes to look out for:</p>
<ul>
<li>people who still use <code>.merlin</code> files but do not install Merlin using opam need
to make sure to also have the <code>dot-merlin-reader</code> binary in their PATH (it is
available as an opam package, but is also buildable from Merlin's git
repository)</li>
<li>vim and emacs users who could previously load packages interactively (by
calling <code>M-x merlin-use</code> or <code>:MerlinUse</code>) cannot do that anymore, since Merlin
itself stopped linking with findlib. They'll have to write a <code>.merlin</code> file.</li>
</ul>
<h2>Dropping support for old versions of OCaml</h2>
<p>Until now, every release of Merlin has kept support from OCaml 4.02 to the
latest version of OCaml available at the time of that release.</p>
<p>We have done this by having one version of "<em>the frontend</em>" (i.e. handling of
buffer state, project configuration; analyses like <em>jump-to-definition</em>,
<em>prefix-completion</em>, etc.), but several versions of "<em>the backend</em>" (OCaml's
ASTs, parser and typechecker), and choosing at build time which one to use.
The reason for doing this instead of having, for instance, one branch of Merlin
per version of OCaml, is that while the backends are fairly stable once
released, Merlin's frontend keeps evolving. Having just one version of it makes
it easier to add features and fix bugs (patches don't need to be duplicated),
whilst ensuring that Merlin's behavior is consistent across every version of
OCaml that we support.</p>
<p>For this to work however, one needs a well defined API between the frontend and
all the versions of the backend. This implies mapping every versions of OCaml's
internal ASTs (which receive modifications from one version to the next), to a
unified one, so as to keep Merlin's various features version agnostic. But it
also means being resilient to OCaml's internal API changes. For instance between
4.02 and 4.11 there were big refactorings impacting: the way one accesses the
typing environment, the way one accesses the "load path" (the part of the file
system the compiler/Merlin is aware of), the way error message are produced, ...</p>
<p>The rate of changes on the compiler is a lot higher than what it was when we
first started Merlin (7 years ago now!) which doesn't just mean that we have to
spend more and more time on updating the common interface, but also that the
interface is getting harder to define. Recently (with the 4.11 release) some of
the changes were significant enough that for some parts of the backend we just
didn't manage to produce a single interface to access old and new versions, so
instead we had to start duplicating and specializing parts of the frontend.
And we don't expect things to get much better in the near future.</p>
<p>Furthermore, Merlin's backends are patched to be more resilient to parsing and
typing errors in the user's code. Those patches also need to be evolved at each
new release of the compiler.
The work required to keep the "unified interface" working was taking time away
from updating our patches properly, and our support of user errors has slowly
been getting worse over the past few years, resulting in less precise type
information when asked, incomplete results when asking for auto-completion, etc.</p>
<p>Therefore we have decided to stop dragging older versions of OCaml along. We
plan to switch to a system where we have one branch of Merlin per version of
OCaml, and each opam release of Merlin will only be buildable with one version
of OCaml. We will keep maintaining all the relatively recent branches (that is:
4.02 definitely will not get fixes, but 4.06 is still in the clear). However,
all the new features will be developed against the latest version of OCaml and
cherry-picked to older branches if, and only if, there are no merge conflicts
and they work as expected without changes.</p>
<p>We hope that this will make it easier for us to update to new versions of OCaml
(actually, we already know it does, working on adding support for 4.12 was
easier than for any of the other recent versions), will allow us to clean up
Merlin's codebase (let's call that a work in progress), and will free some time
to work on new features.</p>
<p>You might wonder what all this changes for you, as a user, in practice. Well, it
depends:</p>
<ul>
<li>if you install Merlin from opam: nothing, or almost nothing. Everything that
you currently do with Merlin will keep working. In the future, perhaps some
new feature will appear that won't work on all versions. But that day hasn't
come yet.</li>
<li>if you install Merlin some other way (manually?): you can't just fetch master
and build it anymore. You have to pick the appropriate branch for your
version of OCaml.</li>
<li>if you're reusing Merlin's codebase as part of another project and (even
worse) have patches on it: come and talk to us if you haven't done so already!
We can try and integrate your patches, so that you only need to worry about
vendoring the right version(s) for your needs.</li>
</ul>
<hr>
<p>Over the years, Merlin has received bugfixes and improvements from a long list of
people, but for the upcoming release Frédéric and I are particularly grateful to
Rudi Grinberg, a long time and regular contributor who also maintains the OCaml
LSP project, as well as Ulysse Gérard, who joined our team a year ago now. They
are in particular the main authors of the work to improve the handling of
projects' configuration.</p>
<p>We hope you'll be as excited as us by all these changes!</p>
]]></description><link>https://tarides.com/blog/2021-01-26-recent-and-upcoming-changes-to-merlin</link><guid isPermaLink="false">https://tarides.com/blog/2021-01-26-recent-and-upcoming-changes-to-merlin.html</guid><dc:creator><![CDATA[ Thomas Refis ]]></dc:creator><pubDate>Tue, 26 Jan 2021 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides sponsors the Oxbridge Women in Computer Science Conference 2020]]></title><description><![CDATA[<p>The <a href="https://oxbridgewomenincs8.wixsite.com/2020">Oxbridge Women in Computer Science
conference</a> is an annual one-day
event hosted by the Universities of Oxford and Cambridge (UK). The
conference is free and open to everyone from any discipline, regardless of
gender identity. Its purpose is to spotlight the successes of women within
computer science and strengthen the network of women in computer science
within a supportive and friendly environment.</p>
<p>This year, the conference was organised by the University of Cambridge and was
held virtually on December 7th.</p>
<p>Tarides is very glad to sponsor this event as we strongly believe that diversity
and inclusive culture is a key factor in building a competitive and innovative
company. Our employees come from 8 different countries and are 1⁄3 women.
Tarides promotes transparency, openness and autonomy, creating a work atmosphere
auspicious for employees to strive in their work, to solve novel, impactful and
technical challenges. By working on open-source projects, a collaboration is
possible with worldwide experts from both academia and industry, encouraging
continuous training and education; in this context, it is very important to have
teams with diverse backgrounds and experience.</p>
<p>The underrepresentation of women in tech, and particularly in computer science,
is not a new problem and gender equality remains a major issue in the corporate
world. By celebrating female computer scientists, as during the Oxbridge Women
in Computer Science Conference, it will hopefully encourage more women to pursue
their interests and careers in the tech field. Head
<a href="/careers/">here</a> to see our currently-open positions.</p>
<p>For the event, we made a short video to experience a day in the life of a
software engineer:</p>
<div style="position: relative; width: 100%; height: 0; padding-bottom: 56.25%">
  <iframe style="position: absolute; width: 100%; height: 100%; left: 0; right: 0" src="https://www.youtube-nocookie.com/embed/5qK8elKNxKI" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="">
</div>
</iframe></div>]]></description><link>https://tarides.com/blog/2020-12-14-tarides-sponsors-the-oxbridge-women-in-computer-science-conference-2020</link><guid isPermaLink="false">https://tarides.com/blog/2020-12-14-tarides-sponsors-the-oxbridge-women-in-computer-science-conference-2020.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Mon, 14 Dec 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[Building portable user interfaces with Nottui and Lwd]]></title><description><![CDATA[<p>At Tarides, we build many tools and writing UI is usually a tedious task. In this post we will see how to write functional UIs in OCaml using the <code>Nottui</code> &amp; <code>Lwd</code> libraries.</p>
<p>These libraries were developed for <a href="https://github.com/ocurrent/citty">Citty</a>, a frontend to the <a href="https://github.com/ocurrent/ocaml-ci">Continuous Integration service</a> of OCaml Labs.</p>
<div>
  <video controls="" width="100%">
    <source src="/blog/images/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwd/nottui-citty~Ai7feU7jDgbrTvCiWf9KxQ.mp4" type="video/mp4">
    <source src="/blog/images/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwd/nottui-citty~4eK2DxLYpNW_dGaJvua6nw.webm" type="video/webm;codecs=vp9">
  </video>
</div>
<p>In this recording, you can see the lists of repositories, branches and jobs monitored by the CI service, as well as the result of job execution. Most of the logic is asynchronous, with all the contents being received from the network in a non-blocking way.</p>
<p><code>Nottui</code> extends <a href="https://github.com/pqwy/notty">Notty</a>, a library for declaring terminal images, to better suit the needs of UIs. <code>Lwd</code> (Lightweight Document) exposes a simple form of reactive computation (values that evolve over time). It can be thought of as an alternative to the DOM, suitable for building interactive documents.
They are used in tandem: <code>Nottui</code> for rendering the UI and <code>Lwd</code> for making it interactive.</p>
<h2>Nottui = Notty with layout and events</h2>
<p>Notty exposes a nice way to display images in a terminal. A Notty image is matrix of characters with optional styling attributes (tweaking foreground and background colors, using <strong>bold</strong> glyphs...).</p>
<p>These images are pure values and can be composed (concatenated, cropped, ...) very efficiently, making them very convenient to manipulate in a functional way.</p>
<p>However these images are inert: their contents are fixed and their only purpose is to be displayed. Nottui reuses Notty images and exposes essentially the same interface but it adds two features: layout &amp; event dispatch. UI elements now adapt to the space available and can react to keyboard and mouse actions.</p>
<p><strong>Layout DSL</strong>. Specifying a layout is done using "stretchable" dimensions, a concept loosely borrowed from TeX. Each UI element has a fixed size (expressed as a number of columns and rows) and a stretchable size (possibly empty). The stretchable part is interpreted as a strength that is used to determine how to share the space available among all UI elements.</p>
<p>This is a simple system amenable to an efficient implementation while being powerful enough to express common layout patterns.</p>
<p><strong>Event dispatch</strong>. Reacting to mouse and keyboard events is better done using local behaviors, specific to an element. In Nottui, images are augmented with handlers for common actions. There is also a global notion of focus to determine which element should consume input events.</p>
<h2>Interactivity with Lwd</h2>
<p>Nottui's additions are nice for resizing and attaching behaviors to images, but they are still static objects. In practice, user interfaces are very dynamic: parts can be independently updated to display new information.</p>
<p>This interactivity layer is brought by Lwd and is developed separately from the core UI library. It is built around a central type, <code>'a Lwd.t</code>, that represents a value of type <code>'a</code> that can change over time.</p>
<p><code>Lwd.t</code> is an <a href="https://en.wikipedia.org/wiki/Applicative_functor">applicative functor</a> (and even a monad), making it a highly composable abstraction.</p>
<p>Primitive changes are introduced by <code>Lwd.var</code>, which are OCaml references with an extra operation <code>val get : 'a Lwd.var -&gt; 'a Lwd.t</code>. This operation turns a variable into a <em>changing value</em> that changes whenever the variable is set.</p>
<p>In practice this leads to a mostly declarative style of programming interactive documents (as opposed to the DOM that is deeply mutable). Most of the code is just function applications without spooky action at a distance! However, it is possible to opt-out of this pure style by introducing an <code>Lwd.var</code>, on a case-by-case basis.</p>
<h2>And much more...</h2>
<p>A few extra libraries are provided to target more specific problems.</p>
<p><code>Lwd_table</code> and <code>Lwd_seq</code> are two datastructures to manipulate dynamic collections. <code>Nottui_pretty</code> is an interactive pretty printing library that supports arbitrary Nottui layouts and widgets. Finally <code>Tyxml_lwd</code> is a strongly-typed abstraction of the DOM driven by Lwd.</p>
<p>Version 0.1 has just been released on OPAM.</p>
<h2>Getting started!</h2>
<p>Here is a small example to start using the library. First, install the Nottui library:</p>
<pre><code><span class="sh-source">$ opam install nottui
</span></code></pre>
<p>Now we can play in the top-level. We will start with a simple button that counts the number of clicks:</p>
<pre><code><span class="ocaml-keyword-operator">$</span><span class="ocaml-source"> </span><span class="ocaml-source">utop</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">require</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">nottui</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Nottui</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">W</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Nottui_widgets</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> State for holding the number of clicks </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">vcount</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwd</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">var</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Image of the button parametrized by the number of clicks </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">button</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">W</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">button</span><span class="ocaml-source"> ~</span><span class="ocaml-source">attr</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Notty</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">A</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">bg</span><span class="ocaml-source"> </span><span class="ocaml-source">green</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">++</span><span class="ocaml-source"> </span><span class="ocaml-source">fg</span><span class="ocaml-source"> </span><span class="ocaml-source">black</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Printf</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sprintf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Clicked </span><span class="ocaml-constant-character-printf">%d</span><span class="ocaml-string-quoted-double"> times!</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwd</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">set</span><span class="ocaml-source"> </span><span class="ocaml-source">vcount</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Run the UI! </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ui_loop</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Lwd</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-source">button</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Lwd</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">vcount</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<p><strong>Note:</strong> to quit the example, you can press Ctrl-Q or Esc.</p>
<p>We will improve the example and turn it into a mini cookie clicker game.</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Achievements to unlock in the cookie clicker </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">badges</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-numeric-decimal-integer">15</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Cursor</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">50</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Grandma</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">150</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Farm</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">300</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Mine</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> List the achievements unlocked by the player </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">unlocked_ui</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Filter the achievements </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">target</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">text</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">target</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">W</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-constant-character-printf">% 4d</span><span class="ocaml-string-quoted-double">: </span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">target</span><span class="ocaml-source"> </span><span class="ocaml-source">text</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Concatenate the UI elements vertically </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Ui</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">vcat</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">filter_map</span><span class="ocaml-source"> </span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">badges</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Display the next achievement to reach </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">next_ui</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">target</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">target</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">ciybt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">find_opt</span><span class="ocaml-source"> </span><span class="ocaml-source">predicate</span><span class="ocaml-source"> </span><span class="ocaml-source">badges</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">target</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">W</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> ~</span><span class="ocaml-source">attr</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Notty</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">A</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">st</span><span class="ocaml-source"> </span><span class="ocaml-source">bold</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-constant-character-printf">% 4d</span><span class="ocaml-string-quoted-double">: ???</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">target</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ui</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">empty</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Let's make use of the fancy let-operators recently added to OCaml </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwd_infix</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ui</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-keyword">$</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Lwd</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">vcount</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Ui</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">vcat</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">button</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">unlocked_ui</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">next_ui</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source">]</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Launch the game! </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ui_loop</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">run</span><span class="ocaml-source"> </span><span class="ocaml-source">ui</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<div>
  <video controls="">
    <source src="/blog/images/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwd/nottui-cookie-clicker~ClaOCnt6pB5vaujFqYDotg.mp4" type="video/mp4">
    <source src="/blog/images/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwd/nottui-cookie-clicker~fo2U17WecpbuyaDKliltuw.webm" type="video/webm;codecs=vp9">
  </video>
</div>
<p>Et voilà! We hope you enjoy experimenting with <code>Nottui</code> and <code>Lwd</code>. Check out the <a href="https://github.com/let-def/lwd/tree/master/lib/nottui">Nottui page</a> for more examples, and watch our recent presentation of these libraries at the 2020 ML Workshop here:</p>
<div style="position: relative; width: 100%; height: 0; padding-bottom: 56.25%">
  <iframe style="position: absolute; width: 100%; height: 100%; left: 0; right: 0" src="https://www.youtube-nocookie.com/embed/w7jc35kgBZE" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="">
  </iframe>
</div>
]]></description><link>https://tarides.com/blog/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwd</link><guid isPermaLink="false">https://tarides.com/blog/2020-09-24-building-portable-user-interfaces-with-nottui-and-lwd.html</guid><dc:creator><![CDATA[ Frédéric Bour ]]></dc:creator><pubDate>Thu, 24 Sep 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides is now a sponsor of the OCaml Software Foundation]]></title><description><![CDATA[<p>Tarides is pleased to provide support for the <a href="https://ocaml-sf.org">OCaml Software
Foundation</a>, a non-profit foundation hosted by
the Inria Foundation. The OCaml Software Foundation's mission is to
promote the OCaml programming language and its ecosystem by
supporting the growth of a diverse and international community of
OCaml users.</p>
<p>Tarides develops secure-by-design solutions in which OCaml's memory and
type-safety guarantees play a major role. Hence, most of the software
development that is done at Tarides is in OCaml: for instance,
<a href="https://mirage.io">MirageOS</a>, a library operating system that
constructs unikernels for secure, high-performance network
applications; and <a href="https://irmin.org">Irmin</a>, a library for building
mergeable, branchable distributed data stores, with built-in
snapshotting and support for a wide variety of storage backends.</p>
<p>Tarides is also very involved in the OCaml compiler development and
OCaml developer tooling ecosystem: as active maintainers of the <a href="https://www.youtube.com/watch?v=E8T_4zqWmq8&amp;list=PLKO_ZowsIOu5fHjRj0ua7_QWE_L789K_f&amp;ab_channel=ocaml2020">OCaml
platform</a>, Tarides is involved with most of the major
OCaml developer tools, including <a href="https://github.com/ocaml/ocaml">opam</a>, <a href="https://github.com/ocaml/dune">dune</a> and <a href="https://github.com/ocaml/merlin">merlin</a>.</p>
]]></description><link>https://tarides.com/blog/2020-09-17-tarides-is-now-a-sponsor-of-the-ocaml-software-foundation</link><guid isPermaLink="false">https://tarides.com/blog/2020-09-17-tarides-is-now-a-sponsor-of-the-ocaml-software-foundation.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Thu, 17 Sep 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[Irmin: September 2020 update]]></title><description><![CDATA[<p>This post will survey the latest design decisions and performance improvements
made to <code>irmin-pack</code>, the <a href="https://irmin.org/">Irmin</a> storage backend used by
<a href="https://tezos.gitlab.io/">Tezos</a>. Tezos is an open-source blockchain technology,
written in OCaml, which uses many libraries from the MirageOS ecosystem. For
more context on the design of <code>irmin-pack</code> and how it is optimised for the Tezos
use-case, you can check out our <a href="/blog/2020-09-01-introducing-irmin-pack/">previous blog post</a>.</p>
<p>This post showcases the improvements to <code>irmin-pack</code> since its initial
deployment on Tezos:</p>
<ol>
<li><a href="#faster-read-only-store-instances">Faster read-only store instances</a></li>
<li><a href="#better-flushing-for-the-read-write-instance">Improved automatic flushing</a></li>
<li><a href="#faster-serialisation-for-irmintype">Staging generic serialisation operations</a></li>
<li><a href="#more-control-over-indexmerge">More control over <code>Index.merge</code></a></li>
<li><a href="#clearing-stores">Clearing stores</a></li>
</ol>
<h2>Faster read-only store instances</h2>
<p>The Tezos use-case of Irmin requires both <em>read-only</em> and <em>read-write</em>
store handles, with multiple readers and a single writer all accessing the same
Irmin store concurrently. These store handles are held by different processes
(with disjoint memory spaces) so the instances must use files on disk to
synchronise, ensuring that the readers never miss updates from the writer. The
writer instance automatically flushes its internal buffers to disk at regular
intervals, allowing the readers to regularly pick up <code>replace</code> calls.</p>
<p>Until recently, each time a reader looked for a value – be it a commit, a node,
or a blob – it first checked if the writer had flushed new contents to disk. This
ensured that the readers always see the latest changes from the writer. However,
if the writer isn't actively modifying the regions being read, the readers make
one unnecessary system call per <code>find</code>. The higher the rate of reads, the more
time is lost to these synchronisation points. This is particularly problematic
in two use-cases:</p>
<ul>
<li>
<p><strong>Taking snapshots of the store</strong>. Tezos supports <a href="https://tezos.gitlab.io/user/snapshots.html">exporting portable
snapshots</a> of the store data. Since this operation only reads
<em>historic</em> data in the store (traversing backwards from a given block hash),
it's never necessary to synchronise with the writer.</p>
</li>
<li>
<p><strong>Bulk writes</strong>. It's sometimes necessary for the writer to dump lots of new
data to disk at once (for instance, when adding a commit to the history). In
these cases, any readers will repeatedly synchronise with the disk even though
they don't need to do so until the bulk operation is complete. More on this in
the coming months!</p>
</li>
</ul>
<p>To better support these use-cases, we dropped the requirement for readers to
maintain strict consistency with the writer instance. Instead, readers can call
an explicit <code>sync</code> function only when they <em>need</em> to see the latest concurrent
updates from the writer instance.</p>
<p>In our benchmarks, there is a clear speed-up for <code>find</code> operations from readers:</p>
<pre><code>[RO] Find in random order with implicit syncs
        Total time: 67.276527
        Operations per second: 148640.253086
        Mbytes per second: 6.378948
        Read amplification in syscalls: 3.919739
        Read amplification in bytes: 63.943734

[RO] Find in random order with only one call to sync
        Total time: 40.817458
        Operations per second: 244993.208543
        Mbytes per second: 10.513968
        Read amplification in syscalls: 0.919588
        Read amplification in bytes: 63.258072
</code></pre>
<p>Not only it is faster, we can see also that fewer system calls are used in the
<code>Read amplification in syscalls</code> column. The benchmarks consists of reading
10,000,000 entries of 45 bytes each.</p>
<p>Relevant PRs: <a href="https://github.com/mirage/irmin/pull/1008">irmin #1008</a>,
<a href="https://github.com/mirage/index/pull/175">index #175</a>,
<a href="https://github.com/mirage/index/pull/198">index #198</a> and
<a href="https://github.com/mirage/index/pull/203">index #203</a>.</p>
<h2>Better flushing for the read-write instance</h2>
<p>Irmin-pack uses an <a href="https://github.com/mirage/index/">index</a> to speed up <code>find</code>
calls: a <code>pack</code> file is used to store pairs of <code>(key, value)</code> and an <code>index</code>
records the address in pack where a <code>key</code> is stored. A read-write instance has
to write both the <code>index</code> and the <code>pack</code> file, for a read-only instance to find
a value. Moreover, the order in which the data is flushed to disk for the two
files is important: the address for the pair <code>(key, value)</code> cannot be written
before the pair itself. Otherwise the read-only instance can read an address for
a non existing <code>(key, value)</code> pair. But both <code>pack</code> and <code>index</code> have internal
buffers that accumulate data, in order to reduce the number of system calls, and
both decide arbitrarily when to flush those buffers to disk.</p>
<p>We introduce a <code>flush_callback</code> argument in <code>index</code>, which registers a callback
for whenever the index decides to flush. <code>irmin-pack</code> uses this callback to flush
its pack file, resolving the issue of the dangling address.</p>
<p>Relevant PRs: <a href="https://github.com/mirage/index/pull/189">index #189</a>,
<a href="https://github.com/mirage/index/pull/216">index #216</a>,
<a href="https://github.com/mirage/irmin/pull/1051">irmin #1051</a>.</p>
<h2>Faster serialisation for <code>Irmin.Type</code></h2>
<p>Irmin uses a library of <a href="https://ocamllabs.io/iocamljs/generic_programming.html"><em>generic</em></a> operations: functions
that take a runtime representation of a type and derive some operation on that
type. These are used in many places to automatically derive encoders and
decoders for our types, which are then used to move data to and from disk. For
instance:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> [decode t] is the binary decoder of values represented by [t]. </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> Read an integer from a binary-encoded file. </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int_of_file</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">open_in_bin</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">input_line</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Type</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int32</span><span class="ocaml-source">
</span></code></pre>
<p>The generic <code>decode</code> takes a <em>representation</em> of the type <code>int32</code> and uses
this to select the right binary decoder. Unfortunately, we pay the cost of this
runtime specialisation <em>every time</em> we call <code>int_of_file</code>. If we're invoking
the decoder for a particular type very often – such as when serialising store
values – it's more efficient to specialise <code>decode</code> once:</p>
<pre><code><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> Specialised binary decoder for integers. </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">decode_int32</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Type</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int32</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int_of_file_fast</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">open_in_bin</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">input_line</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">decode_int32</span><span class="ocaml-source"> </span><span class="ocaml-source">contents</span><span class="ocaml-source">
</span></code></pre>
<p>The question then becomes: how can we change <code>decode</code> to encourage it to be
used in this more-efficient way? We can add a type wrapper – called <code>staged</code> –
to prevent the user from passing two arguments to <code>decode</code> at once:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Staged</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-source">+'a </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source">   </span><span class="ocaml-entity-name-function-binding">stage</span><span class="ocaml-source"> : 'a   -&gt; 'a t
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">unstage</span><span class="ocaml-source"> : 'a t -&gt; 'a
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Staged</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> [decode t] needs to be explicitly unstaged before being used. </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span></code></pre>
<p>By forcing the user to add a <code>Staged.unstage</code> type coercion when using this
function, we're encouraging them to hoist such operations out of their
hot-loops:</p>
<pre><code><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> The slow implementation no longer type-checks: </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int_of_file</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">open_in_bin</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">input_line</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Type</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int32</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Error: This expression has type (string -&gt; 'a) Staged.t
</span><span class="ocaml-comment-block"> *        but an expression was expected of type string -&gt; 'a </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Instead, we know to pull [Staged.t] values out of hot-loops: </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">decode_int32</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Staged</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">unstage</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Type</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int32</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int_of_file_fast</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">open_in_bin</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">input_line</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">decode_int32</span><span class="ocaml-source"> </span><span class="ocaml-source">contents</span><span class="ocaml-source">
</span></code></pre>
<p>We made similar changes to the performance-critical generic functions in
<a href="https://mirage.github.io/irmin/irmin/Irmin/Type/index.html"><code>Irmin.Type</code></a>, and observed significant performance improvements.
We also added benchmarks for serialising various types.</p>
<div style="text-align: center;">
  <img src="/blog/images/2020-09-08-irmin-september-2020-update/staged-type~oNlS74gjoXd8T31bCOiRlA.svg" style="height: 550px; max-width: 100%" alt="Relative performance of binary codecs in Irmin.Type">
</div>
<p>Relevant PRs: <a href="https://github.com/mirage/irmin/pull/1030">irmin #1030</a> and
<a href="https://github.com/mirage/irmin/pull/1028">irmin #1028</a>.</p>
<p>There are other interesting factors at play, such as altering <code>decode</code> to
increase the efficiency of the specialised decoders; we leave this for a future
blog post.</p>
<h2>More control over <code>Index.merge</code></h2>
<p>index regularly does a maintenance operation, called <code>merge</code>, to ensure fast
look-ups while having a small memory imprint. This operation is concurrent with
the most of other functions, it is however not concurrent with itself: a second
merge needs to wait for a previous one to finish. When writing big chunks of
data very often, <code>merge</code> operations become blocking. To help measuring and
detecting a blocking <code>merge</code>, we added in the <code>index</code> API calls to check whether
a merge is ongoing, and to time it.</p>
<p>We mentioned that <code>merge</code> is concurrent with most of the other function in
<code>index</code>. One notable exception was <code>close</code>, which had to wait for any ongoing
<code>merge</code> to finish, before closing the index. Now <code>close</code> interrupts an ongoing
merge, but still leaves the index in a clean state.</p>
<p>Relevant PRs: <a href="https://github.com/mirage/index/pull/185">index #185</a>,
<a href="https://github.com/mirage/irmin/pull/1049">irmin #1049</a> and
<a href="https://github.com/mirage/index/pull/215">index #215</a>.</p>
<h2>Clearing stores</h2>
<p>Another feature we recently added is the possibility to <code>clear</code> the store. It is
implemented by removing the old files on disk and opening fresh ones. However
in <code>irmin-pack</code>, the read-only instance has to detect that a clear occurred. To
do this, we add a <code>generation</code> in the header of the files used by an
<code>irmin-pack</code> store, which is increased by the clear operation. A generation
change signals to the read-only instance that it needs to close the file and
open it again, to be able to read the latest values.</p>
<p>As the header of the files on disk changed with the addition of the clear
operation, the <code>irmin-pack</code> stores created previous to this change are no longer
supported. We added a migration function for stores created with the previous
version (version 1) to the new version (version 2) of the store. You can call
this migration function as follows:</p>
<pre><code><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">open_store</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Store</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Repo</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-source">config</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">catch</span><span class="ocaml-source"> </span><span class="ocaml-source">open_store</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_pack</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Unsupported_version</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`V1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-constant-language-capital-identifier">Logs</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">app</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">migrating store to version 2</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-constant-language-capital-identifier">Store</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">migrate</span><span class="ocaml-source"> </span><span class="ocaml-source">config</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-constant-language-capital-identifier">Logs</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">app</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">migration ended, opening store</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-source">open_store</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">exn</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-constant-language-capital-identifier">Lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">fail</span><span class="ocaml-source"> </span><span class="ocaml-source">exn</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Relevant PRs: <a href="https://github.com/mirage/index/pull/211">index #211</a>,
<a href="https://github.com/mirage/irmin/pull/1047">irmin #1047</a>,
<a href="https://github.com/mirage/irmin/pull/1070">irmin #1070</a> and
<a href="https://github.com/mirage/irmin/pull/1071">irmin #1071</a>.</p>
<h2>Conclusion</h2>
<p>We hope you've enjoyed this discussion of our recent work. <a href="https://bsky.app/profile/tarides.com">Stay
tuned</a> for our next Tezos / MirageOS development update! Thanks
to our commercial customers, users and open-source contributors for making this
work possible.</p>
]]></description><link>https://tarides.com/blog/2020-09-08-irmin-september-2020-update</link><guid isPermaLink="false">https://tarides.com/blog/2020-09-08-irmin-september-2020-update.html</guid><dc:creator><![CDATA[ Irmin Team ]]></dc:creator><pubDate>Tue, 08 Sep 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing irmin-pack]]></title><description><![CDATA[<p><code>irmin-pack</code> is an Irmin <a href="https://irmin.org/tutorial/backend">storage backend</a>
that we developed over the last year specifically to meet the
<a href="https://tezos.gitlab.io/">Tezos</a> use-case. Tezos nodes were initially using an
LMDB-based backend for their storage, which after only a year of activity led to
<code>250 GB</code> disk space usage, with a monthly growth of <code>25 GB</code>. Our goal was to
dramatically reduce this disk space usage.</p>
<p>Part of the <a href="/blog/2019-11-21-irmin-v2/">Irmin.2.0.0 release</a>
and still under active development, it has been successfully integrated as the
storage layer of Tezos nodes and has been running in production for the last ten
months with great results. It reduces disk usage by a factor of 10, while still
ensuring similar performance and consistency guarantees in a memory-constrained
and concurrent environment.</p>
<p><code>irmin-pack</code> was presented along with Irmin v2 at the OCaml workshop 2020; you
can watch the presentation here:</p>
<div style="position: relative; width: 100%; height: 0; padding-bottom: 56.25%">
  <iframe style="position: absolute; width: 100%; height: 100%; left: 0; right: 0" src="https://www.youtube-nocookie.com/embed/v1lfMUM332w" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="">
  </iframe>
</div>
<h2>General structure</h2>
<p><code>irmin-pack</code> exposes functors that allow the user to provide arbitrary low-level
modules for handling I/O, and provides a fast key-value store interface composed
of three components:</p>
<ul>
<li>The <code>pack</code> is used to store the data contained in the Irmin store, as blobs.</li>
<li>The <code>dict</code> stores the paths where these blobs should live.</li>
<li>The <code>index</code> keeps track of the blobs that are present in the repository by
containing location information in the <code>pack</code>.</li>
</ul>
<p>Each of these use both on-disk storage for persistence and concurrence and
various in-memory caches for speed.</p>
<h3>Storing the data in the <code>pack</code> file</h3>
<p>The <code>pack</code> contains most of the data stored in this Irmin backend. It is an
append-only file containing the serialized data stored in the Irmin repository.
All three Irmin stores (see our <a href="https://irmin.org/tutorial/architecture">architecture
page</a> in the tutorial to learn more)
are contained in this single file.</p>
<p><code>Content</code> and <code>Commit</code> serialization is straightforward through
<a href="https://docs.mirage.io/irmin/Irmin/Type/index.html"><code>Irmin.Type</code></a>. They are written along with their length (to allow
correct reading) and hash (to enable integrity checks). The hash is used to
resolve internal links inside the pack when nodes are written.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2020-09-01.introducing-irmin-pack/pack-170w~h8CszKfFP-WWOzfnqKDl2Q.webp 170w, /blog/images/2020-09-01.introducing-irmin-pack/pack-340w~t7sLai8BSwcvE8C5ptyH8Q.webp 340w, /blog/images/2020-09-01.introducing-irmin-pack/pack-680w~V8id6e2_34-waz1OgySPzQ.webp 680w, /blog/images/2020-09-01.introducing-irmin-pack/pack-1360w~pb7pWxbe6fV3XTU7kFKq_A.webp 1360w" src="/blog/images/2020-09-01.introducing-irmin-pack/pack-1360w~pb7pWxbe6fV3XTU7kFKq_A.webp" alt="The pack file"></p>
<h4>Optimizing large nodes</h4>
<p>Serializing nodes is not as simple as contents. In fact, nodes might contain an
arbitrarily large number of children, and serializing them as a long list of
references might harm performance, as that means loading and writing a large
amount of data for each modification, no matter how small this modification
might be. Similarly, browsing the tree means reading large blocks of data, even
though only one child is needed.</p>
<p>For this reason, we implemented a <a href="https://en.wikipedia.org/wiki/Radix_tree">Patricia Tree</a> representation of
internal nodes that allows us to split the child list into smaller parts that
can be accessed and modified independently, while still being quickly available
when needed. This reduces duplication of tree data in the Irmin store and
improves disk access times.</p>
<p>Of course, we provide a custom hashing mechanism, so that hashing the nodes
using this partitioning is still backwards-compatible for users who rely on hash
information regardless of whether the node is split or not.</p>
<h4>Optimizing internal references</h4>
<p>In the Git model, all data are content-addressable (i.e. data are always
referenced by their hash). This naturally lends to indexing data by hashes on
the disk itself (i.e. the links from <code>commits</code> to <code>nodes</code> and from <code>nodes</code> to
<code>nodes</code> or <code>contents</code> are realized by hash).</p>
<p>We did not comply to this approach in <code>irmin-pack</code>, for at least two reasons:</p>
<ul>
<li>
<p>Referencing by hash does not allow fast recovery of the children, since
there is no way to find the relevant blob directly in the <code>pack</code> by providing
the hash. We will go into the details of this later in this post.</p>
</li>
<li>
<p>While hashes are being used as simple objects, their size is not negligible.
The default hashing function in Irmin is BLAKE2B, which provides 32-byte
digests.</p>
</li>
</ul>
<p>Instead, our internal links in the <code>pack</code> file are concretized by the offsets –
<code>int64</code> integers – of the children instead of their hash. Provided that the
trees are always written bottom-up (so that children already exist in the <code>pack</code>
when their parents are written), this solves both issues above. The data handled
by the backend is always immutable, and the file is append-only, ensuring that
the links can never be broken.</p>
<p>Of course, that encoding does not break the content-addressable property: one
can always retrieve an arbitrary piece of data through its hash, but it allows
internal links to avoid that indirection.</p>
<h3>Deduplicating the path names through the <code>dict</code></h3>
<p>In fact, the most common operations when using <code>irmin-pack</code> consist of modifying
the tree's leaves rather then its shape. This is similar to the way most of us
use Git: modifying the contents of files is very frequent, while renaming or
adding new files is rather rare. Even still, when writing a <code>node</code> in a new
commit, that node must contain the path names of its children, which end up
being duplicated a large number of times.</p>
<p>The <code>dict</code> is used for deduplication of path names so that the <code>pack</code> file can
uniquely reference them using shorter identifiers. It is composed of an
in-memory bidirectional hash table, allowing to query from path to identifier
when serializing and referencing, and from identifier to path when deserializing
and dereferencing.</p>
<p>To ensure persistence of the data across multiple runs and in case of crashes,
the small size of the <code>dict</code> – less than <code>15 Mb</code> in the Tezos use-case – allows
us to write the bindings to a write-only, append-only file that is fully read
and loaded on start-up.</p>
<p>We guarantee that the <code>dict</code> memory usage is bounded by providing a <code>capacity</code>
parameter. Adding a binding is guarded by this capacity, and will be inlined in
the <code>pack</code> file in case this limit has been reached. This scenario does not
happen during normal use of <code>irmin-pack</code>, but prevents attacks that would make
the memory grow in an unbounded way.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2020-09-01.introducing-irmin-pack/dict-170w~virqmhdHSnnub2iAZlCawg.webp 170w, /blog/images/2020-09-01.introducing-irmin-pack/dict-340w~2EFe9wrX28wPdV93fK-sXA.webp 340w, /blog/images/2020-09-01.introducing-irmin-pack/dict-680w~uqnde1t6zci1Lk2F2DSHSA.webp 680w, /blog/images/2020-09-01.introducing-irmin-pack/dict-1360w~ae2wTXdcTNLVRE_OpjlGFw.webp 1360w" src="/blog/images/2020-09-01.introducing-irmin-pack/dict-1360w~ae2wTXdcTNLVRE_OpjlGFw.webp" alt="The dict"></p>
<h3>Retrieve the data in the <code>pack</code> by indexing</h3>
<p>Since the <code>pack</code> file is append-only, naively reading its data would require a
linear search through the whole file for each lookup. Instead, we provide an
index that maps hashes of data blocks to their location in the <code>pack</code> file,
along with their length. This module allows quick recovery of the values queried
by hash.</p>
<p>It provides a simple key-value interface, that actually hides the most complex
part of <code>irmin-pack</code>.</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">readonly</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">find</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Value</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">replace</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Value</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ... </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span></code></pre>
<p>It has lead most of our efforts in the development of <code>irmin-pack</code> and is now
available as a separate library, wisely named <code>index,</code> that you can checkout on
GitHub under <a href="https://github.com/mirage/index/">mirage/index</a> and via <code>opam</code> as
the <code>index</code> and <code>index-unix</code> packages.</p>
<p>When <code>index</code> is used inside <code>irmin-pack</code>, the keys are the hashes of the data
stored in the backend, and the values are the <code>(offset, length)</code> pair that
indicates the location in the <code>pack</code> file. From now on in this post, we will
stick to the <code>index</code> abstraction: <code>key</code> and <code>value</code> will refer to the keys and
values as viewed by the <code>index</code>.</p>
<p>Our index is split into two major parts. The <code>log</code> is relatively small, and most
importantly, bounded; it contains the recently-added bindings. The <code>data</code> is
much larger, and contains older bindings.</p>
<p>The <code>log</code> part consists of a hash table associating keys to values. In order to
ensure concurrent access, and to be able to recover on a crash, we also maintain
a write-only, append-only file with the same contents, such that both always
contain exactly the same data at any time.</p>
<p>When a new key-value binding is added index, the value is simply serialized
along with its key and added to the <code>log</code>.</p>
<p>An obvious caveat of this approach is that the in-memory representation of the
<code>log</code> (the hashtable) is unbounded. It also grows a lot, as the Tezos node
stores more that 400 million objects. Our memory constraint obviously does not
allow such unbounded structures. This is where the <code>data</code> part comes in.</p>
<p>When the <code>log</code> size reaches a – customizable – threshold, its bindings are
flushed into a <code>data</code> component, that may already contain flushed data from
former <code>log</code> overloads. We call this operation a <em>merge</em>.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2020-09-01.introducing-irmin-pack/merges-170w~cw_iiaT-t5ruYQwVRI6vUw.webp 170w, /blog/images/2020-09-01.introducing-irmin-pack/merges-340w~U_n90Ku2OaSXRxOmA_2uPQ.webp 340w, /blog/images/2020-09-01.introducing-irmin-pack/merges-680w~tNhf0bkxxQ39MWIYxLSylg.webp 680w, /blog/images/2020-09-01.introducing-irmin-pack/merges-1360w~DVz8ugsL3yAS6uTOigXwrg.webp 1360w" src="/blog/images/2020-09-01.introducing-irmin-pack/merges-1360w~DVz8ugsL3yAS6uTOigXwrg.webp" alt="Merging the index"></p>
<p>The important invariant maintained by the <code>merge</code> operation is that the <code>data</code>
file must remain sorted by the hash of the bindings. This will enable a fast
recovery of the data.</p>
<p>During this operation, both the <code>log</code> and the former <code>data</code> are read in sorted
order – <code>data</code> is already sorted, and <code>log</code> is small thus easy to sort in
memory – and merged into a <code>merging_data</code> file. This file is atomically renamed
at the end of the operation to replace the older <code>data</code> while still ensuring
correct concurrent accesses.</p>
<p>This operation obviously needs to re-write the whole index, so its execution
is very expensive. For this reason, it is performed by a separate thread in the
background to still allow regular use of the index and be transparent to the
user.</p>
<p>In the meantime, a <code>log_async</code> – similar to <code>log</code>, with a file and a hash table
– is used to hold new bindings and ensure the data being merged and the new data
are correctly separated. At the end of the merge, the <code>log_async</code> becomes the
new <code>log</code> and is cleared to be ready for the next merge.</p>
<h4>Recovering the data</h4>
<p>This design allows us a fast lookup of the data present in the index. Whenever
<code>find</code> or <code>mem</code> is called, we first look into the <code>log</code>, which is simply a call
to the corresponding <code>Hashtbl</code> function, since this data is contained in memory.
If the data is not found in the <code>log</code>, the <code>data</code> file will be browsed. This
means access to recent values is generally faster, because it does not require
any access to the disk.</p>
<p>Searching in the <code>data</code> file is made efficient by the invariant that we kept
during the <code>merge</code>: the file is sorted by hash. The search algorithm consists in
an interpolation search, which is permitted by the even distribution of the
hashes that we store. The theoretical complexity of the interpolation search is
<code>O(log (log n))</code>, which is generally better than a binary search, provided that
the computation of the interpolant is cheaper than reads, which is the case
here.</p>
<p>This approach allows us to find the data using approximately 5-6 reading steps
in the file, which is good, but still a source of slowdowns. For this reason, we
use a fan-out module on top of the interpolation search, able to tell us the
exact page in which a given key is located, in constant time, for an additional
space cost of <code>~100 Mb</code>. We use this to find the correct page of the disk, then
run the interpolation search in that page only. That approach allows us to find
the correct value with a single read in the <code>data</code> file.</p>
<h2>Conclusion</h2>
<p>This new backend is now used byt the Tezos nodes in production, and manages to
reduce the storage size from <code>250 Gb</code> down to <code>25 Gb</code>, with a monthly growth
rate of <code>2 Gb</code> , achieving a tenfold reduction.</p>
<p>In the meantime, it provides and single writer, multiple readers access pattern
that enables bakers and clients to connect to the same storage while it is operated.</p>
<p>On the memory side, all our components are memory bounded, and the bound is
generally customizable, the largest source of memory usage being the <code>log</code> part
of the <code>index</code>. While it can be reduced to fit in <code>1 Gb</code> of memory and run on
small VPS or Raspberry Pi, one can easily set a higher memory limit on a more
powerful machine, and achieve even better time performance.</p>
]]></description><link>https://tarides.com/blog/2020-09-01-introducing-irmin-pack</link><guid isPermaLink="false">https://tarides.com/blog/2020-09-01-introducing-irmin-pack.html</guid><dc:creator><![CDATA[ Clément Pascutto ]]></dc:creator><pubDate>Tue, 01 Sep 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[Fuzzing OCamlFormat with AFL and Crowbar]]></title><description><![CDATA[<p><a href="https://lcamtuf.coredump.cx/afl/">AFL</a> (and fuzzing in general) is often used
to find bugs in low-level code like parsers, but it also works very well to find
bugs in high level code, provided the right ingredients. We applied this
technique to feed random programs to OCamlFormat and found many formatting bugs.</p>
<p>OCamlFormat is a tool to format source code. To do so, it parses the source code
to an Abstract Syntax Tree (AST) and then applies formatting rules to the AST.</p>
<p>It can be tricky to correctly format the output. For example, say we want to
format <code>(a+b)*c</code>. The corresponding AST will look like <code>Apply("*", Apply ("+", Var "a", Var "b"), Var "c")</code>. A naive formatter would look like this:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">format</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Var</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Apply</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">op</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">e1</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">e2</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">Printf</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sprintf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double"> </span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double"> </span><span class="ocaml-constant-character-printf">%s</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">format</span><span class="ocaml-source"> </span><span class="ocaml-source">e1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">op</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">format</span><span class="ocaml-source"> </span><span class="ocaml-source">e2</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>But this is not correct, as it will print <code>(a+b)*c</code> as <code>a+b*c</code>, which is a
different program. In this particular case, the common solution would be to
track the relative precedence of the expressions and to emit only necessary
parentheses.</p>
<p>OCamlFormat has similar cases. To make sure we do not change a program when
formatting it, there is an extra check at the end to parse the output and
compare the output AST with the input AST. This ensures that, in case of bugs,
OCamlFormat exits with an error rather than changing the meaning of the input
program.</p>
<p>When we consider the whole OCaml language, the rules are complex and it is
difficult to make sure that we are correctly handling all programs. There are
two main failure modes: either we put too many parentheses, and the program does
not look good, or we do not put enough, and the AST changes (and OCamlFormat
exits with an error). We need a way to make sure that the latter does not
happen. Tests work to some extent, but some edge cases happen only when a
certain combination of language features is used. Because of this combinatorial
explosion, it is impossible to get good coverage using tests only.</p>
<p>Fortunately there is a technique we can use to automatically explore the program
space: fuzzing. For a primer on using this technique on OCaml programs, one can
refer to <a href="/blog/2019-09-04-an-introduction-to-fuzzing-ocaml-with-afl-crowbar-and-bun/">this article</a>.</p>
<p>To make this work we need two elements: a random program generator, and a
property to check. Here, we are interested in programs that are valid (in the
sense that they parse correctly) but do not format correctly. We can use the
OCamlFormat internals to do the following:</p>
<ol>
<li>try to parse input: in case of a parse error, just reject this input as
invalid.</li>
<li>otherwise, with have a valid program. try to format it. If this happens with
no error at all, reject this input as well.</li>
<li>otherwise, it means that the AST changed, comments moved, or something
similar, in a valid program. This is what we are after.</li>
</ol>
<p>Generating random programs is a bit more difficult. We can feed random strings
to AFL, but even with a corpus of existing valid code it will generate many
invalid programs. We are not interested in these for this project, we would
rather start from valid programs.</p>
<p>A good way to do that is to use Crowbar to directly generate AST values. Thanks
to <a href="https://github.com/yomimono/ppx_deriving_crowbar"><code>ppx_deriving_crowbar</code></a> and <a href="https://github.com/ocaml-ppx/ppx_import"><code>ppx_import</code></a>
it is possible to generate random values for an external type like
<code>Parsetree.structure</code> (the contents of <code>.ml</code> files). Even more fortunately
<a href="https://github.com/yomimono/ocaml-test-omp/blob/d086037027537ba4e23ce027766187979c85aa3d/test/parsetree_405.ml">somebody already did the work</a>. Thanks, Mindy!</p>
<p>This approach works really well: it generates 5k-10k programs per second, which
is very good performance (AFL starts complaining below 100/s).</p>
<p>Quickly, AFL was able to find crashes related to attributes. These are "labels"
attached to various nodes of the AST. For example the expression <code>(x || y) [@a]</code>
(logical or between <code>x</code> and <code>y</code>, attach attribute <code>a</code> to the "or" expression)
would get formatted as <code>x || y [@a]</code> (attribute <code>a</code> is attached to the <code>y</code>
variable). Once again, there is a check in place in OCamlFormat to make sure
that it does not save the file in this case, but it would exit with an error.</p>
<p>After the fuzzer has run for a bit longer, it found crashes where comments would
jump around in expressions like <code>f (*a*) (*bb*) x</code>. Wait, what? We never told
the program generator how to generate comments. Inspecting the intermediate AST,
the part in the middle is actually an integer literal with value <code>"(*a*) (*bb*)"</code> (integer literals are represented as strings so that <a href="https://github.com/Drup/Zarith-ppx">a third party
library could add literals for arbitrary precision numbers</a> for
example).</p>
<p>AFL comes with a program called <code>afl-tmin</code> that is used to minimize a crash. It
will try to find a smaller example of a program that crashes OCamlFormat. It
works well even with Crowbar in between. For example it is able to turn <code>(new aaaaaa &amp; [0;0;0;0])[@aaaaaaaaaa]</code> into <code>(0&amp;0)[@a]</code> (neither AFL nor OCamlFormat
knows about types, so they can operate on nonsensical programs. Finding a
well-typed version of a crash is usually not very difficult, but it has to be
done manually).</p>
<p>In total, letting AFL run overnight on a single core (that is relatively short
in terms of fuzzing) caused 453 crashes. After minimization and deduplication,
this corresponded to <a href="https://github.com/ocaml-ppx/ocamlformat/issues?q=label%3Afuzz">about 30 unique issues</a>.</p>
<p>Most of them are related to attributes that OCamlFormat did not try to include
in the output, or where it forgot to add parentheses. Fortunately, there are
safeguards in OCamlFormat: since it checks that the formatting preserves the AST
structure, it will exit with an error instead of outputting a different program.</p>
<p>Once again, fuzzing has proved itself as a powerful technique to find actual
bugs (including high-level ones). A possible approach for a next iteration is to
try to detect more problems during formatting, such as finding cases where lines
are longer than allowed. It is also possible to extend the random program
generator so that it tries to generate comments, and let OCamlFormat check that
they are all laid out correctly in the output. We look forward to employing
fuzzing more extensively for OCamlFormat development in future.</p>
]]></description><link>https://tarides.com/blog/2020-08-03-fuzzing-ocamlformat-with-afl-and-crowbar</link><guid isPermaLink="false">https://tarides.com/blog/2020-08-03-fuzzing-ocamlformat-with-afl-and-crowbar.html</guid><dc:creator><![CDATA[ Etienne Millon ]]></dc:creator><pubDate>Mon, 03 Aug 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[The future of Tezos on MirageOS]]></title><description><![CDATA[<p>We are very glad to announce that Tarides has been awarded two new grants from
the Tezos Foundation.</p>
<p>Thanks to these new grants, Tarides will continue to work on the integration
between Tezos and MirageOS. We believe that the secure deployment of blockchains
is still a major challenge today, and that deploying Tezos as a unikernel will
have a big impact in term of safety and security. It will be a key
differentiator that will separate Tezos from other blockchains.</p>
<p>The Tezos codebase is written in OCaml and is currently using more than 100
external packages, among which one third comes from the MirageOS project.
However, it still heavily depends on non-compatible Unix libraries. Making the
Tezos codebase fully compatible with MirageOS will help Tezos with: distribution
and packaging, portability, secure deployment and operational safety.</p>
<p>We’ll regularly publish development progress updates, so stay tuned!</p>
]]></description><link>https://tarides.com/blog/2020-04-20-the-future-of-tezos-on-mirageos</link><guid isPermaLink="false">https://tarides.com/blog/2020-04-20-the-future-of-tezos-on-mirageos.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Mon, 20 Apr 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[Tarides wins the FIC 2020 startup award]]></title><description><![CDATA[<p>We are very excited to announce that Tarides has <a href="https://www.forum-fic.com/en/home/price/the-fic-start-up-award.htm">won an
award</a>
from the International Cybersecurity Forum (FIC 2020).</p>
<p>Organized every year in Lille (France), the International
Cybersecurity Forum has become the leading European event on
cybersecurity and digital trust. Its main goal is to foster reflection
and exchanges within the European cybersecurity ecosystem.</p>
<p>We are very happy to have won the "Coup de Coeur" Prize, which will
bring great visibility to our technological innovations. It is also an
opportunity for us to meet experts in the cybersecurity sector and to
consider additional use-cases for our work. We would like to thank the
<a href="https://ceis.eu/">CEIS</a> for organising the event and the members of
the jury for commending <a href="https://mirage.io">MirageOS</a> and
<a href="/blog/2019-07-05-i-lab-2019/">OSMOSE</a>.</p>
<p>The next FIC will be held on the 28th, 29th and 30th of January
2020 in Lille. For more details about the FIC and to register, visit
<a href="https://www.forum-fic.com/en/home.htm">their website</a>.</p>
<br>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/FIC2020-170w~YqPYbTkVvkD0Idtj697OBA.webp 170w, /blog/images/FIC2020-340w~tRyyJQxUoPx2ehUKf_F__w.webp 340w, /blog/images/FIC2020-680w~64XLirtRvCNHLgh3lo1pbA.webp 680w, /blog/images/FIC2020-1360w~gKrdJ1qCjJMlViSKgQT-Fw.webp 1360w" src="/blog/images/FIC2020-1360w~gKrdJ1qCjJMlViSKgQT-Fw.webp" alt="FIC2020 Startup Award Winners"></p>
]]></description><link>https://tarides.com/blog/2019-12-11-tarides-wins-the-fic-2020-startup-award</link><guid isPermaLink="false">https://tarides.com/blog/2019-12-11-tarides-wins-the-fic-2020-startup-award.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Wed, 11 Dec 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS talk at the Paris Open Source Summit]]></title><description><![CDATA[<p>We are thrilled to have been selected by the <a href="https://www.opensourcesummit.paris">Paris Open Source Summit</a>
committee to talk about “Secure-by-design IoT applications using MirageOS”.</p>
<p>The Paris Open Source Summit is an annual event where you can connect to
open-source communities and learn from tech leaders, project committers and
CTOs about the latest technical solutions, innovative uses and societal
challenges of open digital technology.</p>
<p>Thomas Gazagnaire, Tarides CEO/CTO, will explain what makes MirageOS a good
framework to build IoT applications and how we can use embedded devices running
on ARMv8, ESP32 or RISC-V to run secure and end-to-end open-source
infrastructure services such as VPN proxies and email servers. He will also
highlight how this infrastructure will be used to form the basis of OSMOSE: a
secure, distributed and privacy-preserving platform to write user-centric IoT
applications.</p>
<p>MirageOS is a library operating system (using the MIT license) which enables the
construction of unikernels: specialized services where the runtime binary
contains only the necessary code for execution and no more. Unikernels have a
drastically smaller attack surface than service deployments in traditional
operating systems and could lead to 1000x less code for the full application
stack. Moreover, as MirageOS is written in a memory safe language (OCaml), a
full class of bugs related to memory corruption – representing <a href="https://msrc-blog.microsoft.com/2019/07/16/a-proactive-approach-to-more-secure-code/">70% of the
released CVEs</a> in classic operating systems written in C –
can no longer appear. These two properties combined (and more!) allow MirageOS
to build “secure-by-design” applications where everything – from the high-level
business logic to the low-level device drivers – has been designed to be as
secure as possible.</p>
<p>To learn more about the project, attend the Paris Open Source Summit! The talk
will take place during the '<a href="https://www.opensourcesummit.paris/EMBEDDED+%26+IOT_168_5745.html">Embedded &amp; IOT</a>' section
at 14:50 – 15:20 on December 10th, 2019.</p>
<br>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/poss_2019-170w~rPUTk87ZXeXcHrJ4PuT7wQ.webp 170w, /blog/images/poss_2019-340w~4yLZdlN0-sO7O3OSUs2oyg.webp 340w, /blog/images/poss_2019-680w~eSLFxuuIbP4lQLKcSlXrsA.webp 680w, /blog/images/poss_2019-1360w~fzLe5OhlNUuRE3NQYtzHcQ.webp 1360w" src="/blog/images/poss_2019-1360w~fzLe5OhlNUuRE3NQYtzHcQ.webp" alt="Thomas at the Paris Open Source Summit"></p>
]]></description><link>https://tarides.com/blog/2019-12-04-mirageos-talk-at-the-paris-open-source-summit</link><guid isPermaLink="false">https://tarides.com/blog/2019-12-04-mirageos-talk-at-the-paris-open-source-summit.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Wed, 04 Dec 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Introducing the GraphQL API for Irmin 2.0]]></title><description><![CDATA[<p>With the release of Irmin 2.0.0, we are happy to announce a new package - <code>irmin-graphql</code>, which can be used to serve data from Irmin over HTTP. This blog post will give you some examples to help you get started, there is also <a href="https://irmin.org/tutorial/graphql">a section in the <code>irmin-tutorial</code></a> with similar information. To avoid writing the same thing twice, this post will cover the basics of getting started, plus a few interesting ideas for queries.</p>
<p>Getting the <code>irmin-graphql</code> server running from the command-line is easy:</p>
<pre><code><span class="shell-source">$ irmin graphql --root=/tmp/irmin
</span></code></pre>
<p>where <code>/tmp/irmin</code> is the actual path to your repository. This will start the server on <code>localhost:8080</code>, but it's possible to customize this using the <code>--address</code> and <code>--port</code> flags.</p>
<p>The new GraphQL API has been added to address some of the shortcomings that have been identified with the old HTTP API, as well as enable a number of new features and capabilities.</p>
<h2>GraphQL</h2>
<p><a href="https://graphql.org/">GraphQL</a> is a query language for exposing data as a graph via an API, typically using HTTP as a transport. The centerpiece of a GraphQL API is the <em>schema</em>, which describes the graph in terms of types and relationships between these types. The schema is accessible by the consumer, and acts as a contract between the API and the consumer, by clearly defining all API operations and fully assigning types to all interactions.</p>
<p>Viewing Irmin data as a graph turns out to be a natural and useful model. Concepts such as branches and commits fit in nicely, and the stored application data is organized as a tree. Such highly hierarchical data can be challenging to interact with using REST, but is easy to represent and navigate with GraphQL.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2019-11-27.irmin-graphql/git-data-model-170w~Y6yJErnSXrqZ2MqhMEvM7Q.webp 170w, /blog/images/2019-11-27.irmin-graphql/git-data-model-340w~1t7CykqqysCpzD-u5Et2sg.webp 340w, /blog/images/2019-11-27.irmin-graphql/git-data-model-680w~F1Cymeopub1w8TjURN25cw.webp 680w, /blog/images/2019-11-27.irmin-graphql/git-data-model-1360w~jO7qDWYDECRq18vd5ObQaA.webp 1360w" src="/blog/images/2019-11-27.irmin-graphql/git-data-model-1360w~jO7qDWYDECRq18vd5ObQaA.webp" alt="Git data model">
(image from <a href="https://git-scm.com/book/en/v2/Git-Internals-Git-Objects">Pro Git</a>)</p>
<p>As a consumer of an API, one of the biggest initial challenges is understanding what operations are exposed and how to use them. Conversely, as a developer of an API, keeping documentation up-to-date is challenging and time consuming. Though no substitute for more free-form documentation, a GraphQL schema provides an excellent base line for understanding a GraphQL API that is guaranteed to be accurate and up-to-date. This issue is definitely true of the old Irmin HTTP API, which was hard to approach for newcomers due to lack of documentation.</p>
<p>Being able to inspect the schema of a GraphQL API enables powerful tooling. A great example of this is <a href="https://github.com/graphql/graphiql">GraphiQL</a>, which is a browser-based IDE for GraphQL queries. GraphiQL can serve both as an interactive API explorer and query designer with intelligent autocompletion, formatting and more.</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/2019-11-27.irmin-graphql/graphiql-170w~WblRl7_eWRmrdn3GOn9n0A.webp 170w, /blog/images/2019-11-27.irmin-graphql/graphiql-340w~bisWpN1HrEp_Aqy9U8guZw.webp 340w, /blog/images/2019-11-27.irmin-graphql/graphiql-680w~pdUw0EHJBbsguVbQGBCbJA.webp 680w, /blog/images/2019-11-27.irmin-graphql/graphiql-1360w~BvZnlRfmhBHBhJ-wi0xeMg.webp 1360w" src="/blog/images/2019-11-27.irmin-graphql/graphiql-1360w~BvZnlRfmhBHBhJ-wi0xeMg.webp" alt="GraphiQL"></p>
<p>The combination of introspection and a strongly typed schema also allows creating smart clients using code generation. This is already a quite wide-spread idea with <a href="https://www.apollographql.com/docs/ios">Apollo for iOS</a>, <a href="https://github.com/apollographql/apollo-android">Apollo for Android</a> or <a href="https://github.com/mhallin/graphql_ppx"><code>graphql_ppx</code></a> for OCaml/Reason. Though generic GraphQL client libraries will do a fine job interacting with the Irmin GraphQL API, these highlighted libraries will offer excellent ergonomics and type-safety out of the box.</p>
<p>One of the problems that GraphQL set out to solve is that of over- and underfetching. When designing REST API response payloads, there is always a tension between including too little data, which will require clients to make more network requests, and including too much data, which wastes resources for both client and server (serialization, network transfer, deserialization, etc).<br>
The existing low-level Irmin HTTP API is a perfect example of this. Fetching the contents of a particular file on the master branch requires at least 4 HTTP requests (fetch the branch, fetch the commit, fetch the tree, fetch the blob), i.e. massive underfetching. By comparison, this is something easily solved with a single request to the new GraphQL API. More generally, the GraphQL API allows you to fetch <em>exactly</em> the data you need in a single request without making one-off endpoints.</p>
<p>For the curious, here's the GraphQL query to fetch the contents of <code>README.md</code> from the branch <code>master</code>:</p>
<pre><code class="language-graphql">query {
  master {
    tree {
      get(key: "README.md")
    }
  }
}
</code></pre>
<p>The response will look something like this:</p>
<pre><code class="language-json">{
  "data": {
    "master": {
      "tree": {
        "get": "The contents of README.md"
      }
    }
  }
}
</code></pre>
<p>The GraphQL API is not limited to only reading data, you can also write data to your Irmin store. Here's a simple example that will set the key <code>README.md</code> to <code>"foo"</code>, and return the hash of that commit:</p>
<pre><code class="language-graphql">mutation {
  set(key: "README.md", value: "foo") {
    hash
  }
}
</code></pre>
<p>By default, GraphQL allows you to do multiple operations in a single query, so you get bulk operations for free. Here's a more complex example that modifies two different branches, <code>branch-a</code> and <code>branch-b</code>, and then merges <code>branch-b</code> into <code>branch-a</code> <em>all in a single query</em>:</p>
<pre><code class="language-graphql">mutation {
  branch_a: set(branch: "branch-a", key: "foo", value: "bar") {
    hash
  }

  branch_b: set(branch: "branch-a", key: "baz", value: "qux") {
    hash
  }

  merge_with_branch(branch: "branch-b", from: "branch-a") {
    hash
    tree {
      list_contents_recursively {
        key
        value
      }
    }
  }
}
</code></pre>
<p>Here's what the response might look like:</p>
<pre><code class="language-json">{
  "data": {
    "branch_a": {
      "hash": "0a1313ae9dfe1d4339aee946dd76b383e02949b6"
    },
    "branch_b": {
      "hash": "28855c277671ccc180c81058a28d3254f17d2f7b"
    },
    "merge_with_branch": {
      "hash": "7b17437a16a858816d2710a94ccaa1b9c3506d1f",
      "tree": {
        "list_contents_recursively": [
          {
            "key": "/foo",
            "value": "bar"
          },
          {
            "key": "/baz",
            "value": "qux"
          }
        ]
      }
    }
  }
}
</code></pre>
<p>Overall, the new GraphQL API operates at a much higher level than the old HTTP API, and offers a number of complex operations that were tricky to accomplish before.</p>
<h2>Customizable</h2>
<p>With GraphQL, all request and response data is fully described by the schema. Because Irmin allows the user to have custom content types, this leaves the question of what type to assign to such values. By default, the GraphQL API will expose all values as strings, i.e. the serialized version of the data that your application stores. This works quite well when Irmin is used as a simple key-value store, but it can be very inconvenient scheme when storing more complex values. As an example, consider storing contacts (name, email, phone, tags, etc) in your Irmin store, where values have the following type:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Custom content type: a contact </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">contact</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">name</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">email</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ... </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>Fetching such a value will by default be returned to the client as the JSON encoded representation. Assume we're storing a contact under the key <code>john-doe</code>, which we fetch with the following query:</p>
<pre><code class="language-graphql">query {
  master {
    tree {
      get(key: "john-doe")
    }
  }
}
</code></pre>
<p>The response would then look something like this:</p>
<pre><code class="language-json">{
  "master": {
    "tree": {
      "get": "{\"name\":\"John Doe\", \"email\": \"john.doe@gmail.com/", ...}"
    }
  }
}
</code></pre>
<p>The client will have to parse this JSON string and cannot choose to only fetch parts of the value (say, only the email). Optimally we would want the client to get a structured response such as the following:</p>
<pre><code class="language-json">{
  "master": {
    "tree": {
      "get": {
        "name": "John Doe",
        "email": "john.doe@gmail.com",
        ...
      }
    }
  }
}
</code></pre>
<p>To achieve this, the new GraphQL API allows providing an "output type" and an "input type" for most of the configurable types in your store (<code>contents</code>, <code>key</code>, <code>metadata</code>, <code>hash</code>, <code>branch</code>). The output type specifies how data is presented to the client, while the input type controls how data can be provided by the client. Let's take a closer look at specifying a custom output type.</p>
<p>Essentially you have to construct a value of type <code>(unit, 'a option) Graphql_lwt.Schema.typ</code> (from the <a href="https://github.com/andreas/ocaml-graphql-server"><code>graphql-lwt</code></a> package), assuming your content type is <code>'a</code>. We could construct a GraphQL object type for our example content type <code>contact</code> as follows:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> (unit, contact option) Graphql_lwt.Schema.typ </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">contact_schema_typ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Graphql_lwt</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Schema</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">obj</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Contact</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  ~</span><span class="ocaml-source">fields</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">field</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">name</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">      ~</span><span class="ocaml-source">typ</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">non_null</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      ~</span><span class="ocaml-source">args</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-list">[]</span><span class="ocaml-source">
</span><span class="ocaml-source">      ~</span><span class="ocaml-source">resolve</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-source">contact</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-source">contact</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">name</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ... more fields </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>To use the custom type, you need to instantiate the functor <code>Irmin_unix.Graphql.Server.Make_ext</code> (assuming you're deploying to a Unix target) with an Irmin store (type <code>Irmin.S</code>) and a custom types module (type <code>Irmin_graphql.Server.CUSTOM_TYPES</code>). This requires a bit of plumbing:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Instantiate the Irmin functor somehow </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">contents</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">contact</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> ... </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Custom GraphQL presentation module </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Custom_types</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Construct default GraphQL types </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Defaults</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_graphql</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Server</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Default_types</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Use the default types for most things </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Defaults</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Key</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Metadata</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Defaults</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Metadata</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Hash</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Defaults</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Hash</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Branch</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Defaults</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Branch</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Use custom output type for contents </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Contents</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">include</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Defaults</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Contents</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">schema_typ</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">contact_schema_typ</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Remote</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">remote</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">remote</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">GQL</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Irmin_unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Graphql</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Server</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Make_ext</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Remote</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Custom_types</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>With this in hand, we can now query specifically for the email of <code>john-doe</code>:</p>
<pre><code class="language-graphql">query {
  master {
    tree {
      get(key: "john-doe") {
        email
      }
    }
  }
}
</code></pre>
<p>... and get a nicely structured JSON response back:</p>
<pre><code class="language-json">{
  "master": {
    "tree": {
      "get": {
        "email": "john.doe@gmail.com"
      }
    }
  }
}
</code></pre>
<p>The custom types is very powerful and opens up for transforming or enriching the data at query time, e.g. geocoding the address of a contact, or checking an on-line status.</p>
<h2>Watches</h2>
<p>A core feature of Irmin is the ability to <em>watch</em> for changes to the underlying data store in real-time. <code>irmin-graphql</code> takes advantage of GraphQL subscriptions to expose Irmin watches. Subscriptions are a relative recent addition to the GraphQL spec (<a href="https://github.com/graphql/graphql-spec/releases/tag/June2018">June 2018</a>), which allows clients to <em>subscribe</em> to changes. These changes are pushed to the client over a suitable transport mechanism, e.g. websockets, Server-Sent Events, or a chunked HTTP response, as a regular GraphQL response.</p>
<p>As an example, the following query watches for all changes and returns the new hash:</p>
<pre><code class="language-graphql">subscription {
  watch {
    commit {
      hash
    }
  }
}
</code></pre>
<p>For every change, a message like the following will be sent:</p>
<pre><code class="language-json">{
  "watch": {
    "commit": {
      "hash": "c01a59bacc16d89e9cdd344a969f494bb2698d8f"
    }
  }
}
</code></pre>
<p>Under the hood, subscriptions in <code>irmin-graphql</code> are implemented using Irmin watches, but this is opaque to the client -- this will work with any GraphQL spec compliant client!</p>
<p>Here's a video, which hows how the GraphQL response changes live as the Irmin store is being manipulated:</p>
<p><video controls="" width="680"><source src="/blog/images/2019-11-27-introducing-irmin-graphql/irmin-subscriptions~MdxrUrEyHklnwgR7dfGIAQ.mp4" type="video/mp4"></video></p>
<p>Note that the current implementation only supports websockets with more transport options coming soon.</p>
<h2>Wrap-up</h2>
<p>Irmin 2.0 ships with a powerful new GraphQL API, that makes it much easier to interact with Irmin over the network. This makes Irmin available for many more languages and contexts, not just applications using OCaml (or Javascript). The new API operates at a much high level than the old API, and offers advanced features such as "bring your own GraphQL types", and watching for changes via GraphQL subscriptions.</p>
<p>We're looking forward to seeing what you'll build with it!</p>
]]></description><link>https://tarides.com/blog/2019-11-27-introducing-the-graphql-api-for-irmin-2-0</link><guid isPermaLink="false">https://tarides.com/blog/2019-11-27-introducing-the-graphql-api-for-irmin-2-0.html</guid><dc:creator><![CDATA[ Andreas Garnaes ]]></dc:creator><pubDate>Wed, 27 Nov 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Irmin v2]]></title><description><![CDATA[<p>We are pleased to announce <a href="https://github.com/mirage/irmin/releases">Irmin
2.0.0</a>, a major release of the
Git-like distributed branching and storage substrate that underpins
<a href="https://mirage.io">MirageOS</a>.  We began the release process for all the
components that make up Irmin <a href="/blog/2019-05-13-on-the-road-to-irmin-v2/">back in May
2019</a>, and there
have been close to 1000 commits since Irmin 1.4.0 released back in June 2018. To
celebrate this milestone, we have a new logo and opened a dedicated website:
<a href="https://irmin.org">irmin.org</a>.</p>
<p>Our focus this year has been on ensuring the production success of our
early adopters -- such as the
<a href="https://gitlab.com/tezos/tezos/tree/master/src/lib_storage">Tezos</a> blockchain
and the <a href="https://github.com/moby/datakit">Datakit 9P</a>
stack -- as well as spawning new research projects into the practical
application of distributed and mergeable data stores.  We are also
very pleased to welcome several new maintainers into the Mirage
project for their contributions to Irmin, namely
<a href="https://github.com/icristescu">Ioana Cristescu</a>,
<a href="https://github.com/CraigFe">Craig Ferguson</a>,
<a href="https://github.com/andreas">Andreas Garnaes</a>,
<a href="https://github.com/pascutto">Clément Pascutto</a> and
<a href="https://github.com/zshipko">Zach Shipko</a>.</p>
<h2>New Major Features</h2>
<h3>New CLI</h3>
<p>While Irmin is normally used as a library, it is obviously useful to
be able to interact with a data store from a shell.  The <code>irmin-unix</code>
opam package now provides an <code>irmin</code> binary that is configured via a
Yaml file and can perform queries and mutations against a Git store.</p>
<pre><code><span class="shell-source">$ </span><span class="shell-support-function-builtin">echo</span><span class="shell-source"> </span><span class="shell-punctuation-definition-string-begin">"</span><span class="shell-string-quoted-double">root: .</span><span class="shell-punctuation-definition-string-end">"</span><span class="shell-source"> </span><span class="shell-keyword-operator-redirect">&gt;</span><span class="shell-source"> irmin.yml
</span><span class="shell-source">$ irmin init
</span><span class="shell-source">$ irmin </span><span class="shell-support-function-builtin">set</span><span class="shell-source"> foo/bar </span><span class="shell-punctuation-definition-string-begin">"</span><span class="shell-string-quoted-double">testing 123</span><span class="shell-punctuation-definition-string-end">"</span><span class="shell-source">
</span><span class="shell-source">$ irmin get foo/bar
</span></code></pre>
<p>Try <code>irmin --help</code> to see all the commands and options available.</p>
<h3>Tezos and irmin-pack</h3>
<p>Another big user of Irmin is the <a href="https://tezos.com">Tezos blockchain</a>,
and we have been optimising the persistent space usage of Irmin as their
network grows.  Because Tezos doesn’t require full Git format support,
we created a hybrid backend that grabs the best bits of Git (e.g. the
packfile mechanism) and engineered a domain-specific backend tailored
for Tezos usage. Crucially, because of the way Irmin is split into
clean libraries and OCaml modules, we only had to modify a small part
of the codebase and could also reuse elements of our
<a href="https://github.com/mirage/ocaml-git">OCaml-git</a> codebase as well.</p>
<p>The <a href="https://github.com/mirage/irmin/pull/615">irmin-pack backend</a> is available
for <a href="https://github.com/mirage/irmin/pull/888">use in the CLI</a> and provides a
significant improvement in disk usage.  There is a corresponding <a href="https://gitlab.com/tezos/tezos/merge_requests/1268">Tezos merge
request</a> using the Irmin
2.0 code that has been integrated downstream and will become available via
their release process in due course.</p>
<p>As part of this development process, we also released an efficient multi-level
index implementation (imaginatively dubbed
<a href="https://github.com/mirage/index">index</a> in opam). Our implementation takes an
arbitrary IO implementation and user-supplied content types and supplies a
standard key-value interface for persistent storage. Index provides instance
sharing by default, so each OCaml runtime shares a common singleton instance.</p>
<h3>Irmin-GraphQL and “browser Irmin”</h3>
<p>Another new area of huge interest to us is
<a href="https://graphql.org">GraphQL</a> in order to provide frontends with a rich
query language for Irmin-hosted applications.  Irmin 2.0 includes a
built-in GraphQL server so you can <a href="https://twitter.com/cuvius/status/1017136581755457539">manipulate your Git repo via
GraphQL</a>.</p>
<p>If you are interested in (for example) compiling elements of Irmin to
JavaScript or wasm, for usage in frontends, then the Irmin 2.0 release
makes it significantly easier to support this architecture.  We’ve
already seen some exploratory efforts <a href="https://github.com/mirage/irmin/issues/681">report issues</a>
when doing this, and we’ve had it working ourselves in <a href="https://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-things-done-in-the-browser/">Irmin 1.0 Cuekeeper</a>
so we are excited by the potential power of applications built using
this model.  If you have ideas/questions, please get in touch on the
<a href="https://github.com/mirage/irmin/issues">issue tracker</a> with your
usecase.</p>
<h3>Wodan</h3>
<p>Irmin’s storage layer is also well abstracted, so backends other than
a Unix filesystem or Git are supported.  Irmin can run in highly
diverse and OS-free environments, and so we began engineering the
<a href="https://github.com/mirage/wodan">Wodan filesystem</a> as a
domain-specific filesystem designed for MirageOS, Irmin and modern
flash drives.  See <a href="https://g2p.github.io/research/wodan.pdf">the OCaml Workshop 2017 abstract on
it</a> for more design
rationale.</p>
<p>As part of the Irmin 2.0 release, Wodan is also being prepared for a
release, and you can find <a href="https://github.com/mirage/wodan/tree/master/src/wodan-irmin">Irmin 2.0
support</a>
in the source.  If you’d like a standalone block-device based
persistence environment for Irmin, please try this out.  This is the
preferred backend for using Irmin storage in a unikernel.</p>
<p>###&nbsp;Versioned CalDAV</p>
<p>An application pulling all these pieces together is being developed
by our friends at <a href="https://robur.io/About%20Us/Team">Robur</a>: an Irmin-based
<a href="https://github.com/roburio/caldav">CalDAV calendaring server</a>
that even hosts its DNS server using a versioned Irmin store.  We'll
blog more about this as the components get released and stabilised, but
the unikernel enthusiasts among you may want to browse the
<a href="https://github.com/roburio/unikernels/tree/future">Robur unikernels future branch</a>
to see how they are deploying them today.</p>
<p>A huge thank you to all our commercial customers, end users and open-source
developers who have contributed their time, expertise and
financial support to help us achieve our goal of delivering a modern
storage stack in the spirit of Git.  Our next steps for Irmin are to
continue to increase the performance and optimise the storage,
and to build more end-to-end applications using the application core
on top of MirageOS.</p>
]]></description><link>https://tarides.com/blog/2019-11-21-irmin-v2</link><guid isPermaLink="false">https://tarides.com/blog/2019-11-21-irmin-v2.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Thu, 21 Nov 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Mr. MIME - Parse and generate emails]]></title><description><![CDATA[<p>We're glad to announce the first release of <a href="https://github.com/mirage/mrmime.git"><code>mrmime</code></a>, a parser and a
generator of emails. This library provides an <em>OCaml way</em> to analyze and craft
an email. The eventual goal is to build an entire <em>unikernel-compatible</em> stack
for email (such as SMTP or IMAP).</p>
<p>In this article, we will show what is currently possible with <code>mrmime</code> and
present a few of the useful libraries that we developed along the way.</p>
<h2>An email parser</h2>
<p>Some years ago, Romain gave <a href="https://www.youtube.com/watch?v=kQkRsNEo25k">a talk</a> about what an email really <em>is</em>.
Behind the human-comprehensible format (or <em>rich-document</em> as we said a
long time ago), there are several details of emails which complicate the process of
analyzing them (and can be prone to security lapses). These details are mostly described
by three RFCs:</p>
<ul>
<li><a href="https://tools.ietf.org/html/rfc822">RFC822</a></li>
<li><a href="https://tools.ietf.org/html/rfc2822">RFC2822</a></li>
<li><a href="https://tools.ietf.org/html/rfc5322">RFC5322</a></li>
</ul>
<p>Even though they are cross-compatible, providing full legacy email parsing is an
archaeological exercise: each RFC retains support for the older design decisions
(which were not recognized as bad or ugly in 1970 when they were first standardized).</p>
<p>The latest email-related RFC (RFC5322) tried to fix the issue and provide a better
<a href="https://tools.ietf.org/html/rfc5234">formal specification</a> of the email format – but of course, it comes with plenty of
<em>obsolete</em> rules which need to be implemented. In the standard, you find
both the current grammar rule and its obsolete equivalent.</p>
<h3>An extended email parser</h3>
<p>Even if the email format can defined by "only" 3 RFCs, you will
miss email internationalization (<a href="https://tools.ietf.org/html/rfc6532">RFC6532</a>), the MIME format
(<a href="https://tools.ietf.org/html/rfc2045">RFC2045</a>, <a href="https://tools.ietf.org/html/rfc2046">RFC2046</a>, <a href="https://tools.ietf.org/html/rfc2047">RFC2047</a>,
<a href="https://tools.ietf.org/html/rfc2049">RFC2049</a>), or certain details needed to be interoperable with SMTP
(<a href="https://tools.ietf.org/html/rfc5321">RFC5321</a>). There are still more RFCs which add extra features
to the email format such as S/MIME or the Content-Disposition field.</p>
<p>Given this complexity, we took the most general RFCs and tried to provide an easy way to deal
with them. The main difficulty is the <em>multipart</em> parser, which deals with email
attachments (anyone who has tried to make an HTTP 1.1 parser knows about this).</p>
<h3>A realistic email parser</h3>
<p>Respecting the rules described by RFCs is not enough to be able to analyze any
email from the real world: existing email generators can, and do, produce
<em>non-compliant</em> email. We stress-tested <code>mrmime</code> by feeding it a batch of 2
billion emails taken from the wild, to see if it could parse everything (even if
it does not produce the expected result). Whenever we noticed a recurring
formatting mistake, we updated the details of the <a href="https://tools.ietf.org/html/rfc5234">ABNF</a> to enable
<code>mrmime</code> to parse it anyway.</p>
<h3>A parser usable by others</h3>
<p>One demonstration of the usability of <code>mrmime</code> is <a href="https://github.com/dinosaure/ocaml-dkim.git"><code>ocaml-dkim</code></a>, which wants to
extract a specific field from your mail and then verify that the hash and signature
are as expected.</p>
<p><code>ocaml-dkim</code> is used by the latest implementation of <a href="https://github.com/mirage/ocaml-dns.git"><code>ocaml-dns</code></a> to request
public keys in order to verify email.</p>
<p>The most important question about <code>ocaml-dkim</code> is: is it able to
verify your email in one pass? Indeed, currently some implementations of DKIM
need 2 passes to verify your email (one to extract the DKIM signature, the other
to digest some fields and bodies). We focused on verifying in a <em>single</em> pass in
order to provide a unikernel SMTP <em>relay</em> with no need to store your email between
verification passes.</p>
<h2>An email generator</h2>
<p>OCaml is a good language for making little DSLs for specialized use-cases. In this
case, we took advantage of OCaml to allow the user to easily craft an email from
nothing.</p>
<p>The idea is to build an OCaml value describing the desired email header, and
then let the Mr. MIME generator transform this into a stream of characters that
can be consumed by, for example, an SMTP implementation. The description step
is quite simple:</p>
<pre><code><span class="ocaml-keyword-other">#</span><span class="ocaml-keyword-other">require</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">mrmime</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> ;;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">#</span><span class="ocaml-keyword-other">require</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ptime.clock.os</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> ;;</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mrmime</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">romain_calascibetta</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mailbox</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Local</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">w</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">romain</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">w</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">calascibetta</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">domain</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">gmail</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">com</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">john_doe</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Mailbox</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Local</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">w</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">john</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Domain</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">domain</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">doe</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">org</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">with_name</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Phrase</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">w</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">John</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">w</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">D.</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">now</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Date</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">of_ptime</span><span class="ocaml-source"> ~</span><span class="ocaml-source">zone</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Zone</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">GMT</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Ptime_clock</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">now</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">subject</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Unstructured</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">A</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">sp</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Simple</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">sp</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Mail</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">header</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Header</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Field</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Subject</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">$</span><span class="ocaml-source"> </span><span class="ocaml-source">subject</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&amp;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Field</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Sender</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">$</span><span class="ocaml-source"> </span><span class="ocaml-source">romain_calascibetta</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&amp;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Field</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">To</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">$</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Address</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-source">mailbox</span><span class="ocaml-source"> </span><span class="ocaml-source">john_doe</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&amp;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Field</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Date</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">$</span><span class="ocaml-source"> </span><span class="ocaml-source">now</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">empty</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">stream</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Header</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_stream</span><span class="ocaml-source"> </span><span class="ocaml-source">header</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">go</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">stream</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">print_string</span><span class="ocaml-source"> </span><span class="ocaml-source">buf</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">go</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">go</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>This code produces the following header:</p>
<pre><code>Date: 2 Aug 2019 14:10:10 GMT
To: John "D." &lt;john@doe.org&gt;
Sender: romain.calascibetta@gmail.com
Subject: A Simple Mail
</code></pre>
<h3>78-character rule</h3>
<p>One aspect about email and SMTP is about some historical rules of how to
generate them. One of them is about the limitation of bytes per line. Indeed, a
generator of mail should emit at most 80 bytes per line - and, of course, it
should emits entirely the email line per line.</p>
<p>So <code>mrmime</code> has his own encoder which tries to wrap your mail into this limit.
It was mostly inspired by <a href="https://github.com/inhabitedtype/faraday">Faraday</a> and <a href="https://caml.inria.fr/pub/docs/manual-ocaml/libref/Format.html">Format</a> powered with
GADT to easily describe how to encode/generate parts of an email.</p>
<h3>A multipart email generator</h3>
<p>Of course, the main point about email is to be able to generate a multipart
email - just to be able to send file attachments. And, of course, a deep work
was done about that to make parts, compose them into specific <code>Content-Type</code>
fields and merge them into one email.</p>
<p>Eventually, you can easily make a stream from it, which respects rules (78 bytes
per line, stream line per line) and use it directly into an SMTP implementation.</p>
<p>This is what we did with the project <a href="https://github.com/dinosaure/facteur"><code>facteur</code></a>. It's a little
command-line tool to send with file attachement mails in pure OCaml - but it
works only on an UNIX operating system for instance.</p>
<h2>Behind the forest</h2>
<p>Even if you are able to parse and generate an email, more work is needed to get the expected results.</p>
<p>Indeed, email is a exchange unit between people and the biggest deal on that is
to find a common way to ensure a understable communication each others. About
that, encoding is probably the most important piece and when a French person wants
to communicate with a <em>latin1</em> encoding, an American person can still use ASCII.</p>
<h3>Rosetta</h3>
<p>So about this problem, the choice was made to unify any contents to UTF-8 as the
most general encoding of the world. So, we did some libraries which map an encoding flow
to Unicode code-point, and we use <code>uutf</code> (thanks to <a href="https://github.com/dbuenzli">dbuenzli</a>) to normalize it to UTF-8.</p>
<p>The main goal is to avoid a headache to the user about that and even if
contents of the mail is encoded with <em>latin1</em> we ensure to translate it
correctly (and according RFCs) to UTF-8.</p>
<p>This project is <a href="https://github.com/mirage/rosetta"><code>rosetta</code></a> and it comes with:</p>
<ul>
<li><a href="https://github.com/mirage/uuuu"><code>uuuu</code></a> for ISO-8859 encoding</li>
<li><a href="https://github.com/mirage/coin"><code>coin</code></a> for KOI8-{R,U} encoding</li>
<li><a href="https://github.com/mirage/yuscii"><code>yuscii</code></a> for UTF-7 encoding</li>
</ul>
<h3>Pecu and Base64</h3>
<p>Then, bodies can be encoded in some ways, 2 precisely (if we took the main
standard):</p>
<ul>
<li>A base64 encoding, used to store your file</li>
<li>A quoted-printable encoding</li>
</ul>
<p>So, about the <code>base64</code> package, it comes with a sub-package <code>base64.rfc2045</code>
which respects the special case to encode a body according RFC2045 and SMTP
limitation.</p>
<p>Then, <code>pecu</code> was made to encode and decode <em>quoted-printable</em> contents. It was
tested and fuzzed of course like any others MirageOS's libraries.</p>
<p>These libraries are needed for an other historical reason which is: bytes used
to store mail should use only 7 bits instead of 8 bits. This is the purpose of
the base64 and the <em>quoted-printable</em> encoding which uses only 127 possibilities
of a byte. Again, this limitation comes with SMTP protocol.</p>
<h2>Conclusion</h2>
<p><code>mrmime</code> is tackling the difficult task to parse and generate emails according to 50 years of usability, several RFCs and legacy rules.
So, it
still is an experimental project. We reach the first version of it because we
are currently able to parse many mails and then generate them correctly.</p>
<p>Of course, a <em>bug</em> (a malformed mail, a server which does not respect standards
or a bad use of our API) can appear easily where we did not test everything. But
we have the feeling it was the time to release it and let people to use
it.</p>
<p>The best feedback about <code>mrmime</code> and the best improvement is yours. So don't be
afraid to use it and start to hack your emails with it.</p>
]]></description><link>https://tarides.com/blog/2019-09-25-mr-mime-parse-and-generate-emails</link><guid isPermaLink="false">https://tarides.com/blog/2019-09-25-mr-mime-parse-and-generate-emails.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Wed, 25 Sep 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Decompress: Experiences with OCaml optimization]]></title><description><![CDATA[<p>In our <a href="/blog/2019-08-26-decompress-the-new-decompress-api/">first article</a> we mostly discussed
the API design of <code>decompress</code> and did not talk too much about the issue of
optimizing performance. In this second article, we will relate our experiences
of optimizing <code>decompress</code>.</p>
<p>As you might suspect, <code>decompress</code> needs to be optimized a lot. It was used by
several projects as an underlying layer of some formats (like Git), so it can be
a real bottleneck in those projects. Of course, we start with a footgun by using
a garbage-collected language; comparing the performance of <code>decompress</code> with a C
implementation (like <a href="https://zlib.net/">zlib</a> or <a href="https://github.com/richgel999/miniz">miniz</a>) is obviously not very fair.</p>
<p>However, using something like <code>decompress</code> instead of C implementations can be
very interesting for many purposes, especially when thinking about <em>unikernels</em>.
As we said in the previous article, we can take the advantage of the <em>runtime</em>
and the type-system to provide something <em>safer</em> (of course, it's not really
true since zlib has received several security audits).</p>
<p>The main idea in this article is not to give snippets to copy/paste into your
codebase but to explain some behaviors of the compiler / runtime and hopefully
give you some ideas about how to optimize your own code. We'll discuss the
following optimizations:</p>
<ul>
<li>specialization</li>
<li>inlining</li>
<li>untagged integers</li>
<li>exceptions</li>
<li>unrolling</li>
<li>hot-loop</li>
<li>caml_modify</li>
<li>representation sizes</li>
</ul>
<h3>Cautionary advice</h3>
<p>Before we begin discussing optimization, keep this rule in mind:</p>
<blockquote>
<p>Only perform optimization at the <strong>end</strong> of the development process.</p>
</blockquote>
<p>An optimization pass
can change your code significantly, so you need to keep a state of your project
that can be trusted. This state will provide a comparison point for both
benchmarks and behaviors. In other words, your stable implementation will be the
oracle for your benchmarks. If you start with nothing, you'll achieve
arbitrarily-good performance at the cost of arbitrary behavior!</p>
<p>We optimized <code>decompress</code> because we are using it in bigger projects for a long
time (2 years). So we have an oracle (even if <code>zlib</code> can act as an oracle in
this special case).</p>
<h2>Specialization</h2>
<p>One of the biggest specializations in <code>decompress</code> is regarding the <code>min</code>
function. If you don't know, in OCaml <code>min</code> is polymorphic; you can compare
anything. So you probably have some concerns about how <code>min</code> is implemented?</p>
<p>You are right to be concerned: if you examine the details, <code>min</code> calls the C
function <code>do_compare_val</code>, which traverses your structure and does a comparison
according the run-time representation of your structure. Of course, for integers, it
should be only a <code>cmpq</code> assembly instruction. However, some simple code like:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">min</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span></code></pre>
<p>will produce this CMM and assembly code:</p>
<pre><code>(let x/1002 (app{main.ml:1,8-15} "camlStdlib__min_1028" 1 3 val)
   ...)
</code></pre>
<pre><code class="language-nasm">.L101:
        movq    $3, %rbx
        movq    $1, %rax
        call    camlStdlib__min_1028@PLT
</code></pre>
<p>Note that <em><a href="https://en.wikipedia.org/wiki/Lambda_calculus#Beta_reduction">beta-reduction</a></em>, <em><a href="https://en.wikipedia.org/wiki/Inline_expansion">inlining</a></em> and
specialization were not done in this code. OCaml does not optimize your code
very much – the good point is predictability of the produced assembly output.</p>
<p>If you help the compiler a little bit with:</p>
<pre><code><span class="ocaml-keyword-other">external</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">bool</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-constant-character-printf">%le</span><span class="ocaml-string-quoted-double">ssequal</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">min</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">inline</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">min</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span></code></pre>
<p>We have:</p>
<pre><code>(function{main.ml:2,8-43} camlMain__min_1003 (a/1004: val b/1005: val)
 (if (&lt;= a/1004 b/1005) a/1004 b/1005))

(function camlMain__entry ()
 (let x/1006 1 (store val(root-init) (+a "camlMain" 8) 1)) 1a)
</code></pre>
<pre><code class="language-nasm">.L101:
        cmpq    %rbx, %rax
        jg      .L100
        ret
</code></pre>
<p>So we have all optimizations, in this produced code, <code>x</code> was evaluated as <code>0</code>
(<code>let x/... (store ... 1)</code>) (beta-reduction and inlining) and <code>min</code> was
specialized to accept only integers – so we are able to emit <code>cmpq</code>.</p>
<h3>Results</h3>
<p>With specialization, we won 10 Mb/s on decompression, where <code>min</code> is used
in several places. We completely avoid an indirection and a call to the slow
<code>do_compare_val</code> function.</p>
<p>This kind of specialization is already done by <a href="https://caml.inria.fr/pub/docs/manual-ocaml/flambda.html"><code>flambda</code></a>, however, we
currently use OCaml 4.07.1. So we decided to this kind of optimization by
ourselves.</p>
<h2>Inlining</h2>
<p>In the first example, we showed code with the <code>[@@inline]</code> keyword which is
useful to force the compiler to inline a little function. We will go outside the
OCaml world and study C code (gcc 5.4.0) to really understand
<em>inlining</em>.</p>
<p>In fact, inlining is not necessarily the best optimization. Consider the
following (nonsensical) C program:</p>
<pre><code class="language-c">#include &lt;stdio.h&gt;
#include &lt;string.h&gt;
#include &lt;unistd.h&gt;
#include &lt;time.h&gt;
#include &lt;stdlib.h&gt;

#ifdef HIDE_ALIGNEMENT
__attribute__((noinline, noclone))
#endif
void *
hide(void * p) { return p; }

int main(int ac, const char *av[])
{
  char *s = calloc(1 &lt;&lt; 20, 1);
  s = hide(s);

  memset(s, 'B', 100000);

  clock_t start = clock();

  for (int i = 0; i &lt; 1280000; ++i)
    s[strlen(s)] = 'A';

  clock_t end = clock();

  printf("%lld\n", (long long) (end-start));

  return 0;
}
</code></pre>
<p>We will compile this code with <code>-O2</code> (the second level of optimization in C),
once with <code>-DHIDE_ALIGNEMENT</code> and once without. The assembly emitted differs:</p>
<pre><code class="language-nasm">.L3:
	movq	%rbp, %rdi
	call	strlen
	subl	$1, %ebx
	movb	$65, 0(%rbp,%rax)
	jne	.L3
</code></pre>
<pre><code class="language-nasm">.L3:
	movl	(%rdx), %ecx
	addq	$4, %rdx
	leal	-16843009(%rcx), %eax
	notl	%ecx
	andl	%ecx, %eax
	andl	$-2139062144, %eax
	je	.L3
</code></pre>
<p>In the first output (with <code>-DHIDE_ALIGNEMENT</code>), the optimization pass
decides to disable inlining of <code>strlen</code>; in the second output (without
<code>-DHIDEAlIGNEMENT</code>), it decides to inline <code>strlen</code> (and do some other clever
optimizations). The reason behind this complex behavior from the compiler is
clearly described <a href="https://stackoverflow.com/a/55589634">here</a>.</p>
<p>But what we want to say is that inlining is <strong>not</strong> an automatic optimization;
it might act as a <em>pessimization</em>. This is the goal of <code>flambda</code>: do the right
optimization under the right context. If you are really curious about what <code>gcc</code>
does and why, even if it's very interesting, the reverse engineering of the
optimization process and which information is relevant about the choice to
optimize or not is deep, long and surely too complicated.</p>
<p>A non-spontaneous optimization is to annotate some parts of your code with
<code>[@@inline never]</code> – so, explicitly say to the compiler to not inline the
function. This constraint is to help the compiler to generate a smaller code
which will have more chance to fit under the processor cache.</p>
<p>For all of these reasons, <code>[@@inline]</code> should be used sparingly and an oracle to
compare performances if you inline or not this or this function is necessary to
avoid a <em>pessimization</em>.</p>
<h3>In <code>decompress</code></h3>
<p>Inlining in <code>decompress</code> was done on small functions which need to allocate
to return a value. If we inline them, we can take the opportunity to store
returned value in registers (of course, it depends how many registers are free).</p>
<p>As we said, the goal of the inflator is to translate a bit sequence to a byte.
The largest bit sequence possible according to RFC 1951 has length 15. So, when
we process an inputs flow, we eat it 15 bits per 15 bits. For each packet, we
want to recognize an existing associated bit sequence and then, binded values
will be the real length of the bit sequence and the byte:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">find</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">bits</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">len</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">byte</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>So for each call to this function, we need to allocate a record/tuple. It's
why we choose to inline this function. <code>min</code> was inlined too and some other
small functions. But as we said, the situation is complex; where we think that
<em>inlining</em> can help us, it's not systematically true.</p>
<p>NOTE: we can recognize bits sequence with, at most, 15 bits because a
<a href="https://zlib.net/feldspar.html">Huffman coding</a> is <a href="https://en.wikipedia.org/wiki/Prefix_code">prefix-free</a>.</p>
<h2>Untagged integers</h2>
<p>When reading assembly, the integer <code>0</code> is written as <code>$1</code>.
It's because of the <a href="https://blog.janestreet.com/what-is-gained-and-lost-with-63-bit-integers/">GC bit</a> needed to differentiate a pointer
and an unboxed integer. This is why, in OCaml, we talk about a 31-bits integer
or a 63-bits integer (depending on your architecture).</p>
<p>We will not try to start a debate about this arbitrary choice on the
representation of an integer in OCaml. However, we can talk about some
operations which can have an impact on performances.</p>
<p>The biggest example is about the <code>mod</code> operation. Between OCaml and C, <code>%</code> or
<code>mod</code> should be the same:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">mod</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">
</span></code></pre>
<p>The output assembly is:</p>
<pre><code class="language-nasm">.L105:
        movq    %rdi, %rcx
        sarq    $1, %rcx     // b &gt;&gt; 1
        movq    (%rsp), %rax
        sarq    $1, %rax     // a &gt;&gt; 1
        testq   %rcx, %rcx   // b != 0
        je      .L107
        cqto
        idivq   %rcx         // a % b
        jmp     .L106
.L107:
        movq    caml_backtrace_pos@GOTPCREL(%rip), %rax
        xorq    %rbx, %rbx
        movl    %ebx, (%rax)
        movq    caml_exn_Division_by_zero@GOTPCREL(%rip), %rax
        call    caml_raise_exn@PLT
.L106:
        salq    $1, %rdx     // x &lt;&lt; 1
        incq    %rdx         // x + 1
        movq    %rbx, %rax
</code></pre>
<p>where idiomatically the same C code produce:</p>
<pre><code class="language-nasm">.L2:
        movl    -12(%rbp), %eax
        cltd
        idivl   -8(%rbp)
        movl    %edx, -4(%rbp)
</code></pre>
<p>Of course, we can notice firstly the exception in OCaml (<code>Divided_by_zero</code>) -
which is pretty good because it protects us against an interrupt from assembly
(and keep the trace). Then, we need to <em>untag</em> <code>a</code> and <code>b</code> with <code>sarq</code> assembly
operation. We do, as the C code, <code>idiv</code> and then we must <em>retag</em> returned value
<code>x</code> with <code>salq</code> and <code>incq</code>.</p>
<p>So in some parts, it should be more interesting to use <code>Nativeint</code>. However, by
default, a <code>nativeint</code> is boxed. <em>boxed</em> means that the value is allocated in
the OCaml heap alongside a header.</p>
<p>Of course, this is not what we want so, if our <code>nativeint ref</code> (to have
side-effect, like <code>x</code>) stay inside a function and then, you return the real
value with the deref <code>!</code> operator, OCaml, by a good planet alignment, can
directly use registers and real integers. So it should be possible to avoid
these needed conversions.</p>
<h3>Readability versus performance</h3>
<p>We use this optimization only in few parts of the code. In fact, switch
between <code>int</code> and <code>nativeint</code> is little bit noisy:</p>
<pre><code><span class="ocaml-source">hold</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Nativeint</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">logor</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">hold</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Nativeint</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">shift_left</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">of_int</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">unsafe_get_uint8</span><span class="ocaml-source"> </span><span class="ocaml-source">d</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">i_pos</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">bits</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>In the end, we only gained 0.5Mb/s of inflation rate, so it's not worthwhile
to do systematically this optimization. Especially that the gain is not very
big. But this case show a more troubling problem: loss of readability.</p>
<p>In fact, we can optimize more and more a code (OCaml or C) but we lost, step by
step, readability. You should be afraid by the implementation of <code>strlen</code> for
example. In the end, the loss of readability makes it harder to understand the purpose
of the code, leading to errors whenever some other person (or you in 10 years time)
tries to make a change.</p>
<p>And we think that this kind of optimization is not the way of OCaml in general
where we prefer to produce an understandable and abstracted code than a cryptic
and super fast one.</p>
<p>Again, <code>flambda</code> wants to fix this problem and let the compiler to do this
optimization. The goal is to be able to write a fast code without any pain.</p>
<h2>Exceptions</h2>
<p>If you remember our <a href="/blog/2019-02-08-release-of-base64/">article</a> about the release of <code>base64</code>, we talked a
bit about exceptions and used them as a <em>jump</em>. In fact, it's pretty
common for an OCaml developer to break the control-flow with an exception.
Behind this common design/optimization, it's about calling convention.</p>
<p>Indeed, choose the <em>jump</em> word to describe OCaml exception is not the best where
we don't use <code>setjmp</code>/<code>longjmp</code>.</p>
<p>In the details, when you start a code with a <code>try .. with</code>, OCaml saves a <em>trap</em>
in the stack which contains information about the <code>with</code>, the catcher. Then,
when you <code>raise</code>, you <em>jump</em> directly to this trap and can just discard several
stack frames (and, by this way, you did not check each return codes).</p>
<p>In several places and mostly in the <em>hot-loop</em>, we use this <em>pattern</em>. However,
it completely breaks the control flow and can be error-prone.</p>
<p>To limit errors and because this pattern is usual, we prefer to use a <em>local</em>
exception which will be used only inside the function. By this way, we enforce
the fact that exception should not (and can not) be caught by something else
than inside the function.</p>
<pre><code><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">exception</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Break</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">try</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">while</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">max</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">bl_count</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">max</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">raise_notrace</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Break</span><span class="ocaml-source">
</span><span class="ocaml-source">        </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">decr</span><span class="ocaml-source"> </span><span class="ocaml-source">max</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Break</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span></code></pre>
<p>This code above produce this assembly code:</p>
<pre><code class="language-nasm">.L105:
        pushq   %r14
        movq    %rsp, %r14
.L103:
        cmpq    $3, %rdi              // while !max &gt;= 1
        jl      .L102
        movq    -4(%rbx,%rdi,4), %rsi // bl_count,(!max)
        cmpq    $1, %rsi              // bl_count.(!max) != 0
        je      .L104
        movq    %r14, %rsp
        popq    %r14
        ret                           // raise_notrace Break
.L104:
        addq    $-2, %rdi             // decr max
        movq    %rdi, 16(%rsp)
        jmp     .L103
</code></pre>
<p>Where the <code>ret</code> is the <code>raise_notrace Break</code>. A <code>raise_notrace</code> is needed,
otherwise, you will see:</p>
<pre><code class="language-nasm">        movq    caml_backtrace_pos@GOTPCREL(%rip), %rbx
        xorq    %rdi, %rdi
        movl    %edi, (%rbx)
        call    caml_raise_exn@PLT
</code></pre>
<p>Instead the <code>ret</code> assembly code. Indeed, in this case, we need to store where we
raised the exception.</p>
<h2>Unrolling</h2>
<p>When we showed the optimization done by <code>gcc</code> when the string is aligned, <code>gcc</code>
did another optimization. Instead of setting the string byte per byte, it decides to
update it 4 bytes per 4 bytes.</p>
<p>This kind of this optimization is an <em>unroll</em> and we did it in <code>decompress</code>.
Indeed, when we reach the <em>copy</em> <em>opcode</em> emitted by the <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">lz77</a>
compressor, we want to <em>blit</em> <em>length</em> byte(s) from a source to the outputs
flow. It can appear that this <code>memcpy</code> can be optimized to copy 4 bytes per 4
bytes – 4 bytes is generally a good idea where it's the size of an <code>int32</code> and
should fit under any architectures.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">blit</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-source">src_off</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-source">dst_off</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">dst_off</span><span class="ocaml-source"> – </span><span class="ocaml-source">src_off</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">slow_blit</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-source">src_off</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-source">dst_off</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">len0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">len</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">land</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">len1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">len</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">asr</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-source">len1</span><span class="ocaml-source"> – </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">unsafe_get_uint32</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">src_off</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">unsafe_set_uint32</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">dst_off</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-source">len0</span><span class="ocaml-source"> – </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">len1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">unsafe_get_uint8</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">src_off</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">unsafe_set_uint8</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">dst_off</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source">
</span></code></pre>
<p>In this code, at the beginning, we copy 4 bytes per 4 bytes and if <code>len</code> is not
a multiple of 4, we start the <em>trailing</em> loop to copy byte per byte then. In
this context, OCaml can <em>unbox</em> <code>int32</code> and use registers. So this function does
not deal with the heap, and by this way, with the garbage collector.</p>
<h3>Results</h3>
<p>In the end, we gained an extra 10Mb/s of inflation rate. The <code>blit</code> function is the
most important function when it comes to inflating the window to an output flow.
As the specialization on the <code>min</code> function, this is one of the biggest optimization on
<code>decompress</code>.</p>
<h2><em>hot-loop</em></h2>
<p>A common design about decompression (but we can find it on hash implementation
too), is the <em>hot-loop</em>. An <em>hot-loop</em> is mainly a loop on the most common
operation in your process. In the context of <code>decompress</code>, the <em>hot-loop</em> is
about a repeated translation from bits-sequence to byte(s) from the inputs flow
to the outputs flow and the window.</p>
<p>The main idea behind the <em>hot-loop</em> is to initialize all information needed for
the translation before to start the <em>hot-loop</em>. Then, it's mostly an imperative
loop with a <em>pattern-matching</em> which corresponds to the current state of the
global computation.</p>
<p>In OCaml, we can take this opportunity to use <code>int ref</code> (or <code>nativeint ref</code>), and then, they will be translated into registers (which is the fastest
area to store something).</p>
<p>Another deal inside the <em>hot-loop</em> is to avoid any allocation – and it's why we
talk about <code>int</code> or <code>nativeint</code>. Indeed, a more complex structure like an option
will add a blocker to the garbage collection (a call to <code>caml_call_gc</code>).</p>
<p>Of course, this kind of design is completely wrong if we think in a functional
way. However, this is the (biggest?) advantage of OCaml: hide this ugly/hacky
part inside a functional interface.</p>
<p>In the API, we talked about a state which represents the <em>inflation</em> (or the
<em>deflation</em>). At the beginning, the goal is to store into some references
essentials values like the position into the inputs flow, bits available,
dictionary, etc. Then, we launch the <em>hot-loop</em> and only at the end, we update the state.</p>
<p>So we keep the optimal design about <em>inflation</em> and the functional way outside
the <em>hot-loop</em>.</p>
<h2>caml_modify</h2>
<p>One issue that we need to consider is the call to <code>caml_modify</code>. In
fact, for a complex data-structure like an <code>int array</code> or a <code>int option</code> (so,
other than an integer or a boolean or an <em>immediate</em> value), values can move to the
major heap.</p>
<p>In this context, <code>caml_modify</code> is used to assign a new value into your mutable
block. It is a bit slower than a simple assignment but needed to
ensure pointer correspondence between minor heap and major heap.</p>
<p>With this OCaml code for example:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source">
</span></code></pre>
<p>We produce this assembly:</p>
<pre><code class="language-nasm">camlExample__f_1004:
        subq    $8, %rsp
        movq    %rax, %rdi
        movq    %rbx, %rsi
        call    caml_modify@PLT
        movq    $1, %rax
        addq    $8, %rsp
        ret
</code></pre>
<p>Where we see the call to <code>caml_modify</code> which will be take care about the
assignment of <code>v</code> into <code>t.v</code>. This call is needed mostly because the type of <code>t.v</code> is not an <em>immediate</em> value like an integer. So, for many values in the
<em>inflator</em> and the <em>deflator</em>, we mostly use integers.</p>
<p>Of course, at some points, we use <code>int array</code> and set them at some specific
points of the <em>inflator</em> – where we inflated the dictionary. However, the impact
of <code>caml_modify</code> is not very clear where it is commonly pretty fast.</p>
<p>Sometimes, however, it can be a real bottleneck in your computation and
this depends on how long your values live in the heap. A little program (which is
not very reproducible) can show that:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">init</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">int_of_string</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sys</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">argv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Random</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">256</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">pr</span><span class="ocaml-source"> </span><span class="ocaml-source">fmt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Format</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source"> </span><span class="ocaml-source">fmt</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">mutable</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f0</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t0</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> – </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">t0</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">as</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">     </span><span class="ocaml-source">t0</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;-</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">done</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">t0</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f1</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">ref</span><span class="ocaml-source"> </span><span class="ocaml-source">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">for</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">to</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> – </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">t1</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">v</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">as</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">
</span><span class="ocaml-source">             </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">     </span><span class="ocaml-source">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">done</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">t1</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">t1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">time0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">gettimeofday</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">ignore</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">f0</span><span class="ocaml-source"> </span><span class="ocaml-source">t0</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">time1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">gettimeofday</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">ignore</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">f1</span><span class="ocaml-source"> </span><span class="ocaml-source">t1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">time2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Unix</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">gettimeofday</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">pr</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">f0: </span><span class="ocaml-constant-character-printf">%f</span><span class="ocaml-string-quoted-double"> ns</span><span class="ocaml-constant-character-escape">\n</span><span class="ocaml-constant-character-printf">%!</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">time1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-.</span><span class="ocaml-source"> </span><span class="ocaml-source">time0</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">pr</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">f1: </span><span class="ocaml-constant-character-printf">%f</span><span class="ocaml-string-quoted-double"> ns</span><span class="ocaml-constant-character-escape">\n</span><span class="ocaml-constant-character-printf">%!</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">time2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-.</span><span class="ocaml-source"> </span><span class="ocaml-source">time1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>In our bare-metal server, if you launch the program with 1000, the <code>f0</code>
computation, even if it has <code>caml_modify</code> will be the fastest. However, if you
launch the program with 1000000000, <code>f1</code> will be the fastest.</p>
<pre><code><span class="sh-source">$ ./a.out 1000
</span><span class="sh-source">f0: 0.000006 ns
</span><span class="sh-source">f1: 0.000015 ns
</span><span class="sh-source">$ ./a.out 1000000000
</span><span class="sh-source">f0: 7.931782 ns
</span><span class="sh-source">f1: 5.719370 ns
</span></code></pre>
<h3>About <code>decompress</code></h3>
<p>At the beginning, our choice was made to have, as @dbuenzli does, mutable
structure to represent state. Then, @yallop did a big patch to update it to an
immutable state and we won 9Mb/s on <em>inflation</em>.</p>
<p>However, the new version is more focused on the <em>hot-loop</em> and it is 3
times faster than before.</p>
<p>As we said, the deal about <code>caml_modify</code> is not clear and depends a lot about
how long your data lives in the heap and how many times you want to update it.
If we localize <code>caml_modify</code> only on few places, it should be fine. But it still
is one of the most complex question about (macro?) optimization.</p>
<h2>Smaller representation</h2>
<p>We've discussed the impact that integer types can have on the use of immediate
values. More generally, the choice of type to represent your values can have
significant performance implications.</p>
<p>For example, a dictionary which associates a bits-sequence (an integer) to the
length of it <strong>AND</strong> the byte, it can be represented by a: <code>(int * int) array</code>, or
more idiomatically <code>{ len: int; byte: int; } array</code> (which is structurally the
same).</p>
<p>However, that means an allocation for each bytes to represent every bytes.
Extraction of it will need an allocation if <code>find : bits:int -&gt; { len: int; byte: int; }</code> is not inlined as we said. And about memory, the array can be
really <em>heavy</em> in your heap.</p>
<p>At this point, we used <code>spacetime</code> to show how many blocks we allocated for a
common <em>inflation</em> and we saw that we allocate a lot. The choice was made to use
a smaller representation. Where <code>len</code> can not be upper than 15 according RFC 1951
and when byte can represent only 256 possibilities (and should fit under one
byte), we can decide to merge them into one integer (which can have, at least,
31 bits).</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">static_literal_tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[|</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">140</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">76</span><span class="ocaml-source">)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source"> </span><span class="ocaml-source">|]</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">static_literal_tree</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">map</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">len</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">byte</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">len</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">lsl</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">8</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">lor</span><span class="ocaml-source"> </span><span class="ocaml-source">byte</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">static_literal_tree</span><span class="ocaml-source">
</span></code></pre>
<p>In the code above, we just translate the static dictionary (for a STATIC DEFLATE
block) to a smaller representation where <code>len</code> will be the left part of the
integer and <code>byte</code> will be the right part. Of course, it's depends on what you
want to store.</p>
<p>Another point is readability. <a href="https://github.com/mirage/ocaml-cstruct#ppx"><code>cstruct-ppx</code></a> and
<a href="https://bitbucket.org/thanatonauts/bitstring/src"><code>bitstring</code></a> can help you but <code>decompress</code>
wants to depend only on OCaml.</p>
<h2>Conclusion</h2>
<p>We conclude with some closing advice about optimising your OCaml programs:</p>
<ul>
<li>
<p><strong>Optimization is specific to your task</strong>. The points highlighted in this
article may not fit your particular problem, but they are intended to give you
ideas. Our optimizations were only possible because we completely assimilated
the ideas of <code>zlib</code> and had a clear vision of what we really needed to
optimize (like <code>blit</code>).
<br><br>
As your first project, this article can not help you a lot to optimize your
code where it's mostly about <em>micro</em>-optimization under a specific context
(<em>hot-loop</em>). But it helps you to understand what is really done by the
compiler – which is still really interesting.</p>
</li>
<li>
<p><strong>Optimise only with respect to an oracle</strong>. All optimizations were done
because we did a comparison point between the old implementation of
<code>decompress</code> and <code>zlib</code> as oracles. Optimizations can change the semantics of your
code and you should systematically take care at any step about expected
behaviors. So it's a long run.</p>
</li>
<li>
<p><strong>Use the predictability of the OCaml compiler to your advantage</strong>. For sure,
the compiler does not optimize a lot your code – but it sill produce realistic
programs if we think about performance. For many cases, <strong>you don't need</strong> to
optimize your OCaml code. And the good point is about expected behavior.
<br><br>
The mind-link between the OCaml and the assembly exists (much more than the C
and the assembly sometimes where we let the C compiler to optimize the code).
The cool fact is to keep a mental-model about what is going on on your code
easily without to be afraid by what the compiler can produce. And, in some
critical parts like <a href="https://github.com/mirage/eqaf">eqaf</a>, it's really needed.</p>
</li>
</ul>
<p>We have not discussed benchmarking, which is another hard issue: who should you
compare with? where? how? For example, a global comparison between <code>zlib</code> and
<code>decompress</code> is not very relevant in many ways – especially because of the
garbage collector. This could be another article!</p>
<p>Finally, all of these optimizations should be done by <code>flambda</code>; the difference
between compiling <code>decompress</code> with or without <code>flambda</code> is not very big. We
optimized <code>decompress</code> by hand mostly to keep compatibility with OCaml (since
<code>flambda</code> needs another switch) and, in this way, to gain an understanding of
<code>flambda</code> optimizations so that we can use it effectively!</p>
]]></description><link>https://tarides.com/blog/2019-09-13-decompress-experiences-with-ocaml-optimization</link><guid isPermaLink="false">https://tarides.com/blog/2019-09-13-decompress-experiences-with-ocaml-optimization.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Fri, 13 Sep 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[An introduction to fuzzing OCaml with AFL, Crowbar and Bun]]></title><description><![CDATA[<p><a href="https://lcamtuf.coredump.cx/afl/">American Fuzzy Lop</a> or AFL is a <em>fuzzer</em>: a program that tries to find bugs in
other programs by sending them various auto-generated inputs. This article covers the
basics of AFL and shows an example of fuzzing a parser written in OCaml. It also introduces two
extensions: the <a href="https://github.com/stedolan/crowbar/">Crowbar</a> library which can be used to fuzz any kind of OCaml program or
function and the <a href="https://github.com/yomimono/ocaml-bun/">Bun</a> tool for integrating fuzzing into your CI.</p>
<p>All of the examples given in this article are available on GitHub at
<a href="https://github.com/NathanReb/ocaml-afl-examples">ocaml-afl-examples</a>. The <code>README</code> contains all the information you need to understand,
build and fuzz them yourself.</p>
<h2>What is AFL?</h2>
<p>AFL actually isn't <em>just</em> a fuzzer but a set of tools. What makes it so good is that it doesn't just
blindly send random input to your program hoping for it to crash; it inspects the execution paths
of the program and uses that information to figure out which mutations to apply to the previous
inputs to trigger new execution paths. This approach allows for much more efficient and reliable
fuzzing (as it will try to maximize coverage) but requires the binaries to be instrumented so the
execution can be monitored.</p>
<p>AFL provides wrappers for the common C compilers that you can use to produce the instrumented
binaries along with the CLI fuzzing client: <code>afl-fuzz</code>.</p>
<p><code>afl-fuzz</code> is straight-forward to use. It takes an input directory containing a few initial valid
inputs to your program, an output directory and the instrumented binary. It will then repeatedly
mutate the inputs and feed them to the program, registering the ones that lead to crashes or
hangs in the output directory.</p>
<p>Because it works in such a way, it makes it very easy to fuzz a parser.</p>
<p>To fuzz a <code>parse.exe</code> binary, that takes a file as its first command-line argument and parses it,
you can invoke <code>afl-fuzz</code> in the following way:</p>
<pre><code>$ afl-fuzz -i inputs/ -o findings/ /path/to/parse.exe @@
</code></pre>
<p>The <code>findings/</code> directory is where <code>afl-fuzz</code> will write the crashes it finds, it will create it
for you if it doesn't exist.
The <code>inputs/</code> directory contains one or more valid input files for your
program. By valid we mean "that don't crash your program".
Finally the <code>@@</code> part tells <code>afl-fuzz</code> where on the command line the input file should be passed to
your program, in our case, as the first argument.</p>
<p>Note that it is possible to supply <code>afl-fuzz</code> with more detail about how to invoke your program. If
you need to pass it command-line options for instance, you can run it as:</p>
<pre><code>$ afl-fuzz -i inputs/ -o findings/ -- /path/to/parse.exe --option=value @@
</code></pre>
<p>If you wish to fuzz a program that takes its input from standard input, you can also do that by removing the
<code>@@</code> from the <code>afl-fuzz</code> invocation.</p>
<p>Once <code>afl-fuzz</code> starts, it will draw a fancy looking table on the standard output to keep you
updated about its progress. From there, you'll mostly be interested in is the top right
corner which contains the number of crashes and hangs it has found so far:</p>
<p><img sizes="(min-width: 1360px) 1360px, (min-width: 680px) 680px, 100vw" srcset="/blog/images/afl_example_output-170w~m_tUE5r-dIrCM1qfynSjNw.webp 170w, /blog/images/afl_example_output-340w~4_TlMva_T5YIpDoZB9vHNA.webp 340w, /blog/images/afl_example_output-680w~6gqti4rAdudTMiFBVRrjag.webp 680w, /blog/images/afl_example_output-1360w~C7XYAfj1B9ySn39c6sFGvg.webp 1360w" src="/blog/images/afl_example_output-1360w~C7XYAfj1B9ySn39c6sFGvg.webp" alt="Example output from afl-fuzz"></p>
<p>You might need to change some of your CPU settings to achieve better performance while fuzzing.
<code>afl-fuzz</code>'s output will tell you if that's the case and guide you through the steps required to
make that happen.</p>
<h2>Using AFL to fuzz an OCaml parser</h2>
<p>First of all, if you want to fuzz an OCaml program with AFL you'll need to produce an instrumented
binary. <code>afl-fuzz</code> has an option to work with regular binaries but you'd lose a lot of what makes it
efficient. To instrument your binary you can simply install a <code>+afl</code> opam switch and build your
executable from there. AFL compiler variants are available from OCaml <code>4.05.0</code> onwards. To install such
a switch you can run:</p>
<pre><code>$ opam switch create fuzzing-switch 4.07.1+afl
</code></pre>
<p>If your program already parses the standard input or a file given to it via the command line, you
can simply build the executable from your <code>+afl</code> switch and adapt the above examples. If it doesn't,
it's still easy to fuzz any parsing function.</p>
<p>Imagine we have a <code>simple-parser</code> library which exposes the following <code>parse_int</code> function:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">parse_int</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-support-type">int</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Msg</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">]</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">result</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> Parse the given string as an int or return [Error (`Msg _)].
</span><span class="ocaml-comment-doc">    Does not raise, usually... </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span></code></pre>
<p>We want to use AFL to make sure our function is robust and won't crash when receiving unexpected
inputs. As you can see the function returns a result and isn't supposed to raise exceptions. We want
to make sure that's true.</p>
<p>To find crashes, AFL traps the signals sent by your program. That means that it will consider
uncaught OCaml exceptions as crashes. That's good because it makes it really simple to write a
<code>fuzz_me.ml</code> executable that fits what <code>afl-fuzz</code> expects:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">file</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Sys</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">argv</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">open_in</span><span class="ocaml-source"> </span><span class="ocaml-source">file</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">length</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">in_channel_length</span><span class="ocaml-source"> </span><span class="ocaml-source">ic</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">content</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">really_input_string</span><span class="ocaml-source"> </span><span class="ocaml-source">ic</span><span class="ocaml-source"> </span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">close_in</span><span class="ocaml-source"> </span><span class="ocaml-source">ic</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">ignore</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Simple_parser</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">parse_int</span><span class="ocaml-source"> </span><span class="ocaml-source">content</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>We have to provide example inputs to AFL so we can write a <code>valid</code> file to the <code>inputs/</code> directory
containing <code>123</code> and an <code>invalid</code> file containing <code>not an int</code>. Both should parse without crashing
and make good starting point for AFL as they should trigger different execution paths.</p>
<p>Because we want to make sure AFL does find crashes we can try to hide a bug in our function:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">parse_int</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">init</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-string-quoted-single">'a'</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'b'</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'c'</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">failwith</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">secret crash</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">int_of_string_opt</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-polymorphic-variant">`Msg</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Printf</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sprintf</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Not an int: </span><span class="ocaml-constant-character-printf">%S</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">s</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ok</span><span class="ocaml-source"> </span><span class="ocaml-source">i</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Now we just have to build our native binary from the right switch and let <code>afl-fuzz</code> do the rest:</p>
<pre><code>$ afl-fuzz -i inputs/ -o findings/ ./fuzz_me.exe @@
</code></pre>
<p>It should find that the <code>abc</code> input leads to a crash rather quickly. Once it does, you'll see it in
the top right corner of its output as shown in the picture from the previous section.</p>
<p>At this point you can interrupt <code>afl-fuzz</code> and have a look at the content of the <code>findings/crashes</code>:</p>
<pre><code><span class="sh-source">$ ls findings/crashes/
</span><span class="sh-source">id:000000,sig:06,src:000111,op:havoc,rep:16  README.txt
</span></code></pre>
<p>As you can see it contains a <code>README.txt</code> which will give you some details about the <code>afl-fuzz</code>
invocation used to find the crashes and how to reproduce them in the folder and a file of the form
<code>id:...,sig:...,src:...,op:...,rep:...</code> per crash it found. Here there's just one:</p>
<pre><code><span class="sh-source">$ cat findings/crashes/id:000000,sig:06,src:000111,op:havoc,rep:16
</span><span class="sh-source">abc
</span></code></pre>
<p>As expected it contains our special input that triggers our secret crash. We can rerun the program
with that input ourselves to make sure it does trigger it:</p>
<pre><code><span class="sh-source">$ ./fuzz_me.exe findings/crashes/id:000000,sig:06,src:000111,op:havoc,rep:16
</span><span class="sh-source">Fatal error: exception Failure</span><span class="sh-punctuation-definition-subshell">(</span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">secret crash</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-punctuation-definition-subshell">)</span><span class="sh-source">
</span></code></pre>
<p>No surprise here, it does trigger our uncaught exception and crashes shamefully.</p>
<h2>Using Crowbar and AFL for property-based testing</h2>
<p>This works well but only being able to fuzz parsers is quite a limitation. That's where <a href="https://github.com/stedolan/crowbar/">Crowbar</a>
comes into play.</p>
<p>Crowbar is a property-based testing framework. It's much like Haskell's <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a>.
To test a given function, you define how its arguments are shaped, a set of properties the result
should satisfy and it will make sure they hold with any combinations of randomly generated
arguments.
Let's clarify that with an example.</p>
<p>I wrote a library called <code>Awesome_list</code> and I want to test its <code>sort</code> function:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">sort</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> Sorts the given list of integers. Result list is sorted in increasing
</span><span class="ocaml-comment-doc">    order, most of the time... </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span></code></pre>
<p>I want to make sure it really works so I'm going to use Crowbar to generate a whole lot of
lists of integers and verify that when I sort them with <code>Awesome_list.sort</code> the result is, well...
sorted.</p>
<p>We'll write our tests in a <code>fuzz_me.ml</code> file.
First we need to tell Crowbar how to generate arguments for our function. It exposes some
combinators to help you do that:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">int_list</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Crowbar</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">list</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">range</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">10</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>Here we're telling Crowbar to generate lists of any size, containing integers ranging from 0
to 10. Crowbar also exposes more complex and custom generator combinators so don't worry,
you can use it to build more complex arguments.</p>
<p>Now we need to define our property. Once again it's pretty simple, we just want the output to be
sorted:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">is_sorted</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">is_sorted</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-list">[]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-language">_</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">hd</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">hd'</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">as</span><span class="ocaml-source"> </span><span class="ocaml-source">tl</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">hd</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">hd'</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-source">is_sorted</span><span class="ocaml-source"> </span><span class="ocaml-source">tl</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Crowbar</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">check</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">is_sorted</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>All that's left to do now is to register our test:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Crowbar</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add_test</span><span class="ocaml-source"> ~</span><span class="ocaml-source">name</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Awesome_list.sort</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source">int_list</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">is_sorted</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Awesome_list</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sort</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>and to compile that <code>fuzz_me.ml</code> file to a binary. Crowbar will take care of the magic.</p>
<p>We can run that binary in "Quickcheck" mode where it will either try a certain amount of random
inputs or keep trying until one of the properties breaks depending on the command-line options
we pass it.
What we're interested in here is its less common "AFL" mode. Crowbar made it so our executable
can be used with <code>afl-fuzz</code> just like that:</p>
<pre><code><span class="sh-source">$ afl-fuzz -i inputs -o findings -- ./fuzz_me.exe @@
</span></code></pre>
<p>What will happen then is that our <code>fuzz_me.exe</code> binary will read the inputs provided by <code>afl-fuzz</code>
and use it to determine which test to run and how to generate the arguments to pass to our function.
If the properties are satisfied, the binary will exit normally; if they aren't, it will make sure
that <code>afl-fuzz</code> interprets that as a crash by raising an exception.</p>
<p>A nice side-effect of Crowbar's approach is that <code>afl-fuzz</code> will still be able to pick up
crashes. For instance, if we implement <code>Awesome_list.sort</code> as:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">sort</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">failwith</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">secret crash</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">6</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-numeric-decimal-integer">6</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sort</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Pervasives</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare</span><span class="ocaml-source"> </span><span class="ocaml-source">l</span><span class="ocaml-source">
</span></code></pre>
<p>and use AFL and Crowbar to fuzz-test our function, it will find two crashes: one for the input
<code>[1; 2; 3]</code> which triggers a crash and one for <code>[4; 5; 6]</code> for which the <code>is_sorted</code>
property won't hold.</p>
<p>The content of the input files found by <code>afl-fuzz</code> itself won't be of much help as it needs to be
interpreted by Crowbar to build the arguments that were passed to the function to trigger the bug.
We can invoke the <code>fuzz_me.exe</code> binary ourselves on one of the files in <code>findings/crashes</code>
and the Crowbar binary will replay the test and give us some more helpful information about what
exactly is going on:</p>
<pre><code><span class="sh-source">$ ./fuzz_me.exe findings/crashes/id</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">000000</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">sig</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">06</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">src</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">000011</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">op</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">flip1</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">pos</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">5
</span><span class="sh-source">Awesome_list.sort: ....
</span><span class="sh-source">Awesome_list.sort: FAIL
</span><span class="sh-source">
</span><span class="sh-source">When given the input:
</span><span class="sh-source">
</span><span class="sh-source">    [1</span><span class="sh-keyword-operator-list">;</span><span class="sh-source"> 2</span><span class="sh-keyword-operator-list">;</span><span class="sh-source"> 3]
</span><span class="sh-source">
</span><span class="sh-source">the </span><span class="sh-support-function-builtin">test</span><span class="sh-source"> threw an exception:
</span><span class="sh-source">
</span><span class="sh-source">    Failure</span><span class="sh-punctuation-definition-subshell">(</span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">secret crash</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-punctuation-definition-subshell">)</span><span class="sh-source">
</span><span class="sh-source">    Raised at file </span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">stdlib.ml</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-source">, line 33, characters 17-33
</span><span class="sh-source">    Called from file </span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">awesome-list/fuzz/fuzz_me.ml</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-source">, line 11, characters 78-99
</span><span class="sh-source">    Called from file </span><span class="sh-punctuation-definition-string-begin">"</span><span class="sh-string-quoted-double">src/crowbar.ml</span><span class="sh-punctuation-definition-string-end">"</span><span class="sh-source">, line 264, characters 16-19
</span><span class="sh-source">
</span><span class="sh-source">Fatal error: exception Crowbar.TestFailure
</span><span class="sh-source">$ ./fuzz_me.exe findings/crashes/id</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">000001</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">sig</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">06</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">src</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">000027</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">op</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">arith16</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">pos</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">5</span><span class="sh-constant-character-escape">\,</span><span class="sh-source">val</span><span class="sh-constant-character-escape">\:</span><span class="sh-source">+7
</span><span class="sh-source">Awesome_list.sort: ....
</span><span class="sh-source">Awesome_list.sort: FAIL
</span><span class="sh-source">
</span><span class="sh-source">When given the input:
</span><span class="sh-source">
</span><span class="sh-source">    [4</span><span class="sh-keyword-operator-list">;</span><span class="sh-source"> 5</span><span class="sh-keyword-operator-list">;</span><span class="sh-source"> 6]
</span><span class="sh-source">
</span><span class="sh-source">the </span><span class="sh-support-function-builtin">test</span><span class="sh-source"> failed:
</span><span class="sh-source">
</span><span class="sh-source">    check </span><span class="sh-support-function-builtin">false</span><span class="sh-source">
</span><span class="sh-source">
</span><span class="sh-source">Fatal error: exception Crowbar.TestFailure
</span></code></pre>
<p>We can see the actual inputs as well as distinguish the one that broke the invariant from the one
that triggered a crash.</p>
<h2>Using <code>bun</code> to run fuzz testing in CI</h2>
<p>While AFL and Crowbar provide no guarantees they can give you confidence that your implementation
is not broken. Now that you know how to use them, a natural follow-up is to want to run fuzz tests
in your CI to enforce that level of confidence.</p>
<p>Problem is, AFL isn't very CI friendly. First it has this refreshing output that isn't going to look
great on your travis builds output and it doesn't tell you much besides that it could or couldn't find
crashes or invariant infrigements</p>
<p>Hopefully, like most problems, this one has a solution:
<a href="https://github.com/yomimono/ocaml-bun/"><code>bun</code></a>.
<code>bun</code> is a CLI wrapper around <code>afl-fuzz</code>, written in OCaml, that helps you get the best out of AFL
effortlessly. It mostly does two things:</p>
<p>The first is that it will run several <code>afl-fuzz</code> processes in parallel
(one per core by default). <code>afl-fuzz</code> starts with a bunch of deterministic steps. In my experience,
using parallel processes during this phase rarely proved very useful as they tend to find the same
bugs or slight variations of those bugs. It only achieves its full potential in the second phase of
fuzzing.</p>
<p>The second thing, which is the one we're the most interested in, is that <code>bun</code> provides a useful
and CI-friendly summary of what's going on with all the fuzzing processes so far. When one of them
finds a crash, it will stop all processes and pretty-print all of the bug-triggering inputs to help
you reproduce and debug them locally. See an example <code>bun</code> output after a crash was found:</p>
<pre><code>Crashes found! Take a look; copy/paste to save for reproduction:
1432	echo JXJpaWl0IA== | base64 -d &gt; crash_0.$(date -u +%s)
1433	echo NXJhkV8QAA== | base64 -d &gt; crash_1.$(date -u +%s)
1434	echo J3Jh//9qdGFiYmkg | base64 -d &gt; crash_2.$(date -u +%s)
1435	09:35.32:[ERROR]All fuzzers finished, but some crashes were found!
</code></pre>
<p>Using <code>bun</code> is very similar to using <code>afl-fuzz</code>. Going back to our first parser example, we can
fuzz it with <code>bun</code> like this:</p>
<pre><code>$ bun --input inputs/ --output findings/ /path/to/parse.exe
</code></pre>
<p>You'll note that you don't need to provide the <code>@@</code> anymore. <code>bun</code> assumes that it should pass the
input as the first argument of your to-be-fuzzed binary.</p>
<p><code>bun</code> also comes with an alternative <code>no-kill</code> mode which lets all the fuzzers run indefinitely
instead of terminating them whenever a crash is discovered. It will regularly keep you updated on
the number of crashes discovered so far and when terminated will pretty-print each of them just like
it does in regular mode.</p>
<p>This mode can be convenient if you suspect your implementation may contain a lot of bugs and
you don't want to go through the whole process of fuzz testing it to only find a single bug.</p>
<p>You can use it in CI by running <code>bun --no-kill</code> via <code>timeout</code>. For instance:</p>
<pre><code>timeout --preserve-status 60m bun --no-kill --input inputs --output findings ./fuzz_me.exe
</code></pre>
<p>will fuzz <code>fuzz_me.exe</code> for an hour no matter what happens. When <code>timeout</code> terminates <code>bun</code>, it will
provide you with a handful of bugs to fix!</p>
<h2>Final words</h2>
<p>I really want to encourage you to use those tools and fuzzing in general.
Crowbar and <code>bun</code> are fairly new so you will probably encounter bugs or find that it lacks a feature
you want but combined with AFL they make for very nice tools to effectively test
critical components of your OCaml code base or infrastructure and detect newly-introduced bugs.
They are already used accross the MirageOS ecosystem where it has been used to fuzz the TCP/IP stack
<a href="https://github.com/mirage/mirage-tcpip">mirage-tcpip</a> and the DHCP implementation <a href="https://github.com/mirage/charrua">charrua</a> thanks to
<a href="https://github.com/yomimono/somerandompacket">somerandompacket</a>.
You can consult Crowbar's <a href="https://github.com/stedolan/crowbar/issues/2">hall of fame</a> to find out about bugs uncovered by this
approach.</p>
<p>I also encourage anyone interested to join us in using this promising toolchain, report those bugs,
contribute those extra features and help the community build more robust software.</p>
<p>Finally if you wish to learn more about how to efficienly use fuzzing for testing I recommend the
excellent <a href="https://blog.regehr.org/archives/1687">Write Fuzzable Code</a> article by John Regehr.</p>
]]></description><link>https://tarides.com/blog/2019-09-04-an-introduction-to-fuzzing-ocaml-with-afl-crowbar-and-bun</link><guid isPermaLink="false">https://tarides.com/blog/2019-09-04-an-introduction-to-fuzzing-ocaml-with-afl-crowbar-and-bun.html</guid><dc:creator><![CDATA[ Nathan Rebours ]]></dc:creator><pubDate>Wed, 04 Sep 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Decompress: The New Decompress API]]></title><description><![CDATA[<p><a href="https://tools.ietf.org/html/rfc1951">RFC 1951</a> is one of the most used standards. Indeed,
when you launch your Linux kernel, it inflates itself according <a href="https://zlib.net/">zlib</a>
standard, a superset of RFC 1951. Being a widely-used standard, we decided to
produce an OCaml implementation. In the process, we learned many lessons about
developing OCaml where we would normally use C. So, we are glad to present
<a href="https://github.com/mirage/decompress"><code>decompress</code></a>.</p>
<p>One of the many users of RFC 1951 is <a href="https://git-scm.com/">Git</a>, which uses it to pack data
objects into a <a href="https://git-scm.com/book/en/v2/Git-Internals-Packfiles">PACK file</a>. At the request of <a href="https://github.com/samoht">@samoht</a>,
<code>decompress</code> appeared some years ago as a Mirage-compatible replacement for zlib
to be used for compiling a <a href="https://mirage.io/">MirageOS</a> unikernel with
<a href="https://github.com/mirage/ocaml-git/">ocaml-git</a>. Today, this little project passes a major release with
substantial improvements in several domains.</p>
<p><code>decompress</code> provides an API for inflating and deflating <em>flows</em><code>[1]</code>. The main
goal is to provide a <em>platform-agnostic</em> library: one which may be compiled on
any platform, including JavaScript. We surely cannot be faster than C
implementations like <a href="https://github.com/facebook/zstd">zstd</a> or <a href="https://github.com/lz4/lz4">lz4</a>, but we can play some
optimisation tricks to help bridge the gap. Additionally, OCaml can protect the
user against lot of bugs via the type-system <em>and</em> the runtime too (e.g. using
array bounds checking). <a href="https://github.com/mirleft/ocaml-tls"><code>ocaml-tls</code></a> was implemented partly in
response to the famous <a href="https://en.wikipedia.org/wiki/Heartbleed">failure</a> of <code>openssl</code>; a vulnerability
which could not exist in OCaml.</p>
<p><code>[1]</code>: A <em>flow</em>, in MirageOS land, is an abstraction which wants to receive
and/or transmit something under a standard. So it's usual to say a <em>TLS-flow</em>
for example.</p>
<h2>API design</h2>
<p>The API should be the most difficult part of designing a library - it reveals
what we can do and how we should do it. In this way, an API should:</p>
<ol>
<li>
<p><strong>constrain the user to avoid security issues</strong>; too much freedom can be a bad
thing. As an example, consider the <code>Hashtbl.create</code> function, which allows the
user to pass <code>~random:false</code> to select a fixed hash function. The resulting
hashtable suffers deterministic key collisions, which can be exploited by an
attacker.
<br><br>
An example of good security-awareness in API design can be seen in
<a href="https://github.com/mirage/digestif">digestif</a>, which provided an <code>unsafe_compare</code> instead of the common
<code>compare</code> function (before <code>eqaf.0.5</code>). In this way, it enforced the user to
create an alias of it if they want to use a hash in a <code>Map</code> – however, by this
action, they should know that they are not protected against a timing-attack.</p>
</li>
<li>
<p><strong>allow some degrees of freedom to fit within many environments</strong>; a
constrained API cannot support a hostile context. For example, when compiling
to an <a href="https://mirage.io/blog/2018-esp32-booting">ESP32</a> target, even small details such as the length of a stream
input buffer must be user-definable. When deploying to a server, memory
consumption should be deterministic.
<br><br>
Of course, this is made difficult when too much freedom will enable misuse of
the API – an example is <a href="https://github.com/ocaml/dune">dune</a> which wants consciously to limit the user
about what they can do with it.</p>
</li>
<li>
<p><strong>imply an optimal design of how to use it</strong>. Possibilities should serve the
user, but these can make the API harder to understand; this is why
documentation is important. Your API should tell your users how it wants to
be treated.</p>
</li>
</ol>
<h3>A dbuenzli API</h3>
<p>From our experiences with protocol/format, one design stands out: the
<em><a href="https://github.com/dbuenzli/">dbuenzli</a> API</em>. If you look into some famous libraries in the OCaml
eco-system, you probably know <a href="https://github.com/dbuenzli/uutf">uutf</a>, <a href="https://github.com/dbuenzli/jsonm">jsonm</a> or <a href="https://github.com/dbuenzli/xmlm">xmlm</a>. All
of these libraries provide the same API for computing a Unicode/JSON/XML flow –
of course, the details are not the same.</p>
<p>From a MirageOS perspective, even if they use the <code>in_channel</code>/<code>out_channel</code>
abstraction rather than a <a href="https://github.com/mirage/mirage-flow">Mirage flow</a>, these libraries
are system-agnostic since they let the user to choose input and output buffers.
Most importantly, they don't use the standard OCaml <code>Unix</code> module, which cannot
be used in a unikernel.</p>
<p>The APIs are pretty consistent and try to do their <em>best-effort</em><code>[2]</code> of
decoding. The design has a type <em>state</em> which represents the current system
status; the user passes this to <code>decode</code>/<code>encode</code> to carry out the processing.
Of course, these functions have a side-effect on the state internally, but
this is hidden from the user. One advantage of including states in a design is
that the underlying implementation is very amenable to compiler optimisations (e.g.
tail-call optimisation). Internally, of course, we have a <em>porcelain</em><code>[3]</code>
implementation where any details can have an rational explanation.</p>
<p>In the beginning, <code>decompress</code> wanted to follow the same interface without the
mutability (a choice about performances) and it did. Then, the hard test was to
use it in a bigger project; in this case, <a href="https://github.com/mirage/ocaml-git/">ocaml-git</a>. An iterative
process was used to determine what was really needed, what we should not provide
(like special cases) and what we should provide to reach an uniform API that is
not too difficult to understand.</p>
<p>From this experience, we finalised the initial <code>decompress</code> API and it did not
change significantly for 4 versions (2 years).</p>
<p><code>[2]</code>: <em>best-effort</em> means an user control on the error branch where we don't
leak exception (or more generally, any interrupts)</p>
<p><code>[3]</code>: <em>porcelain</em> means implicit invariants held in the mind of the programmer
(or the assertions/comments).</p>
<h2>The new <code>decompress</code> API</h2>
<p>The new <code>decompress</code> keeps the same inflation logic, but drastically changes the
deflator to make the <em>flush</em> operation clearer. For many purposes, people don't
want to hand-craft their compressed flows – they just want
<code>of_string</code>/<code>to_string</code> functions. However, in some contexts (like a PNG
encoder/decoder), the user should be able to play with <code>decompress</code> in detail
(OpenPGP needs this too in <a href="https://tools.ietf.org/html/rfc4880">RFC 4880</a>).</p>
<h3>The Zlib format</h3>
<p>Both <code>decompress</code> and zlib use <em><a href="https://zlib.net/feldspar.html">Huffman coding</a></em>, an algorithm
for building a dictionary of variable-length codewords for a given set of
symbols (in this case, bytes). The most common byte is assigned the shortest bit
sequence; less common bytes are assigned longer codewords. Using this
dictionary, we just translate each byte into its codeword and we should achieve
a good compression ratio. Of course, there are other details, such as the fact
that all Huffman codes are <a href="https://en.wikipedia.org/wiki/Prefix_code">prefix-free</a>. The compression can be
taken further with the <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">LZ77</a> algorithm.</p>
<p>The <em><a href="https://zlib.net/">zlib</a></em> format, a superset of the <a href="https://tools.ietf.org/html/rfc1951">RFC 1951</a> format, is easy
to understand. We will only consider the RFC 1951 format, since zlib adds only
minor details (such as checksums). It consists of several blocks: DEFLATE
blocks, each with a little header, and the contents. There are 3 kinds of
DEFLATE blocks:</p>
<ul>
<li>a FLAT block; no compression, just a <em>blit</em> from inputs to the current block.</li>
<li>a FIXED block; compressed using a pre-computed Huffman code.</li>
<li>a DYNAMIC block; compressed using a user-specified Huffman code.</li>
</ul>
<p>The FIXED block uses a Huffman dictionary that is computed when the OCaml runtime
is initialised. DYNAMIC blocks use dictionaries specified by the user, and so
these must be transmitted alongside the data (<em>after being compressed with
another Huffman code!</em>). The inflator decompresses this DYNAMIC dictionary and uses
it to do the <em>reverse</em> translation from bit sequences to bytes.</p>
<h3>Inflator</h3>
<p>The design of the inflator did not change a lot from the last version of
<code>decompress</code>. Indeed, it's about to take an input, compute it and return an
output like a flow. Of course, the error case can be reached.</p>
<p>So the API is pretty-easy:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Await</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Flush</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`End</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Malformed</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>As you can see, we have 4 cases: one which expects more inputs (<code>Await</code>), the
second which asks to the user to flush internal buffer (<code>Flush</code>), the <code>End</code> case
when we reach the end of the flow and the <code>Malformed</code> case when we encounter an
error.</p>
<p>For each case, the user can do several operations. Of course, about the <code>Await</code>
case, they can refill the contents with an other inputs buffer with:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">bigstring</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">off</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">len</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">
</span></code></pre>
<p>This function provides the decoder a new input with <code>len</code> bytes to read
starting at <code>off</code> in the given <code>bigstring</code>.</p>
<p>In the <code>Flush</code> case, the user wants some information like how many bytes are
available in the current output buffer. Then, we should provide an action to
<em>flush</em> this output buffer. In the end, this output buffer should be given by
the user (how many bytes they want to allocate to store outputs flow).</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Channel</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">in_channel</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Manual</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`String</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">dst_rem</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">flush</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">src</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">o</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">bigstring</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">w</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">bigstring</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source">
</span></code></pre>
<p>The last function, <code>decoder</code>, is the most interesting. It lets the user, at the
beginning, choose the context in which they want to inflate inputs. So they
choose:</p>
<ul>
<li><code>src</code>, where come from inputs flow</li>
<li><code>o</code>, output buffer</li>
<li><code>w</code>, window buffer</li>
</ul>
<p><code>o</code> will be used to store inflated outputs, <code>dst_rem</code> will give to us how many
bytes inflator stored in <code>o</code> and <code>flush</code> will just set <code>decoder</code> to be able to
recompute the flow.</p>
<p><code>w</code> is needed for <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">lz77</a> compression. However, as we said, we let
the user give us this intermediate buffer. The idea behind that is to let the
user prepare an <em>inflation</em>. For example, in <a href="https://github.com/mirage/ocaml-git/">ocaml-git</a>, instead of
allocating <code>w</code> systematically when we want to decompress a Git object, we
allocate <code>w</code> one time per threads and all are able to use it and <strong>re-use</strong> it.
In this way, we avoid systematic allocations (and allocate only once time) which
can have a serious impact about performances.</p>
<p>The design is pretty close to one idea, a <em>description</em> step by the <code>decoder</code>
function and a real computation loop with the <code>decode</code> function. The idea is to
prepare the inflation with some information (like <code>w</code> and <code>o</code>) before the main
(and the most expensive) computation. Internally we do that too (but it's mostly
about a macro-optimization).</p>
<p>It's the purpose of OCaml in general, be able to have a powerful way to describe
something (with constraints). In our case, we are very limited to what we need
to describe. But, in others libraries like <a href="https://github.com/inhabitedtype/angstrom">angstrom</a>, the description
step is huge (describe the parser according to the BNF) and then, we use it to
the main computation, in the case of angstrom, the parsing (another
example is [cmdliner][cmdliner]).</p>
<p>This is why <code>decoder</code> can be considered as the main function where <code>decode</code> can
be wrapped under a stream.</p>
<h3>Deflator</h3>
<p>The deflator is a new (complex) deal. Indeed, behind it we have two concepts:</p>
<ul>
<li>the encoder (according to RFC 1951)</li>
<li>the compressor</li>
</ul>
<p>For this new version of <code>decompress</code>, we decide to separate these concepts where
one question leads all: how to put my compression algorithm? (instead to use
<a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">LZ77</a>).</p>
<p>In fact, if you are interested in compression, several algorithms exist and, in
some context, it's preferable to use <a href="https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm">lzwa</a> for example or rabin's
fingerprint (with <a href="https://github.com/mirage/duff">duff</a>), etc.</p>
<h4>Functor</h4>
<p>The first idea was to provide a <em>functor</em> which expects an implementation of the
compression algorithm. However, the indirection of a functor comes with (big)
performance cost. Consider the following functor example:</p>
<pre><code><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">sig</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">type</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">add</span><span class="ocaml-source"> : t -&gt; t -&gt; t
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">val</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">one</span><span class="ocaml-source"> : t
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">S</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">one</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">include</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Make</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">struct</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">add</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">one</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">end</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">succ</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<p>Currently, with OCaml 4.07.1, the <code>f</code> function will be a <code>caml_apply2</code>. We might
wish for a simple <a href="https://en.wikipedia.org/wiki/Inline_expansion"><em>inlining</em></a> optimisation, allowing <code>f</code> to become an
<code>addq</code> instruction (indeed, <a href="https://caml.inria.fr/pub/docs/manual-ocaml/flambda.html"><code>flambda</code></a> does this), but optimizing
functors is hard. As we learned from <a href="https://github.com/chambart">Pierre Chambart</a>, it is possible
for the OCaml compiler to optimize functors directly, but this requires
respecting several constraints that are difficult to respect in practice.</p>
<h4>Split encoder and compressor</h4>
<p>So, the choice was done to made the encoder which respects RFC 1951 and the
compressor under some constraints. However, this is not what <a href="https://zlib.net/">zlib</a> did
and, by this way, we decided to provide a new design/API which did not follow,
in first instance, zlib (or some others implementations like
<a href="https://github.com/richgel999/miniz">miniz</a>).</p>
<p>To be fair, the choice from zlib and miniz comes from the first
point about API and the context where they are used. The main problem is the
shared queue between the encoder and the compressor. In C code, it can be hard
for the user to deal with it (where they are liable for buffer overflows).</p>
<p>In OCaml and for <code>decompress</code>, the shared queue can be well-abstracted and API
can ensure assumptions (like bounds checking).</p>
<p>Even if this design is much more complex than before, coverage tests are better
where we can separately test the encoder and the compressor. It breaks down the
initial black-box where compression was intrinsec with encoding – which was
error-prone. Indeed, <code>decompress</code> had a bug about generation of
Huffman codes but we never reached it because the (bad)
compressor was not able to produce something (a specific lengh with a specific
distance) to get it.</p>
<p>NOTE: you have just read the main reason for the new version of <code>decompress</code>!</p>
<h4>The compressor</h4>
<p>The compressor is the most easy part. The goal is to produce from an inputs
flow, an outputs flow which is an other (more compacted) representation. This
representation consists to:</p>
<ul>
<li>A <em>literal</em>, the byte as is</li>
<li>A <em>copy</em> code with an <em>offset</em> and a <em>length</em></li>
</ul>
<p>The last one say to copy <em>length</em> byte(s) from <em>offset</em>. For example, <code>aaaa</code> can
be compressed as <code>[ Literal 'a'; Copy (offset:1, len:3) ]</code>. By this way, instead
to have 4 bytes, we have only 2 elements which will be compressed then by an
<a href="https://zlib.net/feldspar.html">Huffman coding</a>. This is the main idea of the <a href="https://en.wikipedia.org/wiki/LZ77_and_LZ78">lz77</a>
compression.</p>
<p>However, the compressor should need to deal with the encoder. An easy interface,
<em>à la <a href="https://github.com/dbuenzli/uutf">uutf</a></em> should be:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">compress</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">state</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Literal</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">char</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Copy</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`End</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Await</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>But as I said, we need to feed a queue instead.</p>
<hr>
<p>At this point, the purpose of the queue is not clear and not really explained.
The signature above still is a valid and understandable design. Then, we can
imagine passing <code>Literal</code> and <code>Copy</code> directly to the encoder. However, we should
(for performance purpose) use a delaying tactic between the compressor and the
deflator[^4].</p>
<p>Behind this idea, it's to be able to implement an <em>hot-loop</em> on the encoder
which will iter inside the shared queue and <em>transmit</em>/<em>encode</em> contents
directly to the outputs buffer.</p>
<hr>
<p>So, when we make a new <code>state</code>, we let the user supply their queue:</p>
<pre><code>val state : src -&gt; w:bistring -&gt; q:queue -&gt; state
val compress : state -&gt; [ `Flush | `Await | `End ]
</code></pre>
<p>The <code>Flush</code> case appears when the queue is full. Then, we refind the <code>w</code> window
buffer which is needed to produce the <code>Copy</code> code. A <em>copy code</em> is limited
according RFC 1951 where <em>offset</em> can not be upper than the length of the window
(commonly 32ko). <em>length</em> is limited too to <code>258</code> (an arbitrary choice).</p>
<p>Of course, about the <code>Await</code> case, the compressor comes with a <code>src</code> function as
the inflator. Then, we added some accessors, <code>literals</code> and <code>distances</code>. The
compressor does not build the <a href="https://zlib.net/feldspar.html">Huffman coding</a> which needs
frequencies, so we need firstly to keep counters about that inside the state and
a way to get them (and pass them to the encoder).</p>
<p><code>[4]</code>: About that, you should be interesting by the reason of <a href="https://www.reddit.com/r/unix/comments/6gxduc/how_is_gnu_yes_so_fast/">why GNU yes is so
fast</a> where the secret is just about buffering.</p>
<h4>The encoder</h4>
<p>Finally, we can talk about the encoder which will take the shared queue filled
by the compressor and provide an RFC 1951 compliant output flow.</p>
<p>However, we need to consider a special <em>detail</em>. When we want to make a
DYNAMIC block from frequencies and then encode the inputs flow, we can reach a
case where the shared queue contains an <em>opcode</em> (a <em>literal</em> or a <em>copy</em>) which
does not appear in our dictionary.</p>
<p>In fact, if we want to encode <code>[ Literal 'a'; Literal 'b' ]</code>, we will not try to
make a dictionary which will contains the 256 possibilities of a byte but we
will only make a dictionary from frequencies which contains only <code>'a'</code> and
<code>'b'</code>. By this way, we can reach a case where the queue contains an <em>opcode</em>
(like <code>Literal 'c'</code>) which can not be encoded by the <em>pre-determined</em>
Huffman coding – remember, the DYNAMIC block <strong>starts</strong> with
the dictionary.</p>
<p>Another point is about inputs. The encoder expects, of course, contents from
the shared queue but it wants from the user the way to encode contents: which
block we want to emit. So it has two entries:</p>
<ul>
<li>the shared queue</li>
<li>an <em>user-entry</em></li>
</ul>
<p>So for many real tests, we decided to provide this kind of API:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Channel</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">out_channel</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Buffer</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Manual</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">encoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">q</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">queue</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">encoder</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">encode</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">encoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Block</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">block</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Flush</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Await</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Ok</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Partial</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Block</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">dst</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">encoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">bigstring</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">off</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">len</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">unit</span><span class="ocaml-source">
</span></code></pre>
<p>As expected, we take the shared queue to make a new encoder. Then, we let the
user to specify which kind of block they want to encode by the <code>Block</code>
operation.</p>
<p>The <code>Flush</code> operation tries to encode all elements present inside the shared
queue according to the current block and feed the outputs buffer. From it, the
encoder can returns some values:</p>
<ul>
<li><code>Ok</code> and the encoder encoded all <em>opcode</em> from the shared queue</li>
<li><code>Partial</code>, the outputs buffer is not enough to encode all <em>opcode</em>, the user
should flush it and give to us a new empty buffer with <code>dst</code>. Then, they must
continue with the <code>Await</code> operation.</li>
<li><code>Block</code>, the encoder reachs an <em>opcode</em> which can not be encoded with the
current block (the current dictionary). Then, the user must continue with a new
<code>Block</code> operation.</li>
</ul>
<p>The hard part is about the <em>ping-pong</em> game between the user and the encoder
where a <code>Block</code> expects a <code>Block</code> response from the user and a <code>Partial</code> expects
an <code>Await</code> response. But this design reveals something higher about zlib
this time: the <em>flush</em> mode.</p>
<h4>The <em>flush</em> mode</h4>
<p>Firstly, we talk about <em>mode</em> because zlib does not allow the user to
decide what they want to do when we reach a <code>Block</code> or a <code>Ok</code> case. So, it
defines some <a href="https://www.bolet.org/~pornin/deflate-flush.html">under-specified <em>modes</em></a> to apply a policy of what
to do in this case.</p>
<p>In <code>decompress</code>, we followed the same design and see that it may be not a good
idea where the logic is not very clear and the user wants may be an another
behavior. It was like a <em>black-box</em> with a <em>black-magic</em>.</p>
<p>Because we decided to split encoder and compressor, the idea of the <em>flush mode</em>
does not exists anymore where the user explicitly needs to give to the encoder
what they want (make a new block? which block? keep frequencies?). So we broke
the <em>black-box</em>. But, as we said, it was possible mostly because we can abstract
safely the shared queue between the compressor and the encoder.</p>
<p>OCaml is an expressive language and we can really talk about a queue where, in
C, it will be just an other <em>array</em>. As we said, the deal is about performance,
but now, we allow the user the possibility to write their code in this corner-case
which is when they reachs <code>Block</code>. Behaviors depends only on them.</p>
<h2>APIs in general</h2>
<p>The biggest challenge of building a library is defining the API - you must
strike a compromise between allowing the user the flexibility to express their
use-case and constraining the user to avoid API misuse. If at all possible,
provide an <em>intuitive</em> API: force the user not to need to think about security
issues, memory consumption or performance.</p>
<p>Avoid making your API so expressive that it becomes unusable, but beware that
this sets hard limits on your users: the current <code>decompress</code> API can be used to
build <code>of_string</code> / <code>to_string</code> functions, but the opposite is not true - you
definitely cannot build a stream API from <code>of_string</code> / <code>to_string</code>.</p>
<p>The best advice when designing a library is to keep in mind what you <strong>really</strong>
want and let the other details fall into place gradually. It is very important
to work in an iterative loop of repeatedly trying to use your library; only this
can highlight bad design, corner-cases and details.</p>
<p>Finally, use and re-use it on your tests (important!) and inside higher-level
projects to give you interesting questions about your design. The last version
of <code>decompress</code> was not used in <a href="https://github.com/mirage/ocaml-git/">ocaml-git</a> mostly because the flush
mode was unclear.</p>
]]></description><link>https://tarides.com/blog/2019-08-26-decompress-the-new-decompress-api</link><guid isPermaLink="false">https://tarides.com/blog/2019-08-26-decompress-the-new-decompress-api.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Mon, 26 Aug 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[i-Lab 2019]]></title><description><![CDATA[<p>We are thrilled to announce that Tarides is laureate of the 21st
edition of the <a href="https://www.enseignementsup-recherche.gouv.fr/cid5745/le-concours-i-lab-2019-un-tremplin-pour-les-entrepreneurs-de-la-deep-tech.html">i-Lab innovation contest</a>
for its innovative technological solution: <strong>OSMOSE</strong>.</p>
<p>Organized by the French Ministry of Higher Education, Research and
Innovation in partnership with Bpifrance, the objective of this
competition is to identify and support innovative technology-based
projects. This year, over 700 applications have been registered and 75
projects rewarded. The jury was composed of fifty experts (business
leaders, researchers, former laureate, start-uppers, engineers,
consultants, investors) and was chaired by Ludovic Le Moan, CEO of
Sigfox.</p>
<p>The OSMOSE solution is a software infrastructure platform to deploy
secure and distributed IoT applications, using low-resource
constraints and providing low-latency performance. This platform is
built upon innovative and open-source projects (in particular MirageOS
and Irmin) which were started at the University of Cambridge, over 10
years ago, where the founders of Tarides met. Tarides uses unikernel
technologies and applies the research done in programming languages to
real-world systems to build safe and performant applications
specialized to their runtime environment.</p>
<p>If you are interested by the project, <a href="/contact/">contact us</a>.</p>
<p>Or <a href="https://anil.recoil.org/papers/2018-hotpost-osmose/">check our position paper</a>
to learn more about it.</p>
]]></description><link>https://tarides.com/blog/2019-07-05-i-lab-2019</link><guid isPermaLink="false">https://tarides.com/blog/2019-07-05-i-lab-2019.html</guid><dc:creator><![CDATA[ Céline Laplassotte ]]></dc:creator><pubDate>Fri, 05 Jul 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Release of OCamlFormat 0.10]]></title><description><![CDATA[<p>We are pleased to announce the release of OCamlFormat 0.10 (available on opam).</p>
<p>There have been numerous changes since the last release, so here is a comprehensive list of the new features and breaking changes to help the transition from OCamlFormat 0.9.</p>
<p><code>ocamlformat-0.10</code> now works on the 4.08 AST, although the formatting should not differ greatly from the one of <code>ocamlformat-0.9</code> in this regard.
Please note that it is necessary to build <code>ocamlformat</code> with 4.08 to be able to parse new features like <code>let*</code>.</p>
<p>Upgrading from <code>ocamlformat-0.9</code> requires to install the following dependencies:</p>
<ul>
<li>ocaml-migrate-parsetree &gt;= 1.3.1 (upgrade)</li>
<li>uuseg &gt;= 10.0.0 (new)</li>
<li>uutf &gt;= 1.0.1 (upgrade)</li>
</ul>
<p>This release focuses on preserving the style of the original source and on handling more <code>ocp-indent</code> options.</p>
<h2>Style preservation</h2>
<h3>Expression grouping</h3>
<p>The new option <code>exp-grouping</code> has been added to preserve the keywords <code>begin</code>/<code>end</code> that are used to delimit expressions instead of parentheses. <code>exp-grouping=parens</code> always uses parentheses to delimit expressions. <code>exp-grouping=preserve</code> preserves the original grouping syntax (parentheses or <code>begin</code>/<code>end</code>).</p>
<h3>Horizontal alignment</h3>
<p>Horizontal alignment is something that users often use to make pattern-matching or type declarations easier to read, and it is a feature that has been requested many times. Three new options have been added to horizontally align the lines.</p>
<p><code>align-cases</code> horizontally aligns the match/try cases:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">fooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bfooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">C</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">c</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">d</span><span class="ocaml-source">)</span><span class="ocaml-source">      </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source">                   </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooo</span><span class="ocaml-source">
</span></code></pre>
<p><code>align-constructors-decl</code> horizontally aligns type declarations:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-list">[]</span><span class="ocaml-source">     </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-source">looooooooooooooooooooooooooooooooooooooong_break</span><span class="ocaml-source">
</span></code></pre>
<p><code>align-variants-decl</code> horizontally aligns variants type declarations:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Foooooooo</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-polymorphic-variant">`Fooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">of</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<h3>Preserve blank lines in sequences</h3>
<p>The new option <code>sequence-blank-line</code> decides whether a blank line is preserved between expressions of a sequence. <code>sequence-blank-line=compact</code> will not keep any blank line between expressions of a sequence, this is still the default behavior. <code>sequence-blank-line=preserve</code> will keep a blank line between two expressions of a sequence if the input contains at least one.</p>
<p>This option can help preserving the readability of the code in this situation:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">foo</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">do_some_setup</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">important_function</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<h2>Supporting more <code>ocp-indent</code> options</h2>
<p>The long term goal of <code>ocamlformat</code> is to handle every <code>ocp-indent</code> option, this release got closer to this goal as the following <code>ocp-indent</code> options are now supported by <code>ocamlformat</code>:</p>
<ul>
<li>max_indent</li>
<li>with</li>
<li>strict_with</li>
<li>ppx_stritem_ext</li>
<li>base</li>
<li>in</li>
<li>type</li>
</ul>
<h3>Offset added to a new line</h3>
<p>The new option <code>max-indent</code> sets the maximum offset (number of columns) added to a new line in addition to the offset of the previous line. If this offset is set to 2 columns, then each new line can only be indented by 2 columns more in addition to the previous line, for example:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">fooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">$</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">fooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>This option is equivalent to the <code>max_indent</code> option of <code>ocp-indent</code>, and it will be set if <code>max_indent</code> is set in an <code>.ocp-indent</code> configuration file.</p>
<h3>Indentation of pattern matching cases</h3>
<p>The new options <code>funtion-indent</code> and <code>match-indent</code> respectively decide the indentation of function cases and the indentation of match/try cases.
These options are equivalent to the <code>with</code> option of <code>ocp-indent</code>, and they will be set if <code>with</code> is set in an <code>ocp-indent</code> configuration file.
If the indentation is set to 4 columns, cases are formatted like this:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">foooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">foooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooo</span><span class="ocaml-source">
</span></code></pre>
<p>The new options <code>function-indent-nested</code> and <code>match-indent-nested</code> respectively decide whether the <code>function-indent</code> and the <code>match-indent</code> parameters should be applied even when in a sub-block. If these options are set to <code>never</code>, it only applies <code>function-indent</code> or <code>match-indent</code> if the function or match block starts a line. If these options are set to <code>always</code>, then the indent parameters are always applied. The <code>auto</code> value applies the indentation parameter when seen fit.</p>
<p>These options are equivalent to the <code>strict_with</code> option of <code>ocp-indent</code>, and they will be set if <code>strict_with</code> is set in an <code>ocp-indent</code> configuration file.</p>
<h3>Indentation inside extension nodes</h3>
<p>The new option <code>extension-indent</code> sets the indentation of items (that are not at structure level) inside extension nodes.
The new option <code>stritem-extension-indent</code> sets the indentation of structure items inside extension nodes. This option is equivalent to the <code>ppx_stritem_ext</code> option of <code>ocp-indent</code>, and it will be set if <code>ppx_stritem_ext</code> is set in an <code>.ocp-indent</code> configuration file.</p>
<p>For example if <code>extension-indent</code> is set to 5 and <code>stritem-extension-indent</code> is set to 3:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">foo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">foooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">       </span><span class="ocaml-source">fooooooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">         </span><span class="ocaml-source">foooooooooooooooooooooooooooo</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">foooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">       </span><span class="ocaml-source">fooooooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">         </span><span class="ocaml-source">foooooooooooooooooooooooooooo</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@@</span><span class="ocaml-keyword-other-attribute">foooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">   </span><span class="ocaml-source">fooooooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">     </span><span class="ocaml-source">foooooooooooooooooooooooooooo</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<h3>Let-binding indentation</h3>
<p>The new option <code>let-binding-indent</code> sets the indentation of let binding expressions if they do not fit on a single line. This option is equivalent to the <code>base</code> option of <code>ocp-indent</code>.
The new option <code>indent-after-in</code> sets the indentation after <code>let ... in</code>, unless followed by another <code>let</code>. This option is equivalent to the <code>in</code> option of <code>ocp-indent</code>.
The new option <code>type-decl-indent</code> sets the indentation of type declarations if they do not fit on a single line. This option is equivalent to the <code>type</code> option of <code>ocp-indent</code>.</p>
<p>These options will be set if their <code>ocp-indent</code> counterparts are set in an <code>.ocp-indent</code> configuration file.</p>
<h2>Miscellaneous features</h2>
<p>This release also brings some new options, new values for existing features, or corrects erroneous behaviours.</p>
<h3>Indicate multiline delimiters</h3>
<p>The former <code>indicate-multiline-delimiters</code> boolean option is now a 3-valued option:</p>
<ul>
<li><code>indicate-multiline-delimiters=space</code> (was equivalent to <code>true</code>) prints a space inside the delimiter to indicate the matching one is on a different line.</li>
<li><code>indicate-multiline-delimiters=no</code> (was equivalent to <code>false</code>) doesn't do anything special to indicate the closing delimiter.</li>
<li><code>indicate-multiline-delimiters=closing-on-separate-line</code> is the new feature of this option, it makes sure that the closing delimiter is on its own line.</li>
</ul>
<p>On this example we can see the closing parenthesis delimiting the nested pattern-matchings are on their own line and are aligned with the matching opening parenthesis:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">   </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">   </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">   </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">       </span><span class="ocaml-source">(</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">       </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">       </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">           </span><span class="ocaml-source">(</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">           </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">           </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span><span class="ocaml-source">           </span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">       </span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<h3>Formatting of literal strings</h3>
<p><code>break-string-literals=newlines</code> now takes into account pretty-printing commands like <code>@,</code>, <code>@;</code> and <code>@\n</code> to produce more readable strings. A new value for this option has been added, <code>break-string-literals=newlines-and-wrap</code>, to break lines at newlines delimiters (including pretty-printing commands) and also wrap the string literals at the margin.</p>
<p>Here is how <code>break-string-literals=newlines-and-wrap</code> formats a string:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">fooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod </span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   tempor incididunt ut labore et dolore magna aliqua.@;</span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi </span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   ut aliquip ex ea commodo consequat.@;</span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   Duis aute irure dolor in reprehenderit in voluptate velit esse cillum </span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   dolore eu fugiat nulla pariatur.@;</span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   Excepteur sint occaecat cupidatat non proident, sunt in culpa qui </span><span class="ocaml-constant-character-escape">\</span><span class="ocaml-string-quoted-double">
</span><span class="ocaml-string-quoted-double">   officia deserunt mollit anim id est laborum.</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span></code></pre>
<p><strong>Warning:</strong> the <code>break-string-literals</code> will likely be removed in the next release and the default behavior would be <code>newlines-and-wrap</code>.</p>
<h3>Break before the <code>in</code> keyword</h3>
<p>The new option <code>break-before-in</code> has been added to decide whether the line should break before the <code>in</code> keyword of a <code>let</code> binding. <code>break-before-in=fit-or-vertical</code> will always break the line before the <code>in</code> keyword if the whole <code>let</code> binding does not fit on a single line, it is still the default behavior. <code>break-before-in=auto</code> will only break the line if the <code>in</code> keyword does not fit on the previous line.</p>
<p>For example:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">short</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">this</span><span class="ocaml-source"> </span><span class="ocaml-source">is</span><span class="ocaml-source"> </span><span class="ocaml-source">short</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">fooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-source">this</span><span class="ocaml-source"> </span><span class="ocaml-source">is</span><span class="ocaml-source"> </span><span class="ocaml-source">very</span><span class="ocaml-source"> </span><span class="ocaml-source">long</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">but</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">the</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> </span><span class="ocaml-source">keyword</span><span class="ocaml-source"> </span><span class="ocaml-source">can</span><span class="ocaml-source"> </span><span class="ocaml-source">fit</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">on</span><span class="ocaml-source"> </span><span class="ocaml-source">the</span><span class="ocaml-source"> </span><span class="ocaml-source">same</span><span class="ocaml-source"> </span><span class="ocaml-source">line</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">foooooo</span><span class="ocaml-source">
</span></code></pre>
<h3>Indentation of nested pattern-matching</h3>
<p>The new option <code>nested-match</code> defines the style of pattern-matchings nested in the last case of another pattern-matching. <code>nested-match=wrap</code> wraps the nested pattern-matching with parentheses and adds indentation, this is still the default behavior. <code>nested-match=align</code> vertically aligns the nested pattern-matching under the encompassing pattern-matching, for example:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">v</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span></code></pre>
<p>The new option <code>cases-matching-exp-indent</code> decides the indentation of cases right-hand sides which are <code>match</code> or <code>try</code> expressions. <code>cases-matching-exp-indent=compact</code> forces an indentation of 2, unless <code>nested-match</code> is set to <code>align</code> and this is the last case of the pattern matching. <code>compact</code> is the default behavior. <code>cases-matching-exp-indent=normal</code> indents as it would any other expression.</p>
<h3>Whitelist of files to format</h3>
<p>A new kind of configuration files is now handled by <code>ocamlformat</code>: <code>.ocamlformat-enable</code> files.
If the <code>disable</code> option is set, an <code>.ocamlformat-enable</code> file can list the files that <code>ocamlformat</code> should format even when the <code>disable</code> option is set. Each line in an <code>.ocamlformat-enable</code> file specifies a filename relative to the directory containing the <code>.ocamlformat-enable</code> file.</p>
<p>The <code>.ocamlformat-enable</code> files are using the same syntax as the <code>.ocamlformat-ignore</code> files: lines starting with <code>#</code> are ignored and can be used as comments.</p>
<p>These new configuration files do not contradict the existing <code>.ocamlformat-ignore</code> files, as <code>.ocamlformat-enable</code> are only considered when <code>disable</code> is set, and <code>.ocamlformat-ignore</code> are only considered when <code>disable</code> is not set.</p>
<h3>Disable outside detected project</h3>
<p>The <code>disable-outside-detected-project</code> option is now set by default.</p>
<p>When the option <code>--enable-outside-detected-project</code> is not set, <code>.ocamlformat</code> files outside of the project (including the one in <code>XDG_CONFIG_HOME</code>) are not read. The project root of an input file is taken to be the nearest ancestor directory that contains a .git or .hg or dune-project file. If no config file is found, formatting is disabled.</p>
<h3>Space around collection-expressions</h3>
<p>The former option <code>space-around-collection-expressions</code> that was deciding whether a space should be added inside the delimiters of collection expressions (lists, arrays, records, variants) has been replaced by 4 new options: <code>space-around-arrays</code>, <code>space-around-lists</code>, <code>space-around-records</code> and <code>space-around-variants</code>, to allow a finer grain customization.</p>
<h3>Fit-or-vertical mode for pattern matching</h3>
<p>The <code>break-cases</code> option that decides the shape of pattern matching has a new value <code>fit-or-vertical</code>. <code>break-cases=fit-or-vertical</code> tries to fit all or-patterns on the same line, otherwise breaks each or-pattern (they are wrapped in other modes).
For example if this set of or-patterns does not fit on a single line, we get the following output:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ffffff</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Aaaaaaaaaaaaaaaaa</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bbbbbbbbbbbbbbbbb</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ccccccccccccccccc</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ddddddddddddddddd</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Eeeeeeeeeeeeeeeee</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Fffffffffffffffff</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooo</span><span class="ocaml-source">
</span></code></pre>
<h3>K&amp;R style for if-then-else</h3>
<p>The <code>if-then-else</code> option now has a new value <code>k-r</code> that uses parentheses (when necessary) to reproduce a formatting close to the K&amp;R style. For example:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">something</span><span class="ocaml-source"> </span><span class="ocaml-source">loooooooooooooooooooooooooooooooong</span><span class="ocaml-source"> </span><span class="ocaml-source">enough</span><span class="ocaml-source"> </span><span class="ocaml-source">to_trigger</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">break</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">this</span><span class="ocaml-source"> </span><span class="ocaml-source">is</span><span class="ocaml-source"> </span><span class="ocaml-source">more</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">b1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">something</span><span class="ocaml-source"> </span><span class="ocaml-source">loooooooooooooooooooooooooooooooong</span><span class="ocaml-source"> </span><span class="ocaml-source">enough</span><span class="ocaml-source"> </span><span class="ocaml-source">to_trigger</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">break</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">this</span><span class="ocaml-source"> </span><span class="ocaml-source">is</span><span class="ocaml-source"> </span><span class="ocaml-source">more</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">e</span><span class="ocaml-source">
</span></code></pre>
<h2>Breaking changes</h2>
<ul>
<li>the <code>indicate-multiline-delimiters</code> option is no longer a boolean option but now has 3 values: <code>space</code>, <code>no</code> and <code>closing-on-separate-line</code> that are detailed in this patch note.</li>
<li>the <code>disable-outside-detected-project</code> option is now set by default.</li>
<li>the <code>default</code> preset profile has been removed (it was equivalent to the <code>ocamlformat</code> profile with <code>break-cases=fit</code>).</li>
<li>the <code>space-around-collection-expressions</code> option has been replaced by 4 new options: <code>space-around-arrays</code>, <code>space-around-lists</code>, <code>space-around-records</code> and <code>space-around-variants</code>.</li>
</ul>
<h2>What's next?</h2>
<p>We strongly encourage our users to try out the <code>conventional</code> preset profile, as we plan to make it the default profile in a future release. This profile's purpose is to reproduce the most commonly encountered styles, and it may be more pleasing to the eye than the current default options.</p>
<p>As stated previously, the <code>break-string-literals</code> will likely be removed in the next release and the default behavior would be <code>newlines-and-wrap</code>.</p>
<h2>Credits</h2>
<p>This release also contains many other changes and bug fixes that we cannot detail here.</p>
<p>We would like to thank our maintainers and contributors for this release: Jules Aguillon, Josh Berdine, Hugo Heuzard, Guillaume Petiot and Thomas Refis, and especially our industrial users Jane Street, Ahrefs and Nomadic Labs that made this work possible by funding this project and providing helpful contributions and feedback.</p>
<p>We would be happy to provide support for more customers, please <a href="/contact/">contact us</a>.</p>
<p>If you wish to get involved with OCamlFormat development or file an issue, please read the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/master/CONTRIBUTING.md">contributing guide</a>, any contribution is welcomed.</p>
]]></description><link>https://tarides.com/blog/2019-06-27-release-of-ocamlformat-0-10</link><guid isPermaLink="false">https://tarides.com/blog/2019-06-27-release-of-ocamlformat-0-10.html</guid><dc:creator><![CDATA[ Guillaume Petiot ]]></dc:creator><pubDate>Thu, 27 Jun 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[On the road to Irmin v2]]></title><description><![CDATA[<p>Over the past few months, we have been heavily engaged in release
engineering the <a href="https://github.com/mirage/irmin/issues/658">Irmin 2.0 release</a>,
which covers multiple years of work on all of its constituent
elements. We first began Irmin in late 2013 to act as a
<a href="https://mirage.io/blog/introducing-irmin">Git-like distributed and branchable storage substrate</a>
that would let us escape the <a href="https://www.cl.cam.ac.uk/~pes20/SOSP15-paper102-submitted.pdf">perils of POSIX filesystems</a>.</p>
<p>The Irmin libraries provide snapshotting, branching and merging
operations over storage and can communicate via Git both on-disk and
remotely. Irmin today therefore consists of many discrete OCaml
libraries that compose together to form a set of <a href="https://blog.acolyer.org/2015/01/14/mergeable-persistent-data-structures/">mergeable data structures</a>
that can be used in MirageOS unikernels and normal OCaml daemons such
as <a href="https://tezos.com">Tezos</a>.</p>
<p>In this blog post, we wanted to explain some of the release
engineering ongoing, and to highlight some areas where we could use
help from the community to test out pieces (and hopefully find your
own uses in your own infrastructure for it).  The overall effort is
tracked in <a href="https://github.com/mirage/irmin/issues/658">mirage/irmin#658</a>, so
feel free to comment on there as well.</p>
<h3>ocaml-git</h3>
<p>Irmin is parameterised over the exact communication mechanisms it uses
between nodes, both as an on-disk format and also the remoting
protocol.  The most important concrete implementation is Git, which
has turned into the world’s most popular version control system.  In
order to seamlessly integrate with Irmin, we embarked on an effort to
build a complete re-implementation of
<a href="https://github.com/mirage/ocaml-git">Git from scratch in pure OCaml</a>.</p>
<p>You can read <a href="/blog/2018-10-19-ocaml-git-2-0/">details of the git 2.0 release</a>
on this blog, but from a release engineering perspective we have steadily
been fixing corner cases in this implementation.  The development
ocaml-git trees feature <a href="https://github.com/mirage/ocaml-git/pull/348">fixes to https+git</a>,
for <a href="https://github.com/mirage/ocaml-git/pull/351">listing remotes</a>, supporting
<a href="https://github.com/mirage/ocaml-git/pull/341">authenticated URIs</a> and
more.</p>
<p>These fixes are possible because users tried end-to-end usecases that
found these corner cases, so we’d really like to see more.  For
example, our friends at <a href="https://robur.io">Robur</a> have submitted fixes
from their integration of it into their upcoming <a href="https://github.com/roburio/caldav">CalDAV engine</a>.
The Mirage <a href="https://github.com/Engil/Canopy">canopy</a> blog engine can now also
push/pull reliably from pure MirageOS unikernels between nodes, which
is a huge step.</p>
<p>If you get a chance to try ocaml-git in your infrastructure, please
let us know how you get along as we prepare a release of the git
libraries with all these fixes (which will be used in Irmin 2.0).</p>
<h3>Wodan</h3>
<p>Irmin’s storage layer is also well abstracted, so backends other than
a Unix filesystem or Git are supported.  Irmin can run in highly
diverse and OS-free environments, and so we began engineering the
<a href="https://github.com/mirage/wodan">Wodan filesystem</a> as a
domain-specific filesystem designed for MirageOS, Irmin and modern
flash drives.  See <a href="https://g2p.github.io/research/wodan.pdf">the OCaml Workshop 2017 abstract on
it</a> for more design
rationale)</p>
<p>As part of the Irmin 2.0 release, Wodan is also being prepared for a
release, and you can find <a href="https://github.com/mirage/wodan/tree/master/src/wodan-irmin">Irmin 2.0
support</a>
in the source.  If you’d like a standalone block-device based
persistence environment for Irmin, please try this out.  This is the
preferred backend for using Irmin storage in a unikernel.</p>
<h3>Tezos and irmin-pack</h3>
<p>Another big user of Irmin is the <a href="https://tezos.com">Tezos blockchain</a>,
and we have been optimising the persistent space usage of Irmin as their
network grows.  Because Tezos doesn’t require full Git format support,
we created a hybrid backend that grabs the best bits of Git (e.g. the
packfile mechanism) and engineered a domain-specific backend tailored
for Tezos usage. Crucially, because of the way Irmin is split into
clean libraries and OCaml modules, we only had to modify a small part
of the codebase and could also re-use elements of the Git 2.0
engineering effort we described above.</p>
<p>The <a href="https://github.com/mirage/irmin/pull/615">irmin-pack backend</a> is
currently being reviewed and integrated ahead of Irmin 2.0 to provide
a significant improvement in disk usage -- more information to come soon.
There is a corresponding <a href="https://gitlab.com/samoht/tezos/tree/snapshot-irmin-pack">Tezos branch</a>
using the Irmin 2.0 code that will be integrated downstream in Tezos
once we complete the Irmin 2.0 tests.</p>
<h3>Irmin-GraphQL and “browser Irmin”</h3>
<p>Another new area of huge interest to us is
<a href="https://graphql.org">GraphQL</a> in order to provide frontends a rich
query language for Irmin hosted applications.  Irmin 2.0 includes a
builtin GraphQL server so you can <a href="https://twitter.com/cuvius/status/1017136581755457539">manipulate your Git repo via
GraphQL</a>.</p>
<p>If you are interested in (for example) compiling elements of Irmin to
JavaScript or wasm, for usage in frontends, then the Irmin 2.0 release
makes it significantly easier to support this architecture.  We’ve
already seen some exploratory efforts <a href="https://github.com/mirage/irmin/issues/681">report issues</a>
when doing this, and we’ve had it working ourselves in <a href="https://roscidus.com/blog/blog/2015/04/28/cuekeeper-gitting-things-done-in-the-browser/">Irmin 1.0 Cuekeeper</a>
so we are excited by the potential power of applications built using
this model.  If you have ideas/questions, please get in touch on the
<a href="https://github.com/mirage/irmin/issues">issue tracker</a> with your
usecase.</p>
<p>This post is just the precursor to the Irmin 2.0 release, so expect to
hear more about it in the coming weeks and months.  This is primarily
a call for help from early adopters interested in helping the project
out.  All of our code is liberally licensed open source, and so this
is a good time to tie together end-to-end usecases and help ensure we
don’t make any decisions in Irmin 2.0 that go counter to some product
you’d like to build. That’s only possible with your feedback, so
either get in touch via the <a href="https://github.com/mirage/irmin/issues">issue tracker</a>, on
<a href="https://discuss.ocaml.org">discuss.ocaml.org</a> via the <code>mirageos</code> tag,
or just <a href="mailto:mirageos-devel@lists.xenproject.org">email us</a>.</p>
<p>A huge thank you to all our commercial customers, end users and open
source developers who have contributed their time, expertise and
financial support to help us achieve our goal of delivering a modern
storage stack in the spirit of Git. We look forward to getting Irmin
2.0 into your hands very soon!</p>
]]></description><link>https://tarides.com/blog/2019-05-13-on-the-road-to-irmin-v2</link><guid isPermaLink="false">https://tarides.com/blog/2019-05-13-on-the-road-to-irmin-v2.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Mon, 13 May 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[An introduction to OCaml PPX ecosystem]]></title><description><![CDATA[<p>These last few months, I spent some time writing new OCaml PPX rewriters or contributing to existing
ones. It's a really fun experience. Toying around with the AST taught me a lot about a language I
thought I knew really well. Turns out I actually had no idea what I was doing all these years.</p>
<p>All jokes aside, I was surprised that the most helpful tricks I learned while writing PPX rewriters
weren't properly documented. There already exist a few very good introduction articles on the
subject, like that
<a href="https://whitequark.org/blog/2014/04/16/a-guide-to-extension-points-in-ocaml/">2014's article from Whitequark</a>,
this <a href="http://rgrinberg.com/posts/extensions-points-update-1/">more recent one from Rudi Grinberg</a>
or even <a href="https://victor.darvariu.me/jekyll/update/2018/06/19/ppx-tutorial.html">this last one from Victor Darvariu</a>
I only discovered after I actually started writing my own. I still felt like they were slightly
outdated or weren't answering all the questions I had when I started playing with PPX and writing my
first rewriters.</p>
<p>I decided to share my PPX adventures in the hope that it can help others familiarize with this bit
of the OCaml ecosystem and eventually write their first rewriters. The scope of this article is not to
cover every single detail about the PPX internals but just to give a gentle introduction to
beginners to help them get settled. That also means I might omit things that I don't think are worth
mentioning or that might confuse the targetted audience but feel free to comment if you believe
this article missed an important point.</p>
<p>It's worth mentioning that a lot of the nice tricks mentioned in these lines were given to me by a
wonderful human being called Étienne Millon, thanks Étienne!</p>
<h2>What is a PPX?</h2>
<p>PPX rewriters or PPX-es are preprocessors that are applied to your code before passing it on to the
compiler. They don't operate on your code directly but on the Abstract Syntax Tree or AST resulting
from its parsing. That means that they can only be applied to syntactically correct OCaml code. You
can think of them as functions that take an AST and return a new AST.</p>
<p>That means that in theory you can do a lot of things with a PPX, including pretty bad and cryptic
things. You could for example replace every instance of <code>true</code> by <code>false</code>, swap the branches of any
<code>if-then-else</code> or randomize the order of every pattern-matching case.
Obviously that's not the kind of behaviour that we want as it would make it impossible to
understand the code since it would be so far from the actual AST the compiler would get.
In practice PPX-es have a well defined scope and only transform parts you explicitly annotated.</p>
<h3>Understanding the OCaml AST</h3>
<p>First things first, what is an AST. An AST is an abstract representation of your code. As the name
suggests it has a tree-like structure where the root describes your entire file. It has children for
each bits such as a function declaration or a type definition, each of them having their own
children, for example for the function name, its argument and its body and that goes on until you
reach a leaf such as a literal <code>1</code>, <code>"abc"</code> or a variable for instance.
In the case of OCaml it's a set of recursive types allowing us to represent OCaml code as an OCaml
value. This value is what the parser passes to the compiler so it can type check and compile it to
native or byte code.
Those types are defined in OCaml's <code>Parsetree</code> module. The entry points there are the <code>structure</code>
type which describes the content of an <code>.ml</code> file and the <code>signature</code> type which describes the
content of an <code>.mli</code> file.</p>
<p>As mentionned above, a PPX can be seen as a function that transforms an AST. Writing a PPX thus
requires you to understand the AST, both to interpret the one you'll get as input and
to produce the right one as output. This is probably the trickiest part as unless you've already
worked on the OCaml compiler or written a PPX rewriter, that will probably be the first time you two
meet. Chances are also high that'll be a pretty bad first date and you will need some to time
to get to know each other.</p>
<p>The <code>Parsetree</code> module <a href="https://caml.inria.fr/pub/docs/manual-ocaml/compilerlibref/Parsetree.html">documentation</a>,
is a good place to start. The above mentioned <code>structure</code> and <code>signature</code> types are at the root of
the AST but some other useful types to look at at first are:</p>
<ul>
<li><code>expression</code> which describes anything in OCaml that evaluates to a value, the right hand side of a
<code>let</code> binding for instance.</li>
<li><code>pattern</code> which is what you use to deconstruct an OCaml value, the left hand side of a <code>let</code>
binding or a pattern-matching case for example.</li>
<li><code>core_type</code> which describes type expressions ie what you would find on the right hand side of a
value description in a <code>.mli</code>, ie <code>val f : &lt;what_goes_there&gt;</code>.</li>
<li><code>structure_item</code> and <code>signature_item</code> which describe the top level AST nodes you can find in a
<code>structure</code> or <code>signature</code> such as type definitions, value or module declarations.</li>
</ul>
<p>Thing is, it's a bit a rough and there's no detailed explanation about how a specific bit of code is
represented, just type definitions. Most of the time, the type, field, and variant names are
self-explanatory but it can get harder with some of the more advanced language features.
It turns out there are plenty of comments that are really helpful in the actual <code>parsetree.mli</code> file
and that aren't part of the generated documentation. You can find them on
<a href="https://github.com/ocaml/ocaml/blob/trunk/parsing/parsetree.mli">github</a> but I personally prefer to
have it opened in a VIM tab when I work on a PPX so I usually open
<code>~/.opam/&lt;current_working_switch&gt;/lib/ocaml/compiler-libs/parsetree.mli</code>.</p>
<p>This works well while exploring but you might also want a more straightforward approach to
discovering what the AST representation is for some specific OCaml code. The
<a href="https://github.com/ocaml-ppx/ppx_tools"><code>ppx_tools</code></a> opam package comes with a <code>dumpast</code> binary
that pretty prints the AST for any given piece of valid OCaml code. You can install it using opam:</p>
<pre><code>$opam install ppx_tools
</code></pre>
<p>and then run it using <code>ocamlfind</code>:</p>
<pre><code>$ocamlfind ppx_tools/dumpast some_file.ml
</code></pre>
<p>You can use it on <code>.ml</code> and <code>.mli</code> files or to quickly get the AST for an expression with the <code>-e</code>
option:</p>
<pre><code>$ocamlfind ppx_tools/dumpast -e "1 + 1"
</code></pre>
<p>Similarly, you can use the <code>-t</code> or <code>-p</code> options to respectively pretty print ASTs from type
expressions or patterns.</p>
<p>Using <code>dumpast</code> to get both the ASTs of a piece of code using your future PPX and the resulting
preprocessed code is a good way to start and will help you figure out what are the steps required to
get there.</p>
<p>Note that you can use the compiler or <code>utop</code> have a similar feature with the <code>-dparsetree</code> flag.
Running <code>ocamlc/ocamlopt -dparsetree file.ml</code> will pretty print the AST of the given file while
running <code>utop -dparsetree</code> will pretty print the AST of the evaluated code alongside it's
evaluation.
I tend to prefer the pretty printed AST from <code>dumpast</code> but any of these tools will prove helpful
in understanding the AST representation of a given piece of OCaml code.</p>
<h3>Language extensions interpreted by PPX-es</h3>
<p>OCaml 4.02 introduced syntax extensions meant to be used by external tools such as PPX-es. Knowing
their syntax and meaning is important to understand how most of the existing rewriters
work because they usually look for those language extensions in the AST to know which part of it
they need to modify.</p>
<p>The two language extensions we're interested in here are extension nodes and attributes. They are
defined in detail in the OCaml manual (see the
<a href="https://caml.inria.fr/pub/docs/manual-ocaml/attributes.html">attributes</a> and
<a href="https://caml.inria.fr/pub/docs/manual-ocaml/extensionnodes.html">extension nodes</a> sections) but I'll
try to give a good summary here.</p>
<p>Extension nodes are used in place of expressions, module expressions, patterns, type expressions or
module type expressions. Their syntax is <code>[%extension_name payload]</code>. We'll come back to the payload
part a little later.
You can also find extension nodes at the top level of modules or module signatures with the syntax
<code>[%%extension_name payload]</code>.
Hopefully the following cheatsheet can help you remember the basics of how and where you can use
them:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>Because extension nodes stand where regular AST nodes should, the compiler won't accept them and
will give you an <code>Uninterpreted extension</code> error. Extension nodes have to be expanded by a PPX for
your code to compile.</p>
<p>Attributes are slightly different although their syntax is very close to extensions. Attributes
are attached to existing AST nodes instead of replacing them. That means that they don't necessarily
need to be transformed and the compiler will ignore unknown attributes by default.
They can come with a payload just like extensions and use <code>@</code> instead of <code>%</code>. The number of <code>@</code>
preceding the attribute name specifies which kind of node they are attached to:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">attr</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">some string</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">attr</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@@</span><span class="ocaml-keyword-other-attribute">attr</span><span class="ocaml-source"> </span><span class="ocaml-source">pl</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>In the first example, the attribute is attached to the expression <code>12</code> while in the second example
it is attached to the whole <code>let b = "some string"</code> value binding. The third one is of a slightly
different nature as it is a floating attribute. It's not attached to anything per-se and just ends
up in the AST as a structure item.
Because there is a wide variety of nodes to which you can attach attributes, I won't go too far into
details here but a good rule of thumb is that you use <code>@@</code> attributes when you want them attached to
structure or signature items, for anything deeper within the AST structure such as patterns,
expressions or core types, use the single <code>@</code> syntax. Looking at the <code>Parsetree</code> documentation can
help you figure out where you can find attributes.</p>
<p>Now let's talk about those payloads I mentioned earlier. You can think of them as "arguments" to
the extension points and attributes. You can pass different kinds of arguments and the syntax varies
for each of them:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">expr_or_str_item</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">type_expr_or_sig_item</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">c</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source">? </span><span class="ocaml-source">pattern</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>As suggested in the examples, you can pass expressions or structure items using a space character,
type expressions or signature items (anything you'd find at the top level of a module signature)
using a <code>:</code> or a pattern using a <code>?</code>.</p>
<p>Attributes' payload use the same syntax:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'a'</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">attr</span><span class="ocaml-source"> </span><span class="ocaml-source">expr_or_str_item</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'b'</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">attr</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">type_expr_or_sig_item</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-single">'a'</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">attr</span><span class="ocaml-source">? </span><span class="ocaml-source">pattern</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>Some PPX-es rely on other language extensions such as the suffix character you can attach to <code>int</code>
and <code>float</code> literals (<code>10z</code> could be used by a PPX to turn it into <code>Z.of_string "10"</code> for instance)
or quoted strings with a specific identifier (<code>{ppx_name|some quoted string|ppx_name}</code> can be used
if you want your PPX to operate on arbitrary strings and not only syntactically correct OCaml) but
attributes and extensions are the most commonly used ones.</p>
<p>Attributes and extension points can be expressed using an infix syntax. The attribute version is
barely used but some forms of the infix syntax for extension points are used by popular PPX-es and
it is likely you will encounter some of the following:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">infix_let_extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">let</span><span class="ocaml-keyword-operator">%</span><span class="ocaml-source">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">infix_match_extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-keyword-operator">%</span><span class="ocaml-source">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">infix_try_extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">try</span><span class="ocaml-keyword-operator">%</span><span class="ocaml-source">ext</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">z</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>which are syntactic sugar for:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">infix_let_extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">infix_match_extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">infix_try_extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">ext</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">try</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">z</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>A good example of a PPX making heavy use of these if
<a href="https://ocsigen.org/lwt/4.1.0/api/Ppx_lwt"><code>lwt_ppx</code></a>. The OCaml manual also contains more examples
of the infix syntax in the Attributes and Extension points sections mentioned above.</p>
<h3>The two main kind of PPX-es</h3>
<p>There is a wide variety of PPX rewriters but the ones you'll probably see the most are Extensions and
Derivers.</p>
<h4>Extensions</h4>
<p>Extensions will rewrite tagged parts of the AST, usually extension nodes of the form
<code>[%&lt;extension_name&gt; payload]</code>. They will replace them with a different AST node of the same nature ie
if the extension point was located where an expression should be, the rewriter will produce an
expression. Good examples of extensions are:</p>
<ul>
<li><a href="https://github.com/rgrinberg/ppx_getenv2"><code>ppx_getenv2</code></a> which replaces <code>[%getenv SOME_VAR]</code> with
the value of the environment variable <code>SOME_VAR</code> at compile time.</li>
<li><a href="https://github.com/NathanReb/ppx_yojson"><code>ppx_yojson</code></a> which allows you to write <code>Yojson</code> values
using OCaml syntax to mimic actual json. For instance you'd use <code>[%yojson {a = None; b = 1}]</code> to
represent <code>{"a": null, "b": 1}</code> instead of the <code>Yojson</code>'s notation:
<code>Assoc [("a", Null); ("b", Int 1)]</code>.</li>
</ul>
<h4>Derivers</h4>
<p>Derivers or deriving plugins will "insert" new nodes derived from type definitions annotated with a
<code>[@@deriving &lt;deriver_name&gt;]</code> attribute. They have various applications but are particularly useful
to derive functions that are tedious and error prone to write by hand such as comparison functions,
pretty printers or serializers. It's really convenient as you don't have to update those functions
every time you update your type definitions. They were inspired by Haskell Type classes. Good
examples of derivers are:</p>
<ul>
<li><a href="https://github.com/ocaml-ppx/ppx_deriving"><code>ppx_deriving</code></a> itself comes with a bunch of deriving
plugins such as <code>eq</code>, <code>ord</code> or <code>show</code> which respectively derives, as you might have guessed,
equality, comparison and pretty-printing functions.</li>
<li><a href="https://github.com/ocaml-ppx/ppx_deriving_yojson"><code>ppx_deriving_yojson</code></a> which derives JSON
serializers and deserializers.</li>
<li><a href="https://github.com/janestreet/ppx_sexp_conv"><code>ppx_sexp_conv</code></a> which derives s-expressions
converters.</li>
</ul>
<p>Derivers often let you attach attributes to specify how some parts of the AST should be handled. For
example when using <code>ppx_deriving_yojson</code> you can use <code>[@default some_val]</code> to make a field of an
object optional:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">default</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">deriving</span><span class="ocaml-source"> </span><span class="ocaml-source">of_yojson</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>will derive a deserializer that will convert the JSON value <code>{"a": 1}</code> to the OCaml
<code>{a = 1; b = ""}</code></p>
<h2>How to write a PPX using <code>ppxlib</code></h2>
<p>Historically there was a few libraries used by PPX rewriter authors to write their PPX-es, including
<code>ppx_tools</code> and <code>ppx_deriving</code> but as the eco-system evolved, <code>ppxlib</code> emerged and is now the most
up-to-date and maintained library to write and handle PPX-es. It wraps the features of those
libraries in a single one.
I encourage you to use <code>ppxlib</code> to write new PPX-es as it is also easier to make various rewriters
work together if they are all registered through <code>ppxlib</code> and the PPX ecosystem would gain from
being unified around a single PPX library and driver.</p>
<p>It is also a great library and has some really powerful features to help you write your extensions
and derivers.</p>
<h3>Writing an extension</h3>
<p>The entry point of <code>ppxlib</code> for extensions is <code>Ppxlib.Extension.declare</code>. You have to use that
function to build an <code>Extension.t</code>, from which you can then build a <code>Context_free.Rule.t</code> before
registering your transformation so it's actually applied.</p>
<p>The typical <code>my_ppx_extension.ml</code> will look like:</p>
<pre><code><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ppxlib</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Extension</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">declare</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">my_extension</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">some_context</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">some_pattern</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">expand_function</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">rule</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Context_free</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Rule</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">extension</span><span class="ocaml-source"> </span><span class="ocaml-source">extension</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Driver</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">register_transformation</span><span class="ocaml-source"> ~</span><span class="ocaml-source">rules</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">[</span><span class="ocaml-source">rule</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">my_transformation</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span></code></pre>
<p>To compile it as PPX rewriter you'll need to put the following in your dune file:</p>
<pre><code>(library
 (public_name my_ppx)
 (kind ppx_rewriter)
 (libraries ppxlib))
</code></pre>
<p>Now let's go back a little and look at the important part:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Extension</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">declare</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">my_extension</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">some_context</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">some_pattern</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">expand_function</span><span class="ocaml-source">
</span></code></pre>
<p>Here <code>"my_extension"</code> is the name of your extension and that define how you're going to invoke it
in your extension point. In other words, to use this extension in our code we'll use a
<code>[%my_extension ...]</code> extension point.</p>
<p><code>some_context</code> is a <code>Ppxlib.Extension.Context.t</code> and describes where this extension can be found in
the AST, ie can you use <code>[%my_extension ...]</code> as an expression, a pattern, a core type. The
<code>Ppxlib.Extension.Context</code> module defines a constant for each possible extension context which you
can pass as <code>some_context</code>.
This obviously means that it also describes the type of AST node to which it must be converted and
this property is actually enforced by the <code>some_pattern</code> argument. But we'll come back to that
later.</p>
<p>Finally <code>expand_function</code> is our actual extension implementation, which basically takes the payload,
a <code>loc</code> argument which contains the location of the expanded extension point, a <code>path</code> argument
which is the fully qualified path to the expanded node (eg. <code>"file.ml.A.B"</code>) and returns the
generated code to replace the extension with.</p>
<h4>Ast_pattern</h4>
<p>Now let's get back to that <code>some_pattern</code> argument.</p>
<p>This is one of the trickiest parts of <code>ppxlib</code> to understand but it's also one its most
powerful features. The type for <code>Ast_pattern</code> is defined as <code>('a, 'b, 'c) t</code> where <code>'a</code> is
the type of AST nodes that are matched, <code>'b</code> is the type of the values you're extracting from the
node as a function type and <code>'c</code> is the return type of that last function. This sounded really
confusing to me at first and I'm guessing it might do to some of you too so let's give it a bit of
context.</p>
<p>Let's look at the type of <code>Extension.declare</code>:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">declare</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-storage-type">'context</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Context</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">(</span><span class="ocaml-source">payload</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'context</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">(</span><span class="ocaml-source">loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Location</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span></code></pre>
<p>Here, the expected pattern first type parameter is <code>payload</code> which means we want a pattern that
matches <code>payload</code> AST nodes. That makes perfect sense since it is used to describe what your
extension's payload should look like and what to do with it.
The last type parameter is <code>'context</code> which again seems logical. As I mentioned earlier our
<code>expand_function</code> should return the same kind of node as the one where the extension was found.
Now what about <code>'a</code>. As you can see, it describes what comes after the base <code>loc</code> and <code>path</code>
parameters of our <code>expand_function</code>. From the pattern point of view, <code>'a</code> describes the parts of the
matched AST node we wish to extract for later consumption, here by our expander.</p>
<p><code>Ast_pattern</code> contains a whole bunch of combinators to let you describe what your pattern should match
and a specific <code>__</code> pattern that you must use to capture the various parts of the matched nodes.
<code>__</code> has type <code>('a, 'a -&gt; 'b, 'b) Ast_pattern.t</code> which means that whenever it's used it changes the
type of consumer function in the returned pattern.</p>
<p>Let's consider a few examples to try wrapping our heads around this. Say I want to write an
extension that takes an expression as a payload and I want to pass this expression to my expander so
I can generate code based on its value. I can declare the extension like this:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">extension</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Extension</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">declare</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">my_extension</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Extension</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Context</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">expression</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">single_expr_payload</span><span class="ocaml-source"> </span><span class="ocaml-source">__</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">expand_function</span><span class="ocaml-source">
</span></code></pre>
<p>In this example, <code>Extension.Context.expression</code> has type <code>expression Extension.Context.t</code>, the
pattern has type <code>(payload, expression -&gt; expression, expression) Ast_pattern.t</code>. The pattern says we
want to allow a single expression in the payload and capture it. If we decompose it a bit, we can
see that <code>single_expr_payload</code> has type
<code>(expression, 'a, 'b) Ast_pattern.t -&gt; (payload, 'a, 'b) Ast_pattern.t</code> and is passed <code>__</code> which
makes it a <code>(expression, expression -&gt; 'b, 'b) Ast_pattern.t</code> and that's exactly what we want here
as our expander will have type <code>loc: Location.t -&gt; path: string -&gt; expression -&gt; expression</code>!</p>
<p>It works similarly to <code>Scanf.scanf</code> when you think about it. Changing the pattern changes the type of the
consumer function the same way changing the format string does for <code>Scanf</code> functions.</p>
<p>This was a bit easy since we had a custom combinator just for that purpose so let's take a few more
complex examples. Now say we want to only allow pairs of integer and string constants expressions in
our payload. Instead of just capturing any expression and dealing with the error cases in the
<code>expand_function</code> we can let <code>Ast_pattern</code> deal with that and pass an <code>int</code> and <code>string</code> along to
our expander:</p>
<pre><code><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">single_expr_payload</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">pexp_tuple</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">(</span><span class="ocaml-source">eint</span><span class="ocaml-source"> </span><span class="ocaml-source">__</span><span class="ocaml-source">)</span><span class="ocaml-keyword-operator">^::</span><span class="ocaml-source">(</span><span class="ocaml-source">estring</span><span class="ocaml-source"> </span><span class="ocaml-source">__</span><span class="ocaml-source">)</span><span class="ocaml-keyword-operator">^::</span><span class="ocaml-source">nil</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>This one's a bit more elaborate but the idea is the same, we use <code>__</code> to capture the int and string
from the expression and use combinators to specify that the payload should be made of a pair and
that gives us a: <code>(payload, int -&gt; string -&gt; 'a, 'a) Ast_pattern.t</code> which should be used with a
<code>loc: Location.t -&gt; path: string -&gt; int -&gt; string -&gt; expression</code> expander.</p>
<p>We can also specify that our extension should take something else than an expression as a payload,
say a pattern with no <code>when</code> clause so that it's applied as <code>[%my_ext? some_pattern_payload]</code>:</p>
<pre><code><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">ppat</span><span class="ocaml-source"> </span><span class="ocaml-source">__</span><span class="ocaml-source"> </span><span class="ocaml-source">none</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>or no payload at all and it should just be invoked as <code>[%my_ext]</code>:</p>
<pre><code><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">pstr</span><span class="ocaml-source"> </span><span class="ocaml-source">nil</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>You should play with <code>Ast_pattern</code> a bit if you need to express complex patterns as I think it's
the only way to get the hang of it.</p>
<h3>Writing a deriver</h3>
<p>Registering a deriver is slightly different from registering an extension but in the end it remains
relatively simple and you will still have to provide the actual implementation in the form of an
<code>expand</code> function.</p>
<p>The typical <code>my_ppx_deriver.ml</code> will look like:</p>
<pre><code><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ppxlib</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">str_type_decl_generator</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Deriving</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make_no_arg</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">attributes</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">expand_str</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">sig_type_decl_generator</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Deriving</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">make_no_arg</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">attributes</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">expand_sig</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">my_deriver</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Deriving</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">add</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">str_type_decl</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">str_type_decl_generator</span><span class="ocaml-source">
</span><span class="ocaml-source">    ~</span><span class="ocaml-source">sig_type_decl</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">sig_type_decl_generator</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">my_deriver</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span></code></pre>
<p>Which you'll need to compile with the following <code>library</code> stanza:</p>
<pre><code>(library
 (public_name my_ppx)
 (kind ppx_deriver)
 (libraries ppxlib))
</code></pre>
<p>The <code>Deriving.add</code> function is declared as:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">add</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">  ?</span><span class="ocaml-source">str_type_decl</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">structure</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">rec_flag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">type_declaration</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> ?</span><span class="ocaml-source">str_type_ext</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">structure</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">type_extension</span><span class="ocaml-source">                  </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> ?</span><span class="ocaml-source">str_exception</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">structure</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">extension_constructor</span><span class="ocaml-source">           </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> ?</span><span class="ocaml-source">sig_type_decl</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">signature</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">rec_flag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-source">type_declaration</span><span class="ocaml-source"> </span><span class="ocaml-source">list</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> ?</span><span class="ocaml-source">sig_type_ext</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">signature</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">type_extension</span><span class="ocaml-source">                  </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> ?</span><span class="ocaml-source">sig_exception</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">signature</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">extension_constructor</span><span class="ocaml-source">           </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Generator</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> ?</span><span class="ocaml-source">extension</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-source">loc</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-capital-identifier">Location</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">path</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">core_type</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">expression</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span></code></pre>
<p>It takes a mandatory string argument, here <code>"my_deriver"</code>, which defines how
user are going to invoke your deriver. In this case we'd need to add a <code>[@@deriving my_deriver]</code> to
a type declaration in a structure or a signature to use it.
Then there's just one optional argument per kind of node to which you can attach a <code>[@@deriving ...]</code>
attribute. <code>type_decl</code> correspond to <code>type = ...</code>, <code>type_ext</code> to <code>type += ...</code> and <code>exception</code> to
<code>exception My_exc of ...</code>.
You need to provide generators for the ones you wish your deriver to handle, <code>ppxlib</code>
will make sure users get a compile error if they try to use it elsewhere.
We can ignore the <code>extension</code> as it's just here for compatibility with <code>ppx_deriving</code>.</p>
<p>Now let's take a look at <code>Generator</code>. Its type is defined as <code>('output_ast, 'input_ast) t</code> where
<code>'input_ast</code> is the type of the node to which the <code>[@@deriving ...]</code> is attached and <code>'output_ast</code>
the type of the nodes it should produce, ie either a <code>structure</code> or a <code>signature</code>. The type of a
generator depends on the expand function it's built from when you use the smart constructor
<code>make_no_arg</code> meaning the expand function should have type
<code>loc: Location.t -&gt; path: string -&gt; 'input_ast -&gt; 'output_ast</code>. This function is the actual
implementation of your deriver and will generate the list of <code>structure_item</code> or <code>signature_item</code>
from the type declaration.</p>
<h4>Compatibility with <code>ppx_import</code></h4>
<p><a href="https://github.com/ocaml-ppx/ppx_import"><code>ppx_import</code></a> is a PPX rewriter that lets you import type
definitions and spares you the need to copy and update them every time they change upstream. The
main reason why you would want to do that is because you need to derive values from those types
using a deriver thus the importance of ensuring your deriving plugin is compatible.</p>
<p>Let's take an example to illustrate how <code>ppx_import</code> is used. I'm using a library called <code>blob</code>
which exposes a type <code>Blob.t</code>. For some reason I need to be able to serialize and deserialize
<code>Blob.t</code> values to JSON. I'd like to use a deriver to do that as I don't want to maintain that code
myself. Imagine <code>Blob.t</code> is defined as:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">id</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<p>Without <code>ppx_import</code> I would define somewhere a <code>serializable_blob</code> type as follows:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">serializable_blob</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Blob</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">id</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">deriving</span><span class="ocaml-source"> </span><span class="ocaml-source">yojson</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>That works well especially because the type definition is simple but I don't really care about
having it here, what I really want is just the <code>to_yojson</code> and <code>of_yojson</code> functions. Also now, if
the type definition changes, I have to update it here manually. Maintaining many such imports can be
tedious and duplicates a lot of code unnecessarily.</p>
<p>What I can do instead, thanks to <code>ppx_import</code> is to write it like this:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">serializable_blob</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">import</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Blob</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">deriving</span><span class="ocaml-source"> </span><span class="ocaml-source">yojson</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>which will ultimately be expanded into the above using <code>Blob</code>'s definition of the type <code>t</code>.</p>
<p>Now <code>ppx_import</code> works a bit differently from regular PPX rewriters as it needs a bit more information
than just the AST. We don't need to understand how it works but what it means is that if your
deriving plugin is used with <code>ppx_import</code>, it will be called twice:</p>
<ul>
<li>A first time with <code>ocamldep</code>. This is required to determine the dependencies of a module in terms
of other OCaml modules. PPX-es need to be applied here to find out about dependencies they may
introduce.</li>
<li>A second time before actually compiling the code.</li>
</ul>
<p>The issue here is that during the <code>ocamldep</code> pass, <code>ppx_import</code> doesn't have the information it
needs to import the type definition yet so it can't copy it and it expands:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">u</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-extension">%</span><span class="ocaml-keyword-other-extension">import</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">A</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>into:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">u</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">A</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span></code></pre>
<p>Only during the second pass will it actually expand it to the copied type definition.</p>
<p>This may be a concern if your deriving plugin can't apply to abstract types because you will
probably raise an error when encountering one, meaning the first phase will fail and the whole
compilation will fail without giving your rewriter a chance to derive anything from the copied
type definition.</p>
<p>The right way to deal with this is to have different a behaviour in the context of <code>ocamldep</code>.
In this case you can ignore such type declaration or eventually, if you know you are going to
inject new dependencies in your generated code, to create dummy values referencing them and just
behave normally in any other context.</p>
<p><code>ppxlib</code> versions <code>0.6.0</code> and higher allow you to do so through the <code>Deriving.Generator.V2</code> API
which passes an abstract <code>ctxt</code> value to your <code>expand</code> function instead of a <code>loc</code> and a <code>path</code>.
You can tell whether it is the <code>ocamldep</code> pass from within the <code>expand</code> function like this:</p>
<pre><code><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ppxlib</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">expand</span><span class="ocaml-source"> </span><span class="ocaml-source">~</span><span class="ocaml-source">ctxt</span><span class="ocaml-source"> </span><span class="ocaml-source">input_ast</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">omp_config</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Expansion_context</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Deriver</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">omp_config</span><span class="ocaml-source"> </span><span class="ocaml-source">ctxt</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">is_ocamldep_pass</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ocamldep</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-source">omp_config</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Migrate_parsetree</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Driver</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">tool_name</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<h4>Deriver attributes</h4>
<p>You'll have noted the <code>attributes</code> parameter in the examples. It's an optional parameter that lets
you define which attributes your deriver allows the user to attach to various bits of the type,
type extension or exception declaration it is applied to.</p>
<p><code>ppxlib</code> comes with a <code>Attribute</code> module that lets you to properly declare the attributes you want
to allow and make sure they are properly used: correctly spelled, placed and with the right
payload attached. This is especially useful since attributes are by default ignored by the compiler
meaning without <code>ppxlib</code>'s care, plugin users wouldn't get any errors if they misused an attribute
and it might take them a while to figure out they got it wrong and the generated code wasn't
impacted as they hoped.
The <code>Attribute</code> module offers another great feature: <code>Attribute.t</code> values can be used to extract the
attribute payload from an AST node if it is present. That will spare you the need for
inspecting attributes yourself which can prove quite tedious.</p>
<p><code>Ppxlib.Attribute.t</code> is defined as <code>('context, 'payload) t</code> where <code>'context</code> describes to which node
the attribute can be attached and <code>'payload</code>, the type of its payload.
To build such an attribute you must use <code>Ppxlib.Attribute.declare</code>:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">declare</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">  </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Context</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">payload</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'b</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'c</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'b</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-storage-type">'a</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'c</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source">
</span></code></pre>
<p>Let's try to declare the <code>default</code> argument from <code>ppx_deriving_yojson</code> I mentioned earlier.</p>
<p>The first <code>string</code> argument is the attribute name. <code>ppxlib</code> support namespaces for the attributes so
that users can avoid conflicting attributes between various derivers applied to the same type
definitions. For instance here we could use <code>"default"</code>. It can prove helpful to use more qualified
name such as <code>"ppx_deriving_yojson.of_yojson.default"</code>. That means that our attribute can be used as
<code>[@@default ...]</code>, <code>[@@of_yojson.default ...]</code> or <code>[@@ppx_deriving.of_yojson.default ...]</code>.
Now if another deriver uses a <code>[@@default ...]</code>, users can apply both derivers and provide different
<code>default</code> values to the different derivers by writing:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">make.default</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">abc</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@</span><span class="ocaml-keyword-other-attribute">of_yojson.default</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">[</span><span class="ocaml-keyword-operator-attribute">@@</span><span class="ocaml-keyword-other-attribute">deriving</span><span class="ocaml-source"> </span><span class="ocaml-source">make</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source">of_yojson</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<p>The context argument works very similarly to the one in <code>Extension.declare</code>. Here we want the
attribute to be attached to record field declarations so we'll use
<code>Attribute.Context.label_declaration</code> which has type <code>label_declaration Attribute.Context.t</code>.</p>
<p>The pattern argument is an <code>Ast_pattern.t</code>. Now that we know how to work with those this is pretty
easy. Here we need to accept any expression as a payload since we should be able to apply the
<code>default</code> attribute to any field, regardless of its type and we want to extract that expression from
the payload so we can use it in our deserializer so let's use
<code>Ast_pattern.(single_expr_payload __)</code>.</p>
<p>Finally the last <code>'b</code> argument has the same type as the pattern consumer function. We can use it to
transform what we extracted using the previous <code>Ast_pattern</code> but in this case we just want to
keep the expression as we got it so we'll just use the identity function here.</p>
<p>We end up with the following:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">default_attribute</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Attribute</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">declare</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">ppx_deriving_yojson.of_yojson.default</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Attribute</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Context</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">label_declaration</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-language-capital-identifier">Ast_pattern</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">(</span><span class="ocaml-source">single_expr_payload</span><span class="ocaml-source"> </span><span class="ocaml-source">__</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">(</span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">expr</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">expr</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>and that gives us a <code>(label_declaration, expression) Attribute.t</code>.</p>
<p>You can then use it to collect the attribute payload from a label_declaration:</p>
<pre><code><span class="ocaml-constant-language-capital-identifier">Attribute</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">default_attribute</span><span class="ocaml-source"> </span><span class="ocaml-source">label_decl</span><span class="ocaml-source">
</span></code></pre>
<p>which will return <code>Some expr</code> if the attribute was attached to <code>label_decl</code> or <code>None</code> otherwise.</p>
<p>Because of their polymorphic nature, attributes need to be packed, ie to be wrapped with a variant
to hide the type parameter, so if you want to pass it to <code>Generator.make_no_arg</code> you'll have to do
it like this:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">attributes</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-constant-language-capital-identifier">Attribute</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">T</span><span class="ocaml-source"> </span><span class="ocaml-source">default_attribute</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<h3>Writing your expand functions</h3>
<p>In the two last sections I mentioned <code>expand</code> functions that would contain the actual <code>deriver</code> or
<code>extension</code> implementation but didn't actually said anything about how to write those. It will
depend a lot on the purpose of your PPX rewriter and what you're trying to achieve.</p>
<p>Before writing your PPX you should clearly specify what it should be applied to and what code it
should produce. That will help you declaring the right deriving or extension rewriter and from there
you'll know the type of the <code>expand</code> functions you have to write which should help.</p>
<p>A good way to proceed is to use the <code>dumpast</code> tool to pretty print the AST fragments of both the
input of your expander and the output, ie the code it should generate. To take a concrete example,
say you want to write a deriving plugin that generates an <code>equal</code> function from a type definition.
You can start by running <code>dumpast</code> on the following file:</p>
<pre><code><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">some_record</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int64</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">equal_some_record</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-source"> </span><span class="ocaml-source">r'</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Int64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-source">r'</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&amp;&amp;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">equal</span><span class="ocaml-source"> </span><span class="ocaml-source">r</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-source">r'</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">b</span><span class="ocaml-source">
</span></code></pre>
<p>That will give you the AST representation of a record type definition and the equal function you
want to write so you can figure out how to deconstruct your expander's input to be able to generate
the right output.</p>
<p><code>ppxlib</code> exposes smart constructors in <code>Ppxlib.Ast_builder.Default</code> to help you build AST fragments
without having to care too much attributes and such fields as well as some convenience constructors
to keep your code concise and readable.</p>
<p>Another convenience tool <code>ppxlib</code> exposes to help you build AST fragments is <code>metaquot</code>. I recently
wrote a bit of documentation about it
<a href="https://ppxlib.readthedocs.io/en/latest/ppx-for-plugin-authors.html#metaquot">here</a> which you
should take a look at but to sum it up <code>metaquot</code> is a PPX extension allowing you to write AST nodes
using the OCaml syntax they describe instead of the AST types.</p>
<h4>Handling code locations in a PPX rewriter</h4>
<p>When building AST fragments you should keep in mind that you have to set their <code>location</code>. Locations
are part of the AST values that describes the position of the corresponding node in your source
file, including the file name and the line number and offset of both the beginning and the end the
code bit they represent.</p>
<p>Because your code was generated after the file was parsed, it doesn't have a location so you need to
set it yourself. One could think that it doesn't matter and we could use a dummy location but
locations are used by the compiler to properly report errors and that's why a PPX rewriter should care
about how it locates the generated code as it will help the end user to understand whether the error
comes from their code or generated code and how to eventually fix it.</p>
<p>Both <code>Ast_builder</code> and <code>metaquot</code> expect a location. The first explicitly takes it as a labelled
<code>loc</code> argument while the second relies on a <code>loc</code> value being available in the scope. It is
important to set those with care as errors in the generated code doesn't necessarily mean that your
rewriter is bugged. There are valid cases where your rewriter functioned as intended but the generated
code triggers an error. PPX-es often work on the assumption that some values are available in the
scope, if the user doesn't properly provide those it's their responsibility to fix the error. To
help them do so, it is important to properly locate the generated code to guide them as much as
possible.</p>
<p>When writing extensions, using the whole extension point location for the generated code makes
perfect sense as that's where the code will sit. That's fairly easy as this what <code>ppxlib</code> passes
to the expand function through the <code>loc</code> labelled argument. For deriving plugins it's a bit different
as the generated code doesn't replace an existing part of the parsed AST but generate a new one to insert.
Currently <code>ppxlib</code> gives you the <code>loc</code> of the whole type declaration, extension or exception
declaration your deriving plugin is applied to. Ideally it would be nice to be able to locate the
generated code on the plugin name in the <code>deriving</code> attribute payload, ie here:</p>
<pre><code>[@@deriving my_plugin,another_plugin]
            ^^^^^^^^^
</code></pre>
<p>I'm currently working on making that location available to the <code>expand</code> function. In the meantime,
you should choose a convention. I personally locate all the generated code on the
type declaration. Some choose to locate the generated code on the part of the input AST they're
handling when generating it.</p>
<h4>Reporting errors to your rewriter users</h4>
<p>You won't always be able to handle all the AST nodes passed to your expand functions, either because the
end user misused your rewriter or because there are some cases you simply can't deal with.</p>
<p>In those cases you can report the error to the user with <code>Ppxlib.Location.raise_errorf</code>. It works
similarly to <code>printf</code> and you can build your error message from a format string and extra
arguments. It will then raise an exception which will be caught and reported by the compiler.
A good practice is to prefix the error message with the name of your rewriter to help users understand
what's going on, especially with deriving plugin as they might use several of them on the same type
declaration.</p>
<p>Another point to take care of here is, again, locations. <code>raise_errorf</code> takes a labelled <code>loc</code>
arguments. It is used so that your error is reported as any compiler error. Having good locations in
those error messages is just as important as sending clear error messages. Keep in mind that both
the errors you report yourself or errors coming from your generated code will be highlighted by
merlin so when properly set they make it much easier to work with your PPX rewriter.</p>
<h3>Testing your PPX</h3>
<p>Just as most pieces of code do, a PPX deserves to be tested and it has become easier over the years to
test rewriters.</p>
<p>I personally tend to write as many unit test as possible for my PPX-es internal libraries. I try to
extract helper functions that can easily be unit-tested but I can't test it all that way.
Testing the <code>ast -&gt; ast</code> functions would be tedious as <code>ppxlib</code> and <code>ocaml-migrate-parsetree</code>
don't provide comparison and pretty printing functions that you can use with <code>alcotest</code> or <code>oUnit</code>.
That means you'd have to import the AST types and derive them on your own. That would make a lot
of boiler plate and even if those functions were exposed, writing such tests would be really
tedious. There's a lot of things to take into account. How are you going to build the input AST values
for instance?  If you use <code>metaquot</code>, every node will share the same loc, making it hard to test
that your errors are properly located. If you don't, you will end up with insanely long and
unreadable test code or fixtures.
While that would allow extremely accurate testing for the generated code and errors, it will almost
certainly make your test code unmaintainable, at least given the current tooling.</p>
<p>Don't panic, there is a very good and simple alternative. <code>ppxlib</code> makes it very easy to build a
binary that will parse OCaml code, preprocess the AST with your rewriter and spit it out, formatted as
code again.</p>
<p>You just have to write the following <code>pp.ml</code>:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ppxlib</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-constant-language-capital-identifier">Driver</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">standalone</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<p>and build the binary with the following <code>dune</code> stanza, assuming your rewriter is called
<code>my_ppx_rewriter</code>:</p>
<pre><code>(executable
 (name pp)
 (modules pp)
 (libraries my_ppx_rewriter ppxlib))
</code></pre>
<p>Because we're humans and the OCaml syntax is meant for us to write and read, it makes for much better
test input/output. You can now write your test input in a regular <code>.ml</code> file, use the <code>pp.exe</code>
binary to "apply" your preprocessor to it and compare the output with another <code>.ml</code> file containing
the code you expect it to generate. This kind of test pattern is really well supported by <code>dune</code>
thanks to the <code>diff</code> user action.</p>
<p>I usually have the following files in a <code>rewriter</code>/<code>deriver</code> folder within my test directory:</p>
<pre><code>test/rewriter/
├── dune
├── test.expected.ml
├── pp.ml
└── test.ml
</code></pre>
<p>Where <code>pp.ml</code> is used to produce the rewriter binary, <code>test.ml</code> contains the input OCaml code and
<code>test.expected.ml</code> the result of preprocessing <code>test.ml</code>. The dune file content is generally similar
to this:</p>
<pre><code>(executable
 (name pp)
 (modules pp)
 (libraries my_ppx_rewriter ppxlib))

(rule
 (targets test.actual.ml)
 (deps (:pp pp.exe) (:input test.ml))
 (action (run ./%{pp} -deriving-keep-w32 both --impl %{input} -o %{targets})))

(alias
 (name runtest)
 (action (diff test.expected.ml test.actual.ml)))

(test
  (name test)
  (modules test)
  (preprocess (pps my_ppx_rewriter)))
</code></pre>
<p>The first stanza is the one I already introduced above and specifies how to build the rewriter binary.</p>
<p>The <code>rule</code> stanza that comes after that indicates to <code>dune</code> how to produce the actual test output by
applying the rewriter binary to <code>test.ml</code>. You probably noticed the <code>-deriving-keep-w32 both</code> CLI
option passed to <code>pp.exe</code>. By default, <code>ppxlib</code> will generate values or add attributes so that your
generated code doesn't trigger a "Unused value" warning. This is useful in real life situation but
here it will just pollute the test output and make it harder to read so we disable that feature.</p>
<p>The following <code>alias</code> stanza is where all the magic happens. Running <code>dune runtest</code> will now
generate <code>test.actual.ml</code> and compare it to <code>test.expected.ml</code>. It will not only do that but show
you how they differ from each other in a diff format. You can then automatically update
<code>test.expected.ml</code> if you're happy with the results by running <code>dune promote</code>.</p>
<p>Finally the last <code>test</code> stanza is there to ensure that the generated code compiles without type
errors.</p>
<p>This makes a very convenient test setup to write your PPX-es TDD style. You can start by writing an
identity PPX, that will just return its input AST as it is. Then you add some OCaml code using your
soon to be PPX in <code>test.ml</code> and run <code>dune runtest --auto-promote</code> to prefill <code>test.expected.ml</code>.
From there you can start implementing your rewriter and run <code>dune runtest</code> to check on your progress
and update the expected result with <code>dune promote</code>.
Going pure TDD by writing the test works but it's tricky cause you'd have to format your code the
same way <code>pp.exe</code> will format the AST. It would be great to be able to specify how to format
the generated <code>test.actual.ml</code> so that this approach would be more viable and the diff more
readable. Being able to use ocamlformat with a very diff friendly configuration would be great
there. <code>pp.exe</code> seems to offer CLI options to change the code style such as <code>-styler</code> but I haven't
had the chance to experiment with those yet.</p>
<p>Now you can test successful rewriting this way but what about errors? There's a lot of value
ensuring you produce the right errors and on the right code location because that's the kind of
things you can get wrong when refactoring your rewriter code or when people try to contribute.
That isn't as likely to happen if your CI yells when you break the error reporting. So how do we do
that?</p>
<p>Well pretty much the exact same way! We write a file with an erroneous invocation of our rewriter,
run <code>pp.exe</code> on it and compare stderr with what we expect it to be.
There are two major differences here. First we want to collect the stderr output of the rewriter
binary instead of using it to generate a file. The second is that we cant write all of our test
cases in a single file since <code>pp.exe</code> will stop at the first error. That means we need one <code>.ml</code>
file per error test case.
Luckily for us, dune offers ways to do both.</p>
<p>For every error test file we will want to add the following stanzas:</p>
<pre><code>(rule
  (targets test_error.actual)
  (deps (:pp pp.exe) (:input test_error.ml))
  (action
    (with-stderr-to
      %{targets}
      (bash "./%{pp} -no-color --impl %{input} || true")
    )
  )
)

(alias
  (name runtest)
  (action (diff test_error.expected test_error.actual))
)
</code></pre>
<p>but obviously we don't want to do that by hand every time we add a new test case so we're gonna need
a script to generate those stanzas and then include them into our <code>dune</code> file using
<code>(include dune.inc)</code>.</p>
<p>To achieve that while keeping things as clean as possible I use the following directory structure:</p>
<pre><code>test/rewriter/
├── errors
│&nbsp;&nbsp; ├── dune
│&nbsp;&nbsp; ├── dune.inc
│&nbsp;&nbsp; ├── gen_dune_rules.ml
│&nbsp;&nbsp; ├── pp.ml
│&nbsp;&nbsp; ├── test_some_error.expected
│&nbsp;&nbsp; ├── test_some_error.ml
│&nbsp;&nbsp; ├── test_some_other_error.expected
│&nbsp;&nbsp; └── test_some_other_error.ml
├── dune
├── test.expected.ml
├── pp.ml
└── test.ml
</code></pre>
<p>Compared to our previous setup, we only added the new <code>errors</code> folder. To keep things simple it has
its own <code>pp.ml</code> copy but in the future I'd like to improve it a bit and be able to use the same
<code>pp.exe</code> binary.</p>
<p>The most important files here are <code>gen_dune_rules.ml</code> and <code>dune.inc</code>. The first is just a simple
OCaml script to generate the above stanzas for each test cases in the <code>errors</code> directory. The second
is the file we'll include in the main <code>dune</code>. It's also the file to which we'll write the generated
stanza.</p>
<p>I personally use the following <code>gen_dune_rules.ml</code>:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">output_stanzas</span><span class="ocaml-source"> </span><span class="ocaml-source">filename</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">base</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Filename</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">remove_extension</span><span class="ocaml-source"> </span><span class="ocaml-source">filename</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Printf</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">printf</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-string-quoted-braced">{|</span><span class="ocaml-string-quoted-braced">
</span><span class="ocaml-string-quoted-braced">(library
</span><span class="ocaml-string-quoted-braced">  (name %s)
</span><span class="ocaml-string-quoted-braced">  (modules %s)
</span><span class="ocaml-string-quoted-braced">  (preprocess (pps ppx_yojson))
</span><span class="ocaml-string-quoted-braced">)
</span><span class="ocaml-string-quoted-braced">
</span><span class="ocaml-string-quoted-braced">(rule
</span><span class="ocaml-string-quoted-braced">  (targets %s.actual)
</span><span class="ocaml-string-quoted-braced">  (deps (:pp pp.exe) (:input %s.ml))
</span><span class="ocaml-string-quoted-braced">  (action
</span><span class="ocaml-string-quoted-braced">    (with-stderr-to
</span><span class="ocaml-string-quoted-braced">      %%{targets}
</span><span class="ocaml-string-quoted-braced">      (bash "./%%{pp} -no-color --impl %%{input} || true")
</span><span class="ocaml-string-quoted-braced">    )
</span><span class="ocaml-string-quoted-braced">  )
</span><span class="ocaml-string-quoted-braced">)
</span><span class="ocaml-string-quoted-braced">
</span><span class="ocaml-string-quoted-braced">(alias
</span><span class="ocaml-string-quoted-braced">  (name runtest)
</span><span class="ocaml-string-quoted-braced">  (action (diff %s.expected %s.actual))
</span><span class="ocaml-string-quoted-braced">)
</span><span class="ocaml-string-quoted-braced">|}</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">base</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">base</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">base</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">base</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">base</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">base</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">is_error_test</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">pp.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">gen_dune_rules.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-source">filename</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Filename</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">check_suffix</span><span class="ocaml-source"> </span><span class="ocaml-source">filename</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">.ml</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-constant-language-capital-identifier">Sys</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">readdir</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">.</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Array</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">to_list</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sort</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">compare</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">filter</span><span class="ocaml-source"> </span><span class="ocaml-source">is_error_test</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-operator">|&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">List</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">iter</span><span class="ocaml-source"> </span><span class="ocaml-source">output_stanzas</span><span class="ocaml-source">
</span></code></pre>
<p>Nothing spectacular here, we just build the list of all the <code>.ml</code> files in the directory except
<code>pp.ml</code> and <code>gen_dune_rules.ml</code> itself and then generate the right stanzas for each of them. You'll
note the extra <code>library</code> stanza which I add to get dune to generate the right <code>.merlin</code> so that I
can see the error highlights when I edit the files by hand.</p>
<p>With that we're almost good, add the following to the <code>dune</code> file and you're all set:</p>
<pre><code>(executable
  (name pp)
  (modules pp)
  (libraries
    ppx_yojson
    ppxlib
  )
)

(include dune.inc)

(executable
  (name gen_dune_rules)
  (modules gen_dune_rules)
)

(rule
  (targets dune.inc.gen)
  (deps
    (:gen gen_dune_rules.exe)
    (source_tree .)
  )
  (action
    (with-stdout-to
      %{targets}
      (run %{gen})
    )
  )
)

(alias
  (name runtest)
  (action (diff dune.inc dune.inc.gen))
)
</code></pre>
<p>The first stanza is here to specify how to build the rewriter binary, same as before, while the
second stanza just tells dune to include the content of <code>dune.inc</code> within this <code>dune</code> file.</p>
<p>The interesting part comes next. As you can guess the <code>executable</code> stanza builds our little OCaml
script into a <code>.exe</code>. The <code>rule</code> that comes after that specifies how to generate the new stanzas
by running <code>gen_dune_rules</code> and capturing its standard output into a <code>dune.inc.gen</code> file.
The last rule allows you to review the changes to the generated stanza and use promotion to accept
them. Once this is done, the new stanzas will be included to the <code>dune</code> file and the test will be
run for every test cases.</p>
<p>Adding a new test case is then pretty easy, you can simply run:</p>
<pre><code>$ touch test/rewriter/errors/some_explicit_test_case_name.{ml,expected} &amp;&amp; dune runtest --auto-promote
</code></pre>
<p>That will create the new empty test case and update the <code>dune.inc</code> with the corresponding rules.
From there you can proceed the same way as with the successful rewriting tests, update the <code>.ml</code>,
run <code>dune runtest</code> to take a sneak peek at the output and <code>dune promote</code> once you're satisfied with
the result.</p>
<p>I've been pretty happy with this setup so far although there's room for improvement. It would be
nice to avoid duplicating <code>pp.ml</code> for errors testing. This also involves
quite a bit of boilerplate that I have to copy into all my PPX rewriters repositories every time.
Hopefully <a href="https://github.com/ocaml/dune/issues/1855">dune plugins</a> should help with that and I
can't wait for a first version to be released so that I can write a plugin to make this test
pattern more accessible and easier to set up.</p>
]]></description><link>https://tarides.com/blog/2019-05-09-an-introduction-to-ocaml-ppx-ecosystem</link><guid isPermaLink="false">https://tarides.com/blog/2019-05-09-an-introduction-to-ocaml-ppx-ecosystem.html</guid><dc:creator><![CDATA[ Nathan Rebours ]]></dc:creator><pubDate>Thu, 09 May 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[7th MirageOS hack retreat]]></title><description><![CDATA[<p>Let's talk sun, mint tea and OCaml: Yes, you got it, the <a href="https://retreat.mirage.io">MirageOS biennial retreat</a> at Marrakesh!</p>
<p>For the 7th iteration of the retreat, the majority of the Tarides team took part in the trip to the camels country.
This is a report about what we produced and enjoyed while there.</p>
<h2>Charles-Edouard Lecat</h2>
<p>That's it, my first MirageOS retreat is coming soon, let's jump in the plane and here I come. After a nice cab trip and an uncountable number of similar streets, I'm finally at the Riad which will host me for the next 5 days.</p>
<p>It now begins the time to do what I came for: Code, Eat, Sleep and Repeat</p>
<p>I mostly worked on <a href="https://github.com/mirage/colombe">Colombe</a>, the OCaml implementation of the SMTP protocol for which I developed a simple client.
Except some delayed problems (like the integration of the MIME protocol, the TLS wrapping and some others), the client was working perfectly :)
Implementing it was actually really easy as the core of the SMTP protocol was done by @dinosaure who developed over time a really nice way of implementing this kind of API. And as I spend most of my time at Tarides working on his code, I feel really comfortable with it.</p>
<p>One of the awesome thing about this retreat was the people who came: There was so many interesting people, doing various thing, so each time someone had an interrogation, you could almost be sure that someone could help you in a way or another.</p>
<p>But sadly, as I arrived few days after everyone, and just before the week-end, the time flew away reaaaaaally fast, and I did not have the time to do some major code, but I'm already looking forward to the next retreat which, I am sure, will be even more fruitful and attract a lot of nice OCaml developers.</p>
<p>Until then, I will just dream about the awesome food I ate there ;)</p>
<h2>Lucas Pluvinage</h2>
<p>Second Mirage retreat for me, and this time I had plans: make a small web game with Mirage hosted by an ESP32 device. I figured out that there was not canonical way to make an HTTP/Websocket server with Mirage and I didn't want to stick to a particular library.</p>
<p>Instead, I took my time to develop <code>mirage-http</code>, an abstraction of HTTP that can either have <code>cohttp</code> or <code>httpaf</code> as a backend. On top of that, I've build <code>mirage-websocket</code> which is therefore an HTTP server-independant implementation of websockets (indeed this has a lot of redundancies with <code>ocaml-websocket</code> but for now it's a proof of concept). While making all this I discussed with @anmonteiro who's the Webservers/protocols expert for Mirage ! However I didn't have the time to build something on top of that, but this is still something that I would like achieve at some point.</p>
<p>I also became the "dune guy" as I'm <a href="https://github.com/mirage/mirage/issues/969">working on the Mirage/dune integration</a>, and helped some people with their build system struggles.</p>
<p>It was definitely a rich week, I've learnt a lot of things, enjoyed the sun, ate good food and contributed to the Mirage universe !</p>
<h2>Jules Aguillon</h2>
<p>This was my first retreat.
It was the occasion to meet OCaml developers from all over the world.
The food was great and the weather perfect.</p>
<p>I submitted some PRs to the OCaml compiler !</p>
<ul>
<li>Hint on type error on int literal <a href="https://github.com/ocaml/ocaml/pull/2301">PR #2301</a>.
It's adding an hint when using <code>int</code> literals instead of other number literals (eg. <code>3</code> instead of <code>3.</code> or <code>3L</code>):</li>
</ul>
<pre><code>Line 2, characters 20-21:
2 | let _ = Int32.add a 3
                        ^
Error: This expression has type int but an expression was expected of type
          int32
        Hint: Did you mean `3l'?
</code></pre>
<ul>
<li>Hint on type error on int operators <a href="https://github.com/ocaml/ocaml/pull/2307">PR #2307</a>. Hint the user when using numerical operators for ints (eg. <code>+</code>) on other kind of numbers (eg. <code>float</code>, <code>int64</code>, etc..). For example:</li>
</ul>
<pre><code>Line 8, characters 8-9:
8 | let _ = x + 1.
            ^
Error: This expression has type float but an expression was expected of type
          int
Line 8, characters 10-11:
8 | let _ = x + 1.
              ^
  Hint: Did you mean to use `+.'?
</code></pre>
<ul>
<li>
<p>Clean up int literal hint <a href="https://github.com/ocaml/ocaml/pull/2313">PR #2313</a>. A little cleanup of the 2 previous PRs.</p>
</li>
<li>
<p>Hint when the expected type is wrapped in a ref <a href="https://github.com/ocaml/ocaml/pull/2319">PR #2319</a>. An other PR adding an hint: When the user forgot to use the <code>!</code> operator on <code>ref</code> values:</p>
</li>
</ul>
<pre><code>Line 2, characters 8-9:
2 | let b = a + 1
            ^
Error: This expression has type int ref
        but an expression was expected of type int
  Hint: This is a `ref', did you mean `!a'?
</code></pre>
<p>The first 3 are merged now.</p>
<h2>Gabriel de Perthuis</h2>
<p>For this retreat my plan was to do something a little different and work on Solo5.</p>
<p><a href="https://github.com/mirage/wodan">Wodan</a>, the storage layer I'm working on,
needs two things from its backends which are not commonly implemented:</p>
<ul>
<li>support for discarding unused blocks (first implemented in mirage-block-unix), and</li>
<li>support for barriers, which are ordering constraints between writes</li>
</ul>
<p>Solo5 provides relevant mirage backends, which are themselves provided by various
virtualised implementations.  Discard was added to most of those, at least those
that were common enough to be easily tested; we just added an "operation not supported"
error code for the other cases.</p>
<p>The virtio implementation was interesting; recent additions to the spec allow discard
support, but few virtual machine managers actually implement that on the backend side.
I tried to integrate with the Chromium OS "crosvm" for that, and had a good time
figuring out how it found the bootloader entry point (turns out the cpu was happily
skipping past invalid instructions to find a slightly misaligned entry point), but
ran out of time to figure out the rest of the integration, which seemed to be more
complex that anticipated.  Because of this virtio discard support will be skipped over
for now.</p>
<p>I also visited the souk, which was an interesting experience.
Turns out I'm bad at haggling, but I brought back interesting things anyway.</p>
<h2>Conclusion</h2>
<p>We'd like to thank Hannes Mehnert who organized this retreat and all the attendees who contributed to make it fruitful and inspiring.
You want to take part in the next MirageOS retreat? Stay tuned <a href="https://retreat.mirage.io">here</a>.</p>
]]></description><link>https://tarides.com/blog/2019-05-06-7th-mirageos-hack-retreat</link><guid isPermaLink="false">https://tarides.com/blog/2019-05-06-7th-mirageos-hack-retreat.html</guid><dc:creator><![CDATA[ Charles-Edouard Lecat ]]></dc:creator><pubDate>Mon, 06 May 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Dune 1.9.0]]></title><description><![CDATA[<p>Tarides is pleased to have contributed to the dune 1.9.0 release which
introduces the concept of library variants. Thanks to this update,
unikernels builds are becoming easier and faster in the MirageOS
universe! This also opens the door for a better cross-compilation
story, which will ease the addition of new MirageOS backends
(trustzone, ESP32, RISC-V, etc.)</p>
<p><em>This post has also been posted to the
<a href="https://dune.build/blog/dune-1-9-0/">Dune blog</a>.  See also the <a href="https://discuss.ocaml.org/t/ann-dune-1-9-0/3646">the discuss
forum</a> for more
details.</em></p>
<h2>Dune 1.9.0</h2>
<p>Changes include:</p>
<ul>
<li>Coloring in the watch mode (<a href="https://github.com/ocaml/dune/pull/1956">#1956</a>)</li>
<li><code>$ dune init</code> command to create or update project boilerplate (<a href="https://github.com/ocaml/dune/pull/1448">#1448</a>)</li>
<li>Allow "." in c_names and cxx_names (<a href="https://github.com/ocaml/dune/pull/2036">#2036</a>)</li>
<li>Experimental Coq support</li>
<li>Support for library variants and default implementations (<a href="https://github.com/ocaml/dune/pull/1900">#1900</a>)</li>
</ul>
<h2>Variants</h2>
<p>In dune 1.7.0, the concept of virtual library was introduced:
https://dune.build/blog/virtual-libraries/. This feature allows to
mark some abstract library as virtual, and then have several
implementations for it. These implementations could be for multiple
targets (<code>unix</code>, <code>xen</code>, <code>js</code>), using different algorithms, using C
code or not. However each implementation in a project dependency tree
had to be manually selected. Dune 1.9.0 introduces features for
automatic selection of implementations.</p>
<h3>Library variants</h3>
<p>Variants is a tagging mechanism to select implementations on the final
linking step. There's not much to add to make your implementation use
variants. For example, you could decide to design a <code>bar_js</code> library
which is the javascript implementation of <code>bar</code>, a virtual
library. All you need to do is specificy a <code>js</code> tag using the
<code>variant</code> option.</p>
<pre><code>(library
 (name bar_js)
 (implements bar)
 (variant js)); &lt;-- variant specification
</code></pre>
<p>Now any executable that depend on <code>bar</code> can automatically select the
<code>bar_js</code> library variant using the <code>variants</code> option in the dune file.</p>
<pre><code>(executable
 (name foo)
 (libraries bar baz)
 (variants js)); &lt;-- variants selection
</code></pre>
<h3>Common variants</h3>
<h4>Language selection</h4>
<p>In your projects you might want to trade off speed for portability:</p>
<ul>
<li><code>ocaml</code>: pure OCaml</li>
<li><code>c</code>: OCaml accelerated by C</li>
</ul>
<h4>JavaScript backend</h4>
<ul>
<li><code>js</code>: code aiming for a Node backend, using <code>Js_of_ocaml</code></li>
</ul>
<h3>Mirage backends</h3>
<p>The Mirage project (<a href="https://mirage.io/">mirage.io</a>) will make
extensive use of this feature in order to select the appropriate
dependencies according to the selected backend.</p>
<ul>
<li><code>unix</code>: Unikernels as Unix applications, running on top of <code>mirage-unix</code></li>
<li><code>xen</code>: Xen backend, on top of <code>mirage-xen</code></li>
<li><code>freestanding</code>: Freestanding backend, on top of <code>mirage-solo5</code></li>
</ul>
<h3>Default implementation</h3>
<p>To facilitate the transition from normal libraries into virtuals ones,
it's possible to specify an implementation that is selected by
default. This default implementation is selected if no implementation
is chosen after variant resolution.</p>
<pre><code>(library
 (name bar)
 (virtual_modules hello)
 (default_implementation bar_unix)); &lt;-- default implementation selection
</code></pre>
<h3>Selection mechanism</h3>
<p>Implementation is done with respect to some priority rules:</p>
<ul>
<li>manual selection of an implementation overrides everything</li>
<li>after that comes selection by variants</li>
<li>finally unimplemented virtual libraries can select their default implementation</li>
</ul>
<p>Libraries may depend on specific implementations but this is not
recommended. In this case, several things can happen:</p>
<ul>
<li>the implementation conflicts with a manually selected implementation: resolution fails.</li>
<li>the implementation overrides variants and default implementations: a cycle check is done and this either resolves or fails.</li>
</ul>
<h2>Conclusion</h2>
<p>Variant libraries and default implementations are fully <a href="https://dune.readthedocs.io/en/latest/variants.html">documented
here</a>. This
feature improves the usability of virtual libraries.</p>
<p>This
<a href="https://github.com/dune-universe/mirage-entropy/commit/576d25d79e3117bba64355ae73597651cfd27631">commit</a>
shows the amount of changes needed to make a virtual library use
variants.</p>
<h3>Coq support</h3>
<p>Dune now supports building Coq projects. To enable the experimental Coq
extension, add <code>(using coq 0.1)</code> to your <code>dune-project</code> file. Then,
you can use the <code>(coqlib ...)</code> stanza to declare Coq libraries.</p>
<p>A typical <code>dune</code> file for a Coq project will look like:</p>
<pre><code>(include_subdirs qualified) ; Use if your development is based on sub directories

(coqlib
  (name Equations)                  ; Name of wrapper module
  (public_name equations.Equations) ; Generate an .install file
  (synopsis "Equations Plugin")     ; Synopsis
  (libraries equations.plugin)      ; ML dependencies (for plugins)
  (modules :standard \ IdDec)       ; modules to build
  (flags -w -notation-override))    ; coqc flags
</code></pre>
<p>See the <a href="https://github.com/ocaml/dune/blob/1.9/doc/coq.rst">documentation of the
extension</a> for more
details.</p>
<h3>Credits</h3>
<p>This release also contains many other changes and bug fixes that can
be found on the <a href="https://discuss.ocaml.org/t/ann-dune-1-9-0/3646">discuss
announce</a>.</p>
<p>Special thanks to dune maintainers and contributors for this release:
<a href="https://github.com/rgrinberg">@rgrinberg</a>,
<a href="https://github.com/emillon">@emillon</a>,
<a href="https://github.com/shonfeder">@shonfeder</a>
and <a href="https://github.com/ejgallego">@ejgallego</a>!</p>
]]></description><link>https://tarides.com/blog/2019-04-10-dune-1-9-0</link><guid isPermaLink="false">https://tarides.com/blog/2019-04-10-dune-1-9-0.html</guid><dc:creator><![CDATA[ Lucas Pluvinage ]]></dc:creator><pubDate>Wed, 10 Apr 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Release of OCamlFormat 0.9]]></title><description><![CDATA[<p>We are pleased to announce the release of OCamlFormat (available on opam).
There have been numerous changes since the last release,
so here is a comprehensive list of the new features and breaking changes to help the transition from OCamlFormat 0.8.</p>
<h2>Additional dependencies</h2>
<p>OCamlFormat now requires:</p>
<ul>
<li>ocaml &gt;= 4.06 (up from 4.04.1)</li>
<li>dune &gt;= 1.1.1</li>
<li>octavius &gt;= 1.2.0</li>
<li>uutf</li>
</ul>
<p>OCamlFormat_Reason now requires:</p>
<ul>
<li>ocaml &gt;= 4.06</li>
<li>dune &gt;= 1.1.1</li>
<li>ocaml-migrate-parsetree &gt;= 1.0.10 (up from 1.0.6)</li>
<li>octavius &gt;= 1.2.0</li>
<li>uutf</li>
<li>reason &gt;= 3.2.0 (up from 1.13.4)</li>
</ul>
<h2>New preset profiles</h2>
<p>The <code>ocamlformat</code> profile aims to take advantage of the strengths of a parsetree-based auto-formatter,
and to limit the consequences of the weaknesses imposed by the current implementation.
This is a style which optimizes for what the formatter can do best, rather than to match the style of any existing code.
General guidelines that have directed the design include:</p>
<ul>
<li>Legibility, in the sense of making it as hard as possible for quick visual parsing to give the wrong interpretation,
is of highest priority;</li>
<li>Whenever possible the high-level structure of the code should be obvious by looking only at the left margin,
in particular, it should not be necessary to visually jump from left to right hunting for critical keywords, tokens, etc;</li>
<li>All else equal compact code is preferred as reading without scrolling is easier,
so indentation or white space is avoided unless it helps legibility;</li>
<li>Attention has been given to making some syntactic gotchas visually obvious.
<code>ocamlformat</code> is the new default profile.</li>
</ul>
<p>The <code>conventional</code> profile aims to be as familiar and "conventional" appearing as the available options allow.</p>
<p>The <code>default</code> profile is <code>ocamlformat</code> with <code>break-cases=fit</code>.
<code>default</code> is deprecated and will be removed in version 0.10.</p>
<h2>OCamlFormat diff tool</h2>
<p><code>ocamlformat-diff</code> is a tool that uses OCamlFormat to apply the same formatting to compared OCaml files,
so that the formatting differences between the two files are not displayed.
Note that <code>ocamlformat-diff</code> comes in a separate opam package and is not included in the <code>ocamlformat</code> package.</p>
<p>The file comparison is then performed by any diff backend.</p>
<p>The options' documentation is available through <code>ocamlformat-diff --help</code>.</p>
<p>The option <code>--diff</code> allows you to configure the diff command that is used to compare the formatted files.
The default value is the vanilla <code>diff</code>, but you can also use <code>patdiff</code> or any other similar comparison tool.</p>
<p><code>ocamlformat-diff</code> can be integrated with <code>git diff</code>,
as explained in the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/0.9/tools/ocamlformat-diff/README.md">online documentation</a>.</p>
<h2>Formatting docstrings</h2>
<p>Previously, the docstrings <code>(** This is a docstring *)</code> could only be formatted like regular comments,
a new option <code>--parse-docstrings</code> has been added so that docstrings can be nicely formatted.</p>
<p>Here is a small example:</p>
<pre><code><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> {1 Printers and escapes used by Cmdliner module} </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">subst_vars</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">subst</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source">(</span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-source">option</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Buffer</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> [subst b ~subst s], using [b], substitutes in [s] variables of the form
</span><span class="ocaml-comment-doc">    </span><span class="ocaml-comment-doc">"</span><span class="ocaml-comment-doc">$(doc)</span><span class="ocaml-comment-doc">"</span><span class="ocaml-comment-doc"> by their [subst] definition. This leaves escapes and markup
</span><span class="ocaml-comment-doc">    directives $(markup,...) intact.
</span><span class="ocaml-comment-doc">    @raise Invalid_argument in case of illegal syntax. </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span></code></pre>
<p>Note that this option is disabled by default and you have to set it manually by adding <code>--parse-docstrings</code> to your command line
or <code>parse-docstrings=true</code> to your <code>.ocamlformat</code> file.
If you get the following error message:</p>
<pre><code>Error: Formatting of (** ... *) is unstable (e.g. parses as a list or not depending on the margin), please tighten up this comment in the source or disable the formatting using the option --no-parse-docstrings.
</code></pre>
<p>It means the original docstring cannot be formatted (e.g. because it does not comply with the odoc syntax)
and you have to edit it or disable the formatting of docstrings.</p>
<p>Of course if you think your docstring complies with the odoc syntax and there might be a bug in OCamlFormat,
<a href="https://github.com/ocaml-ppx/ocamlformat/issues">feel free to file an issue on github</a>.</p>
<h2>Print the configuration</h2>
<p>The new <code>--print-config</code> flag prints the configuration determined by the environment variable,
the configuration files, preset profiles and command line. Attributes are not considered.</p>
<p>It provides the full list of options with the values they are set to, and the source of this value.
For example <code>ocamlformat --print-config</code> prints:</p>
<pre><code>profile=ocamlformat (file .ocamlformat:1)
quiet=false (profile ocamlformat (file .ocamlformat:1))
max-iters=10 (profile ocamlformat (file .ocamlformat:1))
comment-check=true (profile ocamlformat (file .ocamlformat:1))
wrap-fun-args=true (profile ocamlformat (file .ocamlformat:1))
wrap-comments=true (file .ocamlformat:5)
type-decl=compact (profile ocamlformat (file .ocamlformat:1))
space-around-collection-expressions=false (profile ocamlformat (file .ocamlformat:1))
single-case=compact (profile ocamlformat (file .ocamlformat:1))
sequence-style=separator (profile ocamlformat (file .ocamlformat:1))
parse-docstrings=true (file .ocamlformat:4)
parens-tuple-patterns=multi-line-only (profile ocamlformat (file .ocamlformat:1))
parens-tuple=always (profile ocamlformat (file .ocamlformat:1))
parens-ite=false (profile ocamlformat (file .ocamlformat:1))
ocp-indent-compat=false (profile ocamlformat (file .ocamlformat:1))
module-item-spacing=sparse (profile ocamlformat (file .ocamlformat:1))
margin=77 (file .ocamlformat:3)
let-open=preserve (profile ocamlformat (file .ocamlformat:1))
let-binding-spacing=compact (profile ocamlformat (file .ocamlformat:1))
let-and=compact (profile ocamlformat (file .ocamlformat:1))
leading-nested-match-parens=false (profile ocamlformat (file .ocamlformat:1))
infix-precedence=indent (profile ocamlformat (file .ocamlformat:1))
indicate-nested-or-patterns=space (profile ocamlformat (file .ocamlformat:1))
indicate-multiline-delimiters=true (profile ocamlformat (file .ocamlformat:1))
if-then-else=compact (profile ocamlformat (file .ocamlformat:1))
field-space=tight (profile ocamlformat (file .ocamlformat:1))
extension-sugar=preserve (profile ocamlformat (file .ocamlformat:1))
escape-strings=preserve (profile ocamlformat (file .ocamlformat:1))
escape-chars=preserve (profile ocamlformat (file .ocamlformat:1))
doc-comments-tag-only=default (profile ocamlformat (file .ocamlformat:1))
doc-comments-padding=2 (profile ocamlformat (file .ocamlformat:1))
doc-comments=after (profile ocamlformat (file .ocamlformat:1))
disable=false (profile ocamlformat (file .ocamlformat:1))
cases-exp-indent=4 (profile ocamlformat (file .ocamlformat:1))
break-struct=force (profile ocamlformat (file .ocamlformat:1))
break-string-literals=wrap (profile ocamlformat (file .ocamlformat:1))
break-sequences=false (profile ocamlformat (file .ocamlformat:1))
break-separators=before (profile ocamlformat (file .ocamlformat:1))
break-infix-before-func=true (profile ocamlformat (file .ocamlformat:1))
break-infix=wrap (profile ocamlformat (file .ocamlformat:1))
break-fun-decl=wrap (profile ocamlformat (file .ocamlformat:1))
break-collection-expressions=fit-or-vertical (profile ocamlformat (file .ocamlformat:1))
break-cases=fit (file .ocamlformat:2)
</code></pre>
<p>If many input files are specified, only print the configuration for the first file.
If no input file is specified, print the configuration for the root directory if specified,
or for the current working directory otherwise.</p>
<h2>Parentheses around if-then-else branches</h2>
<p>A new option <code>parens-ite</code> has been added to decide whether to use parentheses
around if-then-else branches that spread across multiple lines.</p>
<p>If this option is set, the following function:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">loop</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">self</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">len</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a'</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">cur</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">cur</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">loop</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">a'</span><span class="ocaml-source">
</span></code></pre>
<p>will be formatted as:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-keyword">rec </span><span class="ocaml-entity-name-function-binding">loop</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-source">self</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">len</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a'</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">cur</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">get</span><span class="ocaml-source"> </span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">cur</span><span class="ocaml-keyword-other">#</span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">loop</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">count</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">a'</span><span class="ocaml-source"> </span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<h2>Parentheses around tuple patterns</h2>
<p>A new option <code>parens-tuple-patterns</code> has been added, that mimics <code>parens-tuple</code> but only applies to patterns,
whereas <code>parens-tuples</code> only applies to expressions.
<code>parens-tuple-patterns=multi-line-only</code> mode will try to skip parentheses for single-line tuple patterns,
this is the default value.
<code>parens-tuple-patterns=always</code> always uses parentheses around tuples patterns.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with parens-tuple-patterns=always </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with parens-tuple-patterns=multi-line-only </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<h2>Single-case pattern-matching expressions</h2>
<p>The new option <code>single-case</code> defines the style of pattern-matching expressions with only a single case.
<code>single-case=compact</code> will try to format a single case on a single line, this is the default value.
<code>single-case=sparse</code> will always break the line before a single case.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with single-case=compact </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">try</span><span class="ocaml-source"> </span><span class="ocaml-source">some_irrelevant_expression</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Undefined_recursive_module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with single-case=sparse </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">try</span><span class="ocaml-source"> </span><span class="ocaml-source">some_irrelevant_expression</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Undefined_recursive_module</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source">
</span></code></pre>
<h2>Space around collection expressions</h2>
<p>The new option <code>space-around-collection-expressions</code> decides whether to add a space
inside the delimiters of collection expressions (lists, arrays, records).</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> by default </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">wkind</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">kind</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Nil</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">TCnoarg</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Thd</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Cons</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">TCarg</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Ttl</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Thd</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">tcons</span><span class="ocaml-source">)</span><span class="ocaml-source">]</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with space-around-collection-expressions </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">wkind</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">tag</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-storage-type">'a</span><span class="ocaml-source"> </span><span class="ocaml-source">kind</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">l</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">[</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Nil</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">TCnoarg</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Thd</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Cons</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">TCarg</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">Ttl</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Thd</span><span class="ocaml-keyword-other-ocaml punctuation-comma punctuation-separator">,</span><span class="ocaml-source"> </span><span class="ocaml-source">tcons</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">]</span><span class="ocaml-source">
</span></code></pre>
<h2>Break separators</h2>
<p>The new option <code>break-separators</code> decides whether to break before or after separators such as <code>;</code> in list or record expressions,
<code>*</code> in tuples or <code>-&gt;</code> in arrow types.
<code>break-separators=before</code> breaks the expressions before the separator, this is the default value.
<code>break-separators=after</code> breaks the expressions after the separator.
<code>break-separators=after-and-docked</code> breaks the expressions after the separator and docks the brackets for records.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-separators=before </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-separators=after </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">{</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">fooooooooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooooooo</span><span class="ocaml-source"> </span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-separators=after-and-docked </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">foooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">foooooooooooooooooooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">fooooooooooooooooooooooooooooo</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">fooooooooooooooooooooooooooo</span><span class="ocaml-source">
</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<h2>Not breaking before bind/map operators</h2>
<p>The new option <code>break-infix-before-func</code> decides whether to break infix operators
whose right arguments are anonymous functions specially.
This option is set by default, if you disable it with <code>--no-break-infix-before-func</code>,
it will not break before the operator so that the first line of the function appears docked at the end of line after the operator.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> by default </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-infix-before-func = false </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">f</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;&gt;=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<h2>Break toplevel cases</h2>
<p>There is a new value for the <code>break-cases</code> option: <code>toplevel</code>,
that forces top-level cases (i.e. not nested or-patterns) to break across lines,
otherwise breaks naturally at the margin.</p>
<p>For example:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">f</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">g</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">H</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">when</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">k</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">T</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">P</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">U</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">fun</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">g</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-source">h</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-source">u</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">E</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Z</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">P</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">M</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">O</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">5</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">P</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">when</span><span class="ocaml-source"> </span><span class="ocaml-source">h</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword-other">function</span><span class="ocaml-source">
</span><span class="ocaml-source">          </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">A</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">6</span><span class="ocaml-source"> </span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<h2>Number of spaces before docstrings</h2>
<p>The new option <code>doc-comments-padding</code> controls how many spaces are printed before doc comments in type declarations.
The default value is 2.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with doc-comments-padding = 2 </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">  </span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> a </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source">  </span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> b </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with doc-comments-padding = 1 </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">type</span><span class="ocaml-source"> </span><span class="ocaml-source">t</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">{</span><span class="ocaml-source">a</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> a </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> b </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">}</span><span class="ocaml-source">
</span></code></pre>
<h2>Ignore files</h2>
<p>An <code>.ocamlformat-ignore</code> file specifies files that OCamlFormat should ignore.
Each line in an <code>.ocamlformat-ignore</code> file specifies a filename relative to the directory containing the <code>.ocamlformat-ignore</code> file.
Lines starting with <code>#</code> are ignored and can be used as comments.</p>
<p>Here is an example of such <code>.ocamlformat-ignore</code> file:</p>
<pre><code>#This is a comment
dir2/ignore_1.ml
</code></pre>
<h2>Tag-only docstrings</h2>
<p>The new option <code>doc-comments-tag-only</code> controls the position of doc comments only containing tags.
<code>doc-comments-tag-only=default</code> means no special treatment is done, this is the default value.
<code>doc-comments-tag-only=fit</code> puts doc comments on the same line if it fits.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with doc-comments-tag-only = default </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> @deprecated  </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Module</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with doc-comments-tag-only = fit </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">open</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Module</span><span class="ocaml-source"> </span><span class="ocaml-comment-doc">(**</span><span class="ocaml-comment-doc"> @deprecated  </span><span class="ocaml-comment-doc">*)</span><span class="ocaml-source">
</span></code></pre>
<h2>Fit or vertical mode for if-then-else</h2>
<p>There is a new value for the option <code>if-then-else</code>: <code>fit-or-vertical</code>.
<code>fit-or-vertical</code> vertically breaks all branches if they do not fit on a single line.
Compared to the <code>compact</code> (default) value, it breaks all branches if at least one of them does not fit on a single line.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with if-then-else = compact </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">foo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">foo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with if-then-else = fit-or-vertical </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">foo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">b</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">foo</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-numeric-decimal-integer">12</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span></code></pre>
<h2>Check mode</h2>
<p>A new <code>--check</code> flag has been added.
It checks whether the input files already are formatted.
This flag is mutually exclusive with <code>--inplace</code> and <code>--output</code>.
It returns <code>0</code> if the input files are indeed already formatted, or <code>1</code> otherwise.</p>
<h2>Break function declarations</h2>
<p>The new option <code>break-fun-decl</code> controls the style for function declarations and types.
<code>break-fun-decl=wrap</code> breaks only if necessary, this is the default value.
<code>break-fun-decl=fit-or-vertical</code> vertically breaks arguments if they do not fit on a single line.
<code>break-fun-decl=smart</code> is like <code>fit-or-vertical</code> but try to fit arguments on their line if they fit.
The <code>wrap-fun-args</code> option now only controls the style for function calls, and no more for function declarations.</p>
<p>For example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-fun-decl = wrap </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ffffffffffffffffffff</span><span class="ocaml-source"> </span><span class="ocaml-source">aaaaaaaaaaaaaaaaaaaaaa</span><span class="ocaml-source"> </span><span class="ocaml-source">bbbbbbbbbbbbbbbbbbbbbb</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">cccccccccccccccccccccc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">g</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-fun-decl = fit-or-vertical </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ffffffffffffffffffff</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">aaaaaaaaaaaaaaaaaaaaaa</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">bbbbbbbbbbbbbbbbbbbbbb</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">cccccccccccccccccccccc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">g</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> with break-fun-decl = smart </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">ffffffffffffffffffff</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">aaaaaaaaaaaaaaaaaaaaaa</span><span class="ocaml-source"> </span><span class="ocaml-source">bbbbbbbbbbbbbbbbbbbbbb</span><span class="ocaml-source"> </span><span class="ocaml-source">cccccccccccccccccccccc</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-source">g</span><span class="ocaml-source">
</span></code></pre>
<h2>Disable configuration in files and attributes</h2>
<p>Two new options have been added so that <code>.ocamlformat</code> configuration files and attributes in OCaml files do not change the
configuration.
These options can be useful if you use some preset profile
and you do not want attributes and <code>.ocamlformat</code> files to interfere with your preset configuration.
<code>--disable-conf-attrs</code> disables the configuration in attributes,
and <code>--disable-conf-files</code> disables <code>.ocamlformat</code> configuration files.</p>
<h2>Preserve module items spacing</h2>
<p>There is a new value for the option <code>module-item-spacing</code>: <code>preserve</code>,
that will not leave open lines between one-liners of similar sorts unless there is an open line in the input.</p>
<p>For example the line breaks are preserved in the following code:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cmos_rtc_seconds</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-hexadecimal-integer">0x00</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cmos_rtc_seconds_alarm</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-hexadecimal-integer">0x01</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">cmos_rtc_minutes</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-hexadecimal-integer">0x02</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">o</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">log_other</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-hexadecimal-integer">0x000001</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">log_cpu</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-hexadecimal-integer">0x000002</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">log_fpu</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-hexadecimal-integer">0x000004</span><span class="ocaml-source">
</span></code></pre>
<h2>Breaking changes</h2>
<ul>
<li>When <code>--disable-outside-detected-project</code> is set, disable ocamlformat when no <code>.ocamlformat</code> file is found.</li>
<li>Files are not parsed when ocamlformat is disabled.</li>
<li>Disallow <code>-</code> with other input files.</li>
<li>The <code>wrap-fun-args</code> option now only controls the style for function calls, and no more for function declarations.</li>
<li>The default profile is now named <code>ocamlformat</code>.</li>
<li>The deprecated syntax for <code>.ocamlformat</code> files: <code>option value</code> is no more supported anymore and you should use the <code>option = value</code> syntax instead.</li>
</ul>
<h2>Miscellaneous bugfixes</h2>
<ul>
<li>Preserve shebang (e.g. <code>#!/usr/bin/env ocaml</code>) at the beginning of a file.</li>
<li>Improve the formatting when <code>ocp-indent-compat</code> is set.</li>
<li>UTF8 characters are now correctly printed in comments.</li>
<li>Add parentheses around a constrained any-pattern (e.g. <code>let (_ : int) = x1</code>).</li>
<li>Emacs: the temporary buffer is now killed.</li>
<li>Emacs: add the keybinding in tuareg's map instead of merlin's.</li>
<li>Lots of improvements on the comments, docstrings, attributes formatting.</li>
<li>Lots of improvements on the formatting of modules.</li>
<li>Lots of improvements in the Reason support.</li>
<li>Do not rely on the file-system to format sources.</li>
<li>The <code>--debug</code> mode is more user-friendly.</li>
</ul>
<h2>Credits</h2>
<p>This release also contains many other changes and bug fixes that we cannot detail here.</p>
<p>Special thanks to our maintainers and contributors for this release: Jules Aguillon, Mathieu Barbin, Josh Berdine, Jérémie Dimino, Hugo Heuzard, Ludwig Pacifici, Guillaume Petiot, Nathan Rebours and Louis Roché.</p>
<p>If you wish to get involved with OCamlFormat development or file an issue,
please read the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/master/CONTRIBUTING.md">contributing guide</a>,
any contribution is welcomed.</p>
]]></description><link>https://tarides.com/blog/2019-03-29-release-of-ocamlformat-0-9</link><guid isPermaLink="false">https://tarides.com/blog/2019-03-29-release-of-ocamlformat-0-9.html</guid><dc:creator><![CDATA[ Guillaume Petiot ]]></dc:creator><pubDate>Fri, 29 Mar 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Release of Base64]]></title><description><![CDATA[<p>MirageOS is a library operating system written from the ground up in OCaml.
It has an impossible and incredibly huge goal to re-implement all of the
world! Looking back at the work accomplished by the MirageOS team, it appears that's
what happened for several years. Re-implementing the entire stack, in particular
the lower layers that we often take for granted, requires a great attention to
detail. While it may seem reasonably easy to implement a given RFC, a huge
amount of work is often hidden under the surface.</p>
<p>In this article, we will explain the development process we went through, as we
updated a small part of the MirageOS stack: the library <code>ocaml-base64</code>. It's a
suitable example as the library is small (few hundreds lines of code), but it
needs ongoing development to ensure good quality and to be able to trust it for
higher level libraries (like <a href="https://github.com/mirage/mrmime">mrmime</a>).</p>
<p>Updating the library was instigated by a problem I ran into with the existing
base64 implementation while working on the e-mail stack. Indeed, we got some
errors when we tried to compute an <em>encoded-word</em> according to the <a href="https://www.ietf.org/rfc/rfc2047.txt">RFC
2047</a>. So after several years of not being touched, we decided to
update <a href="https://github.com/mirage/ocaml-base64"><code>ocaml-base64</code></a>.</p>
<h2>The Critique of Pure Reason</h2>
<h3>The first problem</h3>
<p>We started by attempting to use <code>ocaml-base64</code> on some examples extracted from
actual e-mails, and we quickly ran into cases where the library failed. This
highlighted that reality is much more complex than you can imagine from reading
an RFC. In this situation, what do you do: try to implement a best-effort
strategy and continue parsing? Or stick to the letter of the RFC and fail? In
the context of e-mails, which has accumulated a lot of baggage over time, you
cannot get around implementing a best-effort strategy.</p>
<p>The particular error we were seeing was a <code>Not_found</code> exception when decoding an
<em>encoded-word</em>. This exception appeared because the implementation relied on
<code>String.contains</code>, and the input contained a character which was not part of the
base64 alphabet.</p>
<p>This was the first reason why we thought it necessary to rewrite <code>ocaml-base64</code>.
Of course, we could just catch the exception and continue the initial
computation, but then another reason appeared.</p>
<h3>The second problem</h3>
<p>As <a href="https://github.com/clecat">@clecat</a> and I reviewed RFC 2045, we noticed the
following requirement:</p>
<blockquote>
<p>The encoded output stream must be represented in lines of no more than 76
characters each.</p>
<p>See RFC 2045, section 6.8</p>
</blockquote>
<p>Pretty specific, but general to e-mails, we should never have more than 78
characters per line according to <a href="https://www.ietf.org/rfc/rfc822.txt">RFC 822</a>, nor more than 998 characters
according to <a href="https://www.ietf.org/rfc/rfc2822.txt">RFC 2822</a>.</p>
<p>Having a decoder that abided RFC 2045 more closely, including the requirement
above, further spurred us to implement a new decoder.</p>
<p>As part of the new implementation, we decided to implement tests and fuzzers to
ensure correctness. This also had the benefit, that we could run the fuzzer on
the existing codebase. When fuzzing an encoder/decoder pair, an excellent check
is whether the following isomorphism holds:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">iso0</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">encode</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">iso1</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">assert</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">encode</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source">)</span><span class="ocaml-source">
</span></code></pre>
<p>However, at this point <a href="https://github.com/hannesm">@hannesm</a> ran into another error (see
<a href="https://github.com/mirage/ocaml-base64/issues/20">#20</a>).</p>
<h3>The third problem</h3>
<p>We started to review the <a href="https://github.com/mirleft/ocaml-nocrypto"><code>nocrypto</code></a> implementation of base64, which
respects our requirements. We had some concerns about the performance of the
implementation though, so we decided to see if we would get a performance
regression by switching to this implementation.</p>
<p>A quick benchmark based on random input revealed the opposite, however!
<code>nocrypto</code>'s implementation was faster than <code>ocaml-base64</code>:</p>
<pre><code><span class="sh-source">ocaml-base64</span><span class="sh-punctuation-definition-string-begin">'</span><span class="sh-string-quoted-single">s implementation on bytes (length: 5000): 466 272.34ns
</span><span class="sh-string-quoted-single">nocrypto</span><span class="sh-punctuation-definition-string-end">'</span><span class="sh-source">s implementation on bytes </span><span class="sh-punctuation-definition-subshell">(</span><span class="sh-meta-scope-subshell">length: 5000</span><span class="sh-punctuation-definition-subshell">)</span><span class="sh-source">: 137 406.04ns
</span></code></pre>
<p>Based on all these observations, we thought there was sufficient reason to
reconsider the <code>ocaml-base64</code> implementation. It's also worth mentioning that
the last real release (excluding <code>dune</code>/<code>jbuilder</code>/<code>topkg</code> updates) is from Dec.
24 2014. So, it's pretty old code and the OCaml eco-system has improved a lot
since 2014.</p>
<h2>Implementation &amp; review</h2>
<p>We started integrating the <code>nocrypto</code> implementation. Of course, implementing
<a href="https://www.ietf.org/rfc/rfc4648.txt">RFC 4648</a> is not as easy as just reading examples and trying to do
something which works. The devil is in the detail.</p>
<p>@hannesm and <a href="https://github.com/cfcs">@cfcs</a> decided to do a big review of expected behavior
according to the RFC, and another about implementation and security issues.</p>
<h3>Canonicalization</h3>
<p>The biggest problem about RFC 4648 is regarding canonical inputs. Indeed, there
are cases where two different inputs are associated with the same value:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Zm9vCg==</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">foo</span><span class="ocaml-constant-character-escape">\n</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">b</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Zm9vCh==</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">foo</span><span class="ocaml-constant-character-escape">\n</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span></code></pre>
<p>This is mostly because the base64 format encodes the input 6 bits at a time. The
result is that 4 base64 encoded bytes are equal to 3 decoded bytes (<code>6 * 4 = 8 * 3</code>). Because of this, 2 base64 encoded bytes provide 1 byte plus 4 bits. What do
we need to do with these 4 bits? Nothing.</p>
<p>That's why the last character in our example can be something else than <code>g</code>. <code>g</code>
is the canonical byte to indicate using the 2 bits afterward the 6 bits
delivered by <code>C</code> (and make a byte - 8 bits). But <code>h</code> can be used where we just
need 2 bits at the end.</p>
<p>Due to this behavior, the check used for fuzzing changes: from a canonical
input, we should check isomorphism.</p>
<h3>Invalid character</h3>
<p>As mentioned above ("The first problem"), how should invalid characters be
handled? This happens when decoding a byte which is not a part of the base64
alphabet. In the old version, <code>ocaml-base64</code> would simply leak a <code>Not_found</code>
exception from <code>String.contains</code>.</p>
<p>The MirageOS team has taken <a href="https://mirage.io/wiki/mirage-3.0-errors">a stance on exceptions</a>, which is
to "use exceptions for exceptional conditions" - invalid input is hardly one of
those. This is to avoid any exception leaks, as it can be really hard to track
the origin of an exception in a unikernel. Because of this, several packages
have been updated to return a <code>result</code> type instead, and we wanted the new
implementation to follow suit.</p>
<p>On the other hand, exceptions can be useful when considered as a more
constrained form of assembly jump. Of course, they break the control flow, but
from a performance point of view, it's interesting to use this trick:</p>
<pre><code><span class="ocaml-keyword-other">exception</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Found</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">contains</span><span class="ocaml-source"> </span><span class="ocaml-source">str</span><span class="ocaml-source"> </span><span class="ocaml-source">chr</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">idx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">ref</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">len</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">str</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">try</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">while</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">idx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;</span><span class="ocaml-source"> </span><span class="ocaml-source">len</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-keyword-other">do</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">unsafe_get</span><span class="ocaml-source"> </span><span class="ocaml-source">str</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">idx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">chr</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">raise</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Found</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source"> </span><span class="ocaml-source">incr</span><span class="ocaml-source"> </span><span class="ocaml-source">idx</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">done</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">      </span><span class="ocaml-constant-language-capital-identifier">None</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Found</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Some</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">!</span><span class="ocaml-source">idx</span><span class="ocaml-source">
</span></code></pre>
<p>This kind of code for example is ~20% faster than <code>String.contains</code>.</p>
<p>As such, exceptions can be a useful tool for performance optimizations, but we
need to be extra careful not to expose them to the users of the library. This
code needs to be hidden behind a fancy functional interface. With this in mind,
we should assert that our <code>decode</code> function never leaks an exception. We'll
describe how we've adressed this problem later.</p>
<h3>Special cases</h3>
<p>RFC 4648 has some detailed cases and while we would sometimes like to work in a
perfect world where we will never need to deal with such errors, from our
experience, we cannot imagine what the end-user will do to formats, protocols
and such.</p>
<p>Even though the RFC has detailed examples, we have to read between lines to know
special cases and how to deal with them.</p>
<p>@hannesm noticed one of these cases, where padding (<code>=</code> sign at the end of
input) is not mandatory:</p>
<blockquote>
<p>The pad character "=" is typically percent-encoded when used in an URI [9],
but if the data length is known implicitly, this can be avoided by skipping
the padding; see section 3.2.</p>
<p>See RFC 4648, section 5</p>
</blockquote>
<p>That mostly means that the following kind of input can be valid:</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">a</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> ~</span><span class="ocaml-source">pad</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Zm9vCg</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">foo</span><span class="ocaml-constant-character-escape">\n</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span></code></pre>
<p>It's only valid in a specific context though: when <em>length is known implicitly</em>.
Only the caller of <code>decode</code> can determine whether the length is implicitly known
such that padding can be omitted. To that end, we've added a new optional
argument <code>?pad</code> to the function <code>Base64.decode</code>.</p>
<h3>Allocation, <code>sub</code>, <code>?off</code> and <code>?len</code></h3>
<p>Xavier Leroy has described the garbage collector in the following way:</p>
<blockquote>
<p>You see, the Caml garbage collector is like a god from ancient mythology:
mighty, but very irritable. If you mess with it, it'll make you suffer in
surprising ways.</p>
</blockquote>
<p>That's probably why my experience with improving the allocation policy of
(<code>ocaml-git</code>)<a href="https://github.com/mirage/ocaml-git">ocaml-git</a> was quite a nightmare. Allowing the user to control
allocation is important for efficiency, and we wanted to <code>ocaml-base64</code> to be a
good citizen.</p>
<p>At the beginning, <code>ocaml-base64</code> had a very simple API:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">encode</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span></code></pre>
<p>This API forces allocations in two ways.</p>
<p>Firstly, if the caller needs to encode a part of a string, this part needs to be
extracted, e.g. using <code>String.sub</code>, which will allocate a new string. To avoid
this, two new optional arguments have been added to <code>encode</code>: <code>?off</code> and <code>?len</code>,
which specifies the substring to encode. Here's an example:</p>
<pre><code><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> We want to encode the part 'foo' without prefix or suffix </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> Old API -- forces allocation </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">encode</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">sub</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">prefix foo suffix</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">7</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Zm9v</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-comment-block">(*</span><span class="ocaml-comment-block"> New API -- avoids allocation </span><span class="ocaml-comment-block">*)</span><span class="ocaml-source">
</span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">encode</span><span class="ocaml-source"> ~</span><span class="ocaml-source">off</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">7</span><span class="ocaml-source"> ~</span><span class="ocaml-source">len</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">prefix foo suffix</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-string-quoted-double">Zm9v</span><span class="ocaml-string-quoted-double">"</span><span class="ocaml-source">
</span></code></pre>
<p>Secondly, a new string is allocated to hold the resulting string. We can
calculate a bound on the length of this string in the following manner:</p>
<pre><code><span class="ocaml-keyword-other">let</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-keyword-operator">//</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-source">raise</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Division_by_zero</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-separator-terminator punctuation-separator">;</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">if</span><span class="ocaml-source"> </span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&gt;</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">then</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">+</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-source">(</span><span class="ocaml-source">x</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">1</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">/</span><span class="ocaml-source"> </span><span class="ocaml-source">y</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">else</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">0</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">encode</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bytes</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">//</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">decode</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">res</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Bytes</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">create</span><span class="ocaml-source"> </span><span class="ocaml-source">(</span><span class="ocaml-constant-language-capital-identifier">String</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">length</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">//</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">4</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">*</span><span class="ocaml-source"> </span><span class="ocaml-constant-numeric-decimal-integer">3</span><span class="ocaml-source">)</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">
</span></code></pre>
<p>Unfortunately we cannot know the exact length of the result prior to computing
it. This forces a call to <code>String.sub</code> at the end of the computation to return a
string of the correct length. This means we have two allocations rather than
one. To avoid the additional allocation, [@avsm][avsm] proposed to provide a new
type <code>sub = string * int * int</code>. This lets the user call <code>String.sub</code> if
required (and allocate a new string), or use simply use the returned <code>sub</code> for
_blit_ting to another buffer or similar.</p>
<h2>Fuzz everything!</h2>
<p>There's a strong trend of fuzzing libraries for MirageOS, which is quite easy
thanks to the brilliant work by <a href="https://github.com/yomimono">@yomimono</a> and <a href="https://github.com/stedolan">@stedolan</a>!
The integrated fuzzing in OCaml builds on <a href="https://lcamtuf.coredump.cx/afl/">American fuzzy lop</a>, which is
very smart about discarding paths of execution that have already been tested and
generating unseen inputs which break your assumptions. My first experience with
fuzzing was with the library <a href="https://github.com/mirage/decompress"><code>decompress</code></a>, and I was impressed by
<a href="https://github.com/mirage/decompress/pull/34">precise error</a> it found about a name clash.</p>
<p>Earlier in this article, I listed some properties we wanted to check for
<code>ocaml-base64</code>:</p>
<ul>
<li>The functions <code>encode</code> and <code>decode</code> should be be isomorphic taking
canonicalization into account:</li>
</ul>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">iso0</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> ~</span><span class="ocaml-source">pad</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-boolean">false</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fail</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ok</span><span class="ocaml-source"> </span><span class="ocaml-source">result0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">result1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">encode_exn</span><span class="ocaml-source"> </span><span class="ocaml-source">result0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> ~</span><span class="ocaml-source">pad</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source"> </span><span class="ocaml-source">result1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fail</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ok</span><span class="ocaml-source"> </span><span class="ocaml-source">result2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">check_eq</span><span class="ocaml-source"> </span><span class="ocaml-source">result0</span><span class="ocaml-source"> </span><span class="ocaml-source">result2</span><span class="ocaml-source">
</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">iso1</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">result</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">encode_exn</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">match</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> ~</span><span class="ocaml-source">pad</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-constant-language-boolean">true</span><span class="ocaml-source"> </span><span class="ocaml-source">result0</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Error</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fail</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">|</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Ok</span><span class="ocaml-source"> </span><span class="ocaml-source">result1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">result2</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">encode_exn</span><span class="ocaml-source"> </span><span class="ocaml-source">result1</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">in</span><span class="ocaml-source">
</span><span class="ocaml-source">    </span><span class="ocaml-source">check_eq</span><span class="ocaml-source"> </span><span class="ocaml-source">result0</span><span class="ocaml-source"> </span><span class="ocaml-source">result2</span><span class="ocaml-source">
</span></code></pre>
<ul>
<li>The function <code>decode</code> should <em>never</em> raise an exception, but rather return a
result type:</li>
</ul>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">no_exn</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source">
</span><span class="ocaml-source">  </span><span class="ocaml-keyword-other">try</span><span class="ocaml-source"> </span><span class="ocaml-source">ignore</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">@@</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-capital-identifier">Base64</span><span class="ocaml-keyword-other-ocaml punctuation-other-period punctuation-separator">.</span><span class="ocaml-source">decode</span><span class="ocaml-source"> </span><span class="ocaml-source">input</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other">with</span><span class="ocaml-source"> </span><span class="ocaml-constant-language">_</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">fail</span><span class="ocaml-source"> </span><span class="ocaml-constant-language-unit">()</span><span class="ocaml-source">
</span></code></pre>
<ul>
<li>And finally, we should randomize <code>?off</code> and <code>?len</code> arguments to ensure that we
don't get an <code>Out_of_bounds</code> exception when accessing input.</li>
</ul>
<p>Just because we've applied fuzzing to the new implementation for a long time, it
doesn't mean that the code is completely infallible. People can use our library
in an unimaginable way (and it's mostly what happens in the real world) and get
an unknowable error.</p>
<p>But, with the fuzzer, we've managed to test some properties across a very wide
range of input instead of unit testing with random (or not so random) inputs
from our brains. This development process allows <em>fixing the semantics</em> of
implementations (even if it's <strong>not</strong> a formal definition of semantics), but
it's better than nothing or outdated documentation.</p>
<h2>Conclusion</h2>
<p>Based on our recent update to <code>ocaml-base64</code>, this blog post explains our
development process as go about rewriting the world to MirageOS, one bit at a
time. There's an important point to be made though:</p>
<p><code>ocaml-base64</code> is a small project. Currently, the implementation is about 250
lines of code. So it's a really small project. But as stated in the
introduction, we are fortunate enough to push the restart button of the computer
world - yes, we want to make a new operating system.</p>
<p>That's a massive task, and we shouldn't make it any harder on ourselves than
necessary. As such, we need to justify any step, any line of code, and why we
decided to spend our time on any change (why we decided to re-implement <code>git</code>
for example). So before committing any time to projects, we try to do a deep
analysis of the problem, get feedback from others, and find a consensus between
what we already know, what we want and what we should have (in the case of
<code>ocaml-base64</code>, @hannesm did a look on the PHP implementation and the Go
implementation).</p>
<p>Indeed, this is a hard question which nobody can answer perfectly in isolation.
So, the story of this update to <code>ocaml-base64</code> is an invitation for you to enter
the arcanas of the computer world through MirageOS :) ! Don't be afraid!</p>
]]></description><link>https://tarides.com/blog/2019-02-08-release-of-base64</link><guid isPermaLink="false">https://tarides.com/blog/2019-02-08-release-of-base64.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Fri, 08 Feb 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[How configurator reads C constants]]></title><description><![CDATA[<p>Dune comes with a library to query OS-specific information, called configurator.
It is able to evaluate C expressions and turn them into OCaml value.
Surprisingly, it even works when compiling for a different architecture. How can
it do that?</p>
]]></description><link>https://tarides.com/blog/2019-01-03-how-configurator-reads-c-constants</link><guid isPermaLink="false">https://tarides.com/blog/2019-01-03-how-configurator-reads-c-constants.html</guid><dc:creator><![CDATA[ Etienne Millon ]]></dc:creator><pubDate>Thu, 03 Jan 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS, towards a smaller and safer OS]]></title><description><![CDATA[<p>Presentation about MirageOS in Lambda World Cadìz on October 26th</p>
]]></description><link>https://tarides.com/blog/2018-12-06-mirageos-towards-a-smaller-and-safer-os</link><guid isPermaLink="false">https://tarides.com/blog/2018-12-06-mirageos-towards-a-smaller-and-safer-os.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Thu, 06 Dec 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[ocaml-git 2.0]]></title><description><![CDATA[<p>I'm very happy to announce a new major release of <code>ocaml-git</code> (2.0).
This release is a 2-year effort to get a revamped
streaming API offering a full control over memory
allocation. This new version also adds production-ready implementations of
the wire protocol: <code>git push</code> and <code>git pull</code> now work very reliably
using the raw Git and smart HTTP protocol (SSH support will come
soon). <code>git gc</code> is also implemented, and all of the basic bricks are
now available to create Git servers. MirageOS support is available
out-of-the-box.</p>
<p>Two years ago, we decided to rewrite <code>ocaml-git</code> and split it into
standalone libraries. More details about these new libraries are also
given below.</p>
<p>But first, let's focus on <code>ocaml-git</code>'s new design. The primary goal was
to fix memory consumption issues that our users noticed with the previous version,
and to make <code>git push</code> work reliably. We also took care about
not breaking the API too much, to ease the transition for current users.</p>
<h2>Controlled allocations</h2>
<p>There is a big difference in the way <code>ocaml-git</code> and <code>git</code>
are designed: <code>git</code> is a short-lived command-line tool which does not
care that much about allocation policies, whereas we wanted to build a
library that can be linked with long-lived Git client and/or server
applications. We had to make some (performance) compromises to support
that use-case, at the benefit of tighter allocation policies — and hence
more predictable memory consumption patterns.
Other Git libraries such as <a href="https://libgit2.org/">libgit2</a>
also have to <a href="https://libgit2.org/security/">deal</a> with similar concerns.</p>
<p>In order to keep a tight control on the allocated memory, we decided to
use <a href="https://github.com/mirage/decompress">decompress</a> instead of
<code>camlzip</code>. <code>decompress</code> allows the users to provide their own buffer
instead of allocating dynamically. This allowed us to keep a better
control on memory consumption. See below for more details on <code>decompress</code>.</p>
<p>We also used <a href="https://github.com/inhabitedtype/angstrom">angstrom</a> and
<a href="https://github.com/mirage/encore">encore</a> to provide a streaming interface
to encode and decode Git objects. The streaming API is currently hidden
to the end-user, but it helped us a lot to build abstraction and, again, on
managing the allocation policy of the library.</p>
<h2>Complete PACK file support (including GC)</h2>
<p>In order to find the right abstraction for manipulating pack files in
a long-lived application, we experimented with
<a href="https://github.com/dinosaure/sirodepac">various</a>
<a href="https://github.com/dinosaure/carton">prototypes</a>. We haven't found the
right abstractions just yet, but we believe the PACK format could be useful
to store any kind of data in the future (and not especially Git objects).</p>
<p>We implemented <code>git gc</code> by following the same heuristics as
<a href="https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt">Git</a>
to compress pack files and
we produce something similar in size — <code>decompress</code> has a good ratio about
compression — and we are using <code>duff</code>, our own implementation of <code>xdiff</code>, the
binary diff algorithm used by Git (more details on <code>duff</code> below).
We also had to re-implement the streaming algorithm to reconstruct <code>idx</code> files on
the fly, when receiving pack file on the network.</p>
<p>One notable feature of our compression algorithms is they work without
the assumption that the underlying system implements POSIX: hence,
they can work fully in-memory, in a browser using web storage or
inside a MirageOS unikernel with <a href="https://github.com/mirage/wodan">wodan</a>.</p>
<h2>Production-ready push and pull</h2>
<p>We re-implemented and abstracted the <a href="https://github.com/git/git/blob/master/Documentation/technical/http-protocol.txt">Git Smart protocol</a>, and used that
abstraction to make <code>git push</code> and <code>git pull</code> work over HTTP.  By
default we provide a <a href="https://github.com/mirage/cohttp">cohttp</a>
implementation but users can use their own — for instance based on
<a href="https://github.com/inhabitedtype/httpaf">httpaf</a>.
As proof-of-concept, the <a href="https://github.com/mirage/ocaml-git/pull/227">initial
pull-request</a> of <code>ocaml-git</code> 2.0 was
created using this new implementation; moreover, we wrote a
prototype of a Git client compiled with <code>js_of_ocaml</code>, which was able
to run <code>git pull</code> over HTTP inside a browser!</p>
<p>Finally, that implementation will allow MirageOS unikernels to synchronize their
internal state with external Git stores (hosted for instance on GitHub)
using push/pull mechanisms. We also expect to release a server-side implementation
of the smart HTTP protocol, so that the state of any unikernel can be inspected
via <code>git pull</code>. Stay tuned for more updates on that topic!</p>
<h2>Standalone Dependencies</h2>
<p>Below you can find the details of the new stable releases of libraries that are
used by <code>ocaml-git</code> 2.0.</p>
<h3><code>optint</code> and <code>checkseum</code></h3>
<p>In some parts of <code>ocaml-git</code>, we need to compute a Circular
Redundancy Check value. It is 32-bit integer value. <code>optint</code> provides
an abstraction of it but structurally uses an unboxed integer or a
boxed <code>int32</code> value depending on target (32 bit or 64 bit architecture).</p>
<p><code>checkseum</code> relies on <code>optint</code> and provides 3 implementations of CRC:</p>
<ul>
<li>Adler32 (used by <code>zlib</code> format)</li>
<li>CRC32 (used by <code>gzip</code> format and <code>git</code>)</li>
<li>CRC32-C (used by <code>wodan</code>)</li>
</ul>
<p><code>checkseum</code> uses the <em>linking trick</em>: this means that users of the
library program against an abstract API (only the <code>cmi</code> is provided);
at link-time, users have to select which implementation to use:
<code>checkseum.c</code> (the C implementation) or <code>checkseum.ocaml</code> (the OCaml
implementation). The process is currently a bit cumbersome but upcoming
<code>dune</code> release will make that process much more transparent to the users.</p>
<h3><code>encore</code> (/<em>angkor</em>/)</h3>
<p>In <code>git</code>, we work with Git <em>objects</em> (<em>tree</em>, <em>blob</em> or
<em>commit</em>). These objects are encoded in a specific format. Then,
the hash of these objects are computed from the encoded
result to get a unique identifier. For example, the hash of your last commit is:
<code>sha1(encode(commit))</code>.</p>
<p>A common operation in <code>git</code> is to decode Git objects from an encoded
representation of them (especially in <code>.git/objects/*</code> as a <em>loose</em>
file) and restore them in another part of your Git repository (like in a
PACK file or on the command-line).</p>
<p>Hence, we need to ensure that encoding is always deterministic, and
that decoding an encoded Git object is always the identity, e.g. there is
an <em>isomorphism</em> between the decoder and the encoder.</p>
<pre><code><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;.&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">encoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">value</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">id</span><span class="ocaml-source">
</span><span class="ocaml-keyword">let</span><span class="ocaml-source"> </span><span class="ocaml-entity-name-function-binding">encoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">&lt;.&gt;</span><span class="ocaml-source"> </span><span class="ocaml-source">decoder</span><span class="ocaml-source"> </span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">=</span><span class="ocaml-source"> </span><span class="ocaml-source">id</span><span class="ocaml-source">
</span></code></pre>
<p><a href="https://github.com/mirage/encore">encore</a> is a library in which you
can describe a format (like Git format) and from it, we can derive a
streaming decoder <strong>and</strong> encoder that are isomorphic by
construction.</p>
<h3><code>duff</code></h3>
<p><a href="https://github.com/mirage/duff">duff</a> is a pure implementation in
OCaml of the <code>xdiff</code> algorithm.
Git has an optimized representation of your Git repository. It's a
PACK file. This format uses a binary diff algorithm called <code>xdiff</code>
to compress binary data. <code>xdiff</code> takes a source A and a target B and try
to find common sub-strings between A and B.</p>
<p>This is done by a Rabin's fingerprint of the source A applied to the
target B. The fingerprint can then be used to produce a lightweight
representation of B in terms of sub-strings of A.</p>
<p><code>duff</code> implements this algorithm (with additional Git's constraints,
regarding the size of the sliding windows) in OCaml. It provides a
small binary <code>xduff</code> that complies with the format of Git without the <code>zlib</code>
layer.</p>
<pre><code><span class="sh-source">$ xduff diff </span><span class="sh-support-function-builtin">source</span><span class="sh-source"> target </span><span class="sh-keyword-operator-redirect">&gt;</span><span class="sh-source"> target.xduff
</span><span class="sh-source">$ xduff patch </span><span class="sh-support-function-builtin">source</span><span class="sh-source"> </span><span class="sh-keyword-operator-redirect">&lt;</span><span class="sh-source"> target.xduff </span><span class="sh-keyword-operator-redirect">&gt;</span><span class="sh-source"> target.new
</span><span class="sh-source">$ diff target target.new
</span><span class="sh-source">$ </span><span class="sh-support-function-builtin">echo</span><span class="sh-source"> </span><span class="sh-punctuation-definition-variable">$</span><span class="sh-variable-other-special">?</span><span class="sh-source">
</span><span class="sh-source">0
</span></code></pre>
<h3><code>decompress</code></h3>
<p><a href="https://github.com/mirage/decompress">decompress</a>
is a pure implementation in OCaml of <code>zlib</code> and
<code>rfc1951</code>. You can compress and decompress data flows and, obviously,
Git does this compression in <em>loose</em> files and PACK files.</p>
<p>It provides a non-blocking interface and is easily usable in a server
context. Indeed, the implementation never allocates and only relies on
what the user provides (<code>window</code>, input and output buffer). Then, the
distribution provides an easy example of how to use <code>decompress</code>:</p>
<pre><code><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">inflate</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> ?</span><span class="ocaml-source">level</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-support-type">int</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span><span class="ocaml-keyword-other">val</span><span class="ocaml-source"> </span><span class="ocaml-source">deflate</span><span class="ocaml-keyword-other-ocaml punctuation-other-colon punctuation">:</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source"> </span><span class="ocaml-keyword-operator">-&gt;</span><span class="ocaml-source"> </span><span class="ocaml-support-type">string</span><span class="ocaml-source">
</span></code></pre>
<h3><code>digestif</code></h3>
<p><a href="https://github.com/mirage/digestif">digestif</a> is a toolbox providing
many implementations of hash algorithms such as:</p>
<ul>
<li>MD5</li>
<li>SHA1</li>
<li>SHA224</li>
<li>SHA256</li>
<li>SHA384</li>
<li>SHA512</li>
<li>BLAKE2B</li>
<li>BLAKE2S</li>
<li>RIPEMD160</li>
</ul>
<p>Like <code>checkseum</code>, <code>digestif</code> uses the linking trick too: from a
shared interface, it provides 2 implementations, in C (<code>digestif.c</code>)
and OCaml (<code>digestif.ocaml</code>).</p>
<p>Regarding Git, we use the SHA1 implementation and we are ready to
migrate <code>ocaml-git</code> to BLAKE2{B,S} as the Git core team expects - and,
in the OCaml world, it is just a <em>functor</em> application with
another implementation.</p>
<h3><code>eqaf</code></h3>
<p>Some applications require that secret values are compared in constant
time. Functions like <code>String.equal</code> do not have this property, so we
have decided to provide a small package — <a href="https://github.com/mirage/eqaf">eqaf</a> —
providing a <em>constant-time</em> <code>equal</code> function.
<code>digestif</code> uses it to check equality of hashes — it also exposes
<code>unsafe_compare</code> if you don't care about timing attacks in your application.</p>
<p>Of course, the biggest work on this package is not about the
implementation of the <code>equal</code> function but a way to check the
constant-time assumption on this function. Using this, we did a
<a href="https://github.com/mirage/eqaf/tree/master/test">benchmark</a> on Linux,
Windows and Mac to check it.</p>
<p>An interesting fact is that after various experiments, we replaced the
initial implementation in C (extracted from OpenBSD's <a href="https://man.openbsd.org/timingsafe_bcmp.3">timingsafe_memcmp</a>) with an OCaml
implementation behaving in a much more predictable way on all the
tested platforms.</p>
<h2>Conclusion</h2>
<p>The upcoming version 2.0 of <a href="https://irmin.org">Irmin</a> is using ocaml-git
to create small applications that <a href="https://github.com/mirage/irmin/blob/master/examples/push.ml">push and pull their state
to GitHub</a>.
We think that Git offers a very nice model to persist data for distributed
applications and we hope that more people will use ocaml-git to experiment
and manipulate application data in Git. Please
<a href="https://github.com/mirage/ocaml-git/issues">send us</a> your feedback!</p>
]]></description><link>https://tarides.com/blog/2018-10-19-ocaml-git-2-0</link><guid isPermaLink="false">https://tarides.com/blog/2018-10-19-ocaml-git-2-0.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Fri, 19 Oct 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCamlFormat 0.8]]></title><description><![CDATA[<p>We are proud to announce the release of OCamlFormat 0.8 (available on opam). To ease the transition from the previous 0.7 release here are some highlights of the new features of this release. The <a href="https://github.com/ocaml-ppx/ocamlformat/blob/v0.8/CHANGES.md#08-2018-10-09">full changelog</a> is available on the project repository.</p>
<h2>Precedence of options</h2>
<p>In the previous version you could override command line options with <code>.ocamlformat</code> files configuration. 0.8 fixed this so that the OCamlFormat configuration is first established by reading <code>.ocamlformat</code> and <code>.ocp-indent</code> files:</p>
<pre><code>margin = 77
wrap-comments = true
</code></pre>
<p>By default, these files in current and all ancestor directories for each input file are used, as well as the global configuration defined in <code>$XDG_CONFIG_HOME/ocamlformat</code>. The global <code>$XDG_CONFIG_HOME/ocamlformat</code> configuration has the lowest priority, then the closer the directory is to the processed file, the higher the priority. In each directory, both <code>.ocamlformat</code> and <code>.ocp-indent</code> files are read, with <code>.ocamlformat</code> files having the higher priority.</p>
<p>For now <code>ocp-indent</code> options support is very partial and is expected to be extended in the future.</p>
<p>Then the parameters can be overriden with the <code>OCAMLFORMAT</code> environment variable:</p>
<pre><code>OCAMLFORMAT=field-space=tight,type-decl=compact
</code></pre>
<p>and finally the parameters can be overriden again with the command lines parameters.</p>
<h2>Reading input from stdin</h2>
<p>It is now possible to read the input from stdin instead of OCaml files. The following command invokes OCamlFormat that reads its input from the pipe:</p>
<pre><code>echo "let f x = x + 1" | ocamlformat --name a.ml -
</code></pre>
<p>The <code>-</code> on the command line indicates that <code>ocamlformat</code> should read from stdin instead of expecting input files. It is then necessary to use the <code>--name</code> option to designate the input (<code>a.ml</code> here).</p>
<h2>Preset profiles</h2>
<p>Preset profiles allow you to have a consistent configuration without needing to tune every option.</p>
<p>Preset profiles set all options, overriding lower priority configuration. A preset profile can be set using the <code>--profile</code> (or <code>-p</code>) option. You can pass the option <code>--profile=&lt;name&gt;</code> on the command line or add <code>profile = &lt;name&gt;</code> in an <code>.ocamlformat</code> configuration file.</p>
<p>The available profiles are:</p>
<ul>
<li><code>default</code> sets each option to its default value</li>
<li><code>compact</code> sets options for a generally compact code style</li>
<li><code>sparse</code> sets options for a generally sparse code style</li>
<li><code>janestreet</code> is the profile used at JaneStreet</li>
</ul>
<p>To get a better feel of it, here is the formatting of the <a href="https://github.com/ocaml/ocaml/blob/trunk/typing/env.ml#L227-L234"><code>mk_callback</code></a> function from the OCaml compiler with the <code>compact</code> profile:</p>
<pre><code>let mk_callback rest name desc = function
  | None -&gt; nothing
  | Some f -&gt; (
      fun () -&gt;
        match rest with
        | [] -&gt; f name None
        | (hidden, _) :: _ -&gt; f name (Some (desc, hidden)) )
</code></pre>
<p>then the same function formatted with the <code>sparse</code> profile:</p>
<pre><code>let mk_callback rest name desc = function
  | None -&gt;
      nothing
  | Some f -&gt;
      fun () -&gt;
        ( match rest with
        | [] -&gt;
            f name None
        | (hidden, _) :: _ -&gt;
            f name (Some (desc, hidden)) )
</code></pre>
<h2>Project root</h2>
<p>The project root of an input file is taken to be the nearest ancestor directory that contains a <code>.git</code> or <code>.hg</code> or <code>dune-project</code> file.
If the new option <code>--disable-outside-detected-project</code> is set, <code>.ocamlformat</code> configuration files are not read outside of the current project. If no configuration file is found, formatting is disabled.</p>
<p>A new option, <code>--root</code> allows to specify the root directory for a project. If specified, OCamlFormat only takes into account <code>.ocamlformat</code> configuration files inside the root directory and its subdirectories.</p>
<h2>Credits</h2>
<p>This release also contains many other changes and bug fixes that we cannot detail here. Check out the <a href="https://github.com/ocaml-ppx/ocamlformat/blob/v0.8/CHANGES.md#08-2018-10-09">full changelog</a>.</p>
<p>Special thanks to our maintainers and contributors for this release: David Allsopp, Josh Berdine, Hugo Heuzard, Brandon Kase, Anil Madhavapeddy and Guillaume Petiot.</p>
]]></description><link>https://tarides.com/blog/2018-10-17-ocamlformat-0-8</link><guid isPermaLink="false">https://tarides.com/blog/2018-10-17-ocamlformat-0-8.html</guid><dc:creator><![CDATA[ Guillaume Petiot ]]></dc:creator><pubDate>Wed, 17 Oct 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Workshop 2018]]></title><description><![CDATA[<p>The OCaml Users and Developers Workshop brings together industrial
users of OCaml with academics and hackers who are working on extending
the language, type system and tools. OCaml 2018 was held on September
27th, 2018 in St. Louis, Missouri, USA, colocated with ICFP 2018.</p>
<p><strong>Check Tarides' talks: <a href="https://docs.google.com/presentation/d/e/2PACX-1vRnRiGeBWC6ctpSge0gTFuxprNTiS2qtNpvax_A8pD6Ob5ySfL9_SlPKCIoLDCbmsYjTAkMFnlUwqSl/pub?start=false&amp;loop=false&amp;delayms=3000&amp;slide=id.p1">RFCs, all the way down!</a> and <a href="https://speakerdeck.com/avsm/the-ocaml-platform-1-dot-0-2018">The OCaml Platform 1.0</a>.
</strong></p>
]]></description><link>https://tarides.com/blog/2018-09-27-ocaml-workshop-2018</link><guid isPermaLink="false">https://tarides.com/blog/2018-09-27-ocaml-workshop-2018.html</guid><dc:creator><![CDATA[ Romain Calascibetta ]]></dc:creator><pubDate>Thu, 27 Sep 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[Dune 1.2.0]]></title><description><![CDATA[<p>After a tiny but important patch release as 1.1.1, the dune team is thrilled to
announce the release of dune 1.2.0! Here are some highlights of the new
features in that version. The full list of changes can be found <a href="https://github.com/ocaml/dune/blob/e3af33b43a87d7fa2d15f7b41d8bd942302742ec/CHANGES.md#120-14092018">in the dune
repository</a>.</p>
<h2>Watch mode</h2>
<p>When developing, it is common to edit a file, run a build, read the error
message, and fix the error. Since this is a very tight loop and developers are
doing this hundreds or thousands times a day, it is crucial to have the
quickest feedback possible.</p>
<p><code>dune build</code> and <code>dune runtest</code> now accept <a href="https://dune.readthedocs.io/en/latest/usage.html#watch-mode">a <code>-w</code>
flag</a> that will
watch the filesystem for changes, and trigger a new build.</p>
<h2>Better error messages</h2>
<p>Inspired by the great work done in
<a href="https://elm-lang.org/blog/compiler-errors-for-humans">Elm</a> and
<a href="https://reasonml.github.io/blog/2017/08/25/way-nicer-error-messages.html">bucklescript</a>,
dune now displays the relevant file in error messages.</p>
<pre><code> % cat dune
(executable
 (name my_program)
 (librarys cmdliner)
)
 % dune build
File "dune", line 3, characters 2-10:
3 |  (librarys cmdliner)
      ^^^^^^^^
Error: Unknown field librarys
Hint: did you mean libraries?
</code></pre>
<p>Many messages have also been improved, in particular to help users <a href="https://dune.readthedocs.io/en/latest/migration.html#check-list">switching
from the <code>jbuild</code> format to the <code>dune</code>
format</a>.</p>
<h2>dune unstable-fmt</h2>
<p>Are you confused about how to format S-expressions? You are not alone.
That is why we are gradually introducing a formatter for <code>dune</code> files. It can
transform a valid but ugly <code>dune</code> into one that is consistently formatted.</p>
<pre><code> % cat dune
(executable( name ls) (libraries cmdliner)
(preprocess (pps ppx_deriving.std)))
 % dune unstable-fmt dune
(executable
 (name ls)
 (libraries cmdliner)
 (preprocess
  (pps ppx_deriving.std)
 )
)
</code></pre>
<p>This feature is not ready yet for the end user (hence the <code>unstable</code> part),
and in particular the concrete syntax is not stable yet.
But having it already in the code base will make it possible to build useful
integrations with <code>dune</code> itself (to automatically reformat all dune files in a
project, for example) and common editors, so that they format <code>dune</code> files on
save.</p>
<h2>First class support of findlib plugins</h2>
<p>It is now easy to support findlib plugins by just adding the <code>findlib.dynload</code>
library dependency. Then you can use <code>Fl_dynload</code> module in your code which
will automatically do the right thing. <a href="https://dune.readthedocs.io/en/latest/advanced-topics.html#dynamic-loading-of-packages">A complete example can be found in the
dune manual</a>.</p>
<h2>Promote only certain files</h2>
<p>The <code>dune promote</code> command now accept a list of files. This is useful to
promote just the file that is opened in a text editor for example. Some emacs
bindings are provided to do this, which works particularly well with
<a href="https://dune.readthedocs.io/en/latest/tests.html#inline-expectation-tests">inline expectation tests</a>.</p>
<h2>Deprecation message for (wrapped) modes</h2>
<p>By default, libraries are <code>(wrapped true)</code>, which means that they expose a
single OCaml module (source files are exposed as submodules of this main
module). This is usually desired as it makes link-time name collisions less
likely. However, a lot of libraries are using <code>(wrapped false)</code> (expose all
source files as modules) to keep compatibility.</p>
<p>It can be challenging to transition from <code>(wrapped false)</code> to <code>(wrapped true)</code>
because it breaks compatibility. That is why we have added <code>(wrapped (transition "message"))</code> which will generate wrapped modules but keep unwrapped
modules with a deprecation message to help coordinate the change.</p>
<h2>Credits</h2>
<p>Special thanks to our contributors for this release: @aantron, @anuragsoni,
@bobot, @ddickstein, @dra27, @drjdn, @hongchangwu, @khady, @kodek16,
@prometheansacrifice and @ryyppy.</p>
]]></description><link>https://tarides.com/blog/2018-09-06-dune-1-2-0</link><guid isPermaLink="false">https://tarides.com/blog/2018-09-06-dune-1-2-0.html</guid><dc:creator><![CDATA[ Etienne Millon ]]></dc:creator><pubDate>Thu, 06 Sep 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[Station F]]></title><description><![CDATA[<p>We are thrilled to have been accepted into the Founders Progam's 3rd
batch at <a href="https://stationf.co/">Station F</a>! Station F is
"the only startup campus gathering a whole entrepreneurial ecosystem
under one roof" and is providing 3000+ desks and 26 international
startup programs. Our Paris offices are now located in that incredible
place, close to "métro Chevaleret" (Paris 13).</p>
<p><strong>If you are in Paris, drop us an email to visit
<a href=" https://stationf.co/campus/">our beautiful campus</a>!</strong></p>
]]></description><link>https://tarides.com/blog/2018-07-17-station-f</link><guid isPermaLink="false">https://tarides.com/blog/2018-07-17-station-f.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Tue, 17 Jul 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[MirageOS + Tezos funding]]></title><description><![CDATA[<p>We are excited to announce that the <a href="https://tezos.foundation">Tezos Foundation</a>
will trust Tarides to package Tezos nodes as MirageOS unikernel, which will help participants
establish nodes on the Tezos network in a more efficient and secure manner.</p>
]]></description><link>https://tarides.com/blog/2018-05-23-mirageos-tezos-funding</link><guid isPermaLink="false">https://tarides.com/blog/2018-05-23-mirageos-tezos-funding.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Wed, 23 May 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[OCaml Users in Paris (OUPS)]]></title><description><![CDATA[<p>Thomas Gazagnaire gave a presentation on MirageOS to the
<a href="https://www.meetup.com/ocaml-paris/">OCaml meetup in Paris</a>.</p>
<p><strong>Check the <a href="https://gazagnaire.org/pub/2018.05.OUPS.pdf">slides</a>
for more details.</strong></p>
]]></description><link>https://tarides.com/blog/2018-05-23-ocaml-users-in-paris-oups</link><guid isPermaLink="false">https://tarides.com/blog/2018-05-23-ocaml-users-in-paris-oups.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Wed, 23 May 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[Irmin usability enhancements]]></title><description><![CDATA[<p>Zach Shipko is working on improving the UI/UX for Irmin.
He is looking for <a href="https://discuss.ocaml.org/t/irmin-usability-enhancements/2017">feedback</a>
to make Irmin more accessible to potential users and clean up the rough edges for existing users.</p>
]]></description><link>https://tarides.com/blog/2018-05-18-irmin-usability-enhancements</link><guid isPermaLink="false">https://tarides.com/blog/2018-05-18-irmin-usability-enhancements.html</guid><dc:creator><![CDATA[ Zach Shipko ]]></dc:creator><pubDate>Fri, 18 May 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[Invited lecture at ENS]]></title><description><![CDATA[<p>Thomas Gazagnaire gave an invited lecture at
<a href="https://www.di.ens.fr/">the computer science department of ENS</a>,
in Paris. This was part of the system and network L3 course.</p>
<p><strong>Check the <a href="http://gazagnaire.org/ens/mirage.pdf">slides</a> (in english)
and the <a href="http://gazagnaire.org/ens/mirage.tar.gz">exercices</a> (in french).</strong></p>
]]></description><link>https://tarides.com/blog/2018-05-17-invited-lecture-at-ens</link><guid isPermaLink="false">https://tarides.com/blog/2018-05-17-invited-lecture-at-ens.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Thu, 17 May 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[HotPOST'18]]></title><description><![CDATA[<p>Anil Madhavapeddy and Gemma Gordon presented our new operating system for
connected buildings: <a href="https://kcsrk.info/papers/osmose_feb_18.pdf">OSMOSE</a>
to <a href="https://hotpost18.weebly.com/">HotPOST’18</a>. OSMOSE is based on
MirageOS and Irmin and we hope to explore that area more in the coming months!</p>
]]></description><link>https://tarides.com/blog/2018-04-16-hotpost-18</link><guid isPermaLink="false">https://tarides.com/blog/2018-04-16-hotpost-18.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Mon, 16 Apr 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[An Architecture for Interspatial Communication]]></title><description><![CDATA[<p>Position paper on
<a href="https://kcsrk.info/papers/osmose_feb_18.pdf">“An Architecture for Interspatial Communication”</a>
accepted to <a href="https://hotpost18.weebly.com/">HotPOST’18</a>.</p>
]]></description><link>https://tarides.com/blog/2018-02-14-an-architecture-for-interspatial-communication</link><guid isPermaLink="false">https://tarides.com/blog/2018-02-14-an-architecture-for-interspatial-communication.html</guid><dc:creator><![CDATA[ Thomas Gazagnaire ]]></dc:creator><pubDate>Wed, 14 Feb 2018 00:00:00 GMT</pubDate></item></channel></rss>