In this blog series, I will experiment with Rust as a safer and simpler C/C++ replacement. The idea is to combine a couple of C dependencies in Rust, to do some work using the dependencies in Rust and to expose a final API from a Rust library via C ABI. Then I will consume the same exported Rust methods from a number of modern languages/platforms such as C#, Python, Java, Nodejs, R, Go and even back from C/C++.

I’ve already done this with C# and C-Blosc as a part of Spreads.Native library. But in that project, I’ve only re-exported existing C methods and used Rust Cargo build system to automate CMake build, which for me was more convenient than working with CMake directly. The lack of convenient C/C++ package manager and build systems is a good enough reason on its own to investigate Rust and Cargo. But here (part 3) I will focus more on the code. The post is inspired by my struggles with building and integrating native libraries with .NET in a cross-platform and repeatable way and was finally triggered by this awesome talk by Ashley Mannix (@KodrAus) https://www.youtube.com/watch?v=0B1U3fVCIX0

A not primary but important goal of this exercise is to achieve easy debugging experience for all listed languages from VS Code on Windows directly and/or via containers.

I started writing this post before writing a single line of code. It will mainly serve as a walk-through guide to my future self, but I hope that you will find it interesting.

In this part 1, I will only build native libraries from source using Rust tools, combine them in a trivial way, expose the combined code from Rust cdylib as C API, debug the code locally from windows-msvc and remotely in Docker container, and consume the combined functionality from .NET Core.

In part 2, I will add other languages that consume the simple API.

In part 3, I will write some concurrent code (a very simplified version of what my DataSpreads does) and multi-language clients that work on the same data from different processes simultaneously.

C ABI

The C language is the lowest level “normal” programmers could probably deal with ever. By normal I mean those who do not write drivers, operating systems or OS-dependent things. But very often it is the API of important popular libraries available via stable C Application Binary Interface (ABI) that matters more than the power of “portable assembler” that the C language gives. Many great open-source stable projects are written in C by people who wrote code for NASA 20 years ago and in general are orders of magnitude more proficient in C. We the mortals could only use that pieces of portable and performant art in our routine work from our favorite languages and only dream about such deep C knowledge. It is also not practical and very dangerous to write important code in C without multiple years of experience and to spend so much time on mastering C.

We could consume C ABI practically from any programming language:

The Go language is interesting because it could expose C ABI as well, but it has GC, a bigger runtime and the native boundary is expensive (e.g. 1, 2). So despite the fact that many consider Go to be the main Rust competitor for writing complete programs, Go cannot be used as a low-level C alternative.

Rust shared lib

In this part 1, we will combine three native libraries, call methods that return their versions, combine the versions and return the combined version string via C API to any external consumer.

The output should be (the last line is printed only in debug build):

LMDB: 0.9.70
Blosc: 1.15.2.dev
SQLite: 3.29.0
Rust string is dropped

This task is trivial, but does combine the functionality of three native languages in Rust code, returns a string owned by Rust and exposes a method to free the string from other languages.

Initial setup

Get a stable rust version from https://rustup.rs/ if you do not have it already (1.36 as of this writing). Install VSCode Rust (rls) extension and Remote Development Extension Pack.

All code below is written on Windows with MSVC toolchain and inside the default Rust Docker container (rust:1) using VSCode Remote - Containers extension.

Create the following folder structure (git commit):

src/
    [client languages folders]/
    rust/
        .cargo/
            config
        corelib/            # Run `cargo init --lib` in this folder
            ...             # Generated by `cargo init`
        Cargo.toml          # Rust Workspace
.gitignore

Our main focus is on the src/rust folder now. Add Cargo.toml file for Rust workspace that will contain any crates we will develop next. Initially it contains only the main corelib that will expose functionality to other languages.

[workspace]
members = [
    "corelib"
]

Add rust/.cargo/ folder with a config file:

[target.x86_64-pc-windows-msvc]
rustflags = ["-C", "target-feature=+crt-static"]

[target.x86_64-pc-windows-gnu]
rustflags = ["-C", "target-feature=+crt-static"]

This configuration instructs the Rust compiler to statically link C runtime library on Windows. The effect of these settings on the binary size is quite small (around 80kb for an empty lib) but it simplifies the deployment of shared libraries produced on Windows. You could see the difference using Dependencies utility, which is a rewrite of a legacy DependencyWalker. This tool is useful during developing a shared native library with exported methods. E.g. when something doesn’t work as expected in a weird way it’s good to ensure that at least the methods are exported to rule out this issue.

Finally, run cargo init --lib in rust/corelib/ folder to create an empty Rust library. Run cargo build and cargo test from the rust folder to ensure everything works. Binary artifacts will be placed into rust/target folder for all workspace crates from rust/Cargo.toml config file.

Important: open src/rust folder in a separate VSCode instance and work with Rust code from there. Even though you could run cargo commands from the terminal just by navigating to the folder (cd src/rust), Rust Language Service extension requires Cargo.toml at the root for autocomplete to work.

Add native dependencies

We are going to build all native dependencies ourselves instead of relying on existing cargo crates. Remember that the goal of this exercise is to use Rust as a glue for native libs that we would otherwise combine using C/C++ into a reusable shared library with C interface. We are not going to sacrifice total control over C build flags and even source code.

For example, we want LMDB key size as large as possible instead of the default 511 bytes, we want to expose some internal methods from Blosc to simplify build process and avoid building several compressors manually, and we want to build SQLite amalgamation from a custom branch and set flags that are significantly different from the ones Rusqlite crate uses.

In the end, we want to have a shared library with a C interface that exposes only methods relevant for our application. We do not care much about public API and bindings of intermediate libraries. We just need all native APIs available during design time (with autocomplete and signature hints).

LMDB

To add LMDB dependency first do the initial setup steps:

  • Add rust/lmdb-sys folder;
  • Run cargo init --lib inside it;
  • Add “lmdb-sys” to the workspace Cargo.toml;
  • Clone the upstream repository as git submodule inside rust/lmdb-sys folder.

The folder layout and workspace config file after this step should look like:

rust/
    .cargo/
    corelib/
    lmdb-sys/           # Run `cargo init --lib` in this folder
        lmdb            # git submodule with LMDB source from upstream
        ...             # Generated by `cargo init`
    Cargo.toml          # Rust Workspace: add lmdb-sys
[workspace]
members = [
    "corelib",
    "lmdb-sys",
]

To actually build LMDB library and generate Rust binding to it, edit lmdb-sys/Cargo.toml file and add cc and bindgen crates as build-dependencies and libc crate as a normal dependency. The file should look like this after the edits:

[package]
name = "lmdb-sys"
version = "0.1.0"
authors = ["Victor Baybekov <vbaybekov@gmail.com>"]
publish = false
edition = "2018"
build = "build.rs"

[lib]
name = "lmdb_sys"

[build-dependencies]
cc = "1.0"
bindgen = "0.50"

[dependencies]
libc = "0.2"

Note the build = "build.rs" line - we will add this build script next. Create a file build.rs next to the Cargo.toml file with the following content:

extern crate cc;

use std::env;
use std::path::PathBuf;

fn main() {
    let mut lmdb: PathBuf = PathBuf::from(&env::var("CARGO_MANIFEST_DIR").unwrap());
    lmdb.push("lmdb");
    lmdb.push("libraries");
    lmdb.push("liblmdb");

    cc::Build::new()
        .file(lmdb.join("mdb.c"))
        .file(lmdb.join("midl.c"))
        .define("MDB_MAXKEYSIZE", Some("0")) // Set max key size to max computed value instead of default 511
        .opt_level(2) // https://github.com/LMDB/lmdb/blob/LMDB_0.9.21/libraries/liblmdb/Makefile#L25
        .static_crt(true)
        .compile("liblmdb.a");

    let bindings = bindgen::Builder::default()
        .header("wrapper.h")
        .generate_comments(true)
        .use_core()
        .ctypes_prefix("libc")
        .whitelist_function("mdb_.*") // it adds recursively all used types so the next line in this case changes nothing for this particular case
        .whitelist_type("mdb_.*")
        .prepend_enum_name(false)
        .constified_enum_module("MDB_cursor_op") // allows access to enum values as MDB_cursor_op.MDB_NEXT
        .generate()
        .expect("Unable to generate bindings");

    // Write the bindings to src folder to make rls autocomplete work.
    let out_path = PathBuf::from("src");
    bindings
        .write_to_file(out_path.join("bindings.rs"))
        .expect("Couldn't write bindings!");

    // Tell cargo to tell rustc to link the lmdb library.
    println!("cargo:rustc-link-lib=static=lmdb");
}

I will not go through the basic usage of cc and bindgen crates since their documentation is good and important things are commented in the snippet.

However, some aspects of bindgen setup require some explanation. First, wrapper.h file is the standard bindgen header file that includes all symbols for which we want to generate bindings. In this case, the file contains a single line:

#include "lmdb/libraries/liblmdb/lmdb.h"

Second, we always generate bindings on the fly and do not edit them manually, as many -sys library do. We do not care about clean and concise bindings API and only need to access native API from corelib. But to reduce noise we add whitelist_function("mdb_.*") line that forces bindgen to generate only functions that contain mdb_ substring and all types that are used by such functions.

Third, we write the generated bindings into lmdb-sys/src/ folder instead of OUT_DIR folder as many examples recommend. This is done to make the bindings available to RLS and to make autocomplete work. The current version of RLS does not recognize bindings included via include! macro (at least on my machine).

To export the bindings add the following code to the main lmdb-sys/src/lib.rs file:

#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]

// We could generate bindings into OUT_DIR and it will work,
// but VSCode RLS does not see that, so we generate the file
// inside the src folder and export everything from bindings
// module. This also help to easily find the file instead
// of searching inside target/build/... folder.
// include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
pub use bindings::*;
mod bindings;

...

At this point, the local lmdb-sys package is ready for consumption by our corelib. Run cargo test to ensure that build works and all auto-generated tests pass.

(Browse repository at this point).

Blosc

To add Blosc dependency first do the initial setup steps:

  • Add rust/blosc-sys folder;
  • Run cargo init --lib inside it;
  • Add “blosc-sys” to the workspace Cargo.toml;
  • Clone the upstream repository (spreads branch) as git submodule inside rust/blosc-sys folder.

We will be using Spreads fork of the upstream Blosc repository. The fork has minimal changes and exports LZ4/Zstd/Zlib compress/decompress routines. The compressor libraries are already present in Blosc and it’s much more convenient to just export the needed pieces instead of building every library manually.

Bindgen wrapper.h contains definitions copied from Blosc. I do not remember why exactly in this case I copied rather than included the headers in Spreads.Native library, but this is also a valid approach, which sometimes is more convenient than including raw source header and allows for greater control and less work with whitelisting only required functions and types.

Blosc uses CMake and Rust has cmake crate that allows to build CMake projects as easily as the cc we used for lmdb-sys. The build.rs file is self-explanatory. One interesting thing is defining a custom generator for Windows GNU target on this line.

(Browse repository at this point).

SQLite

SQLite is a very stable library, but it’s development branches have some interesting features. Particularly the begin-concurrent-pnu-wal2 branch has wal2 mode and BEGIN CONCURRENT enhancement. These features improve the performance of concurrent access to a database.

To build SQLite we need to create an amalgamation from the source. Download the source code from the required branch as zip archive (e.g. from here) and unpack to any folder, e.g. G:/temp/sqlite_scr. Then open the folder inside Windows Subsystem for Linux and run the following commands:

cd /mnt/g/temp/sqlite_src
./configure
make sqlite3.c

Create rust/sqlite-sys/ folder and run cargo init --lib inside it. Copy sqlite3.c and sqlite3.h files to rust/sqlite-sys/sqlite/ folder and create wrapper.h and build.rs files similar to LMDB ones. Add build and compile dependencies to Cargo.toml.

Note that the build.rs file has many custom define symbols to modify SQLite defaults.

At this point, the local sqlite-sys package is ready for consumption by our corelib.

(Browse repository at this point).

Test dependencies

Now all three native dependencies are ready to use from corelib.

Add the following lines to rust/corelib/Cargo.toml to add the local packages as dependencies:

[dependencies]
lmdb-sys = { version = "*", path = "../lmdb-sys" }
blosc-sys = { version = "*", path = "../blosc-sys" }
sqlite-sys = { version = "*", path = "../sqlite-sys" }
libc = { version = "0.2"}

Then in rust/corelib/src/lib.rs add functions that return version strings of our native libs. These functions are public but not extern and only available from other Rust packages and tests.

extern crate blosc_sys;
extern crate lmdb_sys;
extern crate sqlite_sys;

use libc::*;

pub fn lmdb_version() -> String {
    unsafe {
        let mut major: c_int = Default::default();
        let mut minor: c_int = Default::default();
        let mut patch: c_int = Default::default();
        lmdb_sys::mdb_version(&mut major, &mut minor, &mut patch);
        return format!("{}.{}.{}", major, minor, patch);
    }
}

pub fn blosc_version() -> String {
    unsafe {
        let cptr = blosc_sys::blosc_get_version_string();
        return std::ffi::CStr::from_ptr(cptr).to_str().unwrap().to_owned();
    }
}

pub fn sqlite_version() -> String {
    unsafe {
        let cptr = sqlite_sys::sqlite3_libversion();
        return std::ffi::CStr::from_ptr(cptr).to_str().unwrap().to_owned();
    }
}

Then add tests in the same file:

#[cfg(test)]
mod tests {
    #[test]
    fn lmdb_works() {
        assert_eq!(super::lmdb_version(), "0.9.70");
    }

    #[test]
    fn blosc_works() {
        unsafe {
            blosc_sys::blosc_set_nthreads(6);
            let threads = blosc_sys::blosc_get_nthreads();
            assert_eq!(threads, 6);
            assert_eq!(super::blosc_version(), "1.15.2.dev");
        }
    }

    #[test]
    fn sqlite_works() {
        assert_eq!(super::sqlite_version(), "3.29.0");
    }
}

View the rust/corelib/src/lib.rs after these changes here.

Run cargo test to ensure that everything works.

Debug locally and in containers

Debugging Rust (especially on Windows with MSVC target) used to be hard or impossible quite recently, but now it is very simple and “just works”.

To simplify debugging we need an executable with a custom entry point (since I couldn’t find a way to easily debug an individual unit test).

Add a sandbox package to the workspace that depends on corelib and put the following lines in its main.rs file:

extern crate corelib;

fn main() -> () {
    println!("LMDB: {}", corelib::lmdb_version());
    println!("Blosc: {}", corelib::blosc_version());
    println!("SQLite: {}", corelib::sqlite_version());
}

Browse the changes in git commit.

Local debug with MSVC target

Add C/C++ extension to VSCode if it is not there already. Then select the Debug tab at the left bar and select “Add configuration…” option from the dropdown box next to the green “Run” icon. Then select “C/C++: (Windows) launch” command. It will populate .vscode/launch.json file with a debug config template. The only line to change there is "program". The entire config section should look like:

{
  "name": "Rust Sandbox Win",
  "type": "cppvsdbg",
  "request": "launch",
  "program": "${workspaceFolder}/target/debug/sandbox.exe",
  "args": [],
  "stopAtEntry": false,
  "cwd": "${workspaceFolder}",
  "environment": [],
  "externalConsole": false
}

After saving the file we could set breakpoints in sandbox/src/main.rs file or any corelib file and launch a debugging session. See this blog post for more details and instructions for OS X / Linux.

Debugging in containers

This requires Docker and for Windows this, unfortunately, means only Windows Pro version. Maybe there is a way to debug Rust in Docker running in WSL but it is out of the scope of this post. I believe upgrading to Pro is simpler and cheaper in the long run if one develops software mainly on Window but needs containers.

Debugging Rust code in Docker containers is surprisingly easy. See a VSCode tutorial on the subject.

There is a Rust sample at https://github.com/microsoft/vscode-remote-try-rust that works fine for simple Rust code. In our case with native dependencies we have to modify the original Dockerfile to install clang and cmake:

...
# Install clang
# From https://gist.github.com/twlz0ne/9faf00346a2acf10044c54f9ba0b9805#file-dockerfile
&& apt-get update && apt-get install -y gnupg wget software-properties-common \
&& wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - \
&& apt-add-repository "deb http://apt.llvm.org/stretch/ llvm-toolchain-stretch-6.0 main" \
&& apt-get update && apt-get install -y clang-6.0 \
#
# Install CMake
&& wget https://cmake.org/files/v3.8/cmake-3.8.2-Linux-x86_64.sh \
&& mkdir /opt/cmake \
&& sh cmake-3.8.2-Linux-x86_64.sh --prefix=/opt/cmake --skip-license \
&& ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake \
...

To run the debugger in a container click on the green >< icon (at the left-bottom corner of VSCode window in the status bar) and run “Remote-Containers: Reopen Folder in Container” command. The first run is slow because Docker needs to create a container from the Dockerfile, but subsequent runs are quite fast.

When the container is ready we could start the debugger with the “Run Sandbox Container” configuration. Or we could execute cargo commands in the container in a new terminal (Ctrl+Shift+`).

All changes required for container debugging are in this commit.

Speed up build

We do not need to regenerate bindings on every build. The generation step takes noticeable time, especially for SQLite. After we have checked that all dependencies work from the shared corelib we could just comment out the binding generation step in build.rs files.

Probably there is a cleaner solution using features or something else. But bindings could only change after we update the source code or change build logic inside build.rs files, so keeping generation code commented out in the build script is a quick fix. After doing so only corelib is incrementally recompiled.

Git commit

Simple C method

We start with the simplest C API from Rust library: just return a string with versions of native dependencies. This method calls methods from each native dependency so technically it does combine native functionality using Rust and exposes it as if we were doing so in C.

But even such a simple method has complications. Rust String is not a C string (a pointer to null-terminated chars), so we need to convert it to owned CString. But the memory behind the CString is owned by Rust and we need to leak it first before returning it to the external world and to destroy the object later.

Add the following methods to rust/corelib/src/lib.rs:

use std::ffi::CString;

#[no_mangle]
pub extern "C" fn native_versions_get() -> *mut c_char {
    let s = format!("LMDB: {}\nBlosc: {}\nSQLite: {}", lmdb_version(), blosc_version(), sqlite_version());
    let cs = CString::new(s).unwrap();
    cs.into_raw()
}

#[no_mangle]
pub extern "C" fn native_versions_free(c: *mut c_char) {
    if c.is_null() {
        return;
    }
    // CString is dropped automatically when does out of scope
    unsafe { CString::from_raw(c); }

    // Print diagnostics in debug build
    if cfg!(debug_assertions) {
        println!("Rust string is dropped");
    }
}

And a test:

#[test]
fn native_versions_get_works() {
    unsafe{
        let nv = super::native_versions_get();
        let cs = std::ffi::CString::from_raw(nv);
        assert_eq!(cs.to_str().unwrap(), "LMDB: 0.9.70\nBlosc: 1.15.2.dev\nSQLite: 3.29.0");
        std::mem::forget(cs);
        super::native_versions_free(nv);
    }
}

Update sandbox main method to this:

fn main() -> () {
    unsafe{
        let nv = corelib::native_versions_get();
        let cs = std::ffi::CString::from_raw(nv);
        println!("{}", cs.to_str().unwrap());
        std::mem::forget(cs); // Do not drop, we need to return it back to destructor to simulate desired behavior
        corelib::native_versions_free(nv);
    }
}

Run cargo test and cargo run to ensure that everything work.

Git commit

Release build

Add the following lines to rust/Cargo.toml for explicit control over build parameters:

[profile.release]
opt-level = 3
debug = false
rpath = false
lto = true
debug-assertions = false
panic = 'abort'

Run cargo build --release.

The final shared library is rust/target/release/corelib.dll. Open that file with Dependencies utility to make sure that the two methods native_versions_get and native_versions_free are exported.

The shared library is ready for consumption from other languages.

Git commit

Clients

.NET Core

Install .NET Core SDK from here. This example uses the LTS version 2.1.

.NET has probably the easiest way to use native dependencies via P/Invoke. Open dotnet folder in a separate VSCode window and run dotnet init console in the terminal. Rename dotnet.csproj to RustTheNewC (optional, just to match this example).

Add the following piece to the project file:

<ItemGroup>
  <None Include="..\rust\target\$(Configuration)\corelib.dll">
    <Pack>true</Pack>
    <PackagePath>runtimes/win-x64/native</PackagePath>
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </None>
</ItemGroup>

The part $(Configuration) is replaced by either Debug or Release depending on dotnet build configuration. This matches nicely to the rust/target output folder layout and uses the fact that the file path is case-insensitive on Windows.

The line <Pack>true</Pack> adds the native library to a NuGet package that could be produced by dotnet pack -c Release command.

The line <PackagePath>runtimes/win-x64/native</PackagePath> places the native library to a special location inside the package so that the library is automatically loaded.

The line <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> is needed to copy the native library to the binary output folder so that this example could find it.

Replace the content of Program.cs with the following code:

using System;
using System.Runtime.InteropServices;
using System.Text;

namespace RustTheNewC
{
    internal static class Program
    {
        private const string NativeLibraryName = "corelib";

        [DllImport(NativeLibraryName, CallingConvention = CallingConvention.Cdecl)]
        public static extern IntPtr native_versions_get();

        [DllImport(NativeLibraryName, CallingConvention = CallingConvention.Cdecl)]
        public static extern void native_versions_free(IntPtr cString);

        private static void Main(string[] args)
        {
            var cStringPtr = native_versions_get();
            try
            {
                Console.WriteLine(PtrToStringUtf8(cStringPtr));
            }
            finally
            {
                native_versions_free(cStringPtr);
            }
        }

        private static string PtrToStringUtf8(IntPtr ptr)
        {
            if (ptr == IntPtr.Zero)
            {
                return null;
            }

            var i = 0;
            while (Marshal.ReadByte(ptr, i) != 0)
            {
                i++;
            }

            var bytes = new byte[i];
            Marshal.Copy(ptr, bytes, 0, i);

            return Encoding.UTF8.GetString(bytes, 0, i);
        }
    }
}

Run dotnet run (debug build) or dotnet run -c Release. These two builds use different native libraries. The Rust debug library has this part in the string destructor:

if cfg!(debug_assertions) {
    println!("Rust string is dropped");
}

You should see that line printed after the native library versions in dotnet debug build.

Git commit

Other languages

To be done in part 2.

Conclusion

So far Rust feels quite solid and stable. Cargo + build.rs + cc/cmake/bindgen are much better than manual work with Makefiles and CMakeList.txt. ML-like syntax is great, previous F# experience makes it feel natural. C# 7 ref readonly and ref are somewhat close to & and &mut from mental model perspective, previous experience with unsafe and by ref C# also helps. Rust obsession with safety and borrow checker are nice to have for complex threaded code, but they could be worked around with unsafe for simple glue code.

I will publish part 2 and 3 hopefully this summer.