In this blog series, I will experiment with Rust as a safer and simpler C/C++ replacement. The idea is to combine a couple of C dependencies in Rust, to do some work using the dependencies in Rust and to expose a final API from a Rust library via C ABI. Then I will consume the same exported Rust methods from a number of modern languages/platforms such as C#, Python, Java, Nodejs, R, Go and even back from C/C++.
I’ve already done this with C# and C-Blosc as a part of Spreads.Native library. But in that project, I’ve only re-exported existing C methods and used Rust Cargo build system to automate CMake build, which for me was more convenient than working with CMake directly. The lack of convenient C/C++ package manager and build systems is a good enough reason on its own to investigate Rust and Cargo. But here (part 3) I will focus more on the code. The post is inspired by my struggles with building and integrating native libraries with .NET in a cross-platform and repeatable way and was finally triggered by this awesome talk by Ashley Mannix (@KodrAus) https://www.youtube.com/watch?v=0B1U3fVCIX0
A not primary but important goal of this exercise is to achieve easy debugging experience for all listed languages from VS Code on Windows directly and/or via containers.
I started writing this post before writing a single line of code. It will mainly serve as a walk-through guide to my future self, but I hope that you will find it interesting.
In this part 1, I will only build native libraries from source using Rust tools, combine them in a trivial way,
expose the combined code from Rust cdylib
as C API, debug the code locally from windows-msvc
and remotely in Docker container,
and consume the combined functionality from .NET Core.
In part 2, I will add other languages that consume the simple API.
In part 3, I will write some concurrent code (a very simplified version of what my DataSpreads does) and multi-language clients that work on the same data from different processes simultaneously.
C ABI
The C language is the lowest level “normal” programmers could probably deal with ever. By normal I mean those who do not write drivers, operating systems or OS-dependent things. But very often it is the API of important popular libraries available via stable C Application Binary Interface (ABI) that matters more than the power of “portable assembler” that the C language gives. Many great open-source stable projects are written in C by people who wrote code for NASA 20 years ago and in general are orders of magnitude more proficient in C. We the mortals could only use that pieces of portable and performant art in our routine work from our favorite languages and only dream about such deep C knowledge. It is also not practical and very dangerous to write important code in C without multiple years of experience and to spend so much time on mastering C.
We could consume C ABI practically from any programming language:
- .NET (C#, F#): P/Invoke requires a single attribute added to an extern method definition;
- Python: ctypes module, cffi library or native/CPython extensions;
- Nodejs: N-API addons or
node-ffi
module; - R: R native extensions or Rcpp;
- Java: Java Native Interface (JNI);
- Go: cgo or “C” pseudo package;
- C/C++: just reuse API exported from Rust natively.
The Go language is interesting because it could expose C ABI as well, but it has GC, a bigger runtime and the native boundary is expensive (e.g. 1, 2). So despite the fact that many consider Go to be the main Rust competitor for writing complete programs, Go cannot be used as a low-level C alternative.
Rust shared lib
In this part 1, we will combine three native libraries, call methods that return their versions, combine the versions and return the combined version string via C API to any external consumer.
The output should be (the last line is printed only in debug build):
This task is trivial, but does combine the functionality of three native languages in Rust code, returns a string owned by Rust and exposes a method to free the string from other languages.
Initial setup
Get a stable rust version from https://rustup.rs/ if you do not have it already (1.36 as of this writing). Install VSCode Rust (rls) extension and Remote Development Extension Pack.
All code below is written on Windows with MSVC toolchain and inside the default Rust Docker container (rust:1
)
using VSCode Remote - Containers extension.
Create the following folder structure (git commit):
1src/
2 [client languages folders]/
3 rust/
4 .cargo/
5 config
6 corelib/ # Run `cargo init --lib` in this folder
7 ... # Generated by `cargo init`
8 Cargo.toml # Rust Workspace
9.gitignore
Our main focus is on the src/rust
folder now. Add Cargo.toml
file for Rust workspace
that will contain any crates we will develop next. Initially it contains only the main corelib
that will expose functionality to other languages.
Add rust/.cargo/
folder with a config
file:
1[target.x86_64-pc-windows-msvc]
2rustflags = ["-C", "target-feature=+crt-static"]
3
4[target.x86_64-pc-windows-gnu]
5rustflags = ["-C", "target-feature=+crt-static"]
This configuration instructs the Rust compiler to statically link C runtime library on Windows. The effect of these settings on the binary size is quite small (around 80kb for an empty lib) but it simplifies the deployment of shared libraries produced on Windows. You could see the difference using Dependencies utility, which is a rewrite of a legacy DependencyWalker. This tool is useful during developing a shared native library with exported methods. E.g. when something doesn’t work as expected in a weird way it’s good to ensure that at least the methods are exported to rule out this issue.
Finally, run cargo init --lib
in rust/corelib/
folder to create an empty Rust library.
Run cargo build
and cargo test
from the rust
folder to ensure everything works. Binary
artifacts will be placed into rust/target
folder for all workspace crates from rust/Cargo.toml
config file.
Important: open
src/rust
folder in a separate VSCode instance and work with Rust code from there. Even though you could run cargo commands from the terminal just by navigating to the folder (cd src/rust
), Rust Language Service extension requiresCargo.toml
at the root for autocomplete to work.
Add native dependencies
We are going to build all native dependencies ourselves instead of relying on existing cargo crates. Remember that the goal of this exercise is to use Rust as a glue for native libs that we would otherwise combine using C/C++ into a reusable shared library with C interface. We are not going to sacrifice total control over C build flags and even source code.
For example, we want LMDB key size as large as possible instead of the default 511 bytes, we want to expose some internal methods from Blosc to simplify build process and avoid building several compressors manually, and we want to build SQLite amalgamation from a custom branch and set flags that are significantly different from the ones Rusqlite crate uses.
In the end, we want to have a shared library with a C interface that exposes only methods relevant for our application. We do not care much about public API and bindings of intermediate libraries. We just need all native APIs available during design time (with autocomplete and signature hints).
LMDB
To add LMDB dependency first do the initial setup steps:
- Add
rust/lmdb-sys
folder; - Run
cargo init --lib
inside it; - Add “lmdb-sys” to the workspace
Cargo.toml
; - Clone the upstream repository as git submodule inside
rust/lmdb-sys
folder.
The folder layout and workspace config file after this step should look like:
1rust/
2 .cargo/
3 corelib/
4 lmdb-sys/ # Run `cargo init --lib` in this folder
5 lmdb # git submodule with LMDB source from upstream
6 ... # Generated by `cargo init`
7 Cargo.toml # Rust Workspace: add lmdb-sys
To actually build LMDB library and generate Rust binding to it, edit
lmdb-sys/Cargo.toml
file and add cc and bindgen
crates as build-dependencies
and libc crate as a normal dependency.
The file should look like this after the edits:
1[package]
2name = "lmdb-sys"
3version = "0.1.0"
4authors = ["Victor Baybekov <vbaybekov@gmail.com>"]
5publish = false
6edition = "2018"
7build = "build.rs"
8
9[lib]
10name = "lmdb_sys"
11
12[build-dependencies]
13cc = "1.0"
14bindgen = "0.50"
15
16[dependencies]
17libc = "0.2"
Note the build = "build.rs"
line - we will add this build script next. Create a file
build.rs
next to the Cargo.toml
file with the following content:
1extern crate cc;
2
3use std::env;
4use std::path::PathBuf;
5
6fn main() {
7 let mut lmdb: PathBuf = PathBuf::from(&env::var("CARGO_MANIFEST_DIR").unwrap());
8 lmdb.push("lmdb");
9 lmdb.push("libraries");
10 lmdb.push("liblmdb");
11
12 cc::Build::new()
13 .file(lmdb.join("mdb.c"))
14 .file(lmdb.join("midl.c"))
15 .define("MDB_MAXKEYSIZE", Some("0")) // Set max key size to max computed value instead of default 511
16 .opt_level(2) // https://github.com/LMDB/lmdb/blob/LMDB_0.9.21/libraries/liblmdb/Makefile#L25
17 .static_crt(true)
18 .compile("liblmdb.a");
19
20 let bindings = bindgen::Builder::default()
21 .header("wrapper.h")
22 .generate_comments(true)
23 .use_core()
24 .ctypes_prefix("libc")
25 .whitelist_function("mdb_.*") // it adds recursively all used types so the next line in this case changes nothing for this particular case
26 .whitelist_type("mdb_.*")
27 .prepend_enum_name(false)
28 .constified_enum_module("MDB_cursor_op") // allows access to enum values as MDB_cursor_op.MDB_NEXT
29 .generate()
30 .expect("Unable to generate bindings");
31
32 // Write the bindings to src folder to make rls autocomplete work.
33 let out_path = PathBuf::from("src");
34 bindings
35 .write_to_file(out_path.join("bindings.rs"))
36 .expect("Couldn't write bindings!");
37
38 // Tell cargo to tell rustc to link the lmdb library.
39 println!("cargo:rustc-link-lib=static=lmdb");
40}
I will not go through the basic usage of cc
and bindgen
crates since their documentation
is good and important things are commented in the snippet.
However, some aspects of bindgen setup require some explanation. First, wrapper.h
file is the standard bindgen header file that includes all symbols for which we want
to generate bindings. In this case, the file contains a single line:
1#include "lmdb/libraries/liblmdb/lmdb.h"
Second, we always generate bindings on the fly and do not edit them manually,
as many -sys
library do. We do not care
about clean and concise bindings API and only need to access native API from corelib
.
But to reduce noise we add whitelist_function("mdb_.*")
line that forces bindgen to
generate only functions that contain mdb_
substring and all types that are used by
such functions.
Third, we write the generated bindings into lmdb-sys/src/
folder instead of
OUT_DIR
folder as many examples recommend.
This is done to make the bindings
available to RLS and to make autocomplete work. The current version of RLS does
not recognize bindings included via include!
macro (at least on my machine).
To export the bindings add the following code to the main lmdb-sys/src/lib.rs
file:
1#![allow(non_upper_case_globals)]
2#![allow(non_camel_case_types)]
3#![allow(non_snake_case)]
4
5// We could generate bindings into OUT_DIR and it will work,
6// but VSCode RLS does not see that, so we generate the file
7// inside the src folder and export everything from bindings
8// module. This also help to easily find the file instead
9// of searching inside target/build/... folder.
10// include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
11pub use bindings::*;
12mod bindings;
13
14...
At this point, the local lmdb-sys
package is ready for consumption by our corelib
.
Run cargo test
to ensure that build works and all auto-generated tests pass.
(Browse repository at this point).
Blosc
To add Blosc dependency first do the initial setup steps:
- Add
rust/blosc-sys
folder; - Run
cargo init --lib
inside it; - Add “blosc-sys” to the workspace
Cargo.toml
; - Clone the upstream repository (
spreads
branch) as git submodule insiderust/blosc-sys
folder.
We will be using Spreads fork of the upstream Blosc repository. The fork has minimal changes and exports LZ4/Zstd/Zlib compress/decompress routines. The compressor libraries are already present in Blosc and it’s much more convenient to just export the needed pieces instead of building every library manually.
Bindgen wrapper.h
contains definitions
copied from Blosc. I do not remember why exactly in this case I copied rather than included the headers in Spreads.Native library, but this is also a valid approach, which sometimes is more convenient than including raw source header and allows for greater control
and less work with whitelisting only required functions and types.
Blosc uses CMake and Rust has cmake crate that allows to build CMake projects
as easily as the cc we used for lmdb-sys.
The build.rs
file is self-explanatory. One interesting thing is defining a custom generator for Windows GNU target on this line.
(Browse repository at this point).
SQLite
SQLite is a very stable library, but it’s development branches have some interesting features.
Particularly the begin-concurrent-pnu-wal2
branch has wal2 mode
and BEGIN CONCURRENT enhancement.
These features improve the performance of concurrent access to a database.
To build SQLite we need to create an amalgamation from the source. Download the source
code from the required branch as zip archive (e.g. from here)
and unpack to any folder, e.g. G:/temp/sqlite_scr
. Then open the folder inside Windows Subsystem for Linux
and run the following commands:
Create rust/sqlite-sys/
folder and run cargo init --lib
inside it. Copy sqlite3.c
and sqlite3.h
files to rust/sqlite-sys/sqlite/
folder and create wrapper.h
and build.rs files similar to LMDB ones. Add build and compile dependencies
to Cargo.toml.
Note that the build.rs
file has many custom define symbols to modify SQLite defaults.
At this point, the local sqlite-sys
package is ready for consumption by our corelib
.
(Browse repository at this point).
Test dependencies
Now all three native dependencies are ready to use from corelib
.
Add the following lines to rust/corelib/Cargo.toml
to add the local packages as dependencies:
1[dependencies]
2lmdb-sys = { version = "*", path = "../lmdb-sys" }
3blosc-sys = { version = "*", path = "../blosc-sys" }
4sqlite-sys = { version = "*", path = "../sqlite-sys" }
5libc = { version = "0.2"}
Then in rust/corelib/src/lib.rs
add functions that return version strings of our native libs.
These functions are public but not extern
and only available from other Rust packages and tests.
1extern crate blosc_sys;
2extern crate lmdb_sys;
3extern crate sqlite_sys;
4
5use libc::*;
6
7pub fn lmdb_version() -> String {
8 unsafe {
9 let mut major: c_int = Default::default();
10 let mut minor: c_int = Default::default();
11 let mut patch: c_int = Default::default();
12 lmdb_sys::mdb_version(&mut major, &mut minor, &mut patch);
13 return format!("{}.{}.{}", major, minor, patch);
14 }
15}
16
17pub fn blosc_version() -> String {
18 unsafe {
19 let cptr = blosc_sys::blosc_get_version_string();
20 return std::ffi::CStr::from_ptr(cptr).to_str().unwrap().to_owned();
21 }
22}
23
24pub fn sqlite_version() -> String {
25 unsafe {
26 let cptr = sqlite_sys::sqlite3_libversion();
27 return std::ffi::CStr::from_ptr(cptr).to_str().unwrap().to_owned();
28 }
29}
Then add tests in the same file:
1#[cfg(test)]
2mod tests {
3 #[test]
4 fn lmdb_works() {
5 assert_eq!(super::lmdb_version(), "0.9.70");
6 }
7
8 #[test]
9 fn blosc_works() {
10 unsafe {
11 blosc_sys::blosc_set_nthreads(6);
12 let threads = blosc_sys::blosc_get_nthreads();
13 assert_eq!(threads, 6);
14 assert_eq!(super::blosc_version(), "1.15.2.dev");
15 }
16 }
17
18 #[test]
19 fn sqlite_works() {
20 assert_eq!(super::sqlite_version(), "3.29.0");
21 }
22}
View the rust/corelib/src/lib.rs
after these changes here.
Run cargo test
to ensure that everything works.
Debug locally and in containers
Debugging Rust (especially on Windows with MSVC target) used to be hard or impossible quite recently, but now it is very simple and “just works”.
To simplify debugging we need an executable with a custom entry point (since I couldn’t find a way to easily debug an individual unit test).
Add a sandbox
package to the workspace that depends on corelib
and put the following lines in its main.rs
file:
1extern crate corelib;
2
3fn main() -> () {
4 println!("LMDB: {}", corelib::lmdb_version());
5 println!("Blosc: {}", corelib::blosc_version());
6 println!("SQLite: {}", corelib::sqlite_version());
7}
Browse the changes in git commit.
Local debug with MSVC target
Add C/C++ extension to VSCode if it is not there already. Then select the Debug tab at the left bar
and select “Add configuration…” option from the dropdown box next to the green “Run” icon.
Then select “C/C++: (Windows) launch” command. It will populate .vscode/launch.json
file with
a debug config template. The only line to change there is "program"
. The entire config section
should look like:
1{
2 "name": "Rust Sandbox Win",
3 "type": "cppvsdbg",
4 "request": "launch",
5 "program": "${workspaceFolder}/target/debug/sandbox.exe",
6 "args": [],
7 "stopAtEntry": false,
8 "cwd": "${workspaceFolder}",
9 "environment": [],
10 "externalConsole": false
11}
After saving the file we could set breakpoints in sandbox/src/main.rs
file or
any corelib
file and launch a debugging session. See this blog post
for more details and instructions for OS X / Linux.
Debugging in containers
This requires Docker and for Windows this, unfortunately, means only Windows Pro version. Maybe there is a way to debug Rust in Docker running in WSL but it is out of the scope of this post. I believe upgrading to Pro is simpler and cheaper in the long run if one develops software mainly on Window but needs containers.
Debugging Rust code in Docker containers is surprisingly easy. See a VSCode tutorial on the subject.
There is a Rust sample at https://github.com/microsoft/vscode-remote-try-rust that works fine
for simple Rust code. In our case with native dependencies we have to modify the original Dockerfile
to install clang
and cmake
:
1...
2# Install clang
3# From https://gist.github.com/twlz0ne/9faf00346a2acf10044c54f9ba0b9805#file-dockerfile
4&& apt-get update && apt-get install -y gnupg wget software-properties-common \
5&& wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add - \
6&& apt-add-repository "deb http://apt.llvm.org/stretch/ llvm-toolchain-stretch-6.0 main" \
7&& apt-get update && apt-get install -y clang-6.0 \
8#
9# Install CMake
10&& wget https://cmake.org/files/v3.8/cmake-3.8.2-Linux-x86_64.sh \
11&& mkdir /opt/cmake \
12&& sh cmake-3.8.2-Linux-x86_64.sh --prefix=/opt/cmake --skip-license \
13&& ln -s /opt/cmake/bin/cmake /usr/local/bin/cmake \
14...
To run the debugger in a container click on the green >< icon
(at the left-bottom corner of VSCode window in the status bar) and run “Remote-Containers: Reopen Folder in Container” command.
The first run is slow because Docker needs to create a container from the Dockerfile
, but subsequent runs are quite fast.
When the container is ready we could start the debugger with the “Run Sandbox Container” configuration. Or we could execute cargo commands in the container in a new terminal (Ctrl+Shift+`
).
All changes required for container debugging are in this commit.
Speed up build
We do not need to regenerate bindings on every build. The generation step
takes noticeable time, especially for SQLite. After we have checked that
all dependencies work from the shared corelib we could just comment out
the binding generation step in build.rs
files.
Probably there is a cleaner solution using features or something else.
But bindings could only change after we update the source code or change
build logic inside build.rs
files, so keeping generation code commented out
in the build script is a quick fix. After doing so only corelib
is
incrementally recompiled.
Simple C method
We start with the simplest C API from Rust library: just return a string with versions of native dependencies. This method calls methods from each native dependency so technically it does combine native functionality using Rust and exposes it as if we were doing so in C.
But even such a simple method has complications. Rust String
is not a C string (a pointer to null-terminated chars),
so we need to convert it to owned CString
. But the memory behind the CString
is owned by Rust and
we need to leak it first before returning it to the external world and to destroy the object later.
Add the following methods to rust/corelib/src/lib.rs
:
1use std::ffi::CString;
2
3#[no_mangle]
4pub extern "C" fn native_versions_get() -> *mut c_char {
5 let s = format!("LMDB: {}\nBlosc: {}\nSQLite: {}", lmdb_version(), blosc_version(), sqlite_version());
6 let cs = CString::new(s).unwrap();
7 cs.into_raw()
8}
9
10#[no_mangle]
11pub extern "C" fn native_versions_free(c: *mut c_char) {
12 if c.is_null() {
13 return;
14 }
15 // CString is dropped automatically when does out of scope
16 unsafe { CString::from_raw(c); }
17
18 // Print diagnostics in debug build
19 if cfg!(debug_assertions) {
20 println!("Rust string is dropped");
21 }
22}
And a test:
1#[test]
2fn native_versions_get_works() {
3 unsafe{
4 let nv = super::native_versions_get();
5 let cs = std::ffi::CString::from_raw(nv);
6 assert_eq!(cs.to_str().unwrap(), "LMDB: 0.9.70\nBlosc: 1.15.2.dev\nSQLite: 3.29.0");
7 std::mem::forget(cs);
8 super::native_versions_free(nv);
9 }
10}
Update sandbox main method to this:
1fn main() -> () {
2 unsafe{
3 let nv = corelib::native_versions_get();
4 let cs = std::ffi::CString::from_raw(nv);
5 println!("{}", cs.to_str().unwrap());
6 std::mem::forget(cs); // Do not drop, we need to return it back to destructor to simulate desired behavior
7 corelib::native_versions_free(nv);
8 }
9}
Run cargo test
and cargo run
to ensure that everything work.
Release build
Add the following lines to rust/Cargo.toml
for explicit control over build parameters:
1[profile.release]
2opt-level = 3
3debug = false
4rpath = false
5lto = true
6debug-assertions = false
7panic = 'abort'
Run cargo build --release
.
The final shared library is rust/target/release/corelib.dll
. Open that file
with Dependencies utility to make sure
that the two methods native_versions_get
and native_versions_free
are exported.
The shared library is ready for consumption from other languages.
Clients
.NET Core
Install .NET Core SDK from here. This example uses the LTS version 2.1.
.NET has probably the easiest way to use native dependencies via P/Invoke. Open
dotnet
folder
in a separate VSCode window and run dotnet init console
in the terminal.
Rename dotnet.csproj
to RustTheNewC
(optional, just to match this example).
Add the following piece to the project file:
1<ItemGroup>
2 <None Include="..\rust\target\$(Configuration)\corelib.dll">
3 <Pack>true</Pack>
4 <PackagePath>runtimes/win-x64/native</PackagePath>
5 <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
6 </None>
7</ItemGroup>
The part $(Configuration)
is replaced by either Debug
or Release
depending on dotnet build configuration. This matches nicely
to the rust/target
output folder layout and uses the fact that the file path is case-insensitive on Windows.
The line <Pack>true</Pack>
adds the native library to a NuGet package
that could be produced by dotnet pack -c Release
command.
The line <PackagePath>runtimes/win-x64/native</PackagePath>
places the native library to a special
location inside the package so that the library is automatically loaded.
The line <CopyToOutputDirectory> PreserveNewest </CopyToOutputDirectory>
is needed to copy the native library
to the binary output folder so that this example could find it.
Replace the content of Program.cs
with the following code:
1using System;
2using System.Runtime.InteropServices;
3using System.Text;
4
5namespace RustTheNewC
6{
7 internal static class Program
8 {
9 private const string NativeLibraryName = "corelib";
10
11 [DllImport(NativeLibraryName, CallingConvention = CallingConvention.Cdecl)]
12 public static extern IntPtr native_versions_get();
13
14 [DllImport(NativeLibraryName, CallingConvention = CallingConvention.Cdecl)]
15 public static extern void native_versions_free(IntPtr cString);
16
17 private static void Main(string[] args)
18 {
19 var cStringPtr = native_versions_get();
20 try
21 {
22 Console.WriteLine(PtrToStringUtf8(cStringPtr));
23 }
24 finally
25 {
26 native_versions_free(cStringPtr);
27 }
28 }
29
30 private static string PtrToStringUtf8(IntPtr ptr)
31 {
32 if (ptr == IntPtr.Zero)
33 {
34 return null;
35 }
36
37 var i = 0;
38 while (Marshal.ReadByte(ptr, i) != 0)
39 {
40 i++;
41 }
42
43 var bytes = new byte[i];
44 Marshal.Copy(ptr, bytes, 0, i);
45
46 return Encoding.UTF8.GetString(bytes, 0, i);
47 }
48 }
49}
Run dotnet run
(debug build) or dotnet run -c Release
. These two builds use different
native libraries. The Rust debug library has this part in the string destructor:
You should see that line printed after the native library versions in dotnet debug build.
Other languages
To be done in part 2.
Conclusion
So far Rust feels quite solid and stable. Cargo + build.rs + cc/cmake/bindgen are much better
than manual work with Makefiles and CMakeList.txt. ML-like syntax is great, previous F# experience makes it feel natural.
C# 7 ref readonly
and ref
are somewhat close to &
and &mut
from mental model perspective,
previous experience with unsafe and by ref C# also helps. Rust obsession with safety and borrow
checker are nice to have for complex threaded code, but they could be worked around with unsafe
for
simple glue code.
I will publish part 2 and 3 hopefully this summer.
2024 update: that year I was busy relocating to France - new country, new job, new covid, new war… It’s now irrelevant to continue on this subject.