The next problem to be tackled in Trillium is to speed-up the transmission of AST objects from Lean to Rust. A repl interface, as used in the paper Hypertree by OpenAI, could be used, but its serialization/deserialization interface imposes overhead and cannot be easily parallelized. This necessitated the need for direct communication between Rust and Lean.
This is based on
The example code is here. If you don’t
see an example listed, check some of the branches other than main
or send me a
message on discord.
This example is doable without Nix, but the build process is more tedious.
Setting up Lean
We start from Lean4’s official Reverse FFI Example and create a simple Lean library:
-- Callee/Callee.lean
@[export my_function]
def my_function (s: String): String :=
s ++ s!"abc"
-- Callee/lakefile.lean
import Lake
open System Lake DSL
package callee
@[default_target]
lean_lib Callee where
defaultFacets := #[LeanLib.sharedFacet]
and a flake needed to build it:
{
description: "RustCallLean";
inputs = {
nixpkgs.url = "github:nixos/nixpkgs/nixos-unstable";
flake-parts.url = "github:hercules-ci/flake-parts";
lean.url = "github:leanprover/lean4?ref=v4.1.0";
};
outputs = inputs @ {
self,
nixpkgs,
flake-parts,
lean,
...
} : flake-parts.lib.mkFlake { inherit inputs; } {
flake = {
};
systems = [
"x86_64-linux"
"x86_64-darwin"
];
perSystem = { system, pkgs, ... }: let
leanPkgs = lean.packages.${system};
callee = leanPkgs.buildLeanPackage {
name = "Callee";
roots = [ "Callee" ];
src = ./Callee;
};
in rec {
packages = {
callee = callee.sharedLib;
};
devShells.default = pkgs.mkShell {
CALLEE_PATH = callee.sharedLib;
LEAN_ROOT = leanPkgs.lean-all;
};
};
};
}
Dropping into a dev shell here with nix develop
, we see that the environment variables are set:
$ nix develop
$ ls $CALLEE_PATH
libCallee.so*
$ ls $LEAN_ROOT
bin/ include/ lib/ share/
This finishes the setup for the lean part.
Setting up Rust
Starting with a simple Cargo.toml
:
[package]
name = "caller"
description = "Rust calls Lean"
authors = ["Leni Aniva"]
publish = false
version = "0.1.0"
edition = "2021"
rust-version = "1.73"
[dependencies]
[build-dependencies]
bindgen = "0.69.4"
The bindgen
crate is added as a build time dependency since we need to
generate Rust bindings for the C header <lean/lean.h>
. In Nix, bindgen
requires clang
and a special bindgenHook
is available on pkgs.rustPlatform
. See nixpkgs.
# flake.nix
devShells.default = pkgs.mkShell {
CALLEE_PATH = callee.sharedLib;
LEAN_ROOT = leanPkgs.lean-all;
nativeBuildInpuuts = [ pkgs.rustPlatform.bindgenHook ];
buildInputs = buildInputs ++ [
pkgs.rustc
# This is not needed for compilation. It is here for diagnostic purposes
pkgs.rust-bindgen
];
};
bindgenHook
must be supplied as one of nativeBuildInputs
for bindgen to work
in Nix. All commands in this tutorial are executed in dev shell unless otherwise
noted.
Now we invoke bindgen
in build.rs
:
// build.rs
use std::env;
use std::path::PathBuf;
fn main() {
let callee_root = env::var("CALLEE_PATH").expect("Callee path variable must be available");
let lean_root = env::var("LEAN_ROOT").expect("Lean root must exist");
let output_path = PathBuf::from(env::var("OUT_DIR").unwrap());
let input = format!("{lean_root}/include/lean/lean.h");
println!("cargo:rerun-if-changed=build.rs");
println!("cargo:rerun-if-changed={input}");
println!("cargo:rustc-link-search={callee_root}");
println!("cargo:rustc-link-search={lean_root}/lib/lean");
// Required in later versions of Lean 4
println!("cargo:rustc-link-lib=Init_shared");
let bindings = bindgen::Builder::default()
.header(&input)
// Tell cargo to invalidate the built crate whenever any of the
// included header files changed.
.parse_callbacks(Box::new(bindgen::CargoCallbacks::new()))
.generate()
.expect("Unable to generate bindings");
bindings
.write_to_file(output_path.join("lean.rs"))
.expect("Couldn't write bindings!");
This creates a file lean.rs
in the OUT_DIR
which contains the same interface
as <lean/lean.h>
. We call it in main.rs
to initialize Lean.
#[allow(non_upper_case_globals)]
#[allow(non_camel_case_types)]
#[allow(non_snake_case)]
#[allow(dead_code)]
mod lean
{
include!(concat!(env!("OUT_DIR"), "/lean.rs"));
#[link(name = "leanshared")]
extern "C" {
pub fn lean_initialize_runtime_module();
pub fn lean_initialize();
// lean_io_mark_end_initialization is already defined.
}
#[link(name = "Callee")]
extern "C" {
pub fn initialize_Callee(builtin: u8, world: lean_obj_arg) -> lean_obj_res;
pub fn my_function(s: lean_obj_arg) -> lean_obj_arg;
}
}
fn main()
{
unsafe {
lean::lean_initialize_runtime_module();
lean::lean_initialize();
let builtin: u8 = 1;
let res = lean::initialize_Callee(builtin, lean::lean_io_mk_world());
if lean::lean_io_result_is_ok(res) {
lean::lean_dec_ref(res);
} else {
lean::lean_io_result_show_error(res);
lean::lean_dec(res);
panic!("Failed to load callee");
}
lean::lean_io_mark_end_initialization();
}
println!("Lean initialization complete!");
}
However, this runs into a problem: The function lean_io_mk_world
isn’t
available here, but it is available in lean.h
as a static inline
function.
We can test this by invoking the bindgen
utility
$ bindgen --experimental --wrap-static-fns $LEAN_ROOT/include/lean/lean.h
panicked at main.rs:52:36:
Unable to generate bindings: Codegen(Serialize { msg: "Cannot serialize type kind Opaque", loc: "builtin definitions" })
There is an error due to that bindgen cannot handle _Atomic(int)
types. This
could change in the future (issue on
github). Fortunately the
list of offending types and functions is rather small, so we can conveniently
blocklist them during generation. We need the experimental
flag to be turned
on for bindgen
.
# Cargo.toml
bindgen = { version = "0.69.4", features = ["experimental"] }
Then we call wrap_static_fns
in build.rs
# build.rs
let bindings = bindgen::Builder::default()
.header(&input)
// Tell cargo to invalidate the built crate whenever any of the
// included header files changed.
.parse_callbacks(Box::new(bindgen::CargoCallbacks::new()))
// The following is specific to `lean.h`: There are a lot of static
// function
.wrap_static_fns(true)
// but some functions use `_Atomic(int)` types that bindgen cannot handle,
// so they are blocklisted. The types which use `_Atomic` are treated as
// opaque.
.blocklist_function("lean_get_rc_mt_addr")
.opaque_type("lean_thunk_object")
.opaque_type("lean_task_object")
.generate()
.expect("Unable to generate bindings");
bindings
.write_to_file(output_path.join("lean.rs"))
.expect("Couldn't write bindings!");
but we are not done yet! The emitted $OUT_DIR/bindgen/extern.c
(This is the
default name. Remember to change it if multiple link targets exist) has to be
compiled to a static library and linked:
# build.rs
let extern_src_path = env::temp_dir().join("bindgen").join("extern.c");
let extern_obj_path = env::temp_dir().join("bindgen").join("extern.o");
// This is the path to the static library file.
let lib_path = output_path.join("libextern.a");
// Compile the generated wrappers into an object file.
let clang_output = std::process::Command::new("clang")
.arg("-O")
.arg("-c")
.arg("-o")
.arg(&extern_obj_path)
.arg(&extern_src_path)
.arg("-include")
.arg(&input)
.arg("-I")
.arg(format!("{lean_root}/include"))
.output()
.unwrap();
if !clang_output.status.success() {
panic!(
"Could not compile object file:\n{}",
String::from_utf8_lossy(&clang_output.stderr)
);
}
// Turn the object file into a static library
#[cfg(not(target_os = "windows"))]
let lib_output = std::process::Command::new("ar")
.arg("rcs")
.arg(output_path.join(&lib_path))
.arg(&extern_obj_path)
.output()
.unwrap();
#[cfg(target_os = "windows")]
let lib_output = std::process::Command::new("LIB")
.arg(obj_path)
.arg(format!(
"/OUT:{}",
output_path.join("libextern.lib").display()
))
.output()
.unwrap();
if !lib_output.status.success() {
panic!(
"Could not emit library file:\n{}",
String::from_utf8_lossy(&lib_output.stderr)
);
}
// Statically link against the generated lib
println!("cargo:rustc-link-search={}", output_path.display());
println!("cargo:rustc-link-lib=static=extern");
with this the project can be run
$ cargo run
Lean initialization complete!
Interacting with Lean
Now that we can load Callee
, we can invoke its functions:
let c_str = std::ffi::CString::new("hello ").unwrap();
let result = unsafe {
let native_str = lean::lean_mk_string(c_str.as_ptr());
let result_obj = lean::my_function(native_str);
let result_c_str = std::ffi::CStr::from_ptr(lean::lean_string_cstr(result_obj));
let str = result_c_str.to_str().unwrap().to_string();
// Clean up the resultant objects
lean::lean_dec(result_obj);
str
};
println!("result: {result}");
Lean4 FFI Documentation describes the protocols the caller has to follow in order to not leak memory.
Constructors
Algebraic data types in Lean can be easily parsed. However, constructing algebraic data types requires great care due to the existence of pre-computed fields. Therefore I recommend passing data into Lean as primitives. A simple example is the tuple:
@[export stylize]
def stylize (s: String): (String × String × String) :=
(s.capitalize, s.decapitalize, s.toUpper)
IMPORTANT
After changing Lean code, re-compile Callee
via direnv reload
The corresponding handler in Rust is more elaborate:
mod callee {
...
#[link(name = "Callee")]
extern "C" {
...
pub fn stylize(s: lean_obj_arg) -> lean_obj_res;
}
}
fn unbox_lean_string(str: lean::b_lean_obj_arg) -> String
{
unsafe {
std::ffi::CStr::from_ptr(lean::lean_string_cstr(str))
}.to_str().expect("Failed to convert C-string to string").to_string()
}
fn call_stylize()
{
let input = std::ffi::CString::new("aBcDe").unwrap();
let (s1, s2, s3) = unsafe {
let result_obj = callee::stylize(lean::lean_mk_string(input.as_ptr()));
assert!(lean::lean_is_ctor(result_obj), "Tuple must be a ctor");
// Remember Lean tuples are nested!
assert_eq!(lean::lean_ctor_num_objs(result_obj), 2, "Tuple ctor has 3 elements");
let s1 = unbox_lean_string(lean::lean_ctor_get(result_obj, 0));
let result_obj2 = lean::lean_ctor_get(result_obj, 1);
let s2 = unbox_lean_string(lean::lean_ctor_get(result_obj2, 0));
let s3 = unbox_lean_string(lean::lean_ctor_get(result_obj2, 1));
lean::lean_dec(result_obj);
(s1, s2, s3)
};
assert_eq!(s1, "ABcDe");
assert_eq!(s2, "aBcDe");
assert_eq!(s3, "ABCDE");
}
Inductives
Inductive data structures have multiple (at most 255) constructors, and this
constructor id can be read from the pointer tag via lean_ptr_tag
. Needless to
say, writing code like this greatly reduces maintainability.
inductive Think where
| Success (s: String)
| Failure (i: Nat)
@[export think]
def think (s: String): Think :=
if s.front == 'S' then
Think.Success (s.drop 1)
else
Think.Failure s.length
mod callee {
...
pub fn think(s: lean_obj_arg) -> lean_obj_res;
}
fn call_think(s: &str) -> Result<String, usize> {
let input = std::ffi::CString::new(s).expect("Failed to convert to C-string");
unsafe {
let result_obj = callee::think(lean::lean_mk_string(input.as_ptr()));
let tag = lean::lean_ptr_tag(result_obj);
let result = if tag == 0 {
assert!(lean::lean_is_ctor(result_obj), "Think must be a ctor");
assert_eq!(
lean::lean_ctor_num_objs(result_obj),
1,
"Ctor must have 1 field"
);
let s = unbox_lean_string(lean::lean_ctor_get(result_obj, 0));
Ok(s)
} else if tag == 1 {
assert_eq!(
lean::lean_ctor_num_objs(result_obj),
1,
"Ctor must have 1 field"
);
let field = lean::lean_ctor_get(result_obj, 0);
let n = lean::lean_uint64_of_nat(field) as usize;
Err(n)
} else {
unreachable!("Invalid constructor id")
};
lean::lean_dec(result_obj);
result
}
}
fn main()
{
...
assert_eq!(call_think("Stanford"), Ok("tanford".to_string()));
assert_eq!(call_think("Else"), Err(4));
}
IO Monad
The most fundamental of the monads in Lean which can interact with the world is the IO Monad. Any function of the form
@[export sym]
def sym (a1: A1) (a2: A2) ... (an: An): IO T;
Generates a function of arity n + 1
with the last argument being the “world”
for the Monad. This world argument can be created via lean_io_mk_world
@[export print_in_upper]
def print_in_upper (s: String): IO Unit := do
IO.println s.toUpper
mod callee {
extern "C" {
...
pub fn print_in_upper(s: lean_obj_arg, world: lean_obj_arg) -> lean_obj_res;
}
}
fn call_print_in_upper()
{
let c_str = std::ffi::CString::new("caller").unwrap();
unsafe {
let native_str = lean::lean_mk_string(c_str.as_ptr());
let res = callee::print_in_upper(native_str, lean::lean_io_mk_world());
if lean::lean_io_result_is_ok(res) {
lean::lean_dec_ref(res);
} else {
lean::lean_io_result_show_error(res);
lean::lean_dec(res);
panic!("IO Monad execution failed");
}
};
}
The result of the monadic execution can be extracted via
lean_io_result_get_value
. Note that even IO Unit
has a return value which
must be handled.
Troubleshooting
Extern symbols are not found despite linking against the static library
This is related to how the linker finds symbols. If it happens that the linker finds symbols in your rust file before the static library is linked, it will complain that symbol wasn’t found. The problem could vanish when an MWE is extracted out of the project.
Example of error message:
$PROJECT/src/backend/pantograph/primitive.rs:58: undefined reference to `lean_string_cstr__leanextern'
/nix/store/p58l5qmzifl20qmjs3xfpl01f0mqlza2-binutils-2.40/bin/ld: $PROJECT/target/debug/deps/trillium-3d2f1a1053dab16e.ihm8qbmx2c1iv1r
rcgu.o: in function `trillium::backend::pantograph::primitive::initialize::{{closure}}':
$PROJECT/src/backend/pantograph/primitive.rs:44: undefined reference to `lean_io_mk_world__leanextern'
/nix/store/p58l5qmzifl20qmjs3xfpl01f0mqlza2-binutils-2.40/bin/ld: $PROJECT/src/backend/pantograph/primitive.rs:45: undefined reference
to `lean_io_result_is_ok__leanextern'
/nix/store/p58l5qmzifl20qmjs3xfpl01f0mqlza2-binutils-2.40/bin/ld: $PROJECT/src/backend/pantograph/primitive.rs:49: undefined reference
to `lean_dec__leanextern'
/nix/store/p58l5qmzifl20qmjs3xfpl01f0mqlza2-binutils-2.40/bin/ld: $PROJECT/src/backend/pantograph/primitive.rs:46: undefined reference
to `lean_dec_ref__leanextern'
clang-11: error: linker command failed with exit code 1 (use -v to see invocation)
= note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
= note: use the `-l` flag to specify native libraries to link
= note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#c
rgorustc-link-libkindname)
The solution is to force the linker to link against leanextern
(or whatever
your extern static library’s name) before the module which uses the lean FFI.
mod lean
{
// This forces cargo to link `leanextern` after this file. Otherwise linker
// errors may occur
#[link(name = "leanextern", kind = "static")]
extern "C" {
#[link_name = "lean_dec__leanextern"]
pub fn dec(obj: lean_obj_arg);
}
}
Undefined references to (some Lean function)
Ensure that all shared objects in $LEAN_ROOT/lib/lean
are linked by adding
lines such as this to build.rs
:
println!("cargo:rustc-link-lib={library_name}");
Segmentation fault
The first thing to check in segmentation faults is that the function signature is correct. Execute
$ lean $SOURCE -c ffi.c
shows the generated interfaces. There are some surprises:
- A monad such as
CoreM
contains 3 additional arguments (context, state reference, world) compared to its Lean counterpart - A generic type corresponds to an unused
lean_object*
argument which can be safely fulfilled bynullptr
.
If SIGSEGV happens during unit testing when multiple tests are running
concurrently, consider switching to Nextest rather than
cargo test
. cargo test
runs tests in individual threads which can cause race
conditions when it comes to globals.
Passing borrowed values into Lean leads to strange behaviour
At the time of writing this article, the Lean FFI manual says
Return values and
@[export]
parameters are always owned at the moment.
My code works on one thread but doesn’t work in multithreaded settings
First of all, I highly recommend writing unit tests of this form:
#[test]
fn single_thread()
{
setup();
main();
}
#[test]
fn multi_thread()
{
setup();
// loop
std::thread::spawn(move || { main(); });
}
Multithreaded FFI was a new feature in Lean 4.8.0. Each spawned thread must call
lean_initialize_thread
and lean_finalize_thread
.