Previously in this series, I covered how the plugin system could be implemented from scratch. This is a lot of work if you’re dealing with a relatively large codebase and therefore a complex interface in your plugin system, so let’s see how we can make our lives easier. I’ve been wanting to try abi_stable
for this since the beginning, which was specifically created for plugins. But we aren’t really locked to that crate, so I’ll show other alternatives as well, which can even be combined to your liking.
Handy tools for our Plugin System
Async with the C ABI
In a previous post I mentioned that async was not supported in abi_stable
. While this is true, because there is no FFI-safe Future
in the crate, it’s certainly possible, and it might be of interest later on.
Matthias recently let me know about the async_ffi
crate, which lets us do exactly that. It exports the type FfiFuture<T>
, which provides the same functionality as Box<dyn Future<Output = T> + Send>
:
// This is how regular async works: the first function is practically equivalent
// to the second.
async fn example() -> String {
read_file().await
}
fn example() -> impl Future<Output = String> {
async {
read_file().await
}
}
// For FFI-safe interfaces there can't be generics involved, so the future is a
// concrete type instead of a trait. This conversion from `Future` to
// `FfiFuture` can be done with `into_ffi`.
fn example() -> FfiFuture<String> {
async move {
read_file().await
}
.into_ffi()
}
// `FfiFuture<T>` implements `Future<Output = T>`, so it can be awaited as usual
async fn user() {
example().await
}
Someone asked for this feature in abi_stable
back in 2019, but noone seemed interested enough to implement it at that time, so maybe in the future.
LCCC
The Lightning Creations Compiler Collection provides a set of frontends and backends with a uniform intermediate representation for multiple programming languages, including Rust.
This means that they’ve written their own standard library with the C ABI, which is exactly what we need. It’s much simpler than Rust’s standard library, but it includes the most popular types your library may use: HashMap
, Vec
, String
, Box
, etc. The source code is quite nice to read in comparison to std
, which often includes lots of procedural macros and various forms of astral magic.
It’s not too popular right now, and it’s still Work In Progress, but it serves as an example of what we’re looking for in this article. We just want to simplify our lives by having a #[repr(C)]
-compatible standard library so that we don’t have to write it ourselves. If all you need is something simple like LCCC, consider this library or a similar one.
Safer FFI
If you don’t like any of the solutions listed in this article, and you’re going to end up writing the plugin interfaces by hand, you might be interested in safer_ffi
.
All this crate provides is a set of procedural macros to make FFI interfacing an easier and safer task. With it, you’ll be able to get rid of lots of extern "C"
and unsafe
instances in your code, which can get out of hands in larger codebases. Its documentation is excellent, you can check out its book for more information.
CGlue
In my last post, I was brought up the cglue
crate by its own creator. It takes a very interesting approach, achieving ABI stability through opaque types.
An opaque type is simply one for which you don’t know its concrete layout. There’s no #[repr(C)]
needed at all, because one can only interact with it via void pointers and its associated vtables.
cglue
’s README showcases the following snippet of code, and the repo even includes an example of a plugin system.
use cglue::*;
// One annotation for the trait.
#[cglue_trait]
pub trait InfoPrinter {
fn print_info(&self);
}
struct Info {
value: usize
}
impl InfoPrinter for Info {
fn print_info(&self) {
println!("Info struct: {}", self.value);
}
}
fn use_info_printer(printer: &impl InfoPrinter) {
println!("Printing info:");
printer.print_info();
}
fn main() -> () {
let mut info = Info {
value: 5
};
// Here, the object is fully opaque, and is FFI and ABI safe.
let obj = trait_obj!(&mut info as InfoPrinter);
use_info_printer(&obj);
}
cglue
is limited to just generating FFI-safe trait objects, trying to make the whole process as straightforward as possible. You could say that cglue
covers just a subset of what abi_stable
does, because most of this is also available in abi_stable
through the sabi_trait
procedural macro, which I’ll explain later. It’s possible to combine both crates, which is something cglue
plans to do in the future. cglue
offers the following benefits over sabi_trait
1:
- It’s possible to generate bindings for C/C++, which means that plugins can be written in languages other than Rust.
- You can define trait groups, even with optional traits.
Neither of these are particularly useful for my use-case, but if any of these features interests you, definitely take a deeper look. It’s actively maintained and constantly being improved; the documentation is great and the author frequently uploads updates to his personal blog.
Miri
Miri is an interpreter for Rust’s mid-level intermediate representation. This doesn’t help us with the plugin system per se, but since it’s very likely that we’re going to end up writing unsafe code, it’s good to know about it. That’s exactly what Miri is used for: detecting undefined behavior, such as using uninitialized data or use-after-frees.
I was going to use Miri from the beginning, but since I’ll be using abi_stable
for now, there will be no unsafe code involved. If I end up having to resort to it, I’ll try to add Miri to Tremor’s workflow (mainly their Continuous Integration).
cbindgen
For the first steps with dynamic loading I think the C/C++ binding generator cbindgen
will help us understand what’s going on under the hood. We can take a look at the generated headers and see how it works internally. Unfortunately, it fails to run for the abi_stable
crate:
(...)
WARN: Skip abi_stable::CONST - (...)
thread 'main' panicked at 'RResult has 2 params but is being instantiated with 1 values', src/bindgen/ir/enumeration.rs:596:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
This probably has to do with the following warning found in cbindgen
’s documentation:
NOTE: A major limitation of cbindgen is that it does not understand Rust’s module system or namespacing. This means that if cbindgen sees that it needs the definition for
MyType
and there exists two things in your project with the type nameMyType
, it won’t know what to do. Currently, cbindgen’s behaviour is unspecified if this happens. However, this may be ok if they have different cfgs.
If you’re using something else like cglue
, this will work without issues. But after letting the maintainers of abi_stable
know about this in an issue, they pointed out that this was expected and that they don’t plan on supporting cbindgen
because it would take too much effort. Understandable, so let’s move on.
Working with abi_stable
I will personally use abi_stable
because it seems like the easiest choice for now, and the one that meets my needs best. Not only does it provide a standard library defined with the C ABI, but also lots of other macros and utilities specially useful for plugin systems. With it, I won’t need a line of unsafe, and I’ll avoid reinventing the wheel in many instances.
Once the plugin system is fully functional with abi_stable
, I might consider using something more hand-crafted. This switch won’t be too complicated, since our interface will already be #[repr(C)]
, which is the most troublesome part. All we’d have to do is remove a few procedural macros, switch the abi_stable
types, and load the plugins manually with something like libloading
. The only thing I want right now is a plugin system that works, and then we can maybe focus on trying to make it available in other languages, making it more performant, or whatever.
So let’s start comparing abi_stable
with my experiments in the previous post using raw dynamic linking. I’ve created the abi-stable-simple
directory in the pdk-experiments repository. I’ll be taking a look at the already implemented examples for abi_stable
in order to make the learning experience smoother. The base structure for a plugin system with abi_stable
is the same as always: a crate for the plugin, another for the runtime, and common
, with the shared interface.
Versioning
abi_stable
states this regarding versioning:
This library ensures that the loaded libraries are safe to use through these mechanisms:
- The abi_stable ABI of the library is checked, Each
0.y.0
version andx.0.0
version of abi_stable defines its own ABI which is incompatible with previous versions.- Types are recursively checked when the dynamic library is loaded, before any function can be called.
In summary, abi_stable
itself is far from being permanently backward compatible, but it automatically makes sure that its versions are compatible when running the plugin. While it doesn’t exactly stick to semantic versioning, it’s good enough for us.
The version checking for the entire common
crate is already implemented, i.e., we can’t try to mix different versions that aren’t compatible. We could still add a version string for each kind of plugin if more fine-grained control is needed, as described in the previous post.
Loading plugins
abi_stable
plugins are structured in modules, which can help us split up our functionality into smaller independent pieces. There must always be a root module that initializes the entire library and provides metadata such as the name or the version strings. Then, we can have submodules to organize the functions exported by the library nicely.
Furthermore, the StableAbi
trait in abi_stable
indicates that a type is FFI-safe. It contains information about the layout of the type, and it can be derived automatically. Each item in abi_stable
’s standard library (RStr
, RSlice<T>
, RArc<T>
, etc) implements this trait, and it’s used to make sure the types are compatible when loading the plugin.
This also introduces the concept of prefix types. When a type derives StableAbi
and has the #[sabi(kind(Prefix(...)))]
attribute, two more types are generated:
<name>_Prefix
, which contains all the fields up to the#[sabi(last_prefix_field)]
attribute in the original type.<name>_Ref
, which is a pointer to<name>_Prefix
that can actually be passed through the FFI barrier safely.
Prefix types are needed to guarantee some kind of individual versioning to avoid breakage in future patches. It will let us add more fields to the module after the last_prefix_field
attribute in patch (0.0.x
) updates. Moving this attribute requires a backward-incompatible version bump. Prefix types are often used for modules and vtables.
For now, I’ll just have a single root module and call it MinMod
, exporting the min
function:
// Using the stable C ABI
#[repr(C)]
// Deriving the `StableAbi` trait, which defines the layout of the struct at
// compile-time:
// https://docs.rs/abi_stable/0.10.2/abi_stable/derive.StableAbi.html
#[derive(StableAbi)]
// Marking the struct as a prefix-type:
// https://docs.rs/abi_stable/0.10.2/abi_stable/docs/prefix_types/index.html
#[sabi(kind(Prefix))]
pub struct MinMod {
/// Initializes the state, which will be passed to the functions in this
/// module. I'll explain more about the state later on.
pub new: extern "C" fn() -> State,
/// Calculates the minimum between two integers. This is the last defined
/// field for the current version. If we try to load fields after this, all
/// of them will be an `Option`.
#[sabi(last_prefix_field)]
pub min: extern "C" fn(&mut State, i32, i32) -> i32,
}
Most of the loading functionality is already handled by abi_stable
. The module we’re exporting implements the RootModule
trait, which includes functions to load the plugin, such as RootModule::load_from_file
or RootModule::load_from_directory
:
// Marking `MinMod` as the main module in this plugin. Note that `MinMod_Ref` is
// a pointer to the prefix of `MinMod`.
impl RootModule for MinMod_Ref {
// The name of the dynamic library
const BASE_NAME: &'static str = "min";
// The name of the library for logging and similars
const NAME: &'static str = "min";
// The version of this plugin's crate
const VERSION_STRINGS: VersionStrings = package_version_strings!();
// Implements the `RootModule::root_module_statics` function, which is the
// only required implementation for the `RootModule` trait.
declare_root_module_statics!{MinMod_Ref}
}
When loading directories, it makes the following decisions by default (though we could change them if we wanted to):
- It does so non-recursively, i.e., only checking the immediate files in the given directory.
- The name of the library must be the
RootModule::BASE_NAME
in lowercase, according to the Operating System’s defaults. For example, in Linux our plugin would belibmin.so
, and on Windows it’d bemin.dll
.
This means that we should add the following parameter to the plugin’s Cargo.toml
file:
[lib]
# This way, the shared object will be saved as `abi_stable` prefers, for example
# `libmin.so`.
name = "min"
Finally, this is what the runtime may look like:
pub fn run_plugin(path: &str) -> Result<()> {
let plugin = MinMod_Ref::load_from_directory(path.as_ref())?;
println!("Loading plugin {}", MinMod_Ref::NAME);
// First we obtain the function pointer. This is not an `Option` because
// `new` is defined before `min`, the last prefix field.
let new_fn = plugin.new();
// We initialize the plugin, obtaining a state.
let mut state = new_fn();
// Same for the `min` function
let min_fn = plugin.min();
println!("initial state: {:?}", state);
println!(" min(1, 2): {}", min_fn(&mut state, 1, 2));
println!(" min(-10, 10): {}", min_fn(&mut state, -10, 10));
println!(" min(2000, 2000): {}", min_fn(&mut state, 2000, 2000));
println!("final state: {:?}", state);
Ok(())
}
Executing the plugin-sample
implementation:
$ make debug-sample
Loading plugin min
initial state: State { counter: 0 }
min(1, 2): 1
min(-10, 10): -10
min(2000, 2000): 2000
final state: State { counter: 3 }
Handling state
Regular Rust
As we saw in the previous example, we need some kind of generic State
type that each plugin can implement with their own data. In regular Rust, we’d do as follows. See the full code here.
trait State: Debug {}
// Remember that we can't use generics, so we need `dyn`, either by itself as a
// reference, or in a box.
type StateBox = Box<dyn State>;
fn usage(state: &mut StateBox) {
println!("state debug: {:?}", state);
}
Interface types
Unfortunately, we already know that regular dyn
is not FFI-safe. I covered how it’s possible to work around it with pointers, but here we can resort to abi_stable
’s safer and more convenient alternatives. Here’s one of them (see the full code here):
#[repr(C)]
#[derive(StableAbi)]
// An `InterfaceType` describes which traits are required when constructing
// `StateBox` and are then usable afterwards.
#[sabi(impl_InterfaceType(Debug, PartialEq))]
struct State;
// A trait object for `State`
type StateBox = DynTrait<'static, RBox<()>, State>;
// It can then be used easily like this
fn usage(state: &mut StateBox) {
println!("state debug: {:?}", state);
}
Here we first declare a State
interface type. Note that even though it’s defined as a struct
, this is a translation of the previous snippet of code, so it acts as the empty “trait”. But all it does is establish Debug
and PartialEq
as its supertraits and give access to them; you can’t really add custom methods to the trait.
Unlike dyn
, this even works with supertraits that aren’t object-safe. Thus, we can use something like PartialEq
. Its main disadvantage is that it’s limited to a set of 21 hardcoded traits, so it might not be enough for us.
Trait objects
If we want something more akin to traits on Rust, we can use #[sabi_trait]
. The trait has to be object-safe, and by default there’s no support for PartialEq
in the list of supertraits, so I’ll remove it. See the full code here.
#[sabi_trait]
pub trait State: Debug {
fn counter(&self) -> i32;
}
// A trait object for the `State` Trait Object
pub type StateBox = State_TO<'static, RBox<()>>;
// It can then be used easily like this
pub fn usage(state: &mut StateBox) {
println!("state debug: {:?}", state);
println!("state counter: {:?}", state.counter());
}
As its documentation explains, this still has a limited number of possible supertraits, but at least it lets us require functions as usual, and it even works with default implementations.
Error handling
abi_stable
is just a wrapper over libloading
after all. It doesn’t include a sandbox, so if the plugin developer was a malicious actor, they’d have full access to the computer the runtime is being executed on. Other popular plugin systems such as nginx’s or apache’s suffer from the same issues, for reference.
However, I think it’s not so bad to assume that no bad actors will be involved here. A sandbox would be mandatory if we were working on something like Solana (one of the main users of eBPF in Rust), which basically executes random code from the internet. But with Tremor we can assume that the plugins come from trusted sources because they’re installed and configured manually by the user.
There are some additional security measures that could be implemented in the future, like checking the integrity of the plugins and verifying they come from a trusted source before loading them. Of course, if we could afford to have a sandbox it’d definitely be the best way to do it, but we’ve already seen in this series that it’s currently not really viable for this use-case.
Still, we trust that the plugin developer has good intentions, but not necessarily that they know what they’re doing. We should make fatal errors as hard as possible to happen so that Tremor isn’t constantly crashing. The fewer pitfalls, the better.
The full source for the example that’s supported to work is here. Let’s see a few ways in which the plugin could go wrong:
Version mismatch
The versions of the common
library are checked automatically. In case there’s a mismatch in those considered incompatible (changes in x.0.0
or 0.x.0
), this is what will show up. See the full code here.
$ make debug-versionmismatch
Error when running the plugin:
(...)
Error:incompatible package versions
Expected:
0.2.0
Found:
0.1.0
We can absolutely catch this error gracefully and continue with the execution of the runtime, just like with raw dynamic loading. It’s even easier because it works out of the box.
Missing fields and wrong types
The layout of every type is recursively checked before trying to use them to make sure they are compatible. Unlike raw dynamic loading, these errors can be caught gracefully, which is a huge plus (it used to segfault). See the full code here.
$ make debug-wrongtype
Error when running the plugin:
Compared <this>:
--- Type Layout ---
type:PrefixRef<'a, MinMod>
(...)
To <other>:
--- Type Layout ---
type:PrefixRef<'a, MinMod>
(...)
0 error(s).
0 error(s)inside:
<other>
(...)
Layout of expected type:
--- Type Layout ---
type:MinMod
(...)
Layout of found type:
--- Type Layout ---
type:MinMod
(...)
(...)
The error message is way too long to show here, but it basically shows the entire layout tree of the types that don’t match for each of its versions (runtime vs plugin). For this example, I changed the State
trait to use a boolean instead of an integer counter, and the message describes it perfectly: their sizes, alignments, and types differ in the trait’s methods.
Panicking
Panicking trough the FFI boundary is undefined behaviour; we aren’t guaranteed that the plugin will abort. It may just continue its execution in a completely invalid state, which is scary. But turns out abi_stable
properly handles this for us! It will use what it calls an AbortBomb
to even print out the line and file where it happened. This is publicly available through the macro extern_fn_panic_handling
. See the full code here.
$ make debug-panic
Loading plugin min
initial state: State { counter: 0 }
thread '<unnamed>' panicked at 'This will crash everything', src/lib.rs:26:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
file:src/lib.rs
line:24
Attempted to panic across the ffi boundary.
Aborting to handle the panic...
If we panic in the plugin it won’t be undefined behaviour anymore because abi_stable
already makes sure the panic doesn’t reach the FFI boundary.
Panicking and FFI
As we’ve already seen, plugins cannot panic across the FFI boundary under any circumstance2. If we aren’t using something like abi_stable
, every single function we export in the plugin should wrap its contents in catch_unwind
in order to be able to panic.
Unwinding is a process in which all local objects are destroyed, properly calling the destructors in the thread in order to continue execution safely34. Knowing this is something taken for granted when taking a look at documentation about exceptions in Rust, but it wasn’t so clear to me at the beginning.
For example, the following snippet will panic after creating the vector. If panics were configured to abort, the contents of the vector wouldn’t be freed at all; the program would just end abruptly, and the cleaning up would be left to the Operating System. But if it unwinds, Rust will call Vec
’s destructor, freeing its allocated memory properly, making it possible to continue the execution of the program.
{
let data = vec![1, 2, 3];
panic!("oh no!");
println!("My data: {:?}", data); // Unreachable
}
In a typical usage of Rust, a panic usually means that your program writes some scary message to stdout and then ends. This is because unwinding is propagated and it may end up finishing the execution of the program if it’s not stopped. But that’s exacty what catch_unwind
is for:
let result = panic::catch_unwind(|| {
let data = vec![1, 2, 3];
panic!("oh no!");
println!("My data: {:?}", data); // Unreachable
});
// This will run just fine and print out `true`
println!("Did it panic? {}", result.is_err());
Rust makes it very clear that catch_unwind
is not intended for regular error handling (you have Result
for that). But in our case we are almost forced to use it in order to not invoke undefined behaviour when panicking through the FFI boundary. Every single function in the FFI interface that has a possibility of panicking should use it so that the panic doesn’t try to propagate. And this is quite tricky because even things like addition may cause a panic (overflow in debug mode).
Let’s see what else can we do about panicking:
Aborting
The simplest way to do it would be to just configure plugins to abort on panic instead of unwinding. This is possible with the panic = "abort" option in the plugin's
Cargo.toml`. It will still show the panic message, but the execution will be completely stopped by an abort:
$ cargo r -q
thread 'main' panicked at 'Oh no!', src/main.rs:2:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
zsh: abort (core dumped) cargo r -q
This is sound because the entire program’s execution ends before reaching the FFI boundary. The problem is that cleaning up will never happen, and that although there’s a hack you can use in your common
library to make sure the plugin is compiled with `panic = “abort”, it’s only available on nightly until this is merged:
Pluggable panic implementations (tracking issue for RFC 1513) rust-lang/rust#32837
C-unwind
This problem is something the Rust devs are aware of, and that they’re trying to fix. It has been proposed under the “C-unwind” ABI string. Just like how you currently use extern "C"
, if we used extern "C-unwind"
, we’d get more guarantees about what happens when a thread panics. More information here:
The most relevant things this feature offers us is:
- Support for unwinding through the FFI boundary.
- A guarantee that even with
extern "C"
, panicking is not undefined behavior, it’ll just abort (except for some very specific cases). Switching between “abort” and “unwind” for thepanic
option inCargo.toml
is always sound.
Unfortunately, it’s moving somewhat slowly, and I’m not quite sure when this will be ready. In the meanwhile, we’ll need to use something else to ensure no undefined behaviour occurs in our plugin system.
AbortBomb
abi_stable
does this in a pretty clever way: it creates an AbortBomb
struct at the beginning of the function, which contains its filename and line of code. If something panics and unwraps, AbortBomb
’s destructor will be called, which aborts the program. Otherwise, mem::forget
is called for the AbortBomb
at the end of the function, which will avoid calling its destructor and the function will be able to end successfully.
Note that even though mem::forget
is called, no memory is actually being leaked, because the filename is a 'static str
— which lives for the entirety of the program — and the line number is an integer, which will be in the stack and doesn’t need fancy destructors.
This approach is completely fine and works great, but it aborts the whole plugin system, so you can’t recover from it at all. In the case of Tremor, if a plugin panics, from a logical standpoint it doesn’t make much sense to continue the execution because there’s a piece missing in the pipeline. It couldn’t continue anyway… Right? Well, we could actually load the plugin that panicked again and use that instead for the remainder of the program. But since our plugin system doesn’t support unloading, we’d be leaking memory, and if the plugin keeps panicking it’d eventually crash.
Recovering from a plugin panicking is definitely viable, and it might be an interesting feature for the future. Unfortunately, it’s a lot of work to make sure it works properly, and it’s not really an objective for the first implementation, so for now I’ll just use abi_stable
’s solution.
Recovering with catch_unwind
As I explained in the beginning, catch_unwind
can be used to detect and stop unwinding panics. One way to notify the runtime that a plugin has panicked so that it can act accordingly would be to use an enum equivalent to Option<T>
:
#[repr(C)]
#[derive(Debug, StableAbi)]
pub enum MayPanic<T> {
Panic,
NoPanic(T)
}
MayPanic
is a type that only returns the original value if the function finished without panicking. Since the contents returned by catch_unwind
are just dyn Any
and don’t provide much value for us, they’re discarded and the Panic
variant is empty. The panicking information will be printed automatically as output anyway (or whatever is configured with the panic hook). We will use it in FFI contexts, so it also implements StableAbi
and it’s #[repr(C)]
.
I didn’t want to use Result
for this because panic errors should be treated differently from a regular error. Apart from the fact that panic::catch_unwind
returns a Box<dyn Any>
, which doesn’t implement Error
, panics happen when the plugin reaches an unrecoverable state and cannot continue. We really have to make sure this is handled differently from a regular error, so having the type safety of a different type can help.
It implements From<thread::Result<T>>
, so it can simply be used like this:
fn plugin_stuff() -> MayPanic<Whatever> {
panic::catch_unwind(|| {
// Code goes here
})
.into()
}
Ideally, MayPanic
could be accompanied by a #[may_panic]
procedural macro that adds this boilerplate automatically to the function it’s attached to. Additionally, it could come with a #[may_not_panic]
variant that attaches the #[no_panic]
macro from the no-panic
crate to make sure the statement is true at compile time. However, no-panic
isn’t too reliable, so perhaps it could be opt-in with something like #[may_not_panic(enforce)]
.
Something that complicates this whole thing considerably is the concept of exception safety. Unfortunately, catch_unwind
isn’t as easy to use as just slapping your code into its closure/function, as there are some types that aren’t considered unwind safe. You can read more about that here, but I won’t get into more details because we aren’t going to use MayPanic
in our own plugin system anyway.
Type conversions
It’s important to know the complexity of conversions from and to abi_stable
types. If Vec<T>
→ RVec<T>
wasn’t $O(n)$ it might be worth avoiding it altogether.
This means that I should spend at least a bit of my time on understanding how the abi_stable
types are implemented and making sure this isn’t the case. In std
, the definition of Vec
is actually quite simple if we remove most of the noise:
// A non-null pointer to `T` that indicates ownership.
pub struct Unique<T: ?Sized> {
pointer: *const T, // The data itself
_marker: PhantomData<T>, // Indicating that we own a `T`
}
// Low level type related to allocation
pub struct RawVec<T> {
ptr: Unique<T>,
cap: usize,
}
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
It’s mostly self-explanatory; a Vec<T>
is a pointer to T
with a set capacity and length. What about abi_stable
’s implementation?
#[repr(C)] // Notice this, so that it's FFI-safe
#[derive(StableAbi)] // This trait marks `RVec` as FFI-safe, with info about its layout
pub struct RVec<T> {
pub(super) buffer: *mut T,
pub(super) length: usize,
capacity: usize,
vtable: VecVTable_Ref<T>,
_marker: PhantomData<T>,
}
Yup, basically the same, but packed inside a single struct. The single difference is that we have a field with the vtable. The conversion between these types is written with a macro, but if expanded, it looks like this:
impl<T> From<Vec<T>> for RVec<T> {
fn from(this: Vec<T>) -> RVec<T> {
let mut this = std::mem::ManuallyDrop::new(this);
RVec {
vtable: VTableGetter::<T>::LIB_VTABLE,
buffer: this.as_mut_ptr(),
length: this.len(),
capacity: this.capacity(),
_marker: PhantomData,
}
}
}
The only “weird” part is the usage of std::mem::ManuallyDrop
, which is simply a wrapper that indicates Rust to not call the destructor of its contents automatically. In this case it’s basically a less error-prone std::mem::forget
, as its docs explain. Thanks to it, the memory from the Vec
won’t be dropped when this function ends, and its pointer ownership can be safely moved into RVec
, with no copying.
This happens for every type I checked in abi_stable
, including RSlice<T>
, which contains a reference to a slice, RStr
, which is just a RSlice<u8>
, and RString
, which is just a RVec
.
Thread safety
abi_stable
uses libloading
, whose error-handling is not fully thread-safe on some platforms, such as dlerror
on FreeBSD56. It’s fully thread-safe on Linux7, macOS8, and Windows9, so for Tremor specifically we don’t have to worry about this. But if your programs supports other Operating Systems, you might want to check their manuals one by one in order to make sure.
However, for the first version of our system this won’t be a problem at all. For simplicity’s sake, loading plugins after the startup will not be implemented yet, and we’ll do it sequentially. But it’s good to know it for the future.
Performance
I first tried to write these benchmarks with cargo nightly’s implementation. However, since it’s so basic, not updated regularly, and requires nightly, I moved to criterion
, which I quite liked after using it for another post.
First, we can take a look at already implemented plugin systems in order to have an idea of the performance hit we’ll experience in Tremor. This is what we should expect once our system is polished and ready for deployment:
- nginx reports 20% slower startup times and up to a 5% slowdown in their execution times10.
- This article explains that the only performance difference is saving the resolved address of the symbol in a table the first time, and then it’s just a couple more instructions to access it. Also, obviously, the fact that the compiler can’t optimize parts of the code (e.g., inline function calls).
These are the results of the benchmarks I wrote, on my not-so-fast laptop:
dynamic setup time: [652.53 ns 654.72 ns 657.34 ns]
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
abi_stable setup time: [30.386 ns 30.477 ns 30.575 ns]
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
dynamic runtime time: [1.8814 ns 1.8878 ns 1.8947 ns]
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
abi_stable runtime time: [3.2155 ns 3.2325 ns 3.2494 ns]
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild
native runtime time: [817.39 ps 819.33 ps 821.38 ps]
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
Note that the benchmarks still don’t represent a real usage of Tremor; it’s just using the plugin I described in this post with the min
function. But we can more or less analyze the performance differences between abi_stable
and raw dynamic loading — I doubt it’s worth implementing the final version with both methods just to run some benchmarks.
The loading times aren’t so important for performance because they only happen once at the beginning of the program. But abi_stable
’s way of recursively checking the types in the plugins is not free; the difference with raw dynamic loading should be quite noticeable. But somehow, in my benchmarks abi_stable
was way faster. What??
It turns out that abi_stable
just leaks the library when it’s loaded to prevent a user-after-free. And since it won’t be unloaded anyway, it’s not much of a problem in terms of leaking memory. The library will be saved into a static variable (of type LateStaticRef
), and the next times it’s loaded the initial value will be reused. So in my bencharks for abi_stable
, loading only actually happens once, and for dynamic loading it happens for every iteration.
Once the library is loaded, it seems that using dynamic loading versus static linking is quite bad, being more than twice as slow. This is understandable; the problem with the native benchmark was, and most likely still is, that the Rust compiler is too smart. If I called min
with fixed parameters — say 10.min(3)
— it was optimized away, so I had to write a more intricate example that was different for each loop. Furthermore, using tools like sabi_trait
instead of a void*
almost doubles the execution time again.
Conclusion
We’ve learned a lot about abi_stable
and the overall state of dynamic loading in Rust. We’ll definitely avoid a lot of work thanks to these dependencies. It’s not as bad as I thought; there’s plenty of tools for each use-case, though most are admittedly only in early stages.
Hopefully, the performance degradations we’ve found won’t be as noticeable in the final version of the system. We’ll use sabi_trait
only when loading the library instead of for each call. And having a more complex use-case will probably avoid such incredible optimizations in the native code. You can find the full statistical reports in the criterion-reports
directory of the repository.
In the next article, I’ll cover the different caveats I’m finding as I try to actually implement the plugin system on Tremor, and the different ways in which they can be approached.
You can leave a comment for this article on GitHub.