This Google Summer of Code project consists on developing a universal interface for Tremor plugins, allowing said library to become more modular and reduce the core set of dependencies.
This will achieve a considerably reduced core size for Tremor, meaning that the library will be faster to compile and have a smaller binary size. Most importantly, it will transform Tremor’s architecture to be completely modular, where plugins can be configured as needed and developed independently, in a language-agnostic way.
There are a few details to take into account about the proposed implementation:
Modules can be loaded into a Tremor instance either at start-time or dynamically and then used in deployments.
Plugin unloading is not to be implemented for this project.
Template projects, traits, and examples are out of scope in this
Tremor supports Linux, Max OS X and Windows, so this feature should integrate seamlessly on any of these Operating Systems as well.
Dynamic library loading usually requires usage of
unsafe, so this implementation ought to make sure their appearances do not raise safety concerns in Tremor. Thorough error handling is a must for this.
The following sections list the problems that need to be solved. These haven’t been discussed with Tremor members much, so they are subject to heavy changes and are just introductions to possible approaches.
Defining an Interface
The minimum common denominator to define binary interfaces most times is C.
However, while this would allow Tremor plugins to be written in any language,
the complexity of this project would increase tremendously. Types required for
the interface like
tremor_script::Value (currently used by
traits), error handling, and other advanced features like lifetimes or
templates would have to be transformed down to the C ABI. Not only would this be
an incredibly tedious and complicated task, but it’d also introduce a small
overhead for every function in the plugin.
|Upon further investigation, in future articles I’ve discovered that the following statement is not true. Do note that Rust to Rust FFI is completely unstable and should not be used at all for a plugin system; using the C ABI is the best way to do it. This is explained in the next articles in detail.|
Thus, the only viable solution is a Rust to Rust FFI, using
libraries for the plugins. The problem in this case is that Rust doesn’t have a
stable ABI, and it doesn’t plan to define one in the future. This means
that programs or plugins compiled with different versions of rustc won’t be
compatible at all. This could be less of a problem with alternatives like
provide unofficial stable ABIs, but they don’t follow Rust’s objectives nor are
very popular, so these libraries might end up unmaintained or just not worth it.
Interface improvements may also happen in the future, where a method might be deprecated, removed, or added. The same could happen with Tremor; different API versions wouldn’t be backwards compatible either.
Knowing this, it’s important to correctly handle version upgrades. This means that some kind of interface versioning is needed. Tremor could disallow loading plugins that would inevitably break after a version upgrade, or perhaps it could show a message informing the user that they should upgrade their plugin before the deprecations turn into actual changes. Most Plugin Development Kit interfaces include some way of providing metadata about the plugin. This may include the following fields:
Interface type: to avoid incorrectly loading a codec plugin as a pipeline operator, for example.
Other information about the module itself:
The Linux Kernel Module system for example inserts this in the
section of the binary, where the fields can be read
a key-value c-string. C extensions for Python, on the other hand, use a
with all the fields. The last option is by exporting globals with known names,
which is the most straightforward.
Dynamically Loading Objects
The next step after declaring a basic interface is somehow importing the dynamic
objects into the core of Tremor, in Rust. The main library that can help with
libloading, which seems to be somewhat
mature and well maintained. Although it does offer OS-dependent features, its
main usage is wrapped to be cross-platform, following the same behaviour on all
libloading would allow Tremor to load symbols from an external
module, like functions or variables. Here’s an
example of how it’s used.
After loading the dynamic library, the Tremor core would wrap its exported
functions so that they can be used without
unsafe and unnecessary boilerplate.
Something to consider is where the plugins are going to be loaded dynamically from, or how that can be configured. Tremor currently has quite a few ways to handle configuration via repositories and registries.
One way to approach this would be to have a directory inside
/etc/tremor/ — or any user-configurable path when launching Tremor — with all the available
plugins. Tremor would read its contents when starting and load all the plugins
it finds. It should also be possible to load new plugins on demand, so the
repository could be given the path of a library to include at runtime, for
example with the
tremor CLI tool.
It’s important to make the PDK as easy to use as possible, and to avoid breaking already existing installations of Tremor. For this, the default Tremor configuration should include the current libraries in the default dynamic library location.
The PDK issue establishes the following components of Tremor as the main deliverables:
After defining its interfaces and implementing the PDK architecture, these features would be turned into plugins, which would allow them to be developed independently, and thus wouldn’t have to be in the main tremor-runtime repository. For simplicity, though, it might be best to keep them there. Or at least group them up in less repositories to avoid having too many. Some components are quite simple and it might not be necessary to create a new repository for them, with a copy of the CI, README… But this is up to the core contributors of Tremor, after all.
The PDK proposal also lists a few optional targets, which can be implemented if there’s remaining time after the main goals are fully finished:
Implement connectors RFC ( pre-requirement for connector plugins )
Contribute to and finalize tremor-rs/tremor-rfcs#32
Add source, sink, and peering connectors to pluggable artefacts
Add a PDK TCK ( test compatibility kit ) that asserts plugin invariants and provides testing mechanisms for plugin developers
Consider plugin documentation generation and another tooling for better developer convenience and usability
Make trickle sub-graphs a first-class modular and pluggable artefact
The most likely to be implemented of these is the fifth, as documentation is important for this new breaking feature. It also looks like the easiest one, or at least seemingly more flexible, considering there most likely won’t be that much extra time after the main goals, if any.
The “development tooling” part would also be inevitably developed as the
project progresses, since I’ll need them anyway to move the existing Implementations
to the PDK. Said resources could be contained in a separate
crate, with all kinds of utilities to make plugin development easier, including
traits or even procedural macros if necessary, which are a very interesting part
of Rust, and I’m looking forward to on working on as well, and
I’ve already done in the past.
I do not plan on giving a very specific and tight timeline because it’s still really early, so the following are rough estimates and are subject to modifications. I’ll also include an extra week for possible delays, or otherwise for work towards the enhancements to the initial target, so that the established 175 hours of work by Google are fully covered. This is expected to happen over 10 weeks, which means about 17.5 hours of work per week. Depending on my speed of development this might increase to up to around 20 hours per week so that the proposed requirements can be fulfilled.
I will be in contact with the Tremor team at all times during the development process. I’ll also make a detailed blog post after this is finished, and possibly smaller ones after finishing the more important goals of the project.
13th April to 17th May: Application Review
I don’t have experience with Tremor itself, since I’ve discovered it thanks to the GSoC, but I plan on contributing at least an exec offramp soon to get myself familiarized with the codebase.
I will do more research about the theory needed for this project: dynamic shared object libraries, and specifically in Rust (What libraries can I use? How unsafe is it? How stable is it?).
Research more about libraries like
abi_stable_cratesand evaluate if said method to increase of compatibility for plugins is actually worth it.
I will take a look at how other libraries implement this. I consider it vital to know about how this has been done in the past in order to avoid their failures and improve their solutions rather than starting from scratch.
17th May to 7th June: Community Bonding
Here I will try to get smaller prototypes of PDKs working, which can later be extended for Tremor, and with which I could discuss with the Tremor team.
Plan how the development will work in detail and structure my research and ideas in a single place — perhaps a blog post.
7th June to 16th August: Coding
As the code is written, documentation and the tests also will. Tests are a great method to make sure a feature really works while developing it, and a solid way to move on to another feature when coding is to sum it up with documentation before forgetting more about its details; I consider it a bad idea to forget these points until the very end.
There are five main objectives proposed for the initial target, to be distributed in 9 weeks. Some will take more effort to implement, so here’s an estimate:
Implementing the plugin-loading architecture into Tremor: weeks 1 to 2.
Configuration of the PDK in Tremor’s repositories/registry: week 3.
Defining the main component interfaces: weeks 4 to 5.
Implementing all of Tremor’s components as plugins (pipeline operators, codecs, preprocessors, postprocessors and modules/functions): weeks 6 to 9.
|I expect to make less progress until around the 15th of June, since I will be on finals until that day, and it will be harder to keep up with both at the same time. This means that the work will most likely not be evenly distributed; some weeks I’ll have more time than others, so I’ll make more progress in these to even it out.|
I’m Mario Ortiz Manero, a Computer Science student at the University of Zaragoza, Spain. I’m currently finishing my third year. Thanks to the university, I’m mostly experienced with Python, C and C++, but I’ve also been interested in Rust since 2020’s summer, when I took a deep dive and learned it on my own. I would love to have an opportunity where I can contribute to a big project with mentorship to sharpen my skills.
So far I’ve been interested in Software Development, but I recently learned more about Distributed and Concurrent Systems, which has really caught my attention. Tremor seems to be involved in this as well, which makes me excited to collaborate with them. I’m a long time open source contributor, mostly for projects of my own, but also to help other communities I’m passionate about, as I love the community and its ideals it represents:
The project I’m most proud of is Vidify, a set of programs to automatically reproduce music videos for whatever music is playing on a device.
I’m currently a maintainer of rspotify, the most popular Spotify Web API bindings in Rust.
You can contact me at marioortizmanero at gmail dot com, or via Discord as Glow#5433.