Abstract

This Google Summer of Code project consists on developing a universal interface for Tremor plugins[1], allowing said library to become more modular and reduce the core set of dependencies.

This will achieve a considerably reduced core size for Tremor, meaning that the library will be faster to compile and have a smaller binary size. Most importantly, it will transform Tremor’s architecture to be completely modular, where plugins can be configured as needed and developed independently, in a language-agnostic way.

Implementation Plan

There are a few details to take into account about the proposed implementation:

  • Modules can be loaded into a Tremor instance either at start-time or dynamically and then used in deployments.

  • Plugin unloading is not to be implemented for this project.

  • Template projects, traits, and examples are out of scope in this

  • Tremor supports Linux, Max OS X and Windows, so this feature should integrate seamlessly on any of these Operating Systems as well.

  • Dynamic library loading usually requires usage of unsafe, so this implementation ought to make sure their appearances do not raise safety concerns in Tremor. Thorough error handling is a must for this.

The following sections list the problems that need to be solved. These haven’t been discussed with Tremor members much, so they are subject to heavy changes and are just introductions to possible approaches.

Defining an Interface

The minimum common denominator to define binary interfaces most times is C. However, while this would allow Tremor plugins to be written in any language, the complexity of this project would increase tremendously. Types required for the interface like tremor_script::Value (currently used by some traits), error handling, and other advanced features like lifetimes or templates would have to be transformed down to the C ABI. Not only would this be an incredibly tedious and complicated task, but it’d also introduce a small overhead for every function in the plugin.

Thus, the only viable solution is a Rust to Rust FFI, using dylib libraries for the plugins. The problem in this case is that Rust doesn’t have a stable ABI, and it doesn’t plan to define one in the future[2]. This means that programs or plugins compiled with different versions of rustc won’t be compatible at all. This could be less of a problem with alternatives like abi_stable_crates, which provide unofficial stable ABIs, but they don’t follow Rust’s objectives nor are very popular, so these libraries might end up unmaintained or just not worth it.

Interface improvements may also happen in the future, where a method might be deprecated, removed, or added. The same could happen with Tremor; different API versions wouldn’t be backwards compatible either.

Knowing this, it’s important to correctly handle version upgrades. This means that some kind of interface versioning is needed. Tremor could disallow loading plugins that would inevitably break after a version upgrade, or perhaps it could show a message informing the user that they should upgrade their plugin before the deprecations turn into actual changes. Most Plugin Development Kit interfaces include some way of providing metadata about the plugin[3]. This may include the following fields:

  • Versioning:

    • Interface version

    • Tremor version

    • Rustc version

    • Core version

  • Interface type: to avoid incorrectly loading a codec plugin as a pipeline operator, for example.

  • Other information about the module itself:

    • Version

    • Author

    • License

    • Description

The Linux Kernel Module system for example inserts this in the .modinfo section of the binary, where the fields can be read as a key-value c-string. C extensions for Python, on the other hand, use a PyModuleDef struct with all the fields. The last option is by exporting globals with known names, which is the most straightforward.

Dynamically Loading Objects

The next step after declaring a basic interface is somehow importing the dynamic objects into the core of Tremor, in Rust. The main library that can help with this is libloading, which seems to be somewhat mature and well maintained. Although it does offer OS-dependent features, its main usage is wrapped to be cross-platform, following the same behaviour on all platforms. libloading would allow Tremor to load symbols from an external module, like functions or variables. Here’s an example of how it’s used. After loading the dynamic library, the Tremor core would wrap its exported functions so that they can be used without unsafe and unnecessary boilerplate.

Configuration

Something to consider is where the plugins are going to be loaded dynamically from, or how that can be configured. Tremor currently has quite a few ways to handle configuration via repositories and registries.

One way to approach this would be to have a directory inside /etc/tremor/ — or any user-configurable path when launching Tremor — with all the available plugins. Tremor would read its contents when starting and load all the plugins it finds. It should also be possible to load new plugins on demand, so the repository could be given the path of a library to include at runtime, for example with the tremor CLI tool.

It’s important to make the PDK as easy to use as possible, and to avoid breaking already existing installations of Tremor. For this, the default Tremor configuration should include the current libraries in the default dynamic library location.

Implementations

The PDK issue establishes the following components of Tremor as the main deliverables:

  1. Pipeline Operators

  2. Codecs

  3. Preprocessors

  4. Postprocessors

  5. Modules/Functions

After defining its interfaces and implementing the PDK architecture, these features would be turned into plugins, which would allow them to be developed independently, and thus wouldn’t have to be in the main tremor-runtime repository. For simplicity, though, it might be best to keep them there — or at least group them up in less repositories — to avoid a large number of repos and thus a bigger maintenance cost. Some components are quite simple and it might not be necessary to create a new repository for them, with a copy of the CI, README…​ But this is up to the core contributors of Tremor, after all.

Enhancements

The PDK proposal also lists a few optional targets, which can be implemented if there’s remaining time after the main goals are fully finished:

  1. Implement connectors RFC ( pre-requirement for connector plugins )

  2. Contribute to and finalize tremor-rs/tremor-rfcs#32

  3. Add source, sink, and peering connectors to pluggable artefacts

  4. Add a PDK TCK ( test compatibility kit ) that asserts plugin invariants and provides testing mechanisms for plugin developers

  5. Consider plugin documentation generation and another tooling for better developer convenience and usability

  6. Make trickle sub-graphs a first-class modular and pluggable artefact

The most likely to be implemented of these is the fifth, as documentation is important for this new breaking feature and it looks like the easiest of them, or at least seemingly more flexible, considering there most likely won’t be that much extra time after the main goals, if any.

The “development tooling” part would also be inevitably developed as the project progresses, since I’ll need them anyway to move the existing Implementations to the PDK. Said resources could be contained in a separate tremor_plugin crate, with all kinds of utilities to make plugin development easier, including traits or even procedural macros if necessary, which are a very interesting part of Rust and I’m looking forward to on working on as well, and I’ve already done in the past.

Proposal Timeline

I do not plan on giving a very specific and tight timeline because it’s still really early, so the following are rough estimates and are subject to modifications. I’ll also include an extra week for possible delays, or otherwise for work towards the enhancements to the initial target, so that the established 175 hours of work by Google are fully covered. This is expected to happen over 10 weeks, which means about 17.5 hours of work per week. Depending on my speed of development this might increase to up to around 20 hours per week so that the proposed requirements can be fulfilled.

I will be in contact with the Tremor team at all times during the development process. I’ll also make a detailed blog post after this is finished, and possibly smaller ones after finishing the more important goals of the project.

13th April to 17th May: Application Review

  • I don’t have experience with Tremor itself, since I’ve discovered it thanks to the GSoC, but I plan on contributing at least an exec offramp soon to get myself familiarized with the codebase.

  • I will do more research about the theory needed for this project: dynamic shared object libraries, and specifically in Rust (What libraries can I use? How unsafe is it? How stable is it?).

  • Research more about libraries like abi_stable_crates and evaluate if said method to increase of compatibility for plugins is actually worth it.

  • I will take a look at how other libraries implement this. I consider it vital to know about how this has been done in the past in order to avoid their failures and improve their solutions rather than starting from scratch.

17th May to 7th June: Community Bonding

  • Here I will try to get smaller prototypes of PDKs working, which can later be extended for Tremor, and with which I could discuss with the Tremor team.

  • Plan how the development will work in detail and structure my research and ideas in a single place — perhaps a blog post.

7th June to 16th August: Coding

As the code is written, documentation and the tests also will. Tests are a great method to make sure a feature really works while developing it, and a solid way to move on to another feature when coding is to sum it up with documentation before forgetting more about its details; I consider it a bad idea to forget these points until the very end.

There are five main objectives proposed for the initial target, to be distributed in 9 weeks. Some will take more effort to implement, so here’s an estimate:

  • Implementing the plugin-loading architecture into Tremor: weeks 1 to 2.

  • Configuration of the PDK in Tremor’s repositories/registry: week 3.

  • Defining the main component interfaces: weeks 4 to 5.

  • Implementing all of Tremor’s components as plugins (pipeline operators, codecs, preprocessors, postprocessors and modules/functions): weeks 6 to 9.

Note
I expect to make less progress until around the 15th of June, since I will be on finals until that day, and it will be harder to keep up with both at the same time. This means that the work will most likely not be evenly distributed; some weeks I’ll have more time than others, so I’ll make more progress in these to even it out.

About Myself

I’m Mario Ortiz Manero, a Computer Science student at the University of Zaragoza, Spain. I’m currently finishing my third year. Thanks to the university, I’m mostly experienced with Python, C and C++, but I’ve also been interested in Rust since 2020’s summer, when I took a deep dive and learned it on my own. I would love to have an opportunity where I can contribute to a big project with mentorship to sharpen my skills.

So far I’ve been interested in Software Development, but I recently learned more about Distributed and Concurrent Systems, which has really caught my attention. Tremor seems to be involved in this as well, which makes me excited to collaborate with them. I’m a long time open source contributor, mostly for projects of my own, but also to help other communities I’m passionate about, as I love the community and its ideals it represents:

You can contact me at marioortizmanero at gmail dot com, or via Discord as Glow#5433.


1. As described in detail in its issue on Tremor’s repository or its RFC.
3. More details on this post: Plugins in Rust