Subtyping and variance is a concept that works in the background, making your life easier without you knowing about it. That is, until it starts making your life harder instead. It’s a good idea to know about it, in case you end up being a fool like me. So let’s take a look at what went wrong, and how it was resolved.

The problem

As part of my Plugin System in Rust series, I was making one of Tremor’s types FFI-compatible. Put simply, instead of using types from the standard library like String, we wanted custom types defined with #[repr(C)]. The crate abi_stable exists for this exact purpose, with an equivalent for the most important types. Theoretically, the task should be as easy as changing the std types in our core enum with theirs:

// Before (simplified)
pub enum Value {
    String(String),
    Object(Box<HashMap<String, Value>>),
    Bytes(Vec<u8>),
}
// After (simplified)
use abi_stable::std_types::{RString, RVec, RBox, RHashMap};

#[repr(C)]
pub enum Value {
    String(RString),
    Object(RBox<RHashMap<RString, Value>>),
    Bytes(RVec<u8>),
}

The Value type is used a lot in the codebase, so this breaking change brought up lots of compilation errors. But for whatever reason, 70 of these errors were related to lifetimes, which I hadn’t changed at all…

Debugging

Instead of a String or Vec, we actually used Cow<'a> for performance reasons. Cow<'a> is a type that can hold either a borrowed or an owned value at runtime. The idea is to use the borrowed one as much as possible, and only if ownership or mutation is needed, the underlying value is cloned (a better explanation can be found here).

Heinz, my mentor at Tremor, managed to reproduce the issue to Cow and its equivalent, RCow. A Playground snippet can be found here.

use abi_stable::std_types::RCow;
use std::borrow::Cow;

// This works
fn cmp_cow<'a, 'b>(left: &Cow<'a, ()>, right: &Cow<'b, ()>) -> bool {
    left == right
}

// This fails to compile
fn cmp_rcow<'a, 'b>(left: &RCow<'a, ()>, right: &RCow<'b, ()>) -> bool {
    left == right
}

It failed to compile with the following error, which didn’t help much. In Rust 1.62.0, they actually improved it to explain what’s going on (shown at the end of the article):

$ cargo b
   Compiling repro v0.1.0 (/home/mario/Downloads/repro)
error[E0623]: lifetime mismatch
  --> src/lib.rs:10:10
   |
9  | fn cmp_rcow<'a, 'b>(left: &RCow<'a, ()>, right: &RCow<'b, ()>) -> bool {
   |                            ------------          ------------
   |                            |
   |                            these two types are declared with different lifetimes...
10 |     left == right
   |          ^^ ...but data from `left` flows into `right` here

For more information about this error, try `rustc --explain E0623`.
error: could not compile `repro` due to previous error

What? Why do lifetimes matter here if it’s just a comparison?

This hinted that the issue was in the underlying library, not my code. RCow is supposed to be a drop-in replacement for Cow. It also had something to do with PartialOrd, which is the trait used for == here. But I couldn’t see a difference in its implementations:

impl<'a, B: ?Sized> PartialOrd for Cow<'a, B>
where
    B: PartialOrd + ToOwned,
{
    #[inline]
    fn partial_cmp(&self, other: &Cow<'a, B>) -> Option<Ordering> {
        PartialOrd::partial_cmp(&**self, &**other)
    }
}
impl<'a, B> PartialOrd<RCow<'a, B>> for RCow<'a, B>
where
    B: PartialOrd + BorrowOwned<'a> + ?Sized,
{
    #[inline]
    fn partial_cmp(&self, other: &RCow<'a, B>) -> Option<Ordering> {
        PartialOrd::partial_cmp(&**self, &**other)
    }
}

There are more libraries providing drop-in replacements for Cow. And for example, beef managed to get it right, somehow. I wasn’t able to reproduce the issue with their version… but why?

impl<A, B, U, V> PartialOrd<beef::Cow<'_, B, V>> for beef::Cow<'_, A, U>
where
    A: Beef + ?Sized + PartialOrd<B>,
    B: Beef + ?Sized,
    U: Capacity,
    V: Capacity,
{
    #[inline]
    fn partial_cmp(&self, other: &beef::Cow<'_, B, V>) -> Option<Ordering> {
        PartialOrd::partial_cmp(self.borrow(), other.borrow())
    }
}

Some progress… or not?

Other traits like PartialEq also caused similar lifetime errors. I was able to fix some by introducing a new lifetime 'b into the trait implementation. This indicated the Rust compiler that comparing objects with different lifetimes is okay:

-impl<'a, B> PartialEq<RCow<'a, B>> for RCow<'a, B>
+impl<'a, 'b, B, C> PartialEq<RCow<'b, C>> for RCow<'a, B>
 where
     B: PartialEq + BorrowOwned<'a> + ?Sized,
+    C: BorrowOwned<'b> + ?Sized,
 {
-    fn eq(&self, other: &RCow<'a, B>) -> bool {
+    fn eq(&self, other: &RCow<'b, C>) -> bool {
         PartialEq::eq(&**self, &**other)
     }
 }

I suddenly got a bit of hope. But this could never work for Ord, which also failed. The Ord trait uses Self for the other parameter, so I can’t just introduce a new lifetime.

impl<'a, B: ?Sized> Ord for RCow<'a, B>
where
    B: Ord + BorrowOwned<'a>,
{
    #[inline]
    fn cmp(&self, other: &Self) -> Ordering {
        Ord::cmp(&**self, &**other)
    }
}

// Implementation in the standard library:
impl<B: ?Sized> Ord for Cow<'_, B>
where
    B: Ord + ToOwned,
{
    #[inline]
    fn cmp(&self, other: &Self) -> Ordering {
        Ord::cmp(&**self, &**other)
    }
}

Discovering the root cause

Some wonderful people on the Rust Discord server helped me understand what was going on. So I started learning more about the so-called “Subtyping and Variance”.

Discord discussion

This topic isn’t covered in The Rust Book. We’ll only find it in its more obscure, unsafer brother, The Rustonomicon. This book explains it incredibly well, so I won’t repeat it here. Here are some resources:

  1. “Subtyping and Variance” — The Rustonomicon (an explanation)
  2. “Subtyping and Variance” — The Rust Reference (a cheatsheet)
  3. “Covariance and contravariance” — Wikipedia (the general term)

A couple blog posts take a more practical approach, like “Rust Lifetime Subtype Variance” — Prolific K or “Diving Deep: implied bounds and variance” — lcnr.de. Or if you’re a visual learner, this video from Jon Gjengset might be best for you.

Trying to fix it

The difference between RCow and Cow was the BorrowOwned<'a> trait. For technical reasons, it was being used as a subtrait of ToOwned, and it had to bind to a lifetime 'a. Ultimately, this made RCow invariant over 'a, while Cow was covariant. We want RCow to be covariant for this to work.

 impl<B: ?Sized> Ord for Cow<'a, B>
 where
-    B: Ord + ToOwned,  // in Cow
+    B: Ord + BorrowOwned<'a>,  // in RCow
 {
     #[inline]
     fn cmp(&self, other: &Self) -> Ordering {
         Ord::cmp(&**self, &**other)
     }
 }

Attempt #1: GATs

I had an idea of using Generic Associated Types (GATs). Instead of binding the lifetime to the trait, I could do so to its associated type. Then, I’d be able to use BorrowOwned instead of BorrowOwned<'a>:

impl<T> BorrowOwned for T {
    type RBorrowed<'a> where T: 'a = &'a T;
}

But a section in the Rust Developer Book states that “traits with associated types must be invariant with respect to all of their inputs”. So that still didn’t help make our type covariant.

Note I only found that statement in the book for developers of the compiler! I opened an issue about that in The Rustonomicon, and moved on to something else.

Attempt #2: transmute

After many wasted hours, I was tempted to use transmute and call it a day. Here’s what Heinz suggested (trigger warning):

fn compare<'a, 'b>(left: &RCow<'a, str>, right: &RCow<'b, str>) -> Ordering {
    unsafe {
        let right: &RCow<'a, str> = std::mem::transmute(right);
        left.cmp(right)
    }
}

It worked! In theory, it’s safe because both 'a and 'b will live for at least as long as the function does, and we’re returning an owned type.

Ideally, we’d abstract it away by writing a wrapper around RCow with the fix. But that wouldn’t help because invariant relationships are inherited, and the wrapper’s implementation of Ord would still use BorrowOwned<'a>.

struct SCow<'a>(RCow<'a, ()>);  // will still be invariant!

One workaround would be to hide RCow under a *const (). Then, I can pointer-cast back and forth from it. But in this project, I already had too many things backfire. Traumatized, I continued looking for a solution.

Attempt #3: getting rid of BorrowOwned<'a>

The best way to not have problems with this trait is to get rid of it. The standard library has ToOwned, which links a borrowed type with its owned counterpart. For example, &str and String. If Cow<B> requires B: ToOwned, then the Cow::Borrowed variant can just hold &B and Cow::Owned can hold B::Owned.

BorrowOwned<'a> roughly did the same thing for types defined in abi_stable, such as RStr and RString:

// standard library
let x: &str = "abc";
let x_owned: String = x.to_owned();

// abi_stable
let x_ffi_safe: RStr<'_> = rstr!("abc");
let x_owned: String = x.to_owned();
let x_ffi_safe_owned: RString = x.r_to_owned();

Note that we need a lifetime in BorrowOwned because the equivalent of &'a str is RStr<'a>. Which is not exactly the same. This is because str is a Dynamically Sized Type (DST), but custom DSTs aren’t supported by Rust.

impl ToOwned for str {  // okay
    type Owned = String;
    // `&self` is `&str`
    fn to_owned(&self) -> String { ... }
}

impl ToOwned for RStr {
    type Owned = RString;
    // `&self` is `&RStr<'a>`, but we want `RStr<'a>`
    // So we can't quite use `ToOwned` here
    fn to_owned(&self) -> RString { ... }
}

So instead of establishing this relationship through a trait, we can introduce a new generic paramter O. B would be the borrowed type, and O the owned one. This is similar to what the cervine crate does, which relaxes the constraints of Cow:

// Before:
#[repr(C)]
enum RCow<'a, B>
where
    B: BorrowOwned<'a> + ?Sized,
{
    Borrowed(<B as BorrowOwned<'a>>::RBorrowed),
    Owned(<B as BorrowOwned<'a>>::ROwned),
}
// After:
#[repr(C)]
enum RCow<B, O> {
    Borrowed(B),
    Owned(O),
}

/// Ffi-safe equivalent of `Cow<'a, T>`, either a `&T` or `T`.
type RCowVal<'a, T> = RCow<&'a T, T>;
/// Ffi-safe equivalent of `Cow<'a, str>`, either an `RStr` or `RString`.
type RCowStr<'a> = RCow<RStr<'a>, RString>;
/// Ffi-safe equivalent of `Cow<'a, [T]>`, either an `RSlice` or `RVec`.
type RCowSlice<'a, T> = RCow<RSlice<'a, T>, RVec<T>>;

Without the BorrowOwned trait, our struct was now covariant over 'a, and the errors disappeared. Rodri, the author of abi_stable ended up proposing the fix that was merged. You can find a simplified version here.

Conclusion

This showcased two gaps in the language:

  1. There were no indications in the error message about the issue being related to “variance”. I had no idea what that was, and it wasn’t covered in the book.
  2. It was very hard to debug the variance of a type, given that they are implicit.

So it’s amazing to hear that starting in Rust 1.62.0, you’re even taken to the documentation. It will still be hard to understand the whole topic, but at least you know where to start!

error: lifetime may not live long enough
  --> src/main.rs:55:5
   |
54 | fn test2<'a, 'b>(left: &RCow<'a, u8>, right: &RCow<'b, u8>) -> Ordering {
   |          --  -- lifetime `'b` defined here
   |          |
   |          lifetime `'a` defined here
55 |     left.cmp(right)
   |     ^^^^^^^^^^^^^^^ argument requires that `'a` must outlive `'b`
   |
   = help: consider adding the following bound: `'a: 'b`
   = note: requirement occurs because of the type `RCow<'_, u8>`, which makes the generic argument `'_` invariant
   = note: the enum `RCow<'a, B>` is invariant over the parameter `'a`
   = help: see <https://doc.rust-lang.org/nomicon/subtyping.html> for more information about variance

I was lucky to have such a great team at Tremor, and an OSS maintainer as helpful as Rodri. You can find all the details of the discussion in the original GitHub issue:

lifetimes with R* types break compared to non R* types rodrimati1992/abi_stable_crates#75