Type bounds for standard library types in Rust

Catalin September 17, 2023 #Traits #Standard Library

Intro

I often hear complaints that Rust code is hard to read. The following is an example of this:

impl<I, U> Iterator for Flatten<I>
where
    I: Iterator<Item: IntoIterator<IntoIter = U, Item = U::Item>>,
    U: Iterator,
{
    type Item = U::Item;
    //...
}

What's going on here in the trait bounds for this implementation?

I'm going to show that the conditions here are not as complex as they appear, and I'm also going to explain some of the reasoning behind the standard library design for this. We are also going to cover two more advanced topics around traits: associated types and blanket implementations. As a bonus, this will also explain the relationship between the Iterator and IntoIterator traits - which are very important for designing APIs / libraries.

Check out the video

If you don't feel like reading:

Iterator

Let's start with the first trait involved in the definition: Iterator

trait Iterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

This trait allows us to process collections by iterating over their elements. It has multiple associated methods, but the only one that you need to implement is next. All the other methods come with default implementations by using next. This is only a small part of them.

// Provided methods
    fn next_chunk<const N: usize>(
        &mut self
    ) -> Result<[Self::Item; N], IntoIter<Self::Item, N>>
       where Self: Sized { ... }
    fn size_hint(&self) -> (usize, Option<usize>) { ... }
    fn count(self) -> usize
       where Self: Sized { ... }
    fn last(self) -> Option<Self::Item>
       where Self: Sized { ... }
    fn advance_by(&mut self, n: usize) -> Result<(), NonZeroUsize> { ... }
    fn nth(&mut self, n: usize) -> Option<Self::Item> { ... }
    fn step_by(self, step: usize) -> StepBy<Self> ⓘ
       where Self: Sized { ... }
    fn chain<U>(self, other: U) -> Chain<Self, <U as IntoIterator>::IntoIter> ⓘ
       where Self: Sized,
             U: IntoIterator<Item = Self::Item> { ... }
    fn zip<U>(self, other: U) -> Zip<Self, <U as IntoIterator>::IntoIter> ⓘ
       where Self: Sized,
             U: IntoIterator { ... }
    //...

You can overwrite these implementations if you want, but you get them for free after providing an implementation for next.

The next method will return the upcoming Item in the collection, wrapped in an Option. As you can see in the method signature, we're returning the Item type of Self. Self is used here to denote the type of the struct that will implement Iterator.

Item is declared above using the "type Item" declaration. This alias is what we call an associated type.

Now, you may be thinking: "Why do we need this special syntax, isn't this the same as using generics? Can't we just do it like this?:

trait Iterator<Item> {}

And yes, we can always choose to use generic types instead of associated types, but the effect is different in terms of API design. Associated types will force a single concrete type for each implementing struct, while generics will allow any number of types for a given implementing struct.

To illustrate this, if the Iterator trait would use generic syntax, when I implement it for a collection of Strings I could define an implementation that returns String as the Item (which is fine), but I could also define one that returns a number, dates, and so on... which is not what we want as a library designer.

impl Iterator<String> for MyCollection<String> {}

impl Iterator<u32> for MyCollection<String> {}

impl Iterator<DateTime> for MyCollection<String> {}

//...

Since I know that there is only one type that makes sense as the return Item, I can limit users of my library to only defining one type per implementation by using by associated types. This is why we say that associated types have a "grouping" effect - If we were to use a generic instead, we would get a whole family of possible implementations - as we saw in the previous example.

In other words, for a given set of generic trait params, the associated types of the trait will be constant. Yes! what I said means that we can use generics together with associated types, as we'll see with the upcoming examples.

I said that the associated type can only have a single concrete value for each implementing type. What happens if I add another impl block trying to redefine it?

Here's the struct definition for which we're implementing Iterator:

struct Something {}  
  
impl Iterator for Something {  
	type Item = i32;  
  
	fn next(&mut self) -> Option<Self::Item> {  
		Some(42)  
	}  
}  

//doesn't compile
impl Iterator for Something {  
	type Item = String;  
  
	fn next(&mut self) -> Option<Self::Item> {  
		Some("forty-two".to_string())  
	}  
}

After adding the second block that tries to redefine Item, we'll get the following compilation error:

conflicting implementations of trait std::iter::Iterator for type Something

If we now introduce a generic parameter T, we can define different items for Something<T> without the error. This is because Something<u8> is an essentially different type from Something<u16>.

struct Something<T> {}  
  
impl Iterator for Something<u8> {  
	type Item = i32;  
  
	fn next(&mut self) -> Option<Self::Item> {  
		Some(42)  
	}  
}  
  
impl Iterator for Something<u16> {  
	type Item = String;  
  
	fn next(&mut self) -> Option<Self::Item> {  
		Some("forty-two".to_string())  
	}  
}  

//doesn't compile
impl Iterator for Something<u16> {  
	type Item = u32;  
  
	fn next(&mut self) -> Option<Self::Item> {  
		Some(42)  
	}  
}

But, after the third block is added using the same generic <u16> param - I'll get the error again. It doesn't even matter if I change Item or not as the compiler knows that there should be only one implementation definition for Something <u16>

conflicting implementations of trait std::iter::Iterator for type Something<u16>

IntoIterator

Okay, let's look at the next trait we need: IntoIterator

pub trait IntoIterator {
    type Item;
    type IntoIter: Iterator<Item = Self::Item>;

    // Required method
    fn into_iter(self) -> Self::IntoIter;
}

This is a trait that we use to obtain an Iterator from other types. IntoIterator won't handle iteration, but it will provide an Iterator to us. The types which will usually implement IntoIterator will be collection types. Let's look at the associated types in the definition - we've seen Item, what about IntoIter?

IntoIter is the associated type that marks the kind of Iterator that IntoIterator should return when the required method is called. The trait bound is logical - since it is the return type of conversion method IntoIter needs to be something that implements Iterator.

But now we have something weird in the angle brackets. This means that the Item of the IntoIter associated type needs to be the same as the Item on Self (which is the type that will implement IntoIterator.

You might ask, why do we need this Item inside IntoIterator - can't we just use the associated Item of the associated IntoIter type? And you're right, type Item is completely unnecessary here. You could always just refer to Item of IntoIter, but for generic implementations (which are used a lot), this would add an extra generic type and an extra where clause. Here's an example.

Before the change, you would have to do this:

fn print_numbers<C, I>(collection: C)
where
    C: IntoIterator<IntoIter = I>,
    I: Iterator<Item = i32>,
{  
	let mut iter = collection.into_iter();  
	while let Some(elem) = iter.next() {  
		println!("{}", elem);  
	}  
}

After the Item type was added to IntoIterator as well, we can write it like this:

fn print_numbers<C>(collection: C)  
where  
    C: IntoIterator<Item = i32>,  
{  
    let mut iter = collection.into_iter();  
    while let Some(elem) = iter.next() {  
        println!("{}", elem);  
	}  
}

This gets rid of a generic type and a line in the where clause.

In the implementation, we're calling .into_iter() on the collection and then calling .next() on the resulting iterator as long as we get a Some variant as a result. When we get a None, iteration stops.

As you may know, the implementation can be simplified to this:

for elem in collection {
    println!("{}", elem);
}

The for loop is just syntactic sugar over what we were doing before - converting to an iterator and calling .next() on it until it returns a None variant.

You may also have observed that where clauses can be added to normal functions as well - not just for trait definitions or implementations.

Coming back to IntoIterator's design - the standard library team decided to add the Item to it, to simplify common where clauses and generic types. So, while the Item wasn't necessary for IntoIterator to work, since it was added to the definition, now it's mandatory to define when implementing IntoIterator.

For maximum flexibility, a lot of utility methods work with IntoIterator so you don't have to manually convert to an Iterator before calling them. But what about cases where I already have an Iterator? - how do I call those methods then?

Fortunately, all Iterators implement IntoIterator by default How was this achieved? - by using some Rust magic called "blanket implementations". Here we have the implementation block for this:

impl<I: Iterator> IntoIterator for I {
    type Item = I::Item;
    type IntoIter = I;

    #[inline]
    fn into_iter(self) -> I {
        self
    }
}

Since IntoIterator just has to return an Iterator, Iterators trivially implement this by returning themselves. One important thing to note here is that this implementation is for any generic type I that implements Iterator. This means that any newly created type that implements Iterator will also implement IntoIterator because of this. This is the power of blanket implementations - We can have implementations for types before we even write them!

Flatten

Now that we've wrapped this up, let's go back to the implementation definition at the start

impl<I, U> Iterator for Flatten<I>

We're implementing Iterator for the Flatten struct. Flatten is an abstraction over a one-level nested Iterator structure - an iterator that returns collections that can also be iterated on. We have a 2d structure that we want to treat like a 1d structure. Normally, we would have to iterate over each separate collection in the nested structure manually. The Flatten abstracts the job of handling each collection on it's own, allowing us to iterate over the structure as if it were a regular iterator

Let's look at the definition of the struct:

pub struct Flatten<I>
where
    I: Iterator,
    <I as Iterator>::Item: IntoIterator,
    { /* private fields */ }

Here, you can already see the descriptive power of Rust's type system. Flatten is generic over I where I has to implement Iterator and the associated Item of that Iterator needs to Implement IntoIterator. This essentially means that Flatten works for any Iterator that returns Items that can be converted into Iterators.

We can get Flattens by calling the .flatten() method on any Iterator. This means the .flatten() method is part of Iterator's standard provided methods. Here's the implementation definition for the it:

fn flatten(self) -> Flatten<Self>
where
    Self: Sized,
    Self::Item: IntoIterator,
{
    Flatten::new(self)
}

The first where clause indicates that the Iterator needs to implement Sized - this is a marker trait which indicates that the type needs to have a known size at compile time.

The second clause indicates that the associated Item of the Iterator needs to implement IntoIterator

Let's see how the Flatten is used actually used:

let nested: Vec<Vec<i32>> = vec![vec![1, 2, 3], vec![7, 9]];  
let simple = vec![1, 2, 3];  
  
let flatten = nested.iter().flatten();  
  
for &val in flatten {  
println!("{}", val);  
}  

//doesn't compile
let flatten_fail = simple.iter().flatten();

Here we have a nested vector - a Vec containing two Vecs of number, and a simple one containing numbers. We can get an iterator from the nested Vec and call .flatten() on it. Now, we can just iterate over the Flatten that we created.

If I try to call .flatten() with the simple vector, I'll get a compilation error - because this does not satisfy the type bounds for flatten. The item of the obtained Iterator needs to implement IntoIterator, where here, it's just a reference to a number.

If Flatten allows us to iterate over it, this means that Iterator is implemented for Flatten - and that's exactly the definition we saw at the beginning:

impl<I, U> Iterator for Flatten<I>
where
    I: Iterator<Item: IntoIterator<IntoIter = U, Item = U::Item>>,
    U: Iterator,
{
    type Item = U::Item;

Before you try to run this on your own, I have to warn you - this kind of syntax is unstable, so you won't be able to use it when writing your own code - it's not available on the Rust nightly toolchain either. Rust library designers can use unstable features as long as they are enclosed within other features that are marked as stable in the library. So.... don't try this at home! I'll show you how you'd write this on your own with stable syntax after I explain what the current syntax is doing.

This implementation is generic over I and U. I here marks the outer wrapping iterator. The bound specifies that the Item of this outer Iterator has to be IntoIterator (or in other words, has to be a collection that can be converted to an Iterator).

We mark the associated IntoIter type as being the generic type Uin the definition, and the Item is defined as the Item of U (which is IntoIter). As explained before, this relationship is already implicit - the Item of IntoIterator has to be the Item of the associated IntoIter type - this is specified (and restricted) in the IntoIterator definition, but we need to specify it here again because at this time the Rust compiler doesn't treat the bounds from the trait definition as also implicit for the implementation when using this kind of syntax.

There's a similar limitation for the stable syntax as well (though it applies somewhat differently) The language team has been looking at making the compiler recognize these implicit bounds so you don't have to repeat them, but some of the implementation details are not easy to resolve.

This is also the case for U: Iterator in the second line of the where clause - it's also specified in the trait definition. In other words, this clause would not be needed if the implied bounds feature were available. If we use the stable syntax for the implementation, we can get rid of the U type, but the Item definition gets more complicated:

impl<I> Iterator for Flatten<O>  
where 
	I: Iterator,  
	<I as Iterator>::Item: IntoIterator
{  
	type Item = <I::Item as IntoIterator>::Item;
	
	fn next(&mut self) -> Option<Self::Item> {  
		//....  
	}  
}

This represents that the Item of the Iterator (in this case the type of the inner collections) needs to implement IntoIterator (this is exactly as in the Flatten struct definition)

Here we're defining that the Item of the Flatten is the Item associated with the outer Iterator's Item. We use the "as" syntax here to make it clear to the compiler on which trait definition we're accessing the associated type Item. Right now, the compiler needs this type qualification because the associated name (Item in this case) might be defined on multiple traits that the generic type I can implement - as with this case I implements Iterator and IntoIterator, both with associated Items. In this particular situation, the Items happen to be identical, but in case of a conflict, the compiler wouldn't know which to pick.

Phew... that was quite a bit - hope that you're more comfortable reading trait bounds and that you've got a better grasp of how associated types and generic implementations are used.