Rust iterator adapters for handling ownership

Catalin May 01, 2023 #Iterators

Intro

If you've passed the beginner stages of learning Rust, you've probably come across the suggestion to replace your dusty old for-loops with iterators. This is sound advice for multiple reasons.

Firstly, this is more succinct and will make the code more semantically representative of its intent. This approach also leaves less room for errors, as the author will not use additional variables for managing the loop. The cherry on top is that this has virtually no performance cost over using a plain old for-loop.

For one familiar with functional operators from other languages, this shouldn't be too hard, but Rust's ownership mechanism might make this slightly more difficult than expected.

Let's discuss some gotcha's using the following example:

We have a sequence of bytes, and we want to collect segments of it into various fields on a type. The idiomatic way to handle this would be to iterate, use the take adapter, and then collect to the owned structure we want. The initial trivial problem we'll encounter is that the iterator gives us references (wrapped in a Some()), while we need owned types for the collection.

1 fn get_segment(input: Vec<u8>) {
2     //does not compile
3     let segment: Vec<u8> = input.iter()
4                                 .take(4)
5                                 .collect();
6     // ........
7 }

Check out the video

If you don't want to read:

The 'copied' and 'cloned' iterator adapters

The naive way to handle this would be to call .map(|elem| *elem) trying to dereference. This will work, but if you check with Clippy, you'll get a warning, as this is not the idiomatic way to handle this scenario.

Clippy recommends using the .copied() adapter, which will make the iterator copy all its elements.

1 fn get_segment(input: Vec<u8>) {
2     let segment: Vec<u8> = input.iter()
3                                 .copied()
4                                 .take(4)
5                                 .collect();
6     // ........
7 }

Now, the .copied() adapter works only for Copy types and has the same effect as the map() call discussed previously. What would happen, instead for `Clone' types?

Right off the bat, the .map() call wouldn't just return a warning, but result in a compile error, because the dereference would result in a move of the borrowed value. The .map() statement could be fixed by changing the deref with a call to .clone().

1 #[derive(Clone)]
2 struct Thing {
3     id: u32,
4     name: String
5 }
6 
7 fn get_thing_segment(input: Vec<Thing>) {
8     let segment: Vec<Thing> = input.iter()
9                                    .map(|thing| thing.clone())
10                                    .take(4)
11                                    .collect();
12     // ........
13 }

The idiomatic way to do it is to use the .cloned() adapter on the iterators, which does the same things as above.

1 // --- struct code ---
2 
3 fn get_thing_segment(input: Vec<Thing>) {
4     let segment: Vec<Thing> = input.iter()
5                                    .take(4)
6                                    .cloned()
7                                    .collect();
8     // ........
9 }

The copied adapter will not compile if you use it on a Clone type. On the other hand, using the cloned adapter on a Copy type will just give a linter warning - as the clone will essentially just do a copy, but it's better to be explicit about it.

Note: .copied() and .cloned() can be placed after .iter() or .take() since both of the return types implement the Iterator trait. I'm not sure if it's better to use the .cloned call immediately after .iter. If the compiler doesn't optimize cloning only the necessary elements, it might make sense to use .cloned after take, in order to reduce the number of clones.

The 'by_ref' adapter

Getting past this, we might encounter a second issue - if we try to use .take() to collect multiple groups from the iterators, the compiler won't let us. This is because the take call consumes the iterator. In other words, ownership is moved, and the initial reference to the iterator is no longer valid for the following calls.

If we want to avoid consuming the iterator with the take call we can use the by_ref adapter after the .iter() call like so:

1 fn get_segments(input: Vec<u8>) {
2     let mut byte_iter = input.iter().copied();
3 
4     let first_segment: Vec<u8> = byte_iter.by_ref()
5                                           .take(4)
6                                           .collect();
7 
8     let second_segment: Vec<u8> = byte_iter.by_ref()
9                                            .take(4)
10                                            .collect();
11     // ........
12 }

In this manner, we can take multiple groups from the iterator without compiler errors. Keep in mind that the by_ref for the second segment isn't strictly necessary unless we want to do an additional operation with the iterator afterward.

1	fn get_segment(input: Vec<u8>) {
2	//does not compile
3	let segment: Vec<u8> = input.iter()
4	.take(4)
5	.collect();
6	// ........
7	}

1	fn get_segment(input: Vec<u8>) {
2	let segment: Vec<u8> = input.iter()
3	.copied()
4	.take(4)
5	.collect();
6	// ........
7	}

1	#[derive(Clone)]
2	struct Thing {
3	id: u32,
4	name: String
5	}
6
7	fn get_thing_segment(input: Vec<Thing>) {
8	let segment: Vec<Thing> = input.iter()
9	.map(\|thing\| thing.clone())
10	.take(4)
11	.collect();
12	// ........
13	}

1	// --- struct code ---
2
3	fn get_thing_segment(input: Vec<Thing>) {
4	let segment: Vec<Thing> = input.iter()
5	.take(4)
6	.cloned()
7	.collect();
8	// ........
9	}

1	fn get_segments(input: Vec<u8>) {
2	let mut byte_iter = input.iter().copied();
3
4	let first_segment: Vec<u8> = byte_iter.by_ref()
5	.take(4)
6	.collect();
7
8	let second_segment: Vec<u8> = byte_iter.by_ref()
9	.take(4)
10	.collect();
11	// ........
12	}