Python iterators are cool. And very useful. Often, you use them to perform operations on existing data without creating copies of those data.
An iterator is a Python object that represents a stream of data. This means it allows you to go from one item from a data source to the next. You’ll see a couple of short examples soon to get you started.
However, an iterator is dependent on the data from which you created it. So what happens if you create two or more iterators from the same data? The iterators are all dependant on the same data, so are they dependent on each other, too? This article will explore the topic of iterator independence. Sounds weird and abstract? Lots of things are at first. I’ll go through why this is relevant and interesting in this article.
What’s the context? I often like to dive randomly into documentation pages, find a function that I’ve not seen or used before, and explore it. Some hobby I have, you’re thinking. I came across
itertools.tee(), which had a curious name. Is this the golfing term tee or the letter ‘T’, or perhaps the tea you drink spelt incorrectly? Who knows? (It’s actually the second option, I later found out, but by now, I was sufficiently intrigued to explore further.)
This led me to
itertools.islice() and to the issue of when iterators are dependent of each other, or not. This article will look at these two
itertools functions, too.
Looking At Python Iterators
Start with a list of numbers:
numbers = [1, 3, 5, 7, 9]
A list is an iterable. This means you can iterate through it. Even simpler, it means you can use it in a
for loop as the object that comes after the
in keyword in the
for loop statement.
When you use a list in a
for loop, its iterator is created. Whereas the list contains all the data, the iterator made from it only represents the stream of data and doesn’t store all the items. The iterator created from a list relies on the data stored in the list.
Let’s see why this matters:
>>> numbers = [1, 3, 5, 7, 9] >>> numbers_iter = iter(numbers) # Create the iterator >>> next(numbers_iter) 1 >>> next(numbers_iter) 3
Each time you call
next() on the iterator, you get the next value from the data the iterator is based on. But, before you call
next() again to get the value that’s next in line, try removing a value from the original list:
# Same session as previous code snippet >>> numbers.remove(1) >>> numbers [3, 5, 7, 9] >>> next(numbers_iter) 7
You’ve removed the value
1 from the list which was the first item in the list. Note that the iterator had already gone through this value. You might expect that removing the first element from the list will not have any effect on the iterator which has already gone past that point.
You can see from the result of
next() that that’s not what happens. The call to
next() doesn’t return
7. When you removed the first element from the list, all the other elements fell down by one place. And when you call
next() on the iterator, the number
5 is skipped.
You can look at another example using
>>> first_names = ["Isaac", "René", "Christiaan", "Pierre", "Gottfried"] >>> last_names = ["Newton", "Descartes", "Huygens", "de Fermat", "Leibniz"] >>> full_names = zip(first_names, last_names) >>> next(full_names) ('Isaac', 'Newton') >>> next(full_names) ('René', 'Descartes') # Now, remove an item from first_names >>> first_names.remove("Isaac") >>> next(full_names) ('Pierre', 'Huygens')
zip(), you get an iterator which uses the data from the lists
last_names. The data are not duplicated, saving memory. The first two calls of
next(full_names) give you the result you expect. But you then remove the name
first_names and the next call to
next() creates a hybrid 17th century scientist called Pierre Huygens!
Let’s now fast forward to my serendipitous encounter with
So I read the docs for
itertools.tee(). The function “returns n independent iterators from a single iterable”. OK, seems simple enough, no? You’ve read about iterables and iterators above. The documentation goes on to show code that’s equivalent to what
tee() does. This is one of those instances where the Python docs weren’t enough for me to say “Ah, great, it’s all very clear now.”
So I googled a bit more and found lots of dry examples showing how
tee() works in a four-line-code-snippet-type example. They show what
itertools.tee() does. But they don’t shed any light on why you’d want to use it and when.
Luckily, it didn’t take long to find David Amos’s RealPython article. Finally, some sense. Do read this overview of
itertools through lots of great examples. But first, finish reading this article, of course!
The boring bit
So, I’m now contractually obliged to give you one of those dry examples that shows you what
itertools.tee() does, but nothing else. Don’t worry. Better examples are coming later on!
>>> import itertools >>> original_generator = (number for number in range(10)) >>> original_generator <generator object <genexpr> at 0x7fec3027a4a0> >>> first, second, third = itertools.tee(original_generator, 3) >>> first <itertools._tee object at 0x7fec3028a500> >>> second <itertools._tee object at 0x7fec3028a140> >>> third <itertools._tee object at 0x7fec3028acc0>
As the documentation said,
tee() returns independent iterators from the original iterable. All three will iterate through all the items in the original iterable. The iterators returned are
_tee objects. In this case, the original iterable is a generator.
The second argument in
tee() determines how many independent iterators the function returns. Let’s check they’re independent:
# Get the first two values from `first` >>> next(first) 0 >>> next(first) 1 # Now exhaust `second` fully >>> for item in second: ... print(item) 0 1 2 3 4 5 6 7 8 9 # And get a value from `third` >>> next(third) 0
Each of the three iterators
third go through values independently from each other. When you looped through
second, the code printed all numbers from
9 even though you had already used up
third was still untouched!
Note that the three iterators are independent of each other, but they’re not independent of the original generator:
# Recreate the original generator and the three independent iterators >>> original_generator = (number for number in range(10)) >>> first, second, third = itertools.tee(original_generator, 3) # Use up the first two values from the original generator >>> next(original_generator) 0 >>> next(original_generator) 1 # The iterators from tee() start from where you've just left off! >>> next(first) 2 >>> next(second) 2
You’ll return to
tee() later to see how and when it can be useful. You’ll also revisit the issue of when generators are and aren’t independent of each other.
Let’s dive into another function in
itertools. You can create a slice in an iterable by using
itertools.islice(). This returns an iterator. The concept is similar to slicing through sequences in the normal way with the difference that the result is an iterator:
>>> import itertools >>> original_generator = (number for number in range(10)) >>> a_slice = itertools.islice(original_generator, 4, 8) >>> a_slice <itertools.islice object at 0x7fec3026d720> >>> next(a_slice) 4 >>> next(a_slice) 5 >>> next(a_slice) 6 >>> next(a_slice) 7 >>> next(a_slice) Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/code.py", line 90, in runcode exec(code, self.locals) File "<input>", line 1, in <module> StopIteration # But, be careful! >>> next(original_generator) 8
The iterator slice you created starts from the value at index
4 and goes up to, but excluding, the value at index
8. You’ve set these values using the second and third arguments in
You can see how you call
next(a_slice) four times successfully. These calls return
7. However, when you call
next(a_slice) again, you get a
StopIteration error as the
islice iterator is exhausted.
original_generator? So far, you haven’t explicitly used
original_generator except for creating the
islice. However, the result of
8. This means that
a_slice are not independent. When you advanced through
a_slice, you also advanced through
a_slice depends on
original_generator in a similar way to how, earlier in this article, the iterator you created from a list was dependant on the data in the list.
How to Shuffle a Generator in Python Without Converting Into A List
You’ve set yourself the task to shuffle the values in a generator without ever converting it into a list. This is a drill or an exercise as in real-world programs you’re likely to be better off converting into a list. But for the sake of this exercise, let’s try this challenge.
I will stick with the simple generator with numbers from
9 in this example. Of course, if you wanted a generator with random numbers from
9, you could create one directly. However, this is not the case for other generators you may have in your code. I’ll keep using this example as it’s easy to demonstrate what’s going on.
You can’t use functions such as
numpy.random.shuffle() on a generator:
>>> import random >>> original_generator = (number for number in range(10)) >>> random.shuffle(original_generator) Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/code.py", line 90, in runcode exec(code, self.locals) File "<input>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/random.py", line 359, in shuffle for i in reversed(range(1, len(x))): TypeError: object of type 'generator' has no len()
A generator doesn’t have a
__len__ attribute. Therefore, these functions can’t work. This is true for iterators in general. Whereas an iterable such as a list has a length, and iterator doesn’t.
The solution in many cases is to convert the generator into a list, shuffle the list, and then convert it back into a generator if that’s what you’d like to have:
>>> import random >>> original_generator = (number for number in range(10)) >>> numbers = list(original_generator) >>> random.shuffle(numbers) >>> numbers [3, 7, 6, 5, 2, 0, 8, 9, 1, 4] >>> new_generator = (number for number in numbers)
Often, this is a perfectly good solution. But, for the sake of this exercise, you’ve set yourself the task to avoid converting the generator into a list.
itertools and the two functions you’ve explored earlier in this article can come to the rescue.
Planning and Writing the Code
The technique you’ll use here is the following:
- Create two independent iterators from the original generator
- Choose a random index and slice the two iterators using this index so that one has the first part of the original and the other has the second part
- Yield the value at the location of the split
- Merge the remaining parts back into a single iterator and repeat the process until you’ve used up all the values in the original generator
This method is inspired by David Amos’s example in the article I mentioned in the introduction.
You can start by creating the generator you’ve already used several times in this article and define a generator function using the
yield keyword. I’ll use a script for this example rather than the console sessions I used earlier.
# shuffle_generators.py n = 10 original_generator = (number for number in range(n)) def randomise_generator(original, length): while True: yield new_generator = randomise_generator(original_generator, n) for number in new_generator: print(number)
The generator function
None forever for the time being. You’ll fix this soon.
You’ve also written code to create a new generator from the generator function
randomise_generator() and test it by going through the new generator using a
If you run this code now, it will print
First attempt: Just using
Let’s try to use
itertools.islice() directly on the original generator first. Spoiler alert: this won’t work. But let’s see why:
# shuffle_generators.py import itertools import random n = 10 original_generator = (number for number in range(n)) def randomise_generator(original, length): while True: idx = random.randint(0, length - 1) first_part = itertools.islice(original, idx) second_part = itertools.islice(original, idx, None) yield next(second_part) original = itertools.chain(first_part, second_part) length -= 1 if length == 0: return new_generator = randomise_generator(original_generator, n) for number in new_generator: print(number)
You’re first choosing a random index where you’ll split your generator. Next, you use this index to create two iterator slices from the original generator. Note that when you use
islice() with two arguments, the second argument is the
stop parameter and the
start defaults to index
first_part is a slice from the beginning of the original generator up to, but excluding, the value with index
When you call
islice() with three arguments, the second and third are the
stop parameters. If the third is
None, the slice goes to the end.
Next, you yield the first value of
second_part. This is the value just after the point where you split the generator into two.
yield statement, you put the two remaining parts together again using
itertools.chain(). The plan is to merge the remaining parts of the original iterator minus the one value you’ve removed already.
You decrease the value of
1 to account for the element you’ve removed and yielded already and put in a condition to end the generator function when there are no more elements left.
You run this code, and you get this:
0 4 9 Traceback (most recent call last): File "<file_path>", line 15, in randomise_generator yield next(second_part) StopIteration The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<file_path>", line 23, in <module> for number in new_generator: RuntimeError: generator raised StopIteration
Both the values and the number of outputs you’ll get before the error will be different each time you run this code. But you’ll always end up with the
Let’s investigate this problem by going back into the console. In this example, you’re splitting the generator at index
>>> import itertools >>> original_generator = (number for number in range(10)) >>> first_part = itertools.islice(original_generator, 6) >>> second_part = itertools.islice(original_generator, 6, None) >>> for number in first_part: ... print(number) 0 1 2 3 4 5 >>> for number in second_part: ... print(number) >>> # There's no output from the second `for` loop
You intend to create two iterator slices. One from
5 and the other from
9. The first
islice seems to be correct. When you iterate through
first_part, you get the expected numbers.
However, when you iterate through
second_part you get no output. The iterator
second_part is empty.
You can check if the same thing happens if you use
first_part. Remember you’ll need to recreate the original generator and the slices each time:
>>> original_generator = (number for number in range(10)) >>> first_part = itertools.islice(original_generator, 6) >>> second_part = itertools.islice(original_generator, 6, None) >>> for number in second_part: ... print(number) 6 7 8 9 >>> for number in first_part: ... print(number) >>> # Again, no output from the second loop
This time, it’s
first_part that’s empty. This is because the iterator slices are not independent of the original generator. When you exhaust an iterator slice, you’re also using up the original generator. You’ve seen this issue earlier in this article when you first read about
tee() to the rescue
This is where
itertools.tee() comes in useful. This function creates two independent iterators from an iterable. The independence is the important part here!
To be able to shuffle a generator in Python, you can update the code to include
# shuffle_generators.py import itertools import random n = 10 original_generator = (number for number in range(n)) def randomise_generator(original, length): while True: idx = random.randint(0, length - 1) first_iter, second_iter = itertools.tee(original, 2) first_part = itertools.islice(first_iter, idx) second_part = itertools.islice(second_iter, idx, None) yield next(second_part) original = itertools.chain(first_part, second_part) length -= 1 if length == 0: return new_generator = randomise_generator(original_generator, n) for number in new_generator: print(number)
First, you create
itertools.tee(). Both iterators go through all the elements of the original generator, but they’re independent of each other.
Next, you create iterator slices from
second_iter. You no longer have the problem you encountered in the previous section as these are independent iterators now.
You can verify this in the console:
>>> import itertools >>> original_generator = (number for number in range(10)) >>> first_iter, second_iter = itertools.tee(original_generator, 2) >>> first_part = itertools.islice(first_iter, 6) >>> second_part = itertools.islice(second_iter, 6, None) >>> for number in first_part: ... print(number) 0 1 2 3 4 5 >>> for number in second_part: ... print(number) 6 7 8 9
In this example,
first_part goes from
second_part goes from
9. Independence problem solved!
You can run the
shuffle_generators.py script now. You’ll verify that
new_generator is a generator which has all the values in
original_generator, but they’ve been shuffled:
5 8 6 7 1 0 2 3 9 4
This way of shuffling a generator is not very efficient, so if you ever need to shuffle a generator, you’re better off converting to a list first!
In this article, you’ve explored a bit how Python iterators work and how they are dependent on the original data they’re created from. You’ve also looked at two functions which create iterators from the
When using iterators, keep in mind that the iterators depend on the data you’ve created them from. This information could save you hours looking for hard-to-find bugs in some cases when you’re using iterators!
You may also like the article on stacks, queues, and deques
The Python Coding Stack
Regular articles for the intermediate Python programmer or a beginner who wants to “read ahead”
3 thoughts on “Beware Python Iterators That Are Not Independent”
Thank you for your article on shuffling a generator.
Unfortunately, this is a very bad implemenation that used far more memory than when just converting the generator into a list. That’s because you are actually building many iterators. The easiest way to verify this is by setting n to a reasonable but not extreme value, like 100000. You will either see Puthon crash or just stall.
Performance wise it is also much worse than copying the generator into a list.
Finally, the fact that you have to know the length in advance makes this code rather useless. How would you be able to know, if you read from a file, for instance.
I think you should remove this article altogether as it is misleading and might lead to people think they should use tee. In fact, tee is nearly always a code smell!
I am open for any discussion about this subject.
This is a good point. What I find most interesting is the dependence or independence of iterators. I may well update the article to focus more on this aspect…
The article has changed from when originally published. It now focusses more on the nature of iterators and their dependence on the data they’re created from.