Generators and Iterators in Julia and Python

Generators are used a lot in Python to implement iterators. Both Julia and Python implement list comprehensions with generators. Rather than writing say [x*x for x in 1:4], we can put expression inside the list comprehension inside a separate object:

julia> g = (x*x for x in 1:4)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##27#28"))}(getfield(Main, Symbol("##27#28"))(), 1:4)

Which you can see results in a Generator object in Julia and a generator object in python:

python> g = (x*x for x in range(1,5))
python> g
<generator object <genexpr> at 0x10bdeef48>

While seemingly similar, they are quite different. Python generators are based on coroutines, while Julia generators are built on top of iterators.

Which brings us to to iterators. We need to understand their role and implementation in both languages.

Python Iterators

class Point:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

This code will allow us to write:

python> p = Point(3, 8, 2)

However if we want to create an iterator for it, we need to add this method as well:

def __iter__(self):
return PointIterator(self)

That allows us to iterate over our point in an for-loop. And just for convenience will add this method as well. It gives a string representation of points in the REPL.

def __repr__(self):
return f"Point({self.x}, {self.y}, {self.z})"

The real meat of the functionality we are building is in the iterator class we return.

class PointIterator:
def __init__(self, point):
self.point = point
self.coords = ['z', 'y', 'x']

def __next__(self):
if self.coords:
return getattr(self.point, self.coords.pop())
else:
raise StopIteration

The way this works is that we keep a list of the coordinates x, y, and z which we want to access in turn as we iterate over the point.

__next__ is called upon each iteration, where we pop one coordinate from the list of coordinates to fetch. With getattr we can get the attributes of Python objects by name.

With this implementation we can play around with iterators:

python> p = Point(2, 3, 8)
python> for a in p:
print(a)

2
3
8

python> it = iter(p)
python> next(it)
2
python> next(it)
3
python> next(it)
8

Let us compare this with how iterators are made in Julia before getting into generators.

Julia Iterators

First let us make the Point type. There is no need to implement the repr function to show the type. It has an automatic representation.

struct Point
x::Int
y::Int
z::Int
end

It also has a default constructor, so we don’t have to implement that either. A for loop in Julia looks like this:

for i in iter   # or  "for i = iter"
# body
end

Julia translates this to the following code:

next = iterate(iter)
while next !== nothing
(i, state) = next
# body
next = iterate(iter, state)
end

So to iterate over the coordinates of a point, we need to implement two version of iterate. In Julia terminology these are methods, but that can be confusing to people with an object oriented background. Methods are all the concrete implementations of a function in Julia. So in Julia terminology, we are implementing two methods of the iterate function.

iterate(p::Point) = p.x, [:y, :z]

function iterate(p::Point, coords)
if isempty(coords)
nothing
else
getfield(p, coords[1]), coords[2:end]
end
end

So there are some similarities with __next__ in Python. Python talks about attributes of an object, while Julia talks of fields of an object. Thus getattr and getfield basically do the same thing. getattr refers to attributes by string names, while Julia uses symbols. Symbols are quite similar to strings. If you want a better understanding how why Julia, often uses Symbols instead of strings, there is a great answer here. "foobar" is a string while :foobar is a symbol.

Another important difference is that Julia does not mutate any data while iterating. Look at this example of using enumerate in Python:

python> it = enumerate(["one", "two"])
python> list(it)
[(0, 'one'), (1, 'two')]
python> list(it)
[]

You can see that the iterator gets spent. You cannot keep creating lists from the iterator. However in Julia you can, because the state of iteration is kept separate from the iterator itself:

julia> it = enumerate(["one", "two"])
Base.Iterators.Enumerate{Array{String,1}}(["one", "two"])

julia> collect(it)
2-element Array{Tuple{Int64,String},1}:
(1, "one")
(2, "two")

julia> collect(it)
2-element Array{Tuple{Int64,String},1}:
(1, "one")
(2, "two")

Using Python Generators to Implement Iterators

class Point:
def __init__(self, x, y, z):
self.x = x
self.y = y
self.z = z

def __repr__(self):
return f"({self.x}, {self.y}, {self.z})"

def __iter__(self):
def gen():
yield self.x
yield self.y
yield self.z
return gen()

You see with this approach we don’t have to implement a PointIterator class at all. yield is different from return in that when run in the context of a coroutine, which generators do, it will "remember" where it was in the function when it exited. A coroutine will re-enter the function where it left off. That is very practical when dealing with iteration. Let me give another example illustrating that.

def generator():
for i in range(0, 6):
yield chr(ord('A') + i)

Let me show you how to use it in the REPL to produce letters from A to F.

python> g = generator()
python> list(g)
['A', 'B', 'C', 'D', 'E', 'F']
python> g = generator()
python> next(g)
'A'
python> next(g)
'B'
python> next(g)
'C'

The neat thing here is that you don’t need write code maintaining info about where you are in the for loop. The generator keeps track of that for you.

Python Style Generator in Julia

function generator()
Channel() do channel
for i in 0:5
put!(channel, Char('A' + i))
end
end
end

This will not return a Generator object, like the comprehension, but a Channel object:

julia> g = generator()
Channel{Any}(sz_max:0,sz_curr:1)

We can however use this channel much like a regular iterator. E.g. we can collect the values and form an array, like we can with an iterator.

julia> collect(g)
6-element Array{Any,1}:
'A'
'B'
'C'
'D'
'E'
'F'

Or we can use the unpack syntax to create an array.

julia> g = generator()
julia> [g...]
6-element Array{Char,1}:
'A'
'B'
'C'
'D'
'E'
'F'

Unpacking the chars returned as an argument to string, for concatenate all the cars into a string.

julia> g = generator()
julia> string(g...)
"ABCDEF"

Getting into the Details about Tasks and Channels

Here what is going on is more spelled out:

function generator()
channel = Channel(2)
function make_letters()
for i in 0:5
put!(channel, Char('A' + i))
end
end
task = Task(make_letters)
schedule(task)
channel
end

You can think of coroutines a bit like threads. Like threads, you can have many of them, and you can communicate between them with different data structures. Channels are used to send data between coroutines. However unlike threads only one coroutine ever runs. You cannot have multiple coroutines running on different CPU cores. They work like cooperative multitasking. The scheduler switches to another coroutine, because the current one explicitly give up control.

So for instance when I write:

put!(channel, Char('A' + i))

A letter will be put into channel and the function return immediately if the channel has a buffer. We specified our channel as having a buffer of 2.

Once the buffer is exhausted our current coroutine will block, and the scheduler will pick a coroutine (task) waiting for data from this particular channel.

task = Task(make_letters)

This will create a task, which is how we keep track of different coroutines. Every coroutine needs a task. The task keeps track of the execution state.

However at this point, our coroutine is not actually running. The for-loop is not putting letters into the channel. To get it to run, we need to schedule the task.

schedule(task)

This puts the task in the scheduler queue. Whenever the system is idle, it will look in its scheduler queue for tasks to run. The immediate result of this is that our task gets run, which causes ‘A’ and ‘B’ to be put into the channel. But when the loop tries to put in ‘C’ it has run out of buffer and it blocks.

The scheduler will then look for another task to run. I believe we can regard the REPL itself as being a main coroutine which the scheduler can switch to. So say I saved the channel returned from generator, we can do this:

julia> channel = generator()
julia> take!(ch)
'A'

After that the REPL goes into idle waiting for your input. The scheduler then has a chance to pick up our old make_letters task and continue running it. One letter has been removed from the buffer, so it now has the chance to put in 'C' into the channel.

To avoid the boilerplate of setting up coroutine tasks and running them, there are a lot of shortcuts and macros to help us. Here is one alternative:

function generator()
channel = Channel(2)
@async begin
for i in 0:5
put!(channel, Char('A' + i))
end
end
channel
end

The @async macro takes care of creating a function, wrapping it in a Task and the scheduling that task. It will return the task object, but we don't need to store it for anything.

The Channel constructor can take either an integer giving its capacity or a function object. In the latter case it will wrap the provided function object in a Task and schedule it.

function generator()
function make_letters(channel::Channel)
for i in 0:5
put!(channel, Char('A' + i))
end
end
Channel(make_letters)
end

That will probably make it more obvious how the initial example worked, where we utilized the do syntax. Remember the do syntax is similar to:

foobar(x -> stuff(x), y, z)
foobar(y, z) do x
stuff(x)
end

This is kind of the stuff that goes under the hood in Python generators. You can actually do similar things in Python to deal with concurrency. I am going to explore that in a later story.

Written by

Geek dad, living in Oslo, Norway with passion for UX, Julia programming, science, teaching, reading and writing.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store