Lazy Cartesian Products in Swift

Jul 30, 2017

Would you like to play a game?

Let’s play some battleship! Assuming a standard 10x10 board, we’ll need two collections:

let xs = 1...10
let ys = ["A", "B", "C", "D", "E", 
          "F", "G", "H", "I", "J"]

If we wanted to hit every square on the grid, we’d have to iterate over both lists in a nested loop. ¹

for x in xs {
  for y in ys {
    target(x, y)
  }
}

There’s nothing wrong with this as far as it goes. But it doesn’t do a great job of expressing what we’re actually trying to accomplish — that is, iterating over every ordered pair of ys and xs.

A more declarative way to go about our task might be to generate a new collection which is explicitly the cartesian product of the two sets, and iterate over that:

product(xs, ys).forEach { x, y in
  target(x, y)
}

Great! Only one minor problem: product doesn’t exist.

Force Majeure

Building something like product isn’t super hard. The trickiest part is getting all the generics right so it can be used with, say, a closed range just as well as with an array.

func product<X, Y>(_ xs: X, _ ys: Y) ->
  [(X.Element, Y.Element)]
  where X: Collection, Y: Collection {
    var orderedPairs: [(X.Element, Y.Element)] = []
    for x in xs {
      for y in ys {
        orderedPairs.append((x, y))
      }
    }
    return orderedPairs
 }

But when we calculate all the ordered pairs of two sets, the size of the resultant set is (as the name implies) the product of both sets. That’s no big deal when we’re talking a 10x10 grid. But if we’re dealing with 10k elements, we can benefit from a more lazy approach that stores the individual sets and generates their products on the fly, as needed.

Wall’s First Virtue

Let’s start by building an iterator:

public struct CartesianProductIterator<X, Y>: 
     IteratorProtocol where
     X: IteratorProtocol, 
     Y: Collection {

  public typealias Element = (X.Element, Y.Element)
 
  public mutating func next() -> Element? {
    //...
  }
}

Why is our generic type X an iterator but we force Y to conform to Collection? If you look at the nested loop in our naive example above, you’ll see that while we iterate over xs only once, we actually loop over ys a number of times (an xs.count number of times, to be precise).

IteratorProtocol allows us to iterate over a set exactly once, so it’s perfect for our X type. But only Collection guarantees us the ability to non-destructively traverse a sequence over and over. So Y must be a little more constrained.

Let’s add an initializer to store our iterator, collection, and related curiosities as properties:

private var xIt: X
private let yCol: Y
 
private var x: X.Element?
private var yIt: Y.Iterator

public init(xs: X, ys: Y) {
  xIt = xs
  yCol = ys
 
  x = xIt.next()
  yIt = yCol.makeIterator()
}

First note xIt is a var. Iterators mutate themselves in the course of serving up next(), so our copy of xs must be mutable.

Also, our ultimate goal here is to take values from xIt and for each of them iterate over the all the values of yCol. We prep for this by pulling the first value out of xIt into x and making an iterator for yCol called yIt.

And note x needs to be optional. We’ll ultimately iterate over xIt until we hit the end — and we’ll know we hit the end when x is nil.²

With all that settled, let’s move on to our implementation of next().

The NeXT Step

The first step of next() is simple; pull a value out of yIt each time it’s called and pair it with the same ol' x we set in the initializer (providing, of course, x isn’t nil)

public mutating func next() -> Element? {
  guard let someX = x else {
    return nil
  }
 
  guard let someY = yIt.next() else {
    return nil
  }
 	 
  return (someX, someY)
}

There. Now each call to next() returns x and whatever the next value of yIt is. But what do we do once we hit the end of yIt? We want to bump x to the next value of xIt, create a new yIt from our collection — and then do the whole thing over again.

Anytime we say to ourselves “…and then do the whole thing over again,” it’s a sign recursion is in our future.

The End is the Beginning is the End

There’s nothing magical about recursion. To do it, we just need to call a method from within the implementation of itself. We’ll do it here when we run out of values in yIt:

public mutating func next() -> Element? {
  guard let someX = x else {
    return nil
  }
 
  guard let someY = yIt.next() else {
    return next() //Recursion!
  }
 	 
  return (someX, someY)
}

But there are two things we need to pay attention to whenever we write recursive routines.

The first is making sure we don’t loop indefinitely. We’ll do that by setting conditions for terminating the recursion, and then making sure we move towards that condition with each iteration.

Our termination condition already exists. It’s that top guard. If x is ever nil, we’re out.

So all we have to do is make sure we’re moving towards a state where x is nil:

public mutating func next() -> Element? {
  guard let someX = x else {
    return nil
  }
 
  guard let someY = yIt.next() else {
	    yIt = yCol.makeIterator()
	    x = xIt.next()
	    return next()
	  }
 	 
  return (someX, someY)
}

There. Now every time we hit the end of yIt, we not only make a new one from our collection, but we also pull the next x from xIt. Eventually, xIt will run out of elements, and x will be nil. End of recursion.

Adventures in Meta

The second thing to look out for when writing recursively is blowing the stack. A stack overflow happens when you call too many functions in a row without returning.³

We can easily see how recursion might cause this to happen. Basically, when we call next() from inside next() we dig one level deeper in the stack. If calling next() in next() causes us to call next(), now we’re another level deep. Get too deep, and we error out with a busted stack.

Thankfully we can see the only time we call next() from inside next() is when yIt.next() returns nil. So the only way we can dig deeper in the stack is if yIt.next() returns nil many times in a row. And the only way that could happen is if yCol is empty.⁴

So we’ll short circuit that specific case with a guard:

public mutating func next() -> Element? {
  guard !yCol.isEmpty else {
    return nil
  }
 
  guard let someX = x else {
    return nil
  }
 
  guard let someY = yIt.next() else {
	    yIt = yCol.makeIterator()
	    x = xIt.next()
	    return next()
	  }
 	 
  return (someX, someY)
}

Free as in Function

And so, finally, we can lazily iterate over every ordered pair of our collections — all while only storing the collections themselves, not their product. But the syntax is a bit awkward:

let xs = 1...10
let ys = ["A", "B", "C", "D", "E", 
          "F", "G", "H", "I", "J"]
var it = CartesianProductIterator(
     xs: xs.makeIterator(), 
     ys: ys)
while let (x, y) = it.next() {
  target(x, y)
}

Let’s wrap this in a Sequence to get access to for...in, forEach, and the rest.

public struct CartesianProductSequence<X, Y>: 
     Sequence where 
     X: Sequence,
     Y: Collection {
  public typealias Iterator =
       CartesianProductIterator<X.Iterator, Y>
  
  private let xs: X
  private let ys: Y
  
  public init(xs: X, ys: Y) {
    self.xs = xs
    self.ys = ys
  }
  
  public func makeIterator() -> Iterator {
    return Iterator(xs: xs.makeIterator(), 
                    ys: ys)
  }
}

And as a finishing touch let’s add a a top-level function to make chaining little more readable:

public func product<X, Y>(_ xs: X, _ ys: Y) ->
     CartesianProductSequence<X, Y> where 
     X: Sequence, 
     Y: Collection {
  return CartesianProductSequence(xs: xs, ys: ys)
}

And just like that, our battleship dreams have become reality:

let xs = 1...10
let ys = ["A", "B", "C", "D", "E",
          "F", "G", "H", "I", "J"]

product(xs, ys).forEach { x, y in
  target(x, y)
}

As ever, here’s a gist of all this together in one place. Compliments of the house.

1: I know Battleship™ was invented before Descartes or something, and insists on calling out the lettered Y axis before the numbered X one. I tried a version of this post where the imaginary target(_:_:) function took its y parameter before the x. I was constitutionally incapable of publishing it.↩︎

2: Or it could be nil right now. If xIt is an iterator over an empty set, x will be nil by the end of initialization.↩︎

3: Ironically, I didn’t find the answer here particularly compelling.↩︎

4: Specifically: if yCol is empty, then every time we call next(), we’re going to immediately call return next() over and over again until we hit the end of xIt. If xIt is long, this could easily be enough to overflow the stack.↩︎