Simeon Franklin

Blog :: Ask an Expert - Interlacing Strings

2 November 2012

Apparently I answered a question for a student looking for the answer to a question on the mid-term in an MIT class... at a commenter's suggestion I took the answer down for a few days. Now that we're safely past the midterm I'm putting my thoughts back up... The original post follows:

I get random questions from the internet… I hate to be helping people with their homework or interview questions - but sometimes the questions are fun! A commenter writes:

I have to write a code in python to concatenate two strings together, by successively alternating elements of each string (starting with the first character of s1). If one string is longer than the other, then the remaining elements of the longer string should simply be added at the end of the new string. For example, if we lace abcd and efghi, we would get the new string:'aebfcgdhi'.

Ok! Let’s tackle this problem a couple of different ways. One way I might try to solve the problem is to break it into multiple simpler problems. We could solve the interlacing problem by looping over the indexes of one string and fetching the the values of both strings - but we have to account for strings of unequal length which makes the looping code a little harder to write. Why not just force both strings to be the same length? The interlacing code would be easy to write and we could just save the remainder of the longer string to append to the result.

We can do this easily with Python’s slicing support. Container types like lists and strings that support indexed access also support slicing: instead of putting a single index number between square braces we can put up to three numbers separated by colons. The first number is the start index of the slice, the second number is the (non-inclusive) stop index of the slice and the third number is the step - how many indexes we should jump each step to take our slice. The start defaults to 0, the stop defaults to the length of the list, and the step defaults to 1.

>>> x = 'abcdef'
>>> x[:4] # indexes 0, 1, 2, and 3
'abcd'
>>> x[1:] # indexes 1 through the end
'bcdef'
>>> x[:] # the whole thing - this is a common shorthand for copying an indexable
'abcdef'

We’ll need one other Python syntactical trick called tuple unpacking to assign to multiple variables at the same time. In Python if the left hand side and the right hand side of an assignment are both tuples Python can put the content of the right hand tuple into the individual variables of the left hand tuple. For example the old interview chestnut about switching the contents of two variables without an additional temporary variable can be done with tuple unpacking:

>>> x = 1
>>> y = 2
>>> (y, x) = (x, y) # the label y gets the value of x goes into
>>> x
2

With our knowledge of slicing and tuple unpacking in hand the code to get the base part and the remainder of a string should make sense:

>>> s1 = 'abcd'
>>> s2 = 'efghi'
>>> if len(s1) > len(s2):
...     (s1, remainder) = (s1[0:len(s2)], s1[len(s2):])
... elif len(s1) < len(s2):
...     (s2, remainder) = (s2[0:len(s1)], s2[len(s1):])
... else:
...     remainder = ""

Now that we have two strings with the same length plus an optional remainder we could try using an index to loop over the two strings and interleave the characters.

>>> # s1 and s2 now start out as two equal length strings
>>> chars = []
>>> for i in range(len(s1)):
...     chars.append(s1[i])
...     chars.append(s2[i])
>>> result = "".join(chars) + remainder
>>> result
'aebfcgdhi'

That works - but we might be able to write simpler code by exploring the standard library a little bit. You might be familiar with the builtin function zip which does something really similar to our problem. The docstring for zip always strikes me as unhelpful - but essentially we pass iterables representing rows to zip and get back a list of columns. If we pass two variables we’ll get back a list of two-tuples with one item from the first variable and one item from the matching index in the second variable.

>>> zip(s1, s2)
[('a', 'e'), ('b', 'f'), ('c', 'g'), ('d', 'h')]

That’s really close to what we want but zip throws away any extra values if one of the "rows" is longer than the other so we still have to break off a remainder if one string is longer than the other. Peek into the itertools module, however, and we can see the izip_longest function which is an iterator version of zip that allows us to pass a keyword argument to fill in missing values in the shorter row. I’m going to ignore the added complication of getting back an iterator for now by just converting the result to a list immediately.

>>> import itertools
>>> s1 = 'abcd'
>>> s2 = 'efghi'
>>> list(itertools.izip_longest(s1, s2, fillvalue=""))
[('a', 'e'), ('b', 'f'), ('c', 'g'), ('d', 'h'), ('', 'i')]

Be sure to poke around in the itertools module - it has all sorts of cool tools for tasks like this. Given a list of two-tuples we can combine them with a for-loop or use a list comprehension.

>>> chars = []
>>> for t in itertools.izip_longest(s1, s2, fillvalue=""):
...     chars.extend(t) # t is a two part tuple and extend adds each item in the tuple individually
>>> "".join(chars)
'aebfcgdhi'
>>> "".join([t[0] + t[1] for t in itertools.izip_longest(s1, s2, fillvalue="")])
'aebfcgdhi'

Using izip_longest and a list comprehension definitely is the shortest and most succinct version of the code and eliminates the necessity for breaking off the remainder of the longer string. It’s a pretty dense line of code, however, and if you are still learning the basics of Python syntax you can stick with the more imperative loop based code. Python is awesome because it’s a multi-paradigm language. If you want to write sophisticated and powerful functional constructs in Python you can! But Python also allows us to write conceptually simple imperative code that a beginner to Python or beginner to coding can readily understand.

Related tags: python


blog comments powered by Disqus