Python: Best Practices by John-Mark Gurney

Strings

Strings are one of the places that python programs have the most problems. this isn't because the programmer isn't doing a good job, it's simply because of the way Python handles strings.

One common code that is very wrong is:

str = 'header\n'
for i in msg.readlines():
	str = str + i

print str

This looks perfectly fine, but because Python's strings are immutable, this means that you end up copying around a lot more data than necessary. It will copy about line1 * numlines + line2 * (numlines-1) + ... + linen-1 * 2 + linen * 1 bytes of data. The better way of doing this code is:

str = ['header\n']
for i in msg.readlines():
	str.append(i)

print ''.join(str)

This significantly improves the performance since you end up only copying the string once. You already have the string for each line, just keep a reference to it, and once you're ready, combine them all in one with join.

I have written some sample code that shows you how much the performance difference is.