Refactoring a Python module

02 mars 2009

Inspired by the P.I.T.S. presentation at the recent Software Craftsmanship 2009 conference in London, I decided to perform a refactoring of some open source code.

I browsed around for awhile and found the Pychess project, and inside that just took a file at random. It happened to be protoload.py:

import urllib, os

def splitUri (uri):
    uri = urllib.url2pathname(uri) # escape special chars
    uri = uri.strip('\r\n\x00') # remove \r\n and NULL
    return uri.split("://")

def protoopen (uri):
    """ Function for opening many things """

    try:
        return urllib.urlopen(uri)
    except (IOError, OSError):
        pass

    try:
        return open(uri)
    except (IOError, OSError):
        pass

    raise IOError, "Protocol isn't supported by pychess"

def protosave (uri, append=False):
    """ Function for saving many things """

    splitted = splitUri(uri)

    if splitted[0] == "file":
        if append:
            return file(splitted[1], "a")
        return file(splitted[1], "w")
    elif len(splitted) == 1:
        if append:
            return file(splitted[0], "a")
        return file(splitted[0], "w")

    raise IOError, "PyChess doesn't support writing to protocol"

def isWriteable (uri):
    """ true if protoopen can open uri to in write mode """

    splitted = splitUri(uri)

    if splitted[0] == "file":
        return os.access (splitted[1], os.W_OK)
    elif len(splitted) == 1:
        return os.access (splitted[0], os.W_OK)

    return False

It is small enough for a blog, yet contains something to work with.

The first thing I do is look at the method- and class names:

  • splitUrl
  • protoopen
  • protosave
  • isWriteable

By the way – I know methods are called functions in Python – I’m just a little ”C# wounded” so you will see me using both words 🙂

Since the file is called ”protoopen” I guess the protopen is the most important function, so I’ll check that out first.

The comment clearly spells out ”Function for opening many things”, and, looking through the code, it seems it is indeed returning pipe objects either from the urllib or the basic open() method of python (file system open). Here I see room for improvement: there is duplication in this method. The two try-catches are very similar, only the specific method used to open is different. So I’ll try out a loop over those two methods, and see if it is more readable.

    methods = [urllib.urlopen, open]
    for method in methods:
    	try:
    		return method(uri)
	    except (IOError, OSError):
            pass

It is quite easy to follow, in my mind easier than previously, so I’ll keep it. But, I’m not quite fond with the variable names ‘method’ and ‘methods’, a more descriptive naming would be more readable. When I’m looking for good names for things, I try to think of how the names are used. ‘method’ is used to open something, with one specific technique. ‘openers’ and ‘openup’ comes to mind and I try it out:

def protoopen (uri):
    """ Function for opening many things """

    openers = [urllib.urlopen, open]
    for openup in openers:
    	try:
    		return openup(uri)
        except (IOError, OSError):
            pass

    raise IOError, "Protocol isn't supported by pychess"

I compare this with what we had to begin with to see if it is easier to read:

def protoopen (uri):
    """ Function for opening many things """

    try:
        return urllib.urlopen(uri)
    except (IOError, OSError):
        pass

    try:
        return open(uri)
    except (IOError, OSError):
        pass

    raise IOError, "Protocol isn't supported by pychess"

I think it is easier now than what it was to begin with. A minor improvement to readibility is that in Python, you really don’t have to specify excactly which exception you want to catch, so I remove that:

	openers = [urllib.urlopen, open]
	for openup in openers:
		try:
			return openup(uri)
		except:
			pass

I try removing the openers variable altogether – since it is a concept that ‘burdens’ the method (the less concepts, the easier to grasp):

def protoopen (uri):
	""" Function for opening many things """ 

	for openup in [urllib.urlopen, open]:
		try:
			return openup(uri)
		except:
			pass 

	raise IOError, "Protocol isn't supported by pychess"

I think this is slightly better than before, and keep it. Now in Python it is etiquette to put statements following try: and except: on separate lines, but since there is only one statement in each I try out campacting the loop:

def protoopen (uri):
	""" Function for opening many things """ 

	for openup in [urllib.urlopen, open]:
		try:    return openup(uri)
		except: pass 

	raise IOError, "Protocol isn't supported by pychess"

I think it is easier to read, and our function is getting really slim, and doing just as much as it did to begin with.

One slight improvement is left, and that is the comment. ‘Many things’ is vague and we can improve that:

def protoopen (uri):
	""" Function for opening URLs and files """ 

	for openup in [urllib.urlopen, open]:
		try: 	return openup(uri)
		except:	pass 

	raise IOError, 'Protocol isn't supported by pychess'

I’m quite happy with protoopen now, and continue looking for ways to refactor this file.

The first method in protoload.py is splitUrl. Actually it seems it does more than split a URL: it also cleans it up. But I cannot see a simple way to simplify this right now, so I turn my eyes on the protosave method.

def protosave (uri, append=False):
	""" Function for saving many things """ 

	splitted = splitUri(uri) 

	if splitted[0] == "file":
		if append:
			return file(splitted[1], "a")
		return file(splitted[1], "w")
	elif len(splitted) == 1:
		if append:
			return file(splitted[0], "a")
		return file(splitted[0], "w") 

	raise IOError, 'PyChess does not support writing to protocol'

The first thing I notice is the duplication of the inner if’s: there is the append check, and the two return file()-calls, with ”a” och ”w” used. Only thing that differs is the number in the [], it is 1 in the top if, and 0 in the elif. So I refactor this a little, using an index variable I call ix:

	ix = -1
	if splitted[0] == "file":
		ix = 0
	elif len(splitted) == 1:
		ix = 1 

	if ix == -1:
		raise IOError, "PyChess doesn't support writing to protocol" 

	if append:
		return file(splitted[ix], "a")
	return file(splitted[ix], "w")

Arguably we have removed some duplication. But is it really more readable? I don’t know. I’ll give it a little more love by observing that the elif really could be an ordinary if:

	ix = -1
	if splitted[0] == "file":
		ix = 0
	if len(splitted) == 1:
		ix = 1
	if ix == -1:
		raise IOError, "PyChess doesn't support writing to protocol"
	if append:
 		return file(splitted[ix], "a")
	return file(splitted[ix], "w")

Now it just looks like a long list of if’s. It is not very intention revealing and it is harder to read IMO.

I think I’ve put myself in a dead end, so I go back to the original and try once more.

def protosave (uri, append=False):
	""" Function for saving many things """ 

	splitted = splitUri(uri) 

	if splitted[0] == "file":
		if append:
			return file(splitted[1], "a")
		return file(splitted[1], "w")
	elif len(splitted) == 1:
		if append:
			return file(splitted[0], "a")
		return file(splitted[0], "w") 

	raise IOError, 'PyChess does not support writing to protocol'

One thing I really like with Python is the ability to define functions really close to where they are used. Here we may define a funtion to do the inner work of the if’s, instead of breaking it out as in my previous attempt:

def protosave (uri, append=False):
	def dofile(path, append):
		if append:
			return file(path, "a")
		return file(path, "w") 

	splitted = splitUri(uri) 

	if splitted[0] == "file":
		return dofile(splitted[1], append)
	if len(splitted) == 1:
		return dofile(splitted[0], append) 

	raise IOError, 'PyChess does not support writing to protocol'

This already feels a lot better. Now there is a slight duplication left inside the dofile method: the file(path start of both return lines. If we decide wich letter to use beforehand, maybe we can simplify even more:

def protosave (uri, append=False):
	def dofile(path, mode):
		return file(path, mode) 

	mode = "w"
	if append: mode = "a" 

	splitted = splitUri(uri) 

	if splitted[0] == "file":
		return dofile(splitted[1], mode)
	if len(splitted) == 1:
		return dofile(splitted[0], mode) 

	raise IOError, 'PyChess does not support writing to protocol'

And what do you know — the dofile method just delegates it’s arguments to the file method, effectively deprecating itself!

def protosave (uri, append=False):
	""" Function for saving to file or URL """ 

	mode = "w"
	if append: mode = "a" 

	splitted = splitUri(uri) 

	if splitted[0] == "file":
		return file(splitted[1], mode)
	if len(splitted) == 1:
		return file(splitted[0], mode) 

	raise IOError, 'PyChess does not support writing to protocol'

Let’s compare this with the original protosave function:

def protosave (uri, append=False):
	""" Function for saving many things """ 

	splitted = splitUri(uri) 

	if splitted[0] == "file":
		if append:
			return file(splitted[1], "a")
		return file(splitted[1], "w")
	elif len(splitted) == 1:
		if append:
			return file(splitted[0], "a")
		return file(splitted[0], "w") 

	raise IOError, "PyChess doesn't support writing to protocol"

The new version looks a little less messy, since there are less nested if’s. I’ll stop there, and turn my attention to the isWriteable function.

The first thing I notice about isWriteable is that contains some similarity to protosave. They both contain the logic of the splitted uri. How can we remove that duplication?

One lesson I’ve learned is that facing these kind of things, it is often easiest to just start with the small things. The larger things will just solve themselves by themselves then. Let’s see if that idea holds in this case!

I begin with creating two new functions for the tests in the if’s. I’ll just call them caseOne- Two for now since I cannot come up with more descriptive names.

def caseOne(splitted):
	return splitted[0] == "file" 

def caseTwo(splitted):
	return len(splitted)==1 

def protosave (uri, append=False):
	""" Function for saving to file or URL """ 

	mode = "w"
	if append: mode = "a" 

	splitted = splitUri(uri) 

	if caseOne(splitted):
		return file(splitted[1], mode)
	if caseTwo(splitted):
		return file(splitted[0], mode) 

	raise IOError, 'PyChess does not support writing to protocol' 

def isWriteable (uri):
	""" Returns true if protoopen can open a write pipe to the uri """ 

	splitted = splitUri(uri) 

	if caseOne(splitted):
		return os.access (splitted[1], os.W_OK)
	if caseTwo(splitted):
		return os.access (splitted[0], os.W_OK) 

	return False

It already seems a little less duplicated. Now we’d like to break out the split,if,if structure into a separate function, so we don’t have to repeat ourselves in isWriteable, with the same logic as protosave. I’ll call it uriLogic, and to begin with it is just imaginary, that is I’m not so sure it will work at all.

def uriLogic(uri, ret1, ret2, second):
	splitted = splitUri(uri)
	if caseOne(splitted):
		return ret1(splitted[1], second)
	if caseTwo(splitted):
		return ret2(splitted[0], second)
	return None

Let’s see how it would be used in protosave:

def protosave (uri, append=False):
	""" Function for saving to file or URL """ 

	mode = "w"
	if append: mode = "a" 

	result = uriLogic(uri, file, mode) 

	if result == None:
		raise IOError, 'PyChess does not support writing to protocol'

It’s extremely short at least. Readable? Could be better. Let’s see how isWriteable treates uriLogic:

def isWriteable(uri):
	""" Returns true if protoopen can open a write pipe to the uri """ 

	result = uriLogic(uri, os.access, os.W_OK) 

	if result == None:
		return False
	else:
		return result

So we’ve really simplified isWriteable and protosave, but we’ve also added three more functions.

Since caseOne- and Two are only used inside of the uriLogic function, I’ll inline them again:

def uriLogic(uri, ret1, ret2, second):
	splitted = splitUri(uri)
	if splitted[0] == "file":
		return ret1(splitted[1], second)
	if len(splitted) == 1:
		return ret2(splitted[0], second)
	return None

.. and we’ve removed two of the three functions. I think it’s time to look at the whole file as it looks now, and compare with the original:

def splitUri (uri):
	uri = urllib.url2pathname(uri) # escape special chars
	uri = uri.strip('\r\n\x00') # remove \r\n and NULL
	return uri.split("://") 

def protoopen (uri):
	""" Function for opening URLs and files """ 

	for openup in [urllib.urlopen, open]:
		try: 	return openup(uri)
		except:	pass 

	raise IOError, "Protocol isn't supported by pychess" 

def uriLogic(uri, ret1, ret2, second):
	splitted = splitUri(uri)
	if splitted[0] == "file":
		return ret1(splitted[1], second)
	if len(splitted) == 1:
		return ret2(splitted[0], second)
	return None 

def protosave (uri, append=False):
	""" Function for saving to file or URL """ 

	mode = "w"
	if append: mode = "a" 

	result = uriLogic(uri, file, mode) 

	if result == None:
		raise IOError, 'PyChess does not support writing to protocol' 

def isWriteable(uri):
	""" Returns true if protoopen can open a write pipe to the uri """ 

	result = uriLogic(uri, os.access, os.W_OK) 

	if result == None:
		return False
	else:
		return result

There is a lot less duplication in the code. But there is one tiny little trick we could do to bring down the noise level even more: instead of returning None in the uriLogic function, we could return False:

def splitUri (uri):
	uri = urllib.url2pathname(uri) # escape special chars
	uri = uri.strip('\r\n\x00') # remove \r\n and NULL
	return uri.split("://") 

def protoopen (uri):
	""" Function for opening URLs and files """ 

	for openup in [urllib.urlopen, open]:
		try: 	return openup(uri)
		except:	pass 

	raise IOError, "Protocol isn't supported by pychess" def uriLogic(uri, ret1, ret2, second):

def uriLogic(uri, ret1, ret2, second):
	splitted = splitUri(uri)
	if splitted[0] == "file":
		return ret1(splitted[1], second)
	if len(splitted) == 1:
		return ret2(splitted[0], second)
	return None 

def protosave (uri, append=False):
	""" Function for saving to file or URL """ 

	mode = "w"
	if append: mode = "a" 

	result = uriLogic(uri, file, mode)
	if result == False:
		raise IOError, 'PyChess does not support writing to protocol'
	else:
		return result 

def isWriteable(uri):
	""" Returns true if protoopen can open a write pipe to the uri """
	return uriLogic(uri, os.access, os.W_OK)

That looks a bit slicker! Suddenly isWriteable is a single line method!

Taggar: , , , ,

Annons

%d bloggare gillar detta: