On Errors in Repeated Functions
Recently I found myself parsing several similar XML files in Python. The XML
had a deeply nested structure I wanted to get stuff out of, which means using
for entry in root.find('this').find('this2').findall('entry'): first_thing = entry.text for next_entry in entry.find('this3').findall('next_entry'): thing_i_want = next_entry.find('thing_i_want').text other_thing = next_entry.find('junk').find('other_thing').text
With this bit of code, I want to grab a list of the text in a XML tree. It's
readable, succinct, and does no error handling. Each of those
find calls can
None if their element isn't found, and because I want to parse
multiple files I can't have that. Fortunately, for the most part, I want to
handle the errors in only two ways: assign
None to the specific thing or quit
the whole parse and try again with the next one (whether that's at the file
level or at the XML branch level). The other problem is how to repeat this
error handling non-awkwardly.
One way to do handle the errors is with a
for entry in root.find('this').find('this2').findall('entry'): try: first_thing = entry.text for next_entry in entry.find('this3').findall('next_entry'): thing_i_want = next_entry.find('thing_i_want').text other_thing = next_entry.find('junk').find('other_thing').text except AttributeError: first_thing = None thing_i_want = None other_thing = None # or `continue` if the rest of the info isn't worth processing on error
The problem with this construct in this situation is that it's not granular
enough. I'd like to get the
first_thing if possible and assign
None to the
others if I can't get them. I could you multiple
try/except blocks, but it
quickly becomes very unreadable.
Another way of doing it is to test everything before we find it:
this = root.find('this') if this is not None: this2 = this.find('this2') if this2 is not None: # iteration over None just skips the loop, so no need to check here for entry in this2.findall('entry') ...
The problems with this is obvious. Look at that nesting! I'd be halfway across the screen before I got anything done! However, this pattern can be abstracted into a function.
def find_or_none(node, taglist): for tag in taglist: node = node.find(tag) if node is None: return node return node
This compressed error handling could also be implemented using exceptions. With this punted error handling, the code becomes:
this2 = find_or_none(root, ['this', 'this2']) if this2 is not None: for entry in this2.findall('entry') first_thing = entry.text next_entries = entry.find('this3') if next_entries is not None: for ...
This works, but there's one way I think it can be improved. Instead of checking
is not None, check if it
is None and
return the heck away from the error.
With this change, I get fairly clean error handling:
this2 = find_or_none(root, 'this', 'this2') if this2 is None: return None for entry in this2.findall('entry'): first_thing = entry.text this3 = entry.find('this3') if this3 is None: continue # or `break` or `return` if that's appropriate for next_entry in this3.findall('next_entry'): thing_i_want = next_entry.find('thing_i_want') if thing_i_want is not None: thing_i_want = thing_i_want.text other_thing = find_or_none(next_entry, ['junk', 'other_thing']) if other_thing is not None: other_thing = other_thing.text
It's not pretty, but it works. Other techniques in other languages for this kind of thing are C#'s null-conditional operator and Functional Programming's monadic error handling. There's also an excellent video including a functional approach to this this (and other) problems. Rust also has a monadic approach to error handling. After I read/watch this stuff again, I'll probably be ashamed of this post, but until then, it's up :)
Update: I still find this useful for some things, but XML handling is best handled by XSLT! Use XSLT!