A Cascade of Table View Bugs

Yesterday I wrote some code to hide the Trash location in our document picker when it’s empty. Today, I got a bug report that attempting to add or remove a cloud storage location in the document picker would now crash with an exception thrown by UITableView. These didn’t sound all that related other than timing and both involving the document picker.

And at first, all the evidence told me they were not related. But thanks to a combination of overlapping bugs it took me over an hour, a fair bit of disassembly, and some moral support from my coworker Jake to track down the origin of the bug.

The specific exception being thrown was “The number of sections contained in the table view after the update (1) must be equal to the number of sections contained in the table view before the update (1), plus or minus the number of sections inserted or deleted (0 inserted, 1 deleted).”

The Trash location is represented by a row in section 0, along with all the other locations, so there was no reason to associate its addition or removal with an assertion about a bad section count. The only time we add or remove a section from the table view is when the view controller gets -setEditing:animated:. When entering edit mode, we add a section that contains a single “Add a Cloud Account” item, which we remove when leaving edit mode.

Of course, this requires returning the correct value from -numberOfSectionsInTableView:, which the code was in fact doing. But this morning, a coworker had fixed a different bug, where swiping to delete an account would display this extra section, which couldn’t be tapped because the table view was in swipe-to-delete mode on another row, and which would disappear as soon as the swipe-to-delete was completed or canceled. He’d neglected to update -numberOfSectionsInTableView: to properly handle being in edit mode but not having added the extra “Add Account” section.

So I fixed the code and tried again, and got the exact same exception at the exact same place (a call to -[UITableView endUpdates] in the code that was responsible for adding and removing rows for all the locations in section 0). Huh, that’s odd. I set a breakpoint on -numberOfSectionsInTableView: and verified that it always returned the correct answer. I inspected the code between -beginUpdates and -endUpdates and found no manipulation of sections. I set a breakpoint on -[UITableView deleteSectionAtIndex:], and eventually on everysection mutation method, none of which were ever hit. I even set a breakpoint on -[NSIndexPath initWithIndexes:length:] and never saw one created with a section of 1. But UITableView still complained that my data source had failed to account for the deletion of a section.

At this point I’d run out of ways to query the table view for assistance. I brought in my coworker Jake to be a second set of eyes. He couldn’t find any obvious mistakes either, so I cracked out Hopper and got to disassembling.

(Quick aside: buy Hopper and learn how to use it. Until the moon crashes into the sea and Apple decides to make its source code available in escrow for tracking down bugs, Hopper is as close as you’re going to get to having source-level debugging for system frameworks.)

We started at the method which threw the exception. It appeared to be quite a workhorse, responsible for coalescing the effects of all the operations that happened between -beginUpdates and -endUpdates. We determined that UITableView stored a record of these operations in arrays of UIUpdateItem objects. There was one pool for each kind of operation (insertion, deletion, move, and reload). Unsurprisingly, the “delete” pool contained one such item at the time of the crash. Asking it -isSectionUpdate returned YES, but strangely, its index path indicated it was a deletion of the 9-quintillionth row in section 0.

In case you’re unfamiliar, 9 quintillion is the approximate value of NSNotFound on 64-bit systems. That set off alarm bells. It was quite possible that the row updating code that executed right before the exception was thrown had asked the table view to delete an absurdly-numbered row from section 0. And it turns out that my Trash-hiding code was doing exactly that: it failed to handle the case where the Trash was already hidden, and wound up asking the array backing the data source for the -indexOfObject: for the object representing that location. It got NSNotFoundback, and dutifully constructed an index path from it to ask the table view to delete. Checking for NSNotFound before trying to delete fixed the crash.

In the end, three overlapping bugs turned this from a normal “oops” to a blog-worthy story:

  1. My coworker changed when we added or removed sections from the table view, but didn’t update the code that returned the current number of sections.
  2. I wrote some bad row updating code that didn’t check if the objects it wanted to delete from the table view actually existed.
  3. UITableView erroneously claimed I had deleted a section, when in fact I’d deleted a non-existent row whose index path contained a Very Special Number, leading me to finger my coworker’s code as the cause rather than my own.

And this is the exact moment that people see the value in more advanced language features like Swift’s support for optional types. If Cocoa had been originally written in Swift, instead of retuning NSNotFound (which is perfectly within the valid range of integers), -indexOfObject:would return an optional index, and Swift would have forced me to handle the case where -indexOfObject: returned nil because the thing just plain didn’t exist.