Error Handling
Lecture Notes for CS 190
Spring 2015
John Ousterhout
- Errors and exceptions come from many sources:
- From above:
- Bad input from user or client
- Misconfiguration and operator errors
- From below: underlying system facilities don't work as desired:
- Disk I/O error
- Can't open file (wrong permissions, missing directory, etc.)
- Out of memory space
- Network socket already in use
- From peers in a distributed system:
- Server crashes
- Slow communication
- Lost network packets
- From ourselves: internal bugs
- Errors and exceptions are a major source of complexity and bugs
- They account for a lot of code in large systems
- They disrupt normal code flow:
- They happen in the middle of other activities
- Something didn't work like you expected
- Hard to figure out how to handle them
- May not be able to complete work in progress
- Language support is clunky
- Verbose
- Makes code hard to read
- Hard to test
- Don't occur very often in running applications
- Programmers often make the exception problem worse:
- Defensive programming: throw exceptions for anything that
looks a tiny bit suspicious. More errors are better?
- Expediency: rather than figure out how to solve a problem,
just throw an exception, punt it to the next level
- Result: even more exceptions, many of which no-one really knows
how to handle.
- Key idea: reduce the number of exceptions that must be
handled. Specific techniques:
- Whenever possible, define errors out of existence:
- Deleting variables in Tcl
- File deletion in Windows
- Bounds checks in Java substring method
- Mask errors (recover automatically so the error doesn't
have to be reported)
- E.g., if a server crashes, automatically fetch data from
a backup server
- Or, if a server crashes, wait until it restarts
- Collapse errors (handle several different cases with the
same code)
- Promote one error to another (in RAMCloud, many errors
get promoted to "server crash").
- Reuse existing handler
- May not work for exceptions that happen frequently
- Defer reporting to a place where other exceptions could
already happen.
- Example: in RAMCloud, report RPC errors only on wait,
not send.
- Advantages of collapsing:
- Simplifies code (fewer cases).
- Remaining handlers get invoked more often (will get
debugged).
- Just panic (crash app)?
- If there's not a viable way to handle it
- Example: running out of memory in malloc
- Or, throw an error, which isn't handled except at the
very top level.
- Before throwing an error, think about how the caller will
handle it.
- If you can't visualize how the caller will handle it, rethink
the error
- Choose between throwing an error and returning a value.
- If the caller will almost always care about the error,
might as well report it with a return value.
- Make lots of information available after errors:
- Include in message in exception?
- Or, output to system log