Managing Complexity
Lecture Notes for CS 190
Spring 2015
John Ousterhout
Fundamental Philosophy
- Programs evolve continuously:
- Can't get the architecture right the first time
- Additional feature needs arise over time
- It's not good enough to write code that works
- Code must also be "beautiful"
- Why? Real goal is to enable continual improvements over a 10-20
year lifetime
- It must be easy to make these improvements, even for people
who weren't involved in the original construction
- Original authors are gone, or can't remember what they did
- Goal for design: allow changes to made easily
- How much work has to be done to accomplish a task?
- How much information does a programmer need in his/her mind
to accomplish a task?
- How easy is it to find the required information?
- Complexity accretes
- No one thing makes a system complicated
- It's an accumulation of thousands of small things over time
- Once complexity arises, hard to eliminate
- To prevent complexity, must sweat the small stuff
- Typical (wrong) developer philosophy: "as long as I don't make
things much more complicated, it's OK"
- Must adopt a zero-tolerance mindset: everything matters.
- Real-world pressures encourage complexity
- Fastest way to make progress in the short term is not to worry
about complexity.
- To reduce complexity, must invest extra time now, but the biggest
benefits don't come until the future.
- Must compromise: zero tolerance for complexity probably
not viable.
- Focus on most important things:
- Good interface design
- Documentation more important for interfaces than internals
- Create a budget for refactoring and cleanup
- Find ways to teach new employees how to write simple code
(e.g. code reviews)
- Investment to reduce complexity pays for itself relatively quickly
(6-12 months?)
- Without care, complexity builds up very fast
- Once this happens, development becomes much more expensive,
would have been cheaper to invest early on
- For this class: zero tolerance for complexity
- Goal for this class: teach you how to make things simple
Modular Design
- Divide system into modules that are relatively independent
- Ideal: each module completely independent of the others
- System complexity = complexity of worst module
- In reality, modules are not completely independent
- Some modules must invoke facilities in other modules
- Design decisions in one module must be known to other
modules
- Can't change one module without understanding parts of
other modules
Abstraction
- Technique for dealing with complexity: find a simple way to
think about and manipulate a complex entity
- Separate essential elements from details that can be ignored
- Divide each module into two parts:
- Interface of a module: anything about that module that
must be known to other modules
- Formal aspects: method signatures, public variables, etc.
- Informal aspects: side effects, algorithms that affect behavior of
methods, etc.
- Implementation: code that enforces the promises made
by the interface
- Goal for interface design: maximize functionality/interface complexity
(a sweet interface or module)
Parnas paper
Information Hiding
- Each module (class) encapsulates certain knowledge or design
decisions:
- No other class should need to understand these details
- The interface does not expose internal implementation details
Classes Should be Thick
- Thin class:
- Not much functionality
- Short methods that don't do much
- It's almost as much work to invoke a method as it
would take to type in the body of the method
- Classic example: linked list
- Thin classes can't hide much information
- Thick class:
- Lots of functionality, yet simple interface
- Hides lots of information
- Classitis: too many classes
- Rule of thumb: 200-2000 lines is a good size for classes
- Below 200 lines: probably pretty thin
- Above 2000 lines: internal complexity of the class can
become unmanageable. See if it can be subdivided cleanly.
- However, size itself isn't the most important metric: it's
functionality/(interface complexity)
Simplicity
- Must decide what's important, design the interface around that
- But, how to know what's important?
- Focus on the things that are done most frequently
- Technique #1: if a particular task is invoked repeatedly,
design an API around that task (or do it automatically,
with no explicit feature).
- Technique #2: if a collection of tasks are not identical,
look for common features shared by all of them; design
APIs for the common features.
- It's OK to provide APIs for infrequently-used features,
but design them in a way that you don't need to be
aware of them when using the common features.
- Bad example: Java I/O
- Good example: device-independent I/O in UNIX/Linux:
- Before UNIX: different kernel calls for opening and accessing
files vs. devices.
- Different kernel calls for each device: terminal, tape, etc.
- Different naming mechanisms for each device
- UNIX emphasized commonality across devices:
- Devices have names in the file system: special device files
- All devices have same basic access structure: open, read,
write, seek, close
- Handle device-specific operations with one additional kernel
call:
int result = ioctl(int fd, int request,
void* inBuffer, int inputSize,
void* outBuffer, int outputSize);
- High- and low-level APIs are most amenable to simplicity:
- Primitives (hash table, file block cache, etc.)
- Should do one thing well; can often restrict functionality to
enhance simplicity.
- If they try to do several things at once they get too confusing.
- High-level abstractions (distributed transactions)
- Encapsulate entire tasks with a ridiculously simple interface.
- Typically conflate a whole bunch of things in their implementation.
- Intermediate-level APIs: hard to make these simple.
Generality
- How general-purpose should a module be?
- E.g., "Should I implement extra features beyond those that I need today?"
- If the module is a basic building block likely to be used in
multiple places, then design for generality
- Focus on clean orthogonal features that are easy to use,
can be combined together
- Plan ahead for uses that aren't necessarily needed today
- If the module is only used in one place, make it specific
- Specialize its API to make it simpler.
- Leave out features not currently needed
- When in doubt, take the more specialized approach
- If you build a module for a single purpose, then discover
it's being reused, refactor to generalize it.
The Martyr Principle
- Module writers should embrace suffering:
- Take on hard problems
- Solve completely
- Make solution easy for others to use
- Take more pain for yourself, so that others have less
- Push complexity down into modules:
- Let a few module developers suffer, rather than thousands
of users
- Solve, don't punt:
- Handle error conditions rather than throwing exceptions
- Minimize "voodoo constants" (configuration parameters)
- If you don't know the right value, how will a user or
administrator ever figure it out?
Applying These Ideas
- May be hard initially to apply these ideas when writing code.
- Make 2 designs and compare
- Pick one and write some code
- Review this topic to look for potential problems
- Revise code
- Take advantage of code reviews
- Red flags to look for:
- Thin classes
- Information leakage
- Very deep call stacks (especially if one interface calls
another that looks similar)
- Lint: little bits of unnecessary complexity
- Repeated pieces of code (DRY)