Stanford University
Computer Science 349: Winter 2005 - 2006

CS349 Midterm 2006, Take home exam

Due: 11:59 pm PST, Feb 10th 2005. Please submit by email to cheriton at dsg.stanford.edu.

Here's my attempt to summarize the basic answers I was after for each question. Given their mushy nature, lots more could be written in each case.

Software Engineering: Given the evolutionary nature of large-scale software, it is important to consider the incremental cost of adding a new feature or facility, especially given it is hard to remove a feature once added and feature addition is occurring repeatedly over many releases. Here, cost can include cost of development, on-going maintenance/support and cost in execution-time cycles and memory. Describe the key issues with understanding this incremental cost, where you think the industry is currently in understanding this, and what needs to be improved or better understood.
Answer: a key aspect I was after is recognition that the cost (and value) of a feature is better evaluated in terms of feature interaction, than as an isolated unit. Moreover, the worst-case in this view is rather frigthening. If a company adds k features each release, and they have kN feature interactions. With this base analysis, you would expect a software system to evolve into something completely unsupportable fairly quickly. Howover, every software system is pushed to add features to keep customers happy and maintain a competitive advantage. Plus, the situation is not really that bad because a feature does not generally interact with all other features, types and objects.
Industry currently does informatlly try to eval. the impact of a feature in terms of modules that need changing, costs, etc. There are simply low-cost features that only affect one or a small number of modules in a simple way. However, ideally we should have more concise models and tools to evaluate interactions and cost of feature. Moreover, the real issue is often impact over time. I.e. what is the future cost of this feature together with all the others we have and will add going forward. One concern here is the scalability impact of a feature, both at present and in the future. This relates back to feature interaction because a given feature may cause non-lineary growth in memory or processing because of feature interaction. E.g. if you have a feature that need state per port-vlan, the cost grows multiplicatively in ports and VLANs.
In sum, there is growing awareness of cost of features, feature interactions but we need to get better theory, and better understanding of the impacts over the lifetime of software.
Audit: The Dali Lama appears in the middle of a debate between using assert's vs audit and claims that the debate is bogus: His holiness argues that these are just points in the design space dimensioned on completeness of checking and frequency of invocation. In particular, asserts are typically checking more frequently (i.e. every function call) and less complete in general whereas audit is complete and less frequent. Describe to what degree he is correct and what engineering implications this might have, e.g. are there other points in this "design space" that are worth considering?
Answer: DL is basically right. However, the key aspects I am after are:
1. watering down audit to incomplete checking seems questionable from an engineering standpoint, especially given that audit should be structured as audit per module, and not clear what it means to check incompletely relative to unit of encapsulation.
2. involving audit more frequently is feasible. It's just a matter of invoking the audit procedure whereever you want. The limitation is the overhead, which comes back to the cost of checking invariants, which is of course tied to the completeness of checking.
3. Conversely, making asserts more complete would warrant having a separate procedure or procedures that includes the checking, given the repeated code you would otherwise have. However, if the checking is not complete, it seems questionable and if it is, this is basically just audit.
In sum, there is a conflict between completeness and cost to inhabit intermediate points in this design space.
One can identify other dimensions on which one might compare audit vs. assert, but that is not what the questions was really asking.
Collections and Iterators: Iterator semantics could, as one choice, be specified as operating on a snapshot of the collection, logically created as a snapshot of the collection at the time the iterator is created.
1. Assuming there are concurrent adds and deletes taking place with a collection during the lifetime of an iterator, what are the issues an application may have to deal with using this model and how does that compare to the "version"-based approach in the notes.
  Answer: What I was after was the recognition that there are often invariants that are related to whether an object is in a collection or not. For example, if an app. have a snapshot view of a process ready queue, as an app. iterates thru the queue, it get a pointer to a process that have been since been deleted from the queue, so the process object itself may indicate blocked even though the iterator suggests it is ready. Similarly, an object may be encountered in a directory iteration that has been deleted from the directory so the iterator suggests it is in the directory but a lookup on the key of the object does not find it in the directory. In general, a snapshot of a collection is too small a unit of state to ensure that many invariants hold.
  Some answers mentioned that the app might now want the snapshot semantics, in effect. However, I really wanted to see any example of this case to make that compelling. Morever, presumably a class can easily providing the non-snapshot version too. The real focus of the question was to what degree the snapshoting really solved the semantics issues.
2. What are the performance/efficiency issues with these semantics compared to the version-based approach.
  Answer: One aspect is just the overhead to create and manage snapshots. Many answers mentioned the timestamp based approach mentioned in the notes. However, as I mentioned in the notes, it is challenging to figure out when to really delete objects in a collection. As I see it, you need to maintain lists of all the iterators on the collection and either periodically run thru the collection deleting objects whose delete time is older than all existing iterators or do this each time an iterator is deleted. The alternative approach is to make a copy of the collection when the iterator is instantiated. Either approach involves extra passes over the collection.
  In terms of comparison, when a snapshot is not needed so no redo is required with the version-approach, clearly that version approach is far cheaper.
  When greater consistency is called for, you can use the version-based approach to either redo the iteration when the version has changed or make a copy of the collection (and redoing if it changes during the copy creation) and then perform the iteration on the collection copy (i.e. snapshot). The former means that a re-iteration may be required for the collection, so is comparable when there is contention, and in fact can be more expensive if contention causes more than two passes over the collection. The latter requires a pass over the collection so is comparable to a snapshot implementation (the timestamped version presumably requires a pass after iteration to delete). Overall, e version approach makes it feasible to be more efficient when snapshot semantics are not required and comparable when they are.
Templates: George Bush hears about C++ templates being "Turing complete", and wants to convert all US software development to "generic programming" as a competitive advantage, whereas John Kerry takes the position that template increase the "digital divide" and should be eliminated altogether. Describe and justify a sane engineering position on the use of templates, citing compelling examples and any quantitative support you have for this position.
Answer: Considering Kerry first, there is an element of "digital divide" at least in the sense that far fewer programmers really understand templates and generic programming. However, given the benefits of templates for collections and smart pointers, it is infeasible to abandon them altogether. However, you can recognize a significant difference between the skill required to use a class template from that required to design/program the class template. You can have a few highly skilled senior programmers that can deal with class template design, if in fact you need any new ones beyond those provided by a good framework. Other programmers just need to use them, and it seems questionable to employ someone on an OOP project that cannot even use templates.
On the other hand, generic/template programming appears to be more difficult to develop, debug and maintain in the current state of the art. The big payoff comes in KLOC from a small number of class templates that are widely used. Also, a template version tends to be far more code than a single class instance, e.g. 70 lines of unique_copy as a class template vs. less than 7 as real code. Thus, a class template has to be used roughly 10 times to even achieve parity with non-templated code. Thus, limited selective use of templates provides benefit but extensive (and certainly exclusive) use tends to increase all the software engineering costs, so would be a competitive disadvantage.
In summary, judicious use of a small number of well-designed class templates such as provided by a good framework has compelling benefits, but additional use of templates, especially design of application-specific templates should be approached with caution given the costs and complexities of templates.

CS349 Midterm 2006, Take home exam

The End