Based on an assignment by Keith Schwarz
Due: Monday, October 29, 11AM
May be done in pairs
Recursion is an extremely powerful problem-solving tool with tons of practical applications. This assignment consists of four real-world recursive problems, each of which we think is interesting in its own right. By the time you’re done with this assignment, we think you’ll have a much deeper appreciation both for recursive problem solving and for what sorts of areas you can apply your newfound skills to. You may optionally choose to work on this assignment with a partner.
This assignment, like the preceding one, explores a lot of different concepts in recursion. If you start typing out code without a clear sense of how your recursive solution will work, chances are you’ll end up going down the wrong path. Before you begin coding, try thinking about the recursive structure of the code you’re writing. Does anything fall into one of the nice categories we’ve seen before (listing subsets, permutations, combinations, etc.)? Is there a nice decision tree you can draw? If you’re using backtracking, what choices can you make at each step? Talking through these questions before you start coding and making sure you have a good conceptual understanding of what it is that you need to do can save you many, many hours of coding and debugging.
Turn in only the following files:
RecursionToTheRescue.cpp, the C++ code for the 4 required functions (excluding a main function)DNADetectiveTests.cpp, the C++ code for the DNA Detective unit testsdebugging.txt, a file detailing a bug you encountered in this assignment and how you approached debugging it (provided in the res/ folder)You have a lot of time to complete this assignment, so make sure that you’re not putting it off to the last minute. Here’s a rough timetable that we think should be pretty manageable:
Best of luck on this assignment!
As in other assignments, you should follow our Style Guide for information about expected coding style. You are also expected to follow all of the general style constraints emphasized in the Homework 1-3 specs, such as the ones about good problem decomposition, parameters, using proper C++ idioms, and commenting. The following are additional points of emphasis and style constraints specific to this problem:
Recursion: Part of your grade will come from appropriately utilizing recursion to implement your algorithm as described previously. We will also grade on the elegance of your recursive algorithm; don't create special cases in your recursive code if they are not necessary. Avoid "arm's length" recursion, which is where the true base case is not found and unnecessary code/logic is stuck into the recursive case. Redundancy in recursive code is another major grading focus; avoid repeated logic as much as possible. As mentioned previously, it is fine (sometimes necessary) to use "helper" functions to assist you in implementing the recursive algorithms for any part of the assignment.
Variables: While not new to this assignment, we want to stress that you should not make any global or static variables (unless they are constants declared with the const keyword). Do not use globals as a way of getting around proper recursion and parameter-passing on this assignment.
Unit Tests: Make sure that your unit tests for problem 3 (DNA Detective) test as small, isolated components of your function as possible. Make sure to give each test a detailed name, and comment above it explaining what it is testing.
You may optionally work on this assignment as a pair programming assignment. If you would like to work in a pair on this assignment, you may, under the following conditions:
You have just been hired as the Minister of Health for the small country of Recursia. Recursia has recently had major economic growth and has been building a network of national hospitals. But now, they face a crisis - no one has told the Recursian doctors which patients to see. They’re Doctors Without Orders! As Minister of Health, it's time to help the Recursians with their medical needs.
Consider the following two structs, which represent doctors and patients, respectively:
struct Doctor {
string name;
int hoursFree;
};
struct Patient {
string name;
int hoursNeeded;
};
Each doctor has a number of hours they're capable of working in a day, and each patient has a number of hours they need to be seen for. Your task is to write a function
bool canAllPatientsBeSeen(const Vector<Doctor>& doctors, const Vector<Patient>& patients, Map<string, Set<string>>& schedule);
that takes as input a list of available doctors, a list of available patients, then returns whether it's possible to schedule all the patients so that each one is seen by a doctor for the appropriate amount of time. Each patient must be seen by a single doctor, so, for example, a patient who needs five hours of time can't be seen by five doctors for one hour each. If it is possible to schedule everyone, the function should fill in the final schedule parameter by associating each doctor’s name (as a key) with the set of the names of patients they should see (the value).
For example, suppose we have these doctors and these patients:
In this case, everyone can be seen:
However, minor changes to the patient requirements can completely invalidate this schedule. For example, if Patient Lacks needed to be seen for three hours rather than two, then there wouldn’t be a way to schedule all the patients so that they can be seen. On the other hand, if Patient Washkansky needed to be seen for seven hours instead of six, then there would indeed be a way to schedule everyone. (Do you see how?)
This problem is all about recursive backtracking. Think about what decision you might make at each point in time. How do you commit to the decision and then undo it if it doesn’t work? And, as always, feel free to introduce as many helper functions as you’d like. You may even want to make the primary function a wrapper around some other recursive function.
Some notes on this problem:
Vector<Doctor> updatedDocs = doctors;
/* ... make changes to updatedDocs ... */
if (canAllPatientsBeSeen(updatedDocs, ...)) {
/* ... much mirth and whimsy ... */
}
If you find you’re “fighting” your code – that an operation that seems simple is actually taking a lot of lines to accomplish – it might mean that you need to change up your data structures.
Disasters – natural and unnatural – are inevitable, and cities need to be prepared to respond to them when they occur. The problem is that stockpiling emergency resources can be really, really expensive. As a result, it's reasonable to have only a few cities stockpile emergency resources, with the plan that they'd send those resources from wherever they're stockpiled to where they're needed when an emergency happens. The challenge with doing this is to figure out where to put resources so that (1) we don't spend too much money stockpiling more than we need, and (2) we don't leave any cities too far away from emergency supplies.
Imagine that you have access to a country's major highway networks. We can imagine that there are a number of different cities, some of which are right down the highway from others. To the right is a fragment of the US Interstate Highway System for the Western US. Suppose we put emergency supplies in Sacramento, Butte, Las Vegas, Barstow, and Nogales (shown in gray). In that case, if there's an emergency in any city, that city either already has emergency supplies or is immediately adjacent from a city that does. For example, any emergency in Nogales would be covered, since Nogales already has emergency supplies. San Francisco could be covered by supplies from Sacramento, Salt Lake City is covered by both Sacramento and Butte, and Barstow is covered both by itself and by Las Vegas.
Although it’s possible to drive from Sacramento to San Diego, for the purposes of this problem the emergency supplies stockpiled in Sacramento wouldn’t provide coverage to San Diego, since they aren’t immediately next to one another.
We'll say that a country or region is disaster-ready if it has this property: that is, every city either already has emergency supplies or is immediately down the highway from a city that has them. Your task is to write a function
bool canBeMadeDisasterReady(const Map<string, Set<string>>& roadNetwork,
int numCities, Set<string>& locations);
that takes as input a Map representing the road network for a region (described below) and a number of cities that can be made to hold supplies, then returns whether it's possible to make the region disaster-ready by placing supplies in at most numCities cities. If so, the function should then populate the argument locations with all of the cities where supplies should be stored.
In this problem, the road network is represented as a map where each key is a city and each value is a set of cities that are immediately down the highway from them. For example, here's a fragment of the map you'd get from the above transportation network:
"Sacramento": {"San Francisco", "Portland", "Salt Lake City", "Los Angeles"}
"San Francisco": {"Sacramento"}
"Portland": {"Seattle", "Sacramento", "Salt Lake City"}
As in the first part of this assignment, you can assume that locations is empty when this function is first called, and you can change it however you'd like if the function returns false.
There are many different strategies one could use to solve this problem, and some are more efficient than others. For example, one option would be to treat this problem as a combinations problem, trying out each combination of cities of the given size and seeing whether any of them would make the cities disaster ready. This option works, but it will be extremely slow on some of the larger test cases where there are over thirty cities – so slow in fact that your program might never give back an answer!
Instead, here is the approach you must use in this problem. Imagine there’s some city X in the transportation grid that’s adjacent to four neighboring cities, as shown here. Any collection of cities that makes this grid disaster ready is going to have to provide some kind of coverage to city X. Even if you have no idea which cities get chosen in the long run, you can say for certain that you’ll need to include at least one of A, B, C, D, or X.
Therefore, the approach should be to find a city that’s uncovered, then think of all the different ways you could cover it (either by choosing an adjacent city or by choosing the city itself). If you can cover all the remaining cities after making any of those choices, congrats! You’re done. On the other hand if, no matter which of these choices you commit to, you find that there’s no way to cover all the cities, you know that no solution to your particular subproblem exists.
This second approach tends to run a lot faster than the first. The recursion focuses more of its effort looking at adding cities that make an impact, and it’s a lot more obvious when you’re in a dead-end.
Some notes on this problem:
roadNetwork. You can rely on this.k cities, but you find a way to solve the problem using fewer than k cities, that’s fine! The set of cities you return can contain any number of cities provided that you don’t use more than k and you properly cover everything.
The test cases we’ve bundled with the starter code here include some simplified versions of real transportation networks from around the world. Play around with them and let us know if you find anything interesting as you do! Our starter code also contains an option to use your code to find the minimum number of cities needed to provide disaster protection in a region, which you might find interesting to try out once you’ve gotten your code working.
A note: some of the test files that we’ve included have a lot of cities in them. The provided test cases whose names start with VeryHard are, unsurprisingly, very hard tests that may require some time to solve. It’s okay if your program takes a long time (say, at most two minutes) to answer queries for those maps, though if you use the strategy outlined above you should probably be able to get solutions back for them in only a matter of seconds.
Debugging: the visualizer built in to the starter code may help you visualize the network being run and find bugs, if there are any. As an additional (optional) benefit, you can also call the following additional graphics methods as part of your solution that can highlight what your algorithm is doing. You are not required to use this in your solution, but feel free to do so if it helps you implement it.
| Method | Description |
|---|---|
setTryingToCoverCity(cityName, tryingToCover) |
Indicate to the visualizer that you are looking at a city to think of the different ways you could cover it (pass in true for tryingToCover), or are no longer doing so (pass in false for tryingToCover). The visualizer will highlight the most recent call with light blue, and earlier calls that are still being covered with dark blue. |
setStockpileInCity(cityName, stockpile) |
Indicate to the visualizer that you are marking this city as holding a stockpile (pass in true for stockpile) or not holding a stockpile (pass in false for stockpile). The visualizer will highlight stockpiles, and roads directly connected to them, in yellow. It will also highlight cities covered by stockpiled neighbors in white. |
You can also consider adding pause(ms); calls to slow down the highlighting and have it animate at a reasonable speed. The ms parameter is the number of milliseconds to pause the program for. Note that the visualizer expects cities to be marked "uncovered" in the reverse order they were marked "covered" (e.g. trying to cover A, and then trying to cover B, it expects you first stop trying to cover B, then stop trying to cover A).
You're working in a hospital and, to your dismay, there's been an outbreak of the antibiotic-resistant bacterium MRSA. You're not sure where the disease came from, whether it's endemic in the community, or whether it's a well-known strain. Fortunately, you've got a number of DNA samples from the bug, and your handy gene sequencer has returned back to you a rough copy of its genome. If you can figure out whether the different samples you're getting from different patients are all from the same strain of MRSA, you can take specific actions to try to end the spread quickly.
DNA consists of paired strands of nucleotides. The nucleotides used in DNA are adenine, cytosine, guanine, and thymine, usually just abbreviated A, C, G, and T, and DNA strands are often represented as strings made from these letters.
When DNA sequencing machines return back the specific sequence of nucleotides that make up a particular strand of DNA, they often have a small number of errors. For example, if there's a part of a DNA strand that actually reads GTGTC, then the sequencer might report it as GTGTA (substituting one letter for another), GTGT (dropping a letter), or GTGTAC (inserting a letter). One way of quantifying how “close” two DNA strands are to one another is to use a measure called edit distance. Specifically, the edit distance between two strings is defined as the number of single-letter insertions, deletions, and replacements needed to turn the first string into the second. For example:
cat and dog have edit distance 3. The fastest way to turn cat into dog is to replace each letter in cat with the corresponding letter from dog.table and maple have edit distance 2. The fastest way to turn table into maple is to replace the t in table with an m and the b in table with a p.
rate and pirate have edit distance 2. The fastest way to turn rate into pirate is to prepend a p and an i.
edit and distance have edit distance 6. Specifically, delete the e from edit, insert an s between the i and the t, and append the letters ance onto the back.
Let's jump back to our outbreak example. Imagine you sequence the DNA of a sample of MRSA that you've found in one patient and the DNA of a sample of MRSA you found in another patient. The sequencer will likely come back with some small number of errors in it. If the edit distance between the two DNA strands is low, it likely means that you're looking at essentially the same organism, meaning that the patients either got the bug from the same source or one gave it to another. On the other hand, if the edit distance is large, it likely means you're looking at two independent sources of infection.
Your task is to write a function
bool approximatelyMatch(const string& one, const string& two, int maxDistance)
that takes as input two strings representing strands of DNA and a number maxDistance, then returns whether the edit distance between the two DNA strands is maxDistance or less.
As a hint for this problem, look at the first characters of one and two and think about what you might do with what you find. What happens if they match? If they don’t match, you have three possible ways that you can get them to match – what are they, and how might you try them? And what happens if either input string is empty?
Some notes on this problem:
(Covered on Monday 10/22) As part of this problem, you should also implement unit tests to help test whether your implementation is correct. Specifically, you must implement at least 5 tests in addition to the ones provided in the starter code. These tests must cover distinct test cases.
Each test should have a detailed comment explaining what it's testing for as well as a descriptive test name. Moreover, each test you add should test different aspects than the existing tests. Refer to the notes on good testing practices from Lecture 13 (on Monday 10/22) for ideas about what sorts of tests you should write.
Each of your tests should be a function in DNADetectiveTests.cpp. To create a test, you will use the following format, adding both your description and a body of the test. You can see the examples in that file for further reference.
/* Comment here with more info about the test */
ADD_TEST("Put your description of the test here") {
/* body of the test */
}
You should submit your expanded testing file with your assignment.
The President of the United States is not elected by a popular vote, but by a majority vote in the Electoral College. Each state, plus DC, gets some number of electors in the Electoral College, and whoever they vote in becomes the next President. For the purposes of this problem, we're going to make some simplifying assumptions:
This problem explores the following question: under these assumptions, what's the fewest number of popular votes you can get and still be elected President?
Imagine that we have a list of information about each state, represented by this handy struct:
struct State {
string name; // The name of the state
int electoralVotes; // How many electors it has
int popularVotes; // The number of people in that state who voted
};
This struct contains the name of the state, the number of electors (to the Electoral College) the state gets, and its voting population. Your task is to write a function
MinInfo minPopularVoteToWin(const Vector<State>& states);
that takes as input a list of all the states that participated in the election (plus DC, if appropriate), then returns some information about the minimum number of popular votes you’d need in order to win the election (namely, how many votes you’d need, and which states you’d carry in the process). The MinInfo structure is defined in the RecursionToTheRescue.h header and is essentially just a pair of a minimum popular vote total plus the list of states you’d carry in the course of reaching that total:
struct MinInfo {
int popularVotesNeeded; // How many popular votes you'd need.
Vector<State> statesUsed; // Which states you'd carry in the course of doing so.
};
To implement this function, you are required to implement the following recursive helper function that solves the following problem (this cannot be a wrapper function):
/* What are the minimum votes and states needed to get at least "electoralVotesNeeded"
* electoral votes, using only states from index "minStateIndex" and greater in the "states" vector? */
MinInfo minPopularVoteToGetAtLeast(int electoralVotesNeeded,
const Vector<State>& states, int minStateIndex);
Notice that if you solve this problem with electoralVotesNeeded set to a majority of the total electoral votes and with minStateIndex as 0, then you’ve essentially solved the original problem (do you see why?). You must make your original function a wrapper around this helper function that solves this specific problem.
In the course of solving this problem, you might find yourself in the unfortunate situation where, for some specific values of electoralVotesNeeded and minStateIndex, there’s no possible way to get that many votes using only the states from that index onwards. For example, if you’re short 75 electoral votes and only have a single state left, there’s nothing that you can do to win the election. In that case, you may want to have this helper function indicate that it’s not possible to secure that many votes. We recommend using the special value INT_MAX, which represents the maximum possible value that you can store in an integer. The advantage of this sentinel value is that you’re already planning on finding the strategy that requires the fewest popular votes, so if your sentinel value is greater than any possible legal number of votes, always choosing the option that requires the minimum number of votes will automatically pull you away from the sentinel value. Just remember to not add anything to INT_MAX!. Doing so will cause it to turn into a negative number, which may introduce bugs in your program.
When you first implement this function, we strongly recommend testing it out using the simplified test cases we’ve provided you from the main menu. These test cases use real election data, but only consider ten states out of a larger election. That should make it easier for you to check whether your solution works on smaller examples.
Without using memoization, it’s almost guaranteed that your code won’t be fast enough to work with the full elections data. There’s just too many subsets of the states to consider. To scale this up to work with actual elections data, you'll need to work memoization into your solution. The good news is that, if you’ve followed the strategy we’ve outlined above, you should find that it’s relatively straightforward to introduce memoization into your solution. You may add additional parameters for memoization to the previously-mentioned helper function to do so. Remember that memoization helps cache previously-computed calculations. There are (in our tests) at most 270 electoral votes and 51 different possible values for minStateIndex = 270*51=13770 possible ways to call this function, at least from our tests. If we are blindly considering all possible subsets of 51 states, that's 2^51 possible subsets! So caching will undoubtedly cut down on the work we are doing. (Hint: a nested ADT, or a SparseGrid, may be helpful in storing the cache for memoizing. Think about how you can quickly look up cached values). Once you’ve gotten that working, try running your code on full elections data. I think you’ll be pleasantly surprised by how fast it runs!
Here are some general notes on this problem:
Right before the 2016 election, NPR reported that 23% of the popular vote would be sufficient to win the election, based on the 2012 voting data. They arrived at this number by looking at states with the highest ratio of electoral votes to voting population. This was a correction to their originally-reported number of 27%, which they got by looking at what it would take to win the states with the highest number of electoral votes. But the optimal strategy turns out to be neither of these and instead uses a blend of small and large states. Once you’ve gotten your program working, try running it on the data from the 2012 election. What percentage of the popular vote does your program say would be necessary to secure the presidency?
(A note: the historical election data here was compiled from a number of sources. In many early elections the state legislatures decided how to choose electors, and so in some cases we extrapolated to estimate the voting population based on the overall US population at the time and the total fraction of votes cast. This may skew some of the earlier election results. However, to the best of our knowledge, data from 1868 and forward is complete and accurate. Please let us know if you find any errors in our data!)There are tons of (optional) variations on these problems that you could imagine trying to solve and lots of ways you could apply them. Here are some suggestions: