Assignment 5: RSS News Feed Aggregator
Due Date: Saturday, February 22nd at 11:59PM
Learning Goals
This assignment gives you a chance to implement an index of information similar to that held by news.google.com. It acts as a transition from multiprocessing to multithreading. This assignment will help you:
- further develop your understanding of
thread
s and thread-related concurrency issues - hone your skills identifying and eliminating race conditions
- become more familiar with how to use
mutex
essemaphore
s to manage coordination between threads.
Starter Code
All coding should be done on a myth
cluster machine, as that's where we'll be testing all submissions. Clone the starter project using the command
git clone /afs/ir/class/cs110/repos/assign5/$USER assign5
Compile often, test incrementally and almost as often as you compile, run ./tools/sanitycheck
a bunch to test your work, and run ./tools/submit
when you’re done. And as you've seen in past assignments, you can play with a sample application by invoking ./samples/aggregate_soln
.
The starter project contains the following, categorized by whether or not you will be modifying them for this assignment. Note, however, that you need to become familiar with these files and their usage even if you do not modify them.
Files you must modify:
- The
news-aggregator.h/cc
files define and implement the class used to build and query an index. You'll update these to implement the version of the assignment that allocates new threads without bound.
Files you should consider modifying:
rss-index.h/cc
: define and implement theRSSIndex
class, which models the news article index we’ve been talking about. AnRSSIndex
index maintains information about all of the words in all of the various news articles that’ve been indexed. The code you add to yourNewsAggregator
class will end up interacting with anRSSIndex
, so you’ll want to be familiar with it—in particular, you should inspect therss-index.h
file so you’re familiar with theadd
method. Note that theRSSIndex
implementation is not thread-safe.
Our solution changed these two files to codify the rules on how to handle articles with the same title but different URLs on the same server. You should consider doing the same thing.
Files you can, but are not required to, modify:
You shouldn’t need to change any of these files, although you're welcome to if it helps you arrive at a working program more quickly.
test-union-and-intersection.cc
: sample code on how to use the C++ algorithms most useful for managing token set intersections. As written, this standalone program doesn't directly contribute to youraggregate
executable, but it's there as a reference so you have an idea how you should be usingvector
s,sort
, andset_intersection
. (You're free to tinker with this file if you'd like to)utils.h/cc
: define and implement a small number of URL and string utility functions that contribute to the implementation of several functions in yourNewsAggregator
class. The provided implementation ofqueryIndex
calls theshouldTruncate
andtruncate
functions, and you’ll ultimately want to call thegetURLServer
function as you implant thread count limits on a per-server basis.log.h/cc
: define and implement a smallNewsAggregatorLog
class that helps manage all of the progress and error messages that come up during program execution. You can read through the interface and implementation files to figure out what method is likely called where within working implementations ofbuildIndex
andprocessAllFeeds
. In fact, the logging might help you work through the workflow of program execution as you implement, test, and debug. You, however, are not required to use the logging facilities provided by theNewsAggregatorLog
class if you don't want to.article.h
: defines a simple record—theArticle
—that pairs a news article URL with its title.operator<
has been overloaded so that it can compare twoArticle
records, which means thatArticle
s can be used as the keys in STLmap
s.*-exception.h
: these three header files each define and inline-implement a custom exception class that gets instantiated and thrown when some network drama prevents one of the threeparse
methods from doing its job. It’s possible the URL was malformed, or it’s possible your WiFi connection hiccuped and your network connection was dropped mid-parse. Recall that functions and methods capable of throwing exceptions can be optionally placed under the jurisdiction of atry
/catch
block - see the tips section of the spec for an example.
Files you should not modify:
aggregate.cc
: a super small file that defines themain
function and nothing else.html-document.h
: defines theHTMLDocument
class, which models a traditional HTML page. It provides functionality to pull an HTML document from its server and surface its payload—specifically, the plain text content under its<body>
tag—as a sequence of tokens. Theparse
method manages the networking and processes the content, and thegetTokens
method provides access to the sequence of tokens making up the pages. All of the news articles referenced by the RSS feeds are standard web pages, so you’ll rely on thisHTMLDocument
class to pull each of them and parse them into their constituent tokens. We have provided you the header file so you can see the interface. You do not have access to the functions themselves.rss-feed.h
: defines theRSSFeed
class, which models a particular type of standardized XML file known as an RSS feed. The structure of an RSS feed document isn’t important, since anRSSFeed
object, when constructed around a URL of such a feed, knows how to pull and parse the XML document just as anHTMLDocument
knows how to pull and parse an HTML document. The primary difference is that anRSSFeed
produces a sequence ofArticle
s, not a sequence of tokens. So,RSSFeed
s produceArticle
s housing URLs which can be fed toHTMLDocument
s to produce words. We have provided you the header file so you can see the interface. You do not have access to the functions themselves.rss-feed-list.h
: defines theRSSFeedList
class, which models another XML file type whose format was invented for the purpose of this assignment.RSSFeedList
’s story is similar to that forRSSFeed
, except that it surfaces a feed-title-to-feed-url URL map. We have provided you the header file so you can see the interface. You do not have access to the functions themselves.stream-tokenizer.h/cc
: the C++ equivalent of the JavaStreamTokenizer
class. The implementation is not pretty, but that’s because it needs to handle UTF8-encodings of strings that aren’t necessarily ASCII. Fortunately, you should be able to ignore this class, since it’s really used to decompose the already implementedHTMLDocument
class. Feel free to peruse the implementation if you want, or ignore it. Just understand that it’s there and contributing to the overall solution. You should not need to read or modify this file.
Overview
Virtually all major newspapers and TV news stations have bought into this whole Internet thing. What you may not know is that many major media corporations serve up RSS feeds summarizing the news stories that’ve aired or gone to press in the preceding 24 hours. RSS news feeds are XML documents housing information about online news articles. If we can get the feeds, we can get the articles, and if we can get the articles, we can build an index of information similar to that held by news.google.com. That’s precisely what you’ll be doing for the next seven days.
Assignment 5 is shorter than past assignments, and it acts as a transition from multiprocessing to multithreading while letting you gain experience with C++ thread
s, mutex
es, and semaphore
s. Assignment 6, which goes out a week from today, will have you revisit some of Assignment 5's design decisions and force you to be much more conservative about the number of actual threads you create over the program's lifetime.
While working on this assignment, see the end section for helpful C++ tips.
Indexing a news article amounts to little more than breaking the story down into the individual words and noting how often each word appears. If a particular word appears a good number of times, then said word is probably a good indicator as to what the web page is all about. Once all of the world’s online RSS news feeds have been indexed, you can ask it for a list of stories about a specific person, place, or thing.
If, for instance, you want to learn more about the movies in the news lately, you might type in this:
myth56$ ./samples/aggregate_soln --url medium-feed.xml
Enter a search term [or just hit <enter> to quit]: movie
That term appears in 99 articles. Here are the top 15 of them:
1.) "Review: The engrossing dramas 'Beanpole' and 'And Then We Danced'.....place" [appears 13 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....views"
2.) "Why Blumhouse flipped the '70s TV classic 'Fantasy Island' into a.....iller" [appears 13 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....house"
3.) "How Julia Louis-Dreyfus and Will Ferrell made 'Downhill' at a wor.....esort" [appears 10 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....eyfus"
4.) "Review: After his makeover, 'Sonic the Hedgehog' is legit funny, .....ining" [appears 9 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....arrey"
5.) "Review: Just in time for Valentine's Day, 'The Photograph' casts .....spell" [appears 9 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....a-rae"
6.) "Visit the places where the 2020 Oscar-nominated movies were filmed" [appears 9 times].
"https://www.latimes.com/travel/story/2020-02-07/where-to-visit-os.....tings"
7.) "How 'Sonic the Hedgehog' director saved the movie from massive in.....klash" [appears 8 times].
"https://www.latimes.com/entertainment-arts/business/story/2020-02.....-work"
8.) "'To All the Boys 2': How Lana Condor and Jenny Han made a teen he.....elves" [appears 7 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....y-han"
9.) "Apple and Amazon are transforming Culver City. Should they pay mo.....axes?" [appears 6 times].
"https://www.latimes.com/entertainment-arts/business/story/2020-02.....taxes"
10.) "Review: 'Fantasy Island' movie squanders premise on thrill-free horror" [appears 6 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....orror"
11.) "Feedback: From 'Parasite's' win to In Memoriam snubs, readers sou.....scars" [appears 6 times].
"https://www.latimes.com/entertainment-arts/story/2020-02-13/oscar.....dback"
12.) "Movies on TV this week: Sunday, Feb. 16, 2020: 'Mr. Smith Goes to..... more" [appears 6 times].
"https://www.latimes.com/entertainment-arts/story/2020-02-14/movie.....-more"
13.) "Kirk Douglas gave millions for Hollywood's Alzheimer's patients" [appears 5 times].
"https://www.latimes.com/entertainment-arts/business/story/2020-02.....-fund"
14.) "Oscars 2020: Netflix wins just two awards as 'Parasite' boosts Neon" [appears 5 times].
"https://www.latimes.com/entertainment-arts/business/story/2020-02.....oscar"
15.) "Review: Clever plotting brings atmospheric ghost story 'Camp Cold..... life" [appears 5 times].
"https://www.latimes.com/entertainment-arts/movies/story/2020-02-1.....urray"
Interested in a different medium? Let's try music:
Enter a search term [or just hit <enter> to quit]: music
That term appears in 157 articles. Here are the top 15 of them:
1.) "Discover a living legend: Singer, songwriter, visual artist, Texa.....Allen" [appears 16 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-14.....-dick"
2.) "What Grammys? Try these 12 places for the best live music in America" [appears 15 times].
"https://www.latimes.com/travel/story/2020-01-23/never-mind-grammy.....enues"
3.) "Movies on TV this week: Sunday, Feb. 16, 2020: 'Mr. Smith Goes to..... more" [appears 13 times].
"https://www.latimes.com/entertainment-arts/story/2020-02-14/movie.....-more"
4.) "Huey Lewis on his struggle with hearing loss: 'I thought I might .....self'" [appears 10 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-12.....sease"
5.) "Confront your fear of bumblebees: Nose into human-size nests at D.....rdens" [appears 8 times].
"https://www.latimes.com/lifestyle/story/2020-01-16/art-embraces-t.....hibit"
6.) "Best concerts in L.A. this week: Courtney Barnett, Mac DeMarco, Ginuwine" [appears 7 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-12.....rnett"
7.) "How a fire in the Inland Empire could spell doom for the worldwid..... boom" [appears 7 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-14.....cquer"
8.) "Vegas tips from the pros: Lounge singers share their favorite off.....gouts" [appears 7 times].
"https://www.latimes.com/entertainment-arts/story/2020-01-30/las-v.....picks"
9.) "Eight things to do in L.A. and O.C., including Valentine's Day shows" [appears 7 times].
"https://www.latimes.com/entertainment-arts/story/2020-02-13/thing.....avoom"
10.) "California Sounds: L.A.'s Kamasi Washington goes 'Live at the Apollo'" [appears 7 times].
"https://www.latimes.com/entertainment-arts/story/2020-02-14/calif.....ntary"
11.) "Morrissey, Bauhaus, Blondie headline new '80s alt-music festiva.....World" [appears 6 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-11.....s-80s"
12.) "What doesn't kill Ozzy Osbourne makes him ... even Ozzier" [appears 6 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-12.....alone"
13.) "Camila Cabello (sort of) launches acting career with old Hollywoo.....h My'" [appears 4 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-12.....-baby"
14.) "Review: On 'Changes,' Justin Bieber finds salvation through hot m.....t R&B" [appears 4 times].
"https://www.latimes.com/entertainment-arts/music/story/2020-02-13.....eview"
15.) "What's on TV This Week: 'Washington,' 'Rocketman' and more" [appears 4 times].
"https://www.latimes.com/entertainment-arts/story/2020-02-14/whats.....etman"
Sadly, sometimes something perfectly awesome doesn’t get mentioned at all:
Enter a search term [or just hit <enter> to quit]: CS110
Ah, we didn't find the term "CS110". Try again.
Milestone 1: Implementing Sequential aggregate
Your NewsAggregator
class—that's the primary class in your assign5
repo to help manage the construction and interrogation of your news article index—ultimately needs to provide a multithreaded version of a method called processAllFeeds
, which downloads a collection of RSS feeds and the HTML news articles they reference to build an in-memory version of your index. You’re free to implement the multithreaded version right from the start and forgo an intermediate, sequential version. However, we strongly recommend getting a sequential version working first to ensure everything works predictably, that you’ve made proper use of all of the helper classes we’ve provided, and that your sequential application’s output matches that produced by the sample (see Note below). This is the approach we took when implementing this assignment, and is the one that will allow us to best help you in office hours. Incremental development is the key to arriving at a robust, bug-free executable!
Note: you might see small differences between the sample application and your own, because a small fraction of an article’s content—the advertising, most often—is dynamically generated for each download and influences how the index is built each time. The sanitycheck
tool, however, has been seeded with static RSS feeds and HTML news articles, so you should rely on it for a reliable test framework.
Here are details that apply to both the sequential and the later multithreaded version:
-
You may not change the implementation of the
NewsAggregator::queryIndex
method. It outputs the results in the exact format we’ll expect to be used by the autograding process, so changing it would be a clear path to a 0. -
Ensure that you never ever download the same exact URL twice. The feed lists and RSS feeds may surface the same URL more than once (e.g. the New York Daily News's world-news and religion feeds may each include the same URL to an article about the Vatican), so you need to ensure that you detect the duplicates. You should download an article the first time you see its URL, but ignore all of the others. If you try to pull a document and the networking fails for whatever reason (i.e.
RSSFeed::parse
orHTMLDocument::parse
throws an exception), you shouldn't try to download it a second time, and you should still ignore any other repeated requests to download the same URL during program execution. Long story short: every URL gets one chance to contribute. -
When two or more HTML articles share the same title and are hosted by the same server (i.e. their URLs have the same domain and subdomain—perhaps both include
arts.nytimes.com
), don't assume they're true duplicates. It's safe to assume they're all really the same article, but download all of them anyway, compute the running intersection of their token sets, and file the token intersection under the lexicographically smallest URL. So if, for instance,http://www.jerrynews.com/a.html
,http://www.jerrynews.com/b.html
, andhttp://www.jerrynews.com/c.html
all lead to articles titled "Rock Me Like a Jerry Cain", and independent downloads produce token vectors of["a", "a", "b", "c", "a", "a"]
,["a", "a", "b", "b", "a"]
, and["a", "a", "b", "b", "c", "a", "d", "a"]
, respectively, ensure that theRSSIndex
gets "Rock Me Like a Jerry Cain" underhttp://www.jerrynews.com/a.html
with["a", "a", "a", "b"]
as tokens. This technique overcomes the fact that many news articles have dynamically generated advertising content that can be intersected away via multiple downloads. Implementing this will require you investigate thesort
andset_intersection
functions, iterators, and/orback_inserter
s (at least these are the C++'isms we feel are best used to implement this part). Your repo includes a standalone program intest-union-and-intersection.cc
that serves no other purpose than to illustrate how these STL built-ins work.
And here's some advice and words of caution.
-
You should test with the
small-feed.xml
file until you’re convinced you’ve reached the sequential milestone. The huge drawback of the sequential version—in fact, a drawback we deliberately highlight so we later see why programming with threads is so incredibly powerful—is that it takes a long time to load each article one after another. Thesmall-feed.xml
file should be good enough for the vast majority of your development needs until you’re ready to tackle the concurrency requirements. At that point you can test the sequential version one last time by launching youraggregate
executable againstmedium-feed.xml
, walking away to binge watch five seasons of your favorite TV show, and then coming back to see if everything’s been properly indexed. It just might be. -
Other feeds to test with:
static-alphabet-feed.xml
, which is a contrived feed that'll help confirm you properly coded up the requirement that the same URL is downloaded at most once, and that all of your token intersection logic is spot on.single-server-source-feed.xml
, where all HTML articles are drawn from the same exact server. This is really good for testing your per-server limits once you implant multithreading.
- Don't be alarmed if you see spurious parsing errors while testing (e.g. messages like “Operation in progress” come up quite a bit). If you're testing using
small-feed.xml
, then you're testing with live data, so you need to be super tolerant of the fact that some of the XML and HTML documents aren't perfectly structured. (The static feeds used by thesanitycheck
tool have been curated to rely on articles that parse without drama.)
Milestone 2: Implementing Multithreaded aggregate
Practically speaking, the sequential, singly-threaded implementation of aggregate
is unacceptably slow. Users don’t want to wait 45 minutes for an application to launch, so you should introduce multithreading in order to speed things up. The sequential version of aggregate
is slow because each and every request for an article from some remote server takes on the order of a second. If a single thread of execution sequentially processes each and every article, then said articles are requested, pulled, parsed, and indexed one after another. There’s no overlay of network stall times, so the lag associated with network connectivity scales with the number of articles. It’s like watching paint dry in 100% humidity.
Your NewsAggregator
should be updated so that each RSS news feed is downloaded in its own child thread, and each article identified within each feed thread can be downloaded in its own grandchild thread. Each article, of course, still needs to be parsed and indexed. But much of the dead time spent just waiting for content to come over the wire can be scheduled to overlap. The news servers we poll for content are industrial strength and can handle hundreds if not thousands of requests per minute (or at least we're hoping so, lest Stanford IP addresses be blacklisted between now and the assignment deadline).
Naturally, multithreading and concurrency have their shortcomings. Each thread needs exclusive access to the index while doing surgery, so you’ll need to introduce some mutex
es to prevent race conditions from compromising your data structures.
Rather than outline a large and very specific set of constraints, it's your responsibility to do whatever it takes to make the application run more quickly without introducing any synchronization issues. The assignment is high on intellectual content—after all, this is the first time where an algorithmically sound application can break because of race conditions and deadlock—but honestly, there isn’t much additional coding once you get the sequential version up and running. You’ll spend a good amount of time figuring out where the sequential version should spawn off child threads, when those threads should spawn off grandchild threads, and how you’re going to use concurrency directives to coordinate thread communication.
Here’s a list of must-haves:
- Each news feed should be downloaded in its own child thread, though you should limit the number of such threads that can exist at any one time to 8.
- Each news article should be downloaded in its own grandchild thread, though you should limit the number of threads maintaining connections to any one server (e.g.
www.latimes.com
) to 10. Even heavy duty servers have a hard time responding to a flash mob of requests, so it’s considered good networking etiquette to limit the number of active conversations with any one server to some small number. (Browsers are even more conservative and usually limit it to 2 or 3 by default). Note that implementing this will be tricky, but you should be able to introduce astd::map<std::string, std::unique_ptr<semaphore>>
to theprivate
section ofNewsAggregator
(making it thread-safe to the extent you need to) and leverage the implementation strategies we use to implement theoslock
andosunlock
manipulators, the implementation of which can be viewed by looking in/usr/class/cs110/local/src/threads/ostreamlock.cc
. (If thestd::map<std::string, std::unique_ptr<semaphore>>
proves to be too much work for you, then you can just go withstd::map<std::string, semaphore *>
instead.) - Limit the total number of article download threads executing at any one time to be 24. (And that's 24 total, not 24 per feed thread.) There’s little sense in overwhelming the thread manager with a large thread count, so we’ll impose the overall limit to be 24. In practice, this number would be fine tuned for the number of processors, the number of cores per processor, and performance analytics, but no need to worry about that. Just assume that 24 is the right number.
- Ensure that you never ever download—or even attempt to download—the same exact URL twice. This was stated above during the specification of the sequential version, but it's particularly important here, since two threads might initiate a download of the same URL at the same time unless you implant measures to forbid that. As it turns out, the XML and HTML parsing libraries don't deal well when asked to pull the same URL in two simultaneously executing threads, so make sure you don't.
- Manage articles with the same title and server as you did for the sequential part, being sensitive the to the possibility that two such articles may be downloaded simultaneously in parallel threads.
- Ensure that access to all shared data structures is thread-safe, meaning that there’s zero chance of deadlock, and there are no race conditions. Use
mutex
es to provide the binary locks needed to protect against race conditions within critical regions. Make sure that all locks are released no matter the outcome of a download (e.g. insidecatch
clauses as well). - Ensure that the full index is built before advancing to the query loop.
- Ensure that all memory you allocate is freed when the full application exits. If you elect to dynamically allocate memory (as you almost certainly will need to with your per-server semaphore
map
), then make sure that’s all properly donated back to the heap before themain
function returns. (You can ignore any memory errors that stem from the XML library we rely on for parsing. It gets the job done, but it clearly wasn't written by a CS110 graduate. :)) - You can add logging code to emulate the logging using the provided
NewsAggregatorLog
class, but you aren't required to. We’ll be testing your solutions using the--quiet
flag, but you can add logging code just so you can see how much faster everything runs once you introduce threading. - Reminder: you may not change the implementation of the
NewsAggregator::queryIndex
method. It outputs the results in the exact format we’ll expect to be used by the autograding process, so changing it would be a clear path to a 0. - In contrast to Assignment 4, your error messages needn't match ours.
If you take the care to introduce threads as outlined above, then you’ll dramatically speed up the configuration of your aggregate
executable. Woo! You’ll also get genuine experience with networking and the various concurrency patterns that come up in networked applications.
Grading
Your assignments will be exercised using the tests we've exposed, plus quite a few of others. We reserve the right to add tests and change point values if during grading we find some features aren't being exercised, but we're fairly certain the breakdown presented below will be a very good approximation regardless.
News Aggregator Tests (82 points)
- Clean Build: 2 points
- Ensure that your aggregator handles a small feed list: 30 points
- Ensure that your aggregator handles a medium feed list: 20 points
- Ensure that your aggregator handles a large feed list: 20 points
- Ensure that you properly handle duplicate URLs and articles with the same title hosted by the same server: 10 points
Helpful Hints
Here's a compilation of hints that CS110 students have relied on in past quarters to complete Assignment 5:
- Recall that functions and methods capable of throwing exceptions can be optionally placed under the jurisdiction of a
try
/catch
block, as with this:
HTMLDocument document(article.url);
try {
document.parse();
} catch (const HTMLDocumentException& hde) {
log.noteSingleArticleDownloadFailure(article);
return;
}
// got this far? then the article was downloaded just fine
-
Because all state and functionality is managed by a C++
NewsAggregator
class, you shouldn't need any global variables. Any state needed for aggregation can be maintained in theprivate
section of the class declaration and manipulated by the class'spublic
andprivate
methods. -
Because your program is fully object oriented, you'll want to install
NewsAggregator
methods (as opposed to traditional functions) as your thread routines. Methods and functions are different, because the former relies on the address of the relevant object to be invisibly passed in via thethis
parameter. If you want to install aNewsAggregator
method with signaturefoobar(int number, semaphore& s)
, the method pointer can be pushed through a thread constructor in one of the three ways listed below (all of which are equivalent, though we might argue for the second one in particular).
thread t([this, n, &sem] { foobar(n, sem); }); // first way
thread t([this](int number, semaphore& s) {
foobar(number, s);
}, n, ref(sem)); // second way
thread t(&NewsAggregator::foobar, this, n, ref(sem)); // third way
- When
semaphore
s are used to limit the number ofthread
s, it's not good enough to simplysignal
the limitingsemaphore
as the final statement of athread
routine, as with the code below. It's not good enough, because the surroundingthread
may be swapped off the processor after thesignal
call but before the routine formally exits, thereby allowing a thread to still exist even though a permit allowing another thread to be created has beensignal
ed. What you really want is to schedule a call tosignal
to be made just as thethread
is being destroyed, and that second flavor ofsignal
is realized through an overloaded version ofsignal
, as demonstrated below.
// flawed attempt to limit the number of threads to 6 at all times
vector<thread> threads;
semaphore permits(6);
for (size_t i = 0; i < 250; i++) {
permits.wait();
threads.push_back(thread([this](semaphore& s) {
// thread safe code of your choosing
s.signal();
}, ref(permits));
}
for (thread& t: threads) t.join();
// correct way to limit the number of threads to 6 at all times
vector<thread> threads;
semaphore permits(6);
for (size_t i = 0; i < 250; i++) {
permits.wait();
threads.push_back(thread([this](semaphore& s) {
// thread safe code of your choosing
s.signal(on_thread_exit); // schedule permits to be signaled as surrounding thread is destroyed
}, ref(permits));
}
for (thread& t: threads) t.join();
In fact, because this second version of signal
schedules the internal ++
to happen when the thread
exits, you can hoist the call to the top of the thread
routine so that you know it's invoked no matter how the routine exits, as presented below.
// correct way to limit the number of threads to 6 at all times
vector<thread> threads;
semaphore permits(6);
for (size_t i = 0; i < 250; i++) {
permits.wait();
threads.push_back(thread([this](semaphore& s) {
s.signal(on_thread_exit); // schedule permits to be signaled as surrounding thread is destroyed
// thread safe code of your choosing
}, ref(permits));
}
for (thread& t: threads) t.join();
-
Be careful when you share references to data with your
thread
routines. Sometimes you absolutely need to, but you need to be careful that the shared reference doesn't lead to data that unexpectedly changes or is destroyed before thethread
routine makes use of it. You need to be sensitive to the very type of race condition we saw in lecture a little over a week ago that justified whymutex
es were important in the first place. -
Take care to decompose and test the code of yours that manages duplicates and running intersections, because the same exact code is needed for your Assignment 6 solution. You'll more or less need to pull that code in to your own Assignment 6 repos once they're posted.
Icons by Piotr Kwiatkowski