Week 7 Exercises: Farm meets multithreading
Excellent job on making it well past the halfway point of the quarter! I’m so proud of all of your progress, and I hope you feel accomplished as well!
The goal of this week’s exercise is to get you thinking about multithreading material and to help you ask questions about lecture material that still feels confusing. Please ask questions! The exercise is designed to be light in order to give you time to focus on Project 1. It should take approximately an hour.
Due date: Wednesday, Feb 23, 11:59pm (Pacific time)
(I can extend this deadline for you if you are having a really rough week. And ping me on Slack if you are having difficulty with this assignment!*
Getting the code
You should have received an invite to join this week’s Github repository. If you didn’t get an email invite, try going to this link:
https://github.com/cs110l/week7-YOURSUNETID
You can download the code using git as usual:
git clone https://github.com/cs110l/week7-YOURSUNETID.git week7
You don’t need a Linux setup for this, so you can code locally as long as you have a Rust toolchain installed.
Part 1: Farm
This week, we’ll take a stab at implementing a simplified version of farm
from your CS 110 assignment 3. This
program uses multiple instruction streams to factor numbers in parllel. One twist: we’ll
use multithreading instead of multiprocessing!
This version of farm will receive the numbers to factor using command line
arguments instead of reading from stdin. It can be run like so:
🍉 cargo run 12345678 12346789 34567890 45678902
Finished dev [unoptimized + debuginfo] target(s) in 0.29s
Running `target/debug/farm 12345678 12346789 34567890 45678902`
Farm starting on 8 CPUs
12345678 = 2 * 3 * 3 * 47 * 14593 [time: 420.687768ms]
12346789 = 7 * 13 * 19 * 37 * 193 [time: 678.612537ms]
34567890 = 2 * 3 * 5 * 7 * 97 * 1697 [time: 1.176596148s]
45678902 = 2 * 433 * 52747 [time: 1.812403818s]
Total execution time: 1.812585221s
Note: when working through this problem, I recommend referencing “attempt 4” from the ticket agents problem that we worked through in class as an example (code with comments here). It’s similar to what we’re trying to do in these exercises: it creates threads; creates and moves shared, mutable, and protected data; coordinates to modify that data without introducing race conditions and without serializing the code; and joins the threads.
You can implement this version of farm in three steps.
- First, establish the queue of numbers to factor.
- You will want to use the
get_input_numbers()function that we have provided. (This will return a VecDeque, which acts like a double-ended queue; the documentation is here, but you won’t need to read up on it.) - I’d recommend drawing on lecture examples to make this queue safe to access and modify from multiple threads.
- You will want to use the
- Spawn
num_threadsworker threads. In each thread, using a loop, pop a number off the queue and factor that number (by callingfactor_number) until no numbers remain on the queue.- You can use
VecDeque::pop_frontto get an element from the quue. This returns anOption<u32>, which is the first element in the collection, orNoneif the collection is empty. - You should be careful to think about when the thread holds a mutex. You’ll want threads to hold the mutex while getting a number from the queue, but they shouldn’t hold it when factoring a number – this would serialize the computation!
- I recommend writing a helper function that borrows your data structure, pops a number from the front of the queue, and returns an
Option<u32>(you can tell that the queue is empty when this returnsNone). Creating a function creates a “scope”; the lock is held only for the duration of that function.
- You can use
- Wait for all of your threads to finish by calling
thread.join()on each of them. If you don’t do this, your program will compile fine, but you may have problems where some numbers are missing from the output because the program exited before the threads could finish.
When you run your program, pay attention to the total execution time printed at the end of the program. If it is significantly longer than the time taken to factor any one number, you may be accidentally serializing your program by holding a lock for too long.
When you are finished, run cargo build && cargo test to verify that your
implementation is working properly:
🍉 cargo build && cargo test
Finished dev [unoptimized + debuginfo] target(s) in 0.06s
Finished test [unoptimized + debuginfo] target(s) in 1.16s
Running target/debug/deps/farm-75667e3d01bacd18
running 3 tests
test test::test_one_number ... ok
test test::test_multiple_numbers ... ok
test test::test_parallelism ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Fun fact: This implementation is made significantly easier by the fact that
we have all the numbers to factor up front (from argv) instead of receiving
them at some point via stdin. Your CS 110 farm implementation needed to
contend with the fact that an empty queue does not mean workers can exit
(they need to stick around in case another number arrives later via stdin).
Here, workers can pull numbers directly off the queue (there is no main process
broadcasting numbers to workers), and workers can exit as soon as the queue
becomes empty, since there is no way for additional numbers to come in. In the
coming weeks, you’ll learn how to manage worker threads in cases where
additional work might arrive after a queue becomes empty, and you’ll implement
such a program in CS 110 assignment 6.
Part 1 is worth 70%. You’ll get credit for a solid effort on each bullet point above, including partial credit for a partial implementation.
Part 2: Week 8 survey
Part 2 of this assignment is to fill out this survey.
Part 2 is worth 30%. You’ll get full credit if you fill out the survey with reasonable thoughtfulness (e.g., leaving every question blank doesn’t count, and I’m expecting a few sentences for the “reflect on 110L” and “reflect on project 1” questions).
Optional Part 3 for CS 110 students: Run ThreadSanitizer on your ThreadPool
Recall from the week 1 exercises that sanitizers add extra instrumentation to your code to identify when it does suspicious things, such as reading from uninitialized memory or writing out of bounds on an array. There is another sanitizer called ThreadSanitizer (sometimes called TSAN) that is extremely useful for identifying data races. ThreadSanitizer observes memory accessses in your code, looking for threads accessing the same data without synchronizing using locks.
You can run ThreadSanitizer on your assignment 5 ThreadPool code by editing your Makefile
and adding -fsanitize=thread to the CXXFLAGS and LDFLAGS variables:
-CXXFLAGS = -g $(WARNINGS) -O0 -std=c++17 $(DEPS) $(DEFINES) $(INCLUDES)
+CXXFLAGS = -g $(WARNINGS) -O0 -std=c++17 $(DEPS) $(DEFINES) $(INCLUDES) -fsanitize=thread
LDFLAGS = -lm -lxml2 -L/afs/ir/class/cs110/local/lib -lrand -lthreadpoolrelease -lthreads -lrssnet -pthread -lboost_thread \
-L/afs/ir/class/cs110/lib/netlib -lcppnetlib-client-connections -lcppnetlib-uri -lcppnetlib-server-parsers \
- -L/afs/ir/class/cs110/lib/myhtml -lmyhtml -lssl -lcrypto -ldl
+ -L/afs/ir/class/cs110/lib/myhtml -lmyhtml -lssl -lcrypto -ldl -fsanitize=thread
From Ryan: “I tried running this on our sample solution. Interestingly, TSAN gives a lot of errors on ./aggregate which seem to be coming from the networking and HTML parsing libraries that we are using. I haven’t had time to investigate whether these are legitimate errors, or if these libraries are using special synchronization primitives that TSAN doesn’t know about. Either way, because there are so many errors unrelated to your code, I don’t think TSAN will be helpful for auditing NewsAggregator.
However, TSAN should be extremely helpful for auditing your ThreadPool implementation. If you make the above changes to your Makefile and run make clean && make, you should be able to run ./tptest and ./tpcustomtest --all and get a clean output with no TSAN warnings.
If you do work through this, I would love to hear how it goes for you. What sorts of mistakes did TSAN find?
Submitting your work
As with last week, you can commit your progress using git:
git commit -am "Type some title here to identify this snapshot!"
In order to submit your work, commit it, then run git push. This will upload
your commits (snapshots) to Github, where we can access them. You can verify
that your code is submitted by visiting
https://github.com/cs110l/week7-yourSunetid and browsing the code there. You
can git push as many times as you’d like.