Week 7 Exercises: Farm meets multithreading

Excellent job on making it well past the halfway point of the quarter! I’m so proud of all of your progress, and I hope you feel accomplished as well!

The goal of this week’s exercise is to get you thinking about multithreading material and to help you ask questions about lecture material that still feels confusing. Please ask questions! The exercise is designed to be light in order to give you time to focus on Project 1. It should take approximately an hour.

Due date: Wednesday, Feb 23, 11:59pm (Pacific time)

(I can extend this deadline for you if you are having a really rough week. And ping me on Slack if you are having difficulty with this assignment!*

Getting the code

You should have received an invite to join this week’s Github repository. If you didn’t get an email invite, try going to this link:

https://github.com/cs110l/week7-YOURSUNETID

You can download the code using git as usual:

git clone https://github.com/cs110l/week7-YOURSUNETID.git week7

You don’t need a Linux setup for this, so you can code locally as long as you have a Rust toolchain installed.

Part 1: Farm

This week, we’ll take a stab at implementing a simplified version of farm from your CS 110 assignment 3. This program uses multiple instruction streams to factor numbers in parllel. One twist: we’ll use multithreading instead of multiprocessing!

This version of farm will receive the numbers to factor using command line arguments instead of reading from stdin. It can be run like so:

🍉  cargo run 12345678 12346789 34567890 45678902
    Finished dev [unoptimized + debuginfo] target(s) in 0.29s
     Running `target/debug/farm 12345678 12346789 34567890 45678902`
Farm starting on 8 CPUs
12345678 = 2 * 3 * 3 * 47 * 14593 [time: 420.687768ms]
12346789 = 7 * 13 * 19 * 37 * 193 [time: 678.612537ms]
34567890 = 2 * 3 * 5 * 7 * 97 * 1697 [time: 1.176596148s]
45678902 = 2 * 433 * 52747 [time: 1.812403818s]
Total execution time: 1.812585221s

Note: when working through this problem, I recommend referencing “attempt 4” from the ticket agents problem that we worked through in class as an example (code with comments here). It’s similar to what we’re trying to do in these exercises: it creates threads; creates and moves shared, mutable, and protected data; coordinates to modify that data without introducing race conditions and without serializing the code; and joins the threads.

You can implement this version of farm in three steps.

First, establish the queue of numbers to factor.
- You will want to use the get_input_numbers() function that we have provided. (This will return a VecDeque, which acts like a double-ended queue; the documentation is here, but you won’t need to read up on it.)
- I’d recommend drawing on lecture examples to make this queue safe to access and modify from multiple threads.
Spawn num_threads worker threads. In each thread, using a loop, pop a number off the queue and factor that number (by calling factor_number) until no numbers remain on the queue.
- You can use VecDeque::pop_front to get an element from the quue. This returns an Option<u32>, which is the first element in the collection, or None if the collection is empty.
- You should be careful to think about when the thread holds a mutex. You’ll want threads to hold the mutex while getting a number from the queue, but they shouldn’t hold it when factoring a number – this would serialize the computation!
- I recommend writing a helper function that borrows your data structure, pops a number from the front of the queue, and returns an Option<u32> (you can tell that the queue is empty when this returns None). Creating a function creates a “scope”; the lock is held only for the duration of that function.
Wait for all of your threads to finish by calling thread.join() on each of them. If you don’t do this, your program will compile fine, but you may have problems where some numbers are missing from the output because the program exited before the threads could finish.

When you run your program, pay attention to the total execution time printed at the end of the program. If it is significantly longer than the time taken to factor any one number, you may be accidentally serializing your program by holding a lock for too long.

When you are finished, run cargo build && cargo test to verify that your implementation is working properly:

🍉  cargo build && cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
    Finished test [unoptimized + debuginfo] target(s) in 1.16s
     Running target/debug/deps/farm-75667e3d01bacd18

running 3 tests
test test::test_one_number ... ok
test test::test_multiple_numbers ... ok
test test::test_parallelism ... ok

test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Fun fact: This implementation is made significantly easier by the fact that we have all the numbers to factor up front (from argv) instead of receiving them at some point via stdin. Your CS 110 farm implementation needed to contend with the fact that an empty queue does not mean workers can exit (they need to stick around in case another number arrives later via stdin). Here, workers can pull numbers directly off the queue (there is no main process broadcasting numbers to workers), and workers can exit as soon as the queue becomes empty, since there is no way for additional numbers to come in. In the coming weeks, you’ll learn how to manage worker threads in cases where additional work might arrive after a queue becomes empty, and you’ll implement such a program in CS 110 assignment 6.

Part 1 is worth 70%. You’ll get credit for a solid effort on each bullet point above, including partial credit for a partial implementation.

Part 2: Week 8 survey

Part 2 of this assignment is to fill out this survey.

Part 2 is worth 30%. You’ll get full credit if you fill out the survey with reasonable thoughtfulness (e.g., leaving every question blank doesn’t count, and I’m expecting a few sentences for the “reflect on 110L” and “reflect on project 1” questions).

Optional Part 3 for CS 110 students: Run ThreadSanitizer on your ThreadPool

Recall from the week 1 exercises that sanitizers add extra instrumentation to your code to identify when it does suspicious things, such as reading from uninitialized memory or writing out of bounds on an array. There is another sanitizer called ThreadSanitizer (sometimes called TSAN) that is extremely useful for identifying data races. ThreadSanitizer observes memory accessses in your code, looking for threads accessing the same data without synchronizing using locks.

You can run ThreadSanitizer on your assignment 5 ThreadPool code by editing your Makefile and adding -fsanitize=thread to the CXXFLAGS and LDFLAGS variables:

-CXXFLAGS = -g $(WARNINGS) -O0 -std=c++17 $(DEPS) $(DEFINES) $(INCLUDES)
+CXXFLAGS = -g $(WARNINGS) -O0 -std=c++17 $(DEPS) $(DEFINES) $(INCLUDES) -fsanitize=thread
 LDFLAGS = -lm -lxml2 -L/afs/ir/class/cs110/local/lib -lrand -lthreadpoolrelease -lthreads -lrssnet -pthread -lboost_thread \
           -L/afs/ir/class/cs110/lib/netlib -lcppnetlib-client-connections -lcppnetlib-uri -lcppnetlib-server-parsers \
-          -L/afs/ir/class/cs110/lib/myhtml -lmyhtml -lssl -lcrypto -ldl
+          -L/afs/ir/class/cs110/lib/myhtml -lmyhtml -lssl -lcrypto -ldl -fsanitize=thread

From Ryan: “I tried running this on our sample solution. Interestingly, TSAN gives a lot of errors on ./aggregate which seem to be coming from the networking and HTML parsing libraries that we are using. I haven’t had time to investigate whether these are legitimate errors, or if these libraries are using special synchronization primitives that TSAN doesn’t know about. Either way, because there are so many errors unrelated to your code, I don’t think TSAN will be helpful for auditing NewsAggregator.

However, TSAN should be extremely helpful for auditing your ThreadPool implementation. If you make the above changes to your Makefile and run make clean && make, you should be able to run ./tptest and ./tpcustomtest --all and get a clean output with no TSAN warnings.

If you do work through this, I would love to hear how it goes for you. What sorts of mistakes did TSAN find?

Submitting your work

As with last week, you can commit your progress using git:

git commit -am "Type some title here to identify this snapshot!"