Assn 3, Generative Grading

Due: Thursday, Oct 31st 11:59PM
Grace: Sunday, Nov 7th 11:59PM

Starter Code

Download the Part 1 starter code!

Download the Part 2 starter code (for training your neural net)!

Question 0: Get to know the task

In this assignment you are going to learn to grade real student work for Code.Orgs unit on teaching nested for loops. Your first job is to solve the problem yourself.

At some point, make a program that is incorrect because the inner repeat is written incorrectly. Try to get a hint (hit the lightbulb). In your writeup include an example of hint text that is not relevant to the corresponding code.

Question 1: What is the distribution of solutions?

Plot the rank of different solutions (eg rank 1 is the most common solution) vs its frequency. What is the resulting distribution? It should look something like this:

Question 2: Generate

Write a grammar using ideaToText that is able to simulat at least 100k solutions to the task (of which at least 10k should be unique). Each solution should be connected to the set of decisions that led to the text. Try and get the distribution of your simulator to look as much like the true distribution of students as you can.

The "solutions" that you simulate should be written in CodeDotOrg psuedocode. Do *not* worry about white-space, that will be normalized out. Here is an example of CodeDotOrg pseudo-code for the starter code and the solution. These examples are exhaustive in coverage of the blocks that students can use in their solutions (for now we are going to ignore the "brushed blocks")

Deliverable: Submit your Grammar. You should also submit a "counts" map that associates PseudoCode strings with number of times you generated that particular solution. Your counts map should be saved as a Python3 pickle.

Question 3: Train a Neural Network

Train a neural network off of the samples you produced in question 3. We have built a boilerplate and included "labelled" data upon which you can use to evaluate your model. We want you to build a useful grammar before looking at the labelled data.


  1. Add “labels” to your ideaToText (particular labels)
  2. Simulate some data from your grammar and save it (in a particular format)
  3. Write your neural network model in
  4. Run the model
  5. Compile Results

1. Rubric Labels

Update your grammar to also keep track of particular error cases that we would like to use for giving feedback to students. Here is a list of the labels (yours have to be the same as ours, since we will evaluate your model on its ability to recreate these labels. We strongly encourage you to use the updateRubric function in the ideaToText:

// they don't have a for loop for drawing many squares

// they don't have code for drawing a square

// they don't have code for drawing a single side of a square

// their many-square loop has at least one ???

// they dont understand the order of the for loops params

// they don't understand the starting value of many-square loop

// they don't understand the end value of many-square loop

// they don't understand the delta value of many-square loop

// they have extra commands when drawing a square

// they attempt to draw a square without using a repeat

// they have the wrong number of sides in a square

// when drawing the side of a square they forgot to turn left

// when drawing the side of a square they forgot to move

// when drawing the side of a square they mixed up the order of move/left

// they have extra commands when drawing a square side

// in at least in one turn, they don't use 90 degrees

// in at least one turn, they use turn right instead of turn left

// they don't move forward Counter pixels

2. Simulate Data from Grammar

Use your generative grammar to generate a lot of labelled programs. Save your data as a pickle file with the following structure:
			'code': 'Program WhenRun ...',
			'labels': ['side-armsLength', 'turn-rightLeftConfusion', ...],
			'code': ...,
			'labels': [..., ..., ...],
In particular, we are expecting a list of dictionaries where each dictionary has two keys: code and labels. The former is a string of your generated pseudocode and the latter is a list of labels. The labels must be one of the rubric labels listed above (You can have other labels in your grammar but do not include them in this list). Save the pickle file to the data/ folder in the part 2 starter code.

3. Write your Neural Network

If you download the starter code for part 2, you should see a file. It contains an empty torch.nn.Module where you need to fill out the __init__() and forward() functions. You are free to design any model you would like! When initializing, You get access to the vocabulary size of your grammar, and the number of possible feedback labels. These might be useful when defining submodules. During the forward call, you get access to a token_seq object, which looks like
where the digits represent the index of each word in the vocabulary. All vectors are the same length (because sentences are padded)! For example,
	SOS Program WhenRun Move PAD PAD PAD EOS
	SOS Program WhenRun Move 30 Turn 30 EOS
	SOS Program WhenRun Turn 180 PAD PAD EOS
Second, the token_length object looks like
	torch.Tensor([3,6,4, ...])
indicating the length of non-padded tokens.

4. Run the Model

Here is a summary of the steps to train your model. Preprocess your raw rubric-sampled data.
	python PATH_TO_RAW_DATA.pickle
This will dump three new files into the data/ directory: a training, validation, and testing pickle file. Try looking at the contents! Use the trainer library to train your model.
This will dump a bunch of files into the checkpoints directory, notably a checkpoint.pth.tar file that represents the last iteration and a model_best.pth.tar file that represents the best iteration (measured by performance on a validation set). Finally, use the trainer library to test transfer performance on a small set (500 examples) of real student programs.
If you are curious, check out trainer to see how training and transfer works.

5. Compile Results

Create a short writeup SUNID_LASTNAME_FIRSTNAME.pdf indicating grammar design choices and the different models and hyperparameters you tried. Second, create an updated with your code. Please make sure that the is filled out, training and testing successfully run, and that you include the trained checkpoints in the checkpoints/ subdirectory. Name this

Your model will be partially graded on performance on a held out set of labelled examples (similar to the 500 examples given to you).