Assn 1, The “OG” Original Generative Model

Question 1: Generative Origin Story [warmup]

a) Use IRT model with ability, difficulty and guess, to simulate 10 students with ability 1 through 10, answering 10 questions with difficulty 1 through 10. On each problem the chance of a student getting the answer correct by random guessing is 1/4. Report the number of questions each student gets right.

b) Now, also simulate the ability of each student. Simulate 100 students. For each student sample a “prior” ability from the distribution A ~ N(0, 1). Each student takes a quiz with difficulties [-0.5, 0, 0.5, 1.0, 1.5]. Include a scatter plot of student ability vs their percent correct. Save your simulation as a CSV with each student as a row, and each answer as a column.

Question 2: TOEFL and GRE tests [challenge]

Data (with example torch code): q2-starter.zip

You are given a simulated dataset of 100 students who took the first section of the GRE math exam (sim-responses.csv). In the dataset each student is a row and each question is a column. If element (i, j) is a 1, that means student i correctly answered question j. You know, from testing, the difficulty of each item (sim-difficulty.csv). Infer the ability of each student and save your results as a file (infer-ability.csv). Because the data is simulated, you can compare your estimated abilities to the true abilities (sim-abilities.csv).

Re-run your analysis on the TOEFL data (you don't need to report anything). The TOEFL data doesn't have provided abilities. Write a sentence describing how you know if you did a good job.

Question 3: Creative Generation [creative]

Imagine any generative scenario more complex than the standard IRT model. Write up a description of your generative process and submit a script called creative_gen.py which has the three functions:

create_item_bank(n)
which returns a list of n item ids (you create the questions, with any details you want, and their ids)
create_students(m)
which returns a list of m student ids (you create the students, with any details you want, and their ids)
response(studentId, itemId)
which returns a boolean True, if the student answers the question correctly, or False otherwise.

You may assume that students take tests one at a time. No other assumptions are necessary!

Perhaps you could include other nuances for the GRE example? Perhaps you could model a different response experience? Maybe your students change over time? See how believable a model you can create. We are going to make a model “zoo” from all the generative processes. You do not have to “infer” abilities.

Optional Question 4: Exploratory Learning

Optionally do any work that you think would benefit yourself and/or the field of computational education. Report what you did. If you want to replace a problem on the pset with your work run it by us!