CS107 Using sanity check

Written by Julie Zelenski

Why do I need to verify output conformance?

Assignments are graded with the aid of automated tools and small slip-ups in formatting can cause your output to be misjudged in grading. To earn proper credit, your program must conform to the output specification given in the assignment writeup and match the behavior of our sample executable. Minor variations like different amounts of whitespace can be ignored, but changing the format, reordering output, or leaving behind extraneous debugging chatter will thwart the autotester and cause your program to be marked wrong.

In order to verify output conformance, we provide a sanity check tool. It compares the output of your program to that of the sample executable and reports on discrepancies, allowing you to detect and address those issues before you submit.

Using sanity check

Sanity check is a tool you use from the command-line like this:

Log into a myth machine.
Use the cd command to change into the directory containing your project files.
Issue the command
```
/afs/ir/class/cs107/tools/sanitycheck
```
The tool checks the output conformance of the program in the current directory. It runs the default sanity tests and compares your program's output to the sample executable. If all matches, it is considered passing, otherwise it reports the discrepancy.
Passing sanity check suggests the autotester won't have problems interpreting your output, and that's good. If it doesn't match, you must fix your output to meet the required format or you risk losing a pile of points from the merciless autotester. There will cause much sadness as we do not restore lost points for problems due to ignoring/failing sanity check.
You can run sanity check as many times as you need. Our submit tool will even encourage one final check before you submit.

Using sanity check with your own custom tests

The default tests supplied for sanity check may not be particularly rigorous nor comprehensive, so you will want to supplement with additional tests. You can create inputs of your own and write them into custom tests to be used by the sanity check tool. Create a text file using this format:

# File: custom_tests
# ------------------
# This file contains a list of custom tests to be run by the sanity check tool.
# Each custom test is given on a single line using format:
#
#  testname  executable  arg(s)
#
# The testname is a single word (no spaces) of your choice. This is used as
# the label to identify the test in the output.
# The executable is the name of the program to run (e.g. reassemble or spellcheck)
# The args are optional. If given, they are treated as a sequence of space-separated
# command-line arguments with which to invoke the executable program.
#
# For each custom test, sanity check will invoke your executable program and the
# solution program (using same command-line arguments), compare the two
# outputs to verify if they match, and report the outcome.
#
# Blank lines and comment lines beginning with # are ignored.
#
# Below is an example custom test, edit as desired.

WriteupExample reassemble slink/allswell_frags

To run your custom tests, invoke sanity check with its optional argument, which is the name of the custom test file

 /afs/ir/class/cs107/tools/sanitycheck custom_tests

When invoked with an argument, sanity check will use the test cases from the named file instead of the standard ones. For each custom test listed in the file, sanity check runs the sample solution with the given command-line arguments and captures its output, then runs your program with the same arguments to capture its output, and finally compares the two results and reports any mismatches. Pretty sweet!

Frequently asked questions about sanity check

How can I reproduce/debug a problem that appears during sanity check?

Look through the sanity check output to find the command being executed:

Command: ./reassemble /afs/ir/class/cs107/samples/sanity_tests/assign1/allswell_frags

Run that same command (in shell, gdb, or Valgrind) to replicate the situation being tested. You can also view the file contents (such as the allswell_frags file in the above command) to better understand what is being tested.

Is it possible to write a custom test to verify Valgrind correctness or memory/time efficiency?

No. Custom sanity check tests compare on output only. You will need to supplement with other forms of testing to verify those additional requirements.

What is the difference between a "MISMATCH" and a "NOT OK" result from sanity check?

Both are test failures, but for somewhat different reasons. MISMATCH indicates that your program successfully ran to completion but the output it produced did not match the output produced by the sample. NOT OK reports that your program did not successfully complete (exited due to a fatal error or timed out) and its output was not compared to the sample. Whether reported as MISMATCH or NOT OK, a sanity check failure indicates there is a problem with your program relative to the sample on that particular test.

If my program passes sanity check, does that mean it's perfect?

Passing sanity check indicates you've got those specific tests nailed, but it doesn't predict anything beyond that. You gain broader coverage by creating your own custom tests. The more comprehensive your tests, the more confidence you can have in the results.

Conversely, if my program fails sanity check, am I hosed?

Passing sanity check is generally a necessary condition for a good grading outcome, any failures are cause for alarm and should be investigated and resolved before submitting. That said, sanity check only accepts output that is an exact match and there are a few situations, such as allowed latitude in the spec or equivalent re-wording of error messages, where a mismatch is not actually an error--- i.e. the program's behavior is a valid alternative to the sample, but sanity check doesn't know that. The autotester defers these cases to the judgment of the grading TA to identify whether such mismatches are true failures or harmless variation.

Submit seems to expect that my program pass sanity. Can I hard-code my program to work on exactly/only those inputs so that it passes sanity and can be submitted?

No. Any submission that attempts to defraud the results of automated testing will be rejected from grading and receive a 0 score. Although we recommend that you resolve any sanity check failures before submit, it is possible to submit a program that doesn't pass sanity by simply skipping the sanity check during the submit process.