Warmups, peer grading, a/b testing and general inference for diseases given symptoms