Out-of-distribution Detection

Advised by: Daniel Rubin, MD, MS

Professor of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics)

and (by courtesy) Computer Science and Ophthalmology,

Stanford University

Advised by: Christopher Lee-Messer, MD, PhD

Clinical Assistant Professor

Neurology & Neurological Sciences and Pediatrics

Stanford School of Medicine

Deep learning models offer great promise for improving speed and quality of diagnosis and treatment in medicine. However, a major flaw with these methods is that they tend to be overconfident in cases where humans would quickly realize that they were out of their depth. This is due to an underlying assumption that variability encountered by the model after being deployed is drawn from the same distribution as the variability present in its training data.

In practice, it is difficult to ensure all real-world samples are drawn from the same distribution as the training data. The consequences are typically minor in consumer applications, but in medical cases, this overconfidence could lead to misdiagnosis, injury or even death. It is thus critical for a model used in medical applications to detect if incoming medical samples are drawn far away from the training distribution, as these are situations when it is likely to fail.

We leverage a framework based on induced metrics on hierarchical vector spaces to identify when a model has not encountered samples from a distribution during training.

(more details coming soon!)