I’m an Assistant Professor of Biomedical Data Science and of Medicine at Stanford University. My research focuses on training and evaluating foundation models for healthcare and is positioned at the intersection of computer science, medical informatics, and hospital systems. Much of my work explores using electronic health record (EHR) data to contextualize human health, leveraging longitudinal patient information to inform model development and evaluation. My work has appeared in NeurIPS, ICLR, AAAI, npj Digital Medicine, Nature Communications, and Nature Medicine.

🧠 2026 Postdoctoral Opportunities →
🎓 PhD Applicants

At Stanford, PhD admissions are handled by academic departments, not individual labs. Students must be admitted to a Stanford PhD program before rotating with or joining a lab. If you are applying and interested in our group, you're welcome to mention it in your application materials. Accepted students can rotate during their first year. I primarily recruit from Biomedical Data Science, Computer Science, and adjacent fields.

Explore Stanford Graduate Programs →

2027 PhD application deadline countdown: 00 weeks 00 days 00:00:00

The 2026 application cycle has closed. For DBDS, the application deadline is typically December 2 each year. Please consult our website.


Note: Due to the volume of inquiries, I may not be able to respond to individual emails regarding PhD admissions prior to acceptance.

Research Interests

Feedback Loop Diagram

Evaluating Foundation Models Reproducibility in healthcare AI is hampered by a lack of standardized benchmarks and challenges in sharing patient data, undermining the research community’s shared understanding of state-of-the-art methods. Private foundation models exacerbate this challenge by introducing additional non-reproducible components. To tackle these challenges and promote reproducibility, we’ve released new EHR datasets INSPECT, EHRSHOT, MedAlign, FactEHR, and made our foundation models available via Hugging Face

Training Multimodal Foundation Models Future healthcare models must integrate diverse data modalities, including imaging, omics, wearables, and medical literature, to capture health progression over time. Longitudinal EHRs provide critical temporal context but are noisy, requiring data-centric AI methods like cleaning, valuation, and curation. My research explores methods to transform EHR timelines into supervision sources to train robust, scalable multimodal models that better capture long-term disease progression.

Human-AI Teaming Today’s preference alignment methods only capture a coarse sense of tacit knowledge —expertise that is contextual, embedded in practice, and rarely documented. Tacit knowledge is a defining feature of complex, multi-stakeholder decision-making processes in medicine, such as tumor boards and care coordination. Successful human-AI teaming in healthcare will require new methods to study, capture, and integrate tacit knowledge into the next generation of healthcare foundation models.