My recent research explores weakly supervised machine learning, where indirect and often noisy sources of domain knowledge are combined to train models. Obtaining large-scale, expert-labeled training data is a significant challenge in medicine. Weakly supervised methods enable new mechanisms of sharing medical expertise and generating training sets from large-scale collections of unlabeled text, medical imaging, and sensor data.
BigScience 2021 Workshop: The Summer of Language Models The BigScience effort is an excited international collaboration aimed at training a very large and open language model for the research community. I'm co-chair of the biomedical working group and participant in the modeling working group.
I'm a contributor/co-developer on Stanford's weak supervision framework Snorkel. I have papers on weakly supervised biomedical concept tagging, machine reading in the electronic health record (EHR), and classifying rare aortic valve diseases in cardiac MRI videos from the UK Biobank, an open, population-scale health dataset.
Ontology-driven weak supervision for clinical entity classification in electronic health records
Jason A. Fries, Ethan Steinberg, Saelig Khattar, Scott L. Fleming, Jose Posada, Alison Callahan, and Nigam H. Shah.
Nature Communications. 2021.
Evaluation of Domain Generalization and Adaptation on Improving Model Robustness to Temporal Dataset Shift in Clinical Medicine
Lin Lawrence Guo, Stephen R Pfohl, Jason A. Fries, Alistair Johnson, Jose Posada, Catherine Aftandilian, Nigam Shah, Lillian Sung
Assessment of Extractability and Accuracy of Electronic Health Record Data for Joint Implant Registries
Nicholas J Giori, John Radin, Alison Callahan, Jason A. Fries, Eni Halilaj, Christopher Ré, Scott L Delp, Nigam H Shah, Alex HS Harris.
JAMA Network Open. 2021.
Language models are an effective representation learning technique for electronic health record data
Ethan Steinberg, Ken Jung, Jason A. Fries, Conor Corbin, Stephen R. Pfohl, Nigam H. Shah.
Journal of Biomedical Informatics. 2021.
Estimating the efficacy of symptom-based screening for COVID-19
Alison Callahan*, Ethan Steinberg*, Jason A.Fries, Saurabh Gombar, Birju Patel, Conor Corbin, Nigam H. Shah.
npj Digital Medicine. 2020.
Assessing the accuracy of automatic speech recognition for psychotherapy
Adam S. Miner, Albert Haque, Jason A.Fries, Scott L. Fleming, Denise E. Wilfley, G. Terence Wilson, Arnold Milstein, Dan Jurafsky, Bruce A. Arnow, W. Stewart Agras, Li Fei-Fei, Nigam H. Shah.
npj Digital Medicine. 2020. [code]
Measure what matters: counts of hospitalized patients are a better metric for health system capacity planning for a reopening
Sehj Kashyap, Saurabh Gombar, Steve Yadlowsky, Alison Callahan, Jason A. Fries, Benjamin A Pinsky, Nigam Shah.
J. Am. Med. Inform. Assoc.. 2020. [PMCID: PMC7337779]
The Accuracy vs. Coverage Trade-off in Patient-facing Diagnosis Models
Anitha Kannan, Jason A. Fries, Eric Kramer, Jen Jen Chen, Nigam Shah, and Xavier Amatriain.
AMIA 2020 Informatics Summit. 2020.
Multi-Resolution Weak Supervision for Sequential Data
Fred Sala, Paroma Varma, Jason A. Fries, Daniel Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, and Christopher Ré.
Neural Information Processing Systems (NeurIPS). 2019.
Weakly Supervised Classification of Aortic Valve Malformations using Unlabeled Cardiac MRI Sequences
Jason A. Fries, Paroma Varma, Vincent S. Chen, Ke Xiao, Helio Tejeda, Priyanka, S., Jared Dunnmon, Henry Chubb, Shiraz Maskatia, Madalina Fiterau, Scott Delp, Euan Ashley, Christopher Ré, and James Priest.
Nature Communications. 2019. [PMCID: PMC6629670] [code]
Medical Device Surveillance with Electronic Health Records
Alison Callahan*, Jason A. Fries*, Christopher Ré, James Huddleston, Nicholas Giori, Scott Delp, and Nigam Shah.
npj Digital Medicine. 2019. [PMCID: PMC6761113] [code]
Snorkel: Rapid Training Data Creation with Weak Supervision
Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré.
Special Issue of the VLDB Journal. 2019.
Multi-frame Weak Supervision to Label Wearable Sensor Data
Saelig Khattar, Hannah O'Day, Paroma Varma, Jason A. Fries, Jennifer Hicks, Scott Delp, Helen Bronte-Stewart, and Christopher Ré.
Time Series Workshop @ ICML. 2019.
Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
Natalia Antropova, Andrew Bream, Brett K Beaulieu-Jones, Irene Chen, Corey Chivers, Adrian Dalca, Sam Finlayson, Madalina Fiterau, Jason A. Fries, Marzyeh Ghassemi, Mike Hughes, Bruno Jedynak, Jasvinder S Kandola, Matthew McDermott, Tristan Naumann, Peter Schulam, Farah Shamout, Alexandre Yahi.
Proceedings at https://arxiv.org/. 2018.
Swellshark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data
Jason A. Fries, Sen Wu, Alex Ratner, Christopher Ré.
Preprint at https://arxiv.org/. 2017.
ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information
Madalina Fiterau, Suvrat Bhooshan, Jason A. Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré, and Scott Delp.
Machine Learning in Healthcare. 2017. [PMCID: PMC6417829]
Snorkel: Rapid Training Data Creation with Weak Supervision
Alexander Ratner, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré
Best of Proceedings VLDB Endowment. 2017. [PMCID: PMC5951191]
Brundlefly at SemEval-2016 Task 12: Recurrent Neural Networks vs. Joint Inference for Clinical Temporal Information Extraction
Jason A. Fries
SemEval@NAACL-HLT 2016. 1274-1279. 2016.
Data Programming with DDLite: Putting Humans in a Different Part of the Loop
Henry Ehrenberg, Jaeho Shin, Alex Ratner, Jason A. Fries, and Christopher Ré.
Similarity-based LSTMs for Time Series Representation Learning in the Presence of Structured Covariates
Madalina Fiterau, Jason A. Fries, Eni Halilaj, Nopphon Siranart, Suvrat Bhooshan, and Christopher Ré
Time Series Workshop @ NeurIPS. 2016.