Learning Representations of Eligibility Criteria in Clinical Trials Using Transformers

A clinical trial's eligibility criteria can have a significant impact on the successful completion of the study, as they determine essential factors such as the recruitment efficiency, patient withdrawal rates, and translational power. Most inclusion and exclusion criteria are written in free-text, which makes a systematic review and analysis of these criteria prohibitive on a large scale. In our project, we address these issues by learning standardized representations of eligibility criteria using transformers. In particular, we pretrain a BERT model on a large unlabeled corpus of eligiblity criteria acquired from ClinicalTrials.gov. Using Named Entity Recognition (NER) as a proxy for the quality of our representations, we show that our pretrained model (ecBERT) outperforms other publicly available biomedical BERT models, suggesting the benefit of domain-specific representations for eligiblity criteria.