Learning Links Between Low Level and Preferred Medical Terminology with GNNs
Relational databases like MedDRA can be valuable tools for learning technical
medical language. In my project, I was interested in leveraging the tree structure of
the MedDRA ontology to learn node embeddings for the natural language phrases
contained at each level of the ontology. I begin by inferring a graph structure from
the hierarchical relations defined between phrases in MedDRA. I then use a Graph
Neural Network (GNN) to learn node embeddings which can be used to predict
relationships between "low level terminology" - i.e. the kind of phrases used when
doctors are talking to patients - and "preferred terminology" - i.e. standardized
technical medical jargon. In this project I explored the effects of modeling the
ontology using different graph structures (both homogeneous and heterogeneous). I
further explore the effectiveness of pairing the GNN with various Bert models. My
clearest finding is that the use of a heterogeneous GNN significantly outperforms
a standard GNN in all experimental settings. I found that the heterogeneous GNNs are, on average, able to achieve approximately 70% accuracy in predicting the links
between low level terminology and the appropriate preferred terminology. Additionally, and somewhat surprisingly, I found that using pretrained BERT models - either specialized medical BERT models like BlueBERT, or the standard base BERT model - to initialize note embeddings did not noticeably outperform the case of random node feature initialization.