Digital twins

We need to talk about digital twins. I've been seeing the term a lot recently in grant applications and papers. Gosh, it sounds really cool! But I am worried that it can mean two different things, and one of the definitions is borrowing legitimacy from the other one.

The term originally comes from industrial engineering, where it refers to a high fidelity computational model of a process or object that enables simulation of what might happen in the future or what might happen if certain changes to inputs are made. You can see why it would be attractive to do this in medicine, given how challenging it is to do experiments in humans. I think the strong version of digital twins in medicine is a pretty direct translation of the engineering type. This would be things like mechanistic models of the cardiovascular system and its response to medications. This can be incredibly useful because it helps us to make an approximately correct causal model, so we can reliably estimate the effects of perturbing inputs like medications.

The other kind of digital twin is quite different. The idea here is to write down a detailed description of the patient, including all their diagnoses, lab values, genomic results, medications, and vital signs. Then we can use a large electronic health record (EHR) or claims dataset to find similar patients and see what ended up happening to them. Potentially, we could learn the effects of changing this patient's treatments by looking at similar patients who got treatment A or B and seeing how those two groups did.

I think you can see the key difference between the two versions of digital twins. The first one involves creating a mechanistic model, the second a phenomenological model. In the second version, we don't really need to understand what happens on a detailed level when we change the medication, we just observe inputs and outputs and learn a function mapping one to the other.

Unfortunately, version 2 has some big problems! EHR data are messy, with high rates of errors and omissions. Many times doctors pick treatment A vs B for specific reasons that aren't captured in the EHR. It's a perfect setup for confusing correlation with causation due to unmeasured differences in the patients who get treatment A versus B (see Soni, Spratt et al JCO 2019). Incorporating richer data, such as notes and sensor data, can help, but this reduces rather than eliminates the dangers. From reading some cancer digital twin papers, they often do not address the causal inference issues at all, which is concerning. And for many of these questions like which cancer therapy to give, we are really far from a mechanistic understanding given the constantly mutating cancer cells with spatial heterogeneity, fluctuating hypoxia and other aspects of the microenvironment, adaptive immunity, and so on.

None of which is to say that this line of research is not worthwhile! If we can get it right and carefully validate it, it could be very useful. And for many prediction tasks (contrasted to causal inference tasks), the confounding issues don't apply. But we should not overpromise the current state of the science and should be careful about analogies. "Digital twin" should not be a shortcut around the hard work of specifying mechanisms, assumptions, and validation.


Published

Category

Musings

Tags