Understanding and treating long-haul COVID-19 with digital medical twins—Part 2

Step-by-step, how can collective data unlock the mystery of long-haul COVID and effective treatment plans?

By David Dimond, Chief Healthcare Innovation Officer, Dell Technologies

In our last post, we talked about long-haul COVID-19, more formally known as Post-Acute Sequelae of SARs-CoV2 (PASC), and how “digital twins”—the offspring of big data and advanced analytics—might help researchers find ways to help the millions afflicted with this mysterious and debilitating syndrome.

Digital twins are simulations of patients created from several terabytes of medical data per person. They can give researchers the information they need to detect disease patterns and simulate the effects of treatments, and to identify the most promising paths for further research among real people. The more digital twins we create, the more discoveries we can make.

Integrating biology and the bedside

How do we put together the vast amounts of data and the advanced analytical capabilities needed to conduct these types of studies? Dell Technologies has collaborated with the i2b2 tranSMART Foundation (Informatics for Integrating Biology and the Bedside) to help with just that task. With new funding coming through grants from the National Institutes of Health, this joint initiative will enable the coordination of data collection and research among 200 hospitals and research centers, 50 academic medical centers, and several large pharmaceutical companies.

Just as the pooling of resources across the globe produced an effective COVID vaccine in record time, this effort, announced in May, will apply advanced artificial intelligence to the puzzle of long-haul COVID, with the goal of finding answers before victims’ lives are dramatically impacted. Though the project faces formidable data management and sharing challenges, what we learn from solving them will not only help to advance COVID research but potentially accelerate medical research efficiencies in the future.

Building the digital twin database

The digital twin database will be built from a “data enclave” that assembles data on patients from participating clinical institutions in a variety of formats, including lab values, medical images, the data streams from monitoring equipment, textual data from electronic health records and genome sequences. All the data will exist in its raw form—as so-called “data lakes”—so researchers will have complete flexibility in how to use it as they develop new algorithms. It will start with an initial sample of 70,000 patients from existing data collaboratives and has the capability to grow to two million digital twins over the next four years.

While patients’ data is de-identified before entering the enclave, researchers need to preserve the linkages among the various pieces of data for any given patient in order to build an accurate digital twin. And because COVID long-haulers have chronic conditions entailing ongoing medical care, the system will be able to regularly update the twin’s health records. This continual monitoring, generating what we might call “small data”, is the key to measuring patients’ progress, changes in their condition, and responses to treatment.

New knowledge from your twin

Patients who provide their data may be able to benefit directly and immediately if the research produces a relevant breakthrough, which offers them a strong incentive to give their permission for their data to be used. Here’s how that might work:

1. Patient X has cardiac and respiratory conditions, mental fatigue, and memory issues following her bout with COVID. Her doctors have no answers for her but tell her she’s a long-hauler. They ask for her consent to give her de-identified data to a large-scale international study.

2. The data enclave collects all her existing health data in all its forms and organizes it into a single record—her digital twin.

3. Similar digital twins are grouped into cohorts to be studied together.

4. Patient X monitors her symptoms with mobile apps and keeps visiting her doctors. The data enclave captures all of Patient X’s new data regularly, de-identifies it, and adds it to the record for her digital twin.

5. Continuous data analysis and feedback flags an experimental drug that has been highly effective in digital twins similar to Patient X. She’s automatically notified, and after a genomic test confirms that she qualifies for the drug’s clinical trial, she enrolls.

6. Data from her participation in the trial, as well as her ongoing medical care, continue to update and refine her digital twin.

Protocols for long-haul COVID

Digital twin research has the potential to help us crack the mysteries of PASC—who’s most at risk and why, what are the most common patterns of symptoms, how long they last, and what measures are effective against them. It can help us put together cohorts of comparable patients that would be impossible, or prohibitively expensive, to assemble in real life. We’ll be able to create complex protocols for studying every aspect of this lingering condition and maybe someday resolving it. Digital twins will take clinical research to a whole new level—and in the process, we hope, will offer patients like Caitlin Barber a chance at getting their former, healthy, lives back.