Transforming Brain Health Research Through Data Science

Discover how the Child Mind Institute uses AI and synthetic data to tackle the brain health data crisis and boost research impact.

tl;dr: Reproducible, privacy-preserving data is vital for brain health research. Dr. Gregory Kiar and the Child Mind Institute are addressing this need through open science, synthetic data and local high-performance computing. With Dell Technologies and NVIDIA, they are building the infrastructure for researchers to accelerate understanding, diagnosis and treatment of mental health and learning conditions, all while protecting patient privacy.


Dr. Gregory Kiar’s path from biomedical and electrical engineering to neuroscientist began at home. Watching his sister navigate a learning disability, he became curious about why their brains worked differently and how that led to such a difference in childhood experiences despite their nearly identical upbringing. His desire to understand why their experiences were so different left him largely without clear answers and led him to appreciate that much of brain health research is built upon an unreliable foundation of irreproducible, small datasets.

Today, as Director of the Center for Data Analytics Innovation and Rigor (DAIR) at the Child Mind Institute, Kiar leads a team tackling one of healthcare’s most complex challenges: building reproducible data infrastructure that brain health researchers desperately need. Kiar and his team are developing collaborations with technical communities and world class technology providers to build reliable data pipelines that can advance understanding, diagnosis, and treatments around mental health and learning conditions.

The reproducibility crisis: building on crumbling blocks

The reproducibility crisis in psychological research came most prominently into light during the early 2010s and laid bare the field’s data and analysis challenges. When a group of researchers attempted to replicate 100 published psychological studies, only one-third of the results could be reproduced.

“It’s like building on blocks that may crumble when you try to step on them,” Kiar describes. “You can’t advance the field efficiently if you don’t know which foundational pieces are solid.”

Without solid, reproducible, and representative data, the consequences cascade through the entire mental health system. Researchers can’t identify reliable biomarkers for mental health conditions, which means diagnosis remains overly subjective and treatment pathways often contain trial-and-error. Today, one in five children struggle with mental health disorders, yet 70% of U.S. counties lack a single child or adolescent psychiatrist. More than 75% of all mental health conditions appear before age 25, but due to stigma, misinformation, and lack of access to care, the average delay between symptom onset and treatment exceeds eight years.

This realization led Kiar to pivot his career toward improving the fundamental reliability of brain health research data. At the Child Mind Institute, he found an institution that places reproducibility and open science as a core principle. “The Child Mind Institute is unique because these principles aren’t something we’re learning to add on,” Kiar notes. “They’re part of our DNA. We start every project by asking how we can make it reproducible and shareable.”

An unprecedented database for an unprecedented challenge

Brain health data presents a unique privacy challenge. “Unlike other medical fields where data can be more easily anonymized, mental health and learning information is inherently personal and social,” Kiar explains. “From a single brain scan or clinical note, you might be able to identify specific individuals if you know what to look for.”

Despite this constraint, the Child Mind Institute has built something remarkable. Through the Healthy Brain Network research study, the institute has enrolled over 7,100 children to receive free diagnostic evaluations. Their work has created the largest-ever database on the developing brain including tens of thousands of hours of brain imaging, physical activity measurements, clinical assessments and survey data from children experiencing mental health and learning disorders, challenges, or concerns.

As a global leader in open science, the Child Mind Institute shares this data freely. Independent researchers have used it in hundreds of peer-reviewed publications. The institute has run numerous data science competitions that engage thousands of researchers at various career stages worldwide to tackle hard questions in neurodevelopmental disorders and train a new generation of data scientists in mental health analytics.

However, the Child Mind Institute, like many other clinical or research environments, has also collected countless hours of video and audio recordings, free-form clinical text notes, journaling entries and clinician reports. This unstructured data could hold critical insights as we advance our understanding of brain health, yet no platform exists that allows researchers to leverage the data science community to analyze sensitive patient data like this. Information that might advance mental health diagnosis remains inaccessible.

Building the infrastructure for privacy-preserving data

Solving the privacy problem requires massive computational resources. DAIR is generating synthetic datasets that faithfully mirror real patient data while maintaining strict privacy protections, allowing researchers to train models on massive collections of clinical notes, recordings and brain scans without exposing actual patient information. Dell Pro Max high-performance computing systems configured with NVIDIA RTX PRO GPUs, provide the processing power needed to generate these synthetic datasets at scale. While data science on high performance computing clusters may typically present a solution for researchers, they cannot be used when working with sensitive datasets. The power afforded by these systems allows researchers to work locally with the latest technologies, to preserve the privacy of their data.

The foundation for tomorrow’s breakthroughs in mental health

The Child Mind Institute’s unique approach to neurological research combines synthetic data generation with collaborative data science and the computational power provided by Dell Technologies and NVIDIA. Together, the Child Mind Institute is working to openly distribute reliably reproducible, privacy-preserving brain health datasets that were previously impossible to share.

Kiar’s team is developing synthetic data generation techniques that could unlock millions of inaccessible datasets worldwide. Through continued competitions and open collaboration, they’re training researchers to use these tools while advancing our understanding of mental health conditions and learning disorders, and closing the delay between symptom onset, diagnosis and effective treatment.


This is the first in a series exploring how the Child Mind Institute is revolutionizing mental health research through data science, collaboration, and innovation. Stay tuned to learn more about their groundbreaking work in synthetic data generation and large-scale data competitions.

About the Author: Logan Lawler

Logan has worked in various roles at Dell for 16 years, including sales, marketing, merchandising, services, and e-commerce. Before joining Dell, Logan grew up in Missouri and graduated from the University of Missouri (MIZ!). Logan lives in Round Rock with his wife Ally, daughter Calloway, and labradoodle Truman.