Columbia University • Mailman School of Public Health
Mapping the Lifespanof Brain Health
The Columbia Brain Health DataBank integrates longitudinal clinical, imaging, and biological data to accelerate discovery in brain aging and Alzheimer’s disease.
Addressing a Critical Gap in Brain Health Research
Brain aging is a long, complex, and dynamic process, but most existing datasets offer only brief snapshots and lack the integration of diverse data modalities to fully map the journey from cognitive health to dementia. This limited scope leaves critical gaps in our understanding of early biological and lifestyle changes, as well as the identification of modifiable risks.
The Columbia Brain Health DataBank (CBDB) addresses this critical need by unifying deep, longitudinal, and multimodal data, offering a first-of-its-kind resource to advance precision brain health and Alzheimer’s disease prevention.
THE PLATFORM
AI Platform In Development
Integrating the Future of Brain Health Data
Multi-Modal
Integrates EHR, neuroimaging, metadata, multi-omics, and wearable-device data within a unified research ecosystem.
AI-Powered
Links harmonized data to advanced modeling pipelines to generate individualized risk predictions and translate insights into clinical and public health applications.
Collaborative
Built on partnerships across ADRC, TRAIL4Health, the Data Science Institute, and CIIB to enable secure data sharing and interdisciplinary research.
THE DATA
A Three-Pillar Foundation for Brain Health
Large-scale Longitudinal EHR Cohort
Large, real-world cohort derived from Columbia’s clinical data warehouse
<
0
Longitudinal clinical visits
0
Patients
0
Free-text clinical notes
0
Brain imaging orders
Structured EHR Data
✓ Diagnoses
✓ Medications
✓ Procedures
✓ Labs
Free-Text Clinic Notes
✓ Progression notes
✓ Imaging reports
✓ Pathology Reports
✓ Surgical Notes
Enriched Clinical Subcohort
✓ Aging & dementia clinics
✓ Longitudinal cognitive evaluations
✓ Comprehensive clinical, imaging & pathology data
✓ Integrated data informing brain aging
AD Focused Research Cohorts
Gold-standard Alzheimer’s disease cohorts for benchmarking, validation, and translational discovery
NACC
National Alzheimer’s Coordinating Center
ADNI
Alzheimer’s Disease Neuroimaging Initiative
ANMerge
Integrated AD research dataset
What They Provide
✓ Standardized phenotyped data
✓ Deep research characterization
✓ Controlled benchmark cohorts
Why It Matters
✓ Validate real-world findings
✓ Translate research into practice
✓ Accelerate advanced analytics
Deep-Phenotyped EHR–ADRC Linked Cohort
1,084 participants and 23 years of data from Columbia’s Alzheimer’s Disease Research Center
0
Clinical visits
0
Lab measurements
0
Clinical notes
0
Brain imaging orders
What’s Included
✓ Longitudinal EHR data
✓ Neurocognitive assessments
✓ Clinical evaluations
✓ Imaging & biospecimens
What It Enables
Rich research-grade golden labels for:
✓ Early disease detection
✓ Progression modeling
✓ Preclinical signal discovery
✓ Long-term outcome analysis
Emerging Data Modalities
Extending Beyond Clinical Data
Wearable Device Data
Continuous, real-world monitoring of activity, sleep, and physiological signals to capture early behavioral and functional changes.
Neuroimaging Data
High-resolution brain imaging enabling structural and functional analysis of disease progression and early neurodegenerative changes.
Multi-Omics Data
Integration of genomics, proteomics, and metabolomics to uncover biological pathways driving brain aging and Alzheimer’s disease.
Ongoing Research
Advancing Brain Health Discovery
Age-dependent phenome-wide association studies (PheWAS)
Mapping how genetic and clinical risk factors for Alzheimer’s disease and related dementias vary across the life course, and identifying age windows of heightened vulnerability or resilience.
Multimodal biomarker integration
Combining plasma biomarkers, neuroimaging measures, and EHR-derived phenotypes to improve early risk prediction and refine subtypes of cognitive decline.
AI-driven clinical note analysis
Applying natural language processing and large language models to millions of unstructured clinical notes to detect subtle early indicators of cognitive and functional decline.
Causal inference pipelines for modifiable factors
Evaluating the timing and impact of candidate interventions—such as hormone replacement therapy, GLP-1 receptor agonists, and other modifiable exposures—on dementia risk and trajectories of brain aging.
Digital phenotyping and wearable analytics
Integrating data from wearable devices to characterize sleep, activity, gait, and physiology; developing models that link these continuous signals to early brain-health changes and long-term outcomes.
Behind the Platform
Weijie Xia
TRAIL4Health • Data Engineer
Weijie Xia plays a key role in building and maintaining the data infrastructure powering TRAIL4Health. His work focuses on scalable data pipelines, system reliability, and enabling high-quality longitudinal analysis across complex clinical datasets.
The Investigators
Leadership driving the Columbia Brain Health DataBank
Principal Investigators
Ying Wei, PhD
Professor of Biostatistics, Mailman School of Public Health • Director, Translational AI Laboratory (TRAIL4Health)
Dr. Wei leads the development of statistical and AI frameworks for precision medicine, with a focus on heterogeneous modeling, dynamic prediction, and causal inference. In CBDB, she oversees the integration of multimodal data and the design of scalable, rigorous AI pipelines for brain health and Alzheimer’s disease prevention.
James Noble, MD, MS
Clinical Director, Columbia Alzheimer’s Disease Research Center (ADRC)
Dr. Noble is a neurologist specializing in Alzheimer’s disease and related dementias. His work centers on clinical characterization, early detection, and population-based strategies for dementia prevention. He provides clinical leadership for CBDB and ensures strong translational links between data resources, research models, and patient-centered applications.
Co-Investigators
Linda Valeri, PhD
Associate Professor of Biostatistics
Dr. Valeria is an expert in causal mediation analysis, measurement error, missing data, and the integration of data from multiple sources. In CBDB, she strengthens the effort to move from prediction to actionable inference by identifying modifiable pathways and supporting rigorous, interpretable analyses of brain health risk.
Tian Gu, PhD
Assistant Professor of Biostatistics
Dr. Gu specializes in statistical learning for complex biomedical data, including functional data analysis, high-dimensional inference, and methodological innovation for health research.
Zhonghua Liu, PhD
Associate Professor of Biostatistics
Dr. Liu works at the intersection of statistics and genetics, with expertise in large-scale association studies, causal modeling, and integrative genomic analysis relevant to Alzheimer’s disease and brain health.
Kaizheng Wang, PhD
Assistant Professor, Industrial Engineering & Operations Research (IEOR)
Dr. Wang develops optimization and machine learning methods for large-scale systems. In CBDB, his work contributes to reliable, efficient model training and deployment, ensuring that AI methods remain scalable and robust in real-world settings.
Orsen Xu, PhD
Assistant Professor, Computer Science & Biomedical Informatics
Dr. Xu’s expertise spans natural language processing, multimodal AI, and foundation models for clinical decision-making. In CBDB, he leads efforts to analyze unstructured EHR data and integrate wearable-device data, expanding CBDB’s modalities to include continuous, passive digital measures of brain health and daily functioning.
Jeff Goldsmith, PhD
Associate Professor of Biostatistics • Associate Dean for Data Science, Columbia Mailman
Dr. Goldsmith develops statistical methods for large, complex longitudinal data in neuroscience and wearable sensing. His work brings CBDB strong expertise in modeling high-dimensional trajectories and building reproducible data-science pipelines for brain health research.
Adam Brickman, PhD
Professor of Neuropsychology in Neurology • Taub Institute
Dr. Brickman is a leader in neuroimaging and biomarker research in Alzheimer’s disease and cognitive aging. His work strengthens CBDB’s ability to characterize brain aging trajectories and identify imaging-based markers of risk across diverse populations.
Frank A. Provenzano, PhD
Assistant Professor of Neurological Sciences • Columbia ADRC Investigator
Dr. Provenzano develops neuroimaging-derived biomarkers from both research and routine clinical scans. Using MRI, PET, and explainable AI methods, his work enables scalable markers of cognitive aging and neurodegeneration for CBDB.
Platform Access
CBDB Documentation & Tools
Documentation, data dictionaries, technical guides, and computational tools for CBDB are available through the TRAIL4Health GitHub site.