The Cambridge Centre for Ageing and Neuroscience (Cam-CAN) is a large-scale research project launched in 2010 to understand how individuals can best retain cognitive abilities into old age. Cam-CAN is an interdisciplinary effort in which researchers from psychology, neuroscience, psychiatry, engineering, and public health use various brain imaging techniques, such as structural and functional magnetic resonance imaging (MRI) and magnetoencephalography (MEG), to measure age-related changes in brain structure and function. The structure of brains changes dramatically as we age (Figure 1), so how do some people maintain many of their cognitive abilities despite these changes?
With nearly 3000 participants ranging in age from 18 to 88 and drawn from a wide range of socioeconomic backgrounds, Cam-CAN is among the largest projects of its kind in the world. Data is collected from health history and lifestyle questionnaires, cognitive tests, and, for a subset of 700 participants, MRI and MEG imaging. Each participant provides nearly a gigabyte of data, with nearly 1000 MRI images, each with over 100,000 voxels, and MEG signals recorded every millisecond for 30 minutes across hundreds of sensors. We use MATLAB® to process the data on a high-performance computing cluster and to apply advanced statistical, optimization, and machine learning techniques to interpret the data and make meaningful quantitative comparisons.
Processing Cam-CAN Data
Processing MRI and MEG data from a large cohort involves many steps, such as co-registering different types of MRI images, warping them to a common space, smoothing, and running statistical models at each voxel. Interpreting MEG data also requires co-registering the sensors with a structural MRI image, in order to construct an accurate model of the head. This results in a complex pipeline, with many interdependent steps.
To manage and automate this multistep pipeline, several research teams use the MATLAB based Automatic Analysis (AA) software package  co-developed with colleagues at the MRC Cognition and Brain Sciences Unit. The Cam-CAN data set is an ideal use case for AA because of its unusually large number of participants and the wide variety of images that must be processed for each participant. Using AA, researchers who are less proficient at programming can perform complex analyses of neuroimaging data (Figure 2). AA pipelines are assembled from modules, with each module performing a single step and specifying its input and output dependencies. The AA processing engine is essentially a batching system that manages these dependencies and tracks the steps completed and the steps remaining. If a pipeline process is interrupted, researchers can resume processing without having to restart from the beginning.
AA pipelines can invoke other neuroimaging analysis software, including the Statistical Parametric Mapping (SPM) package. SPM is another MATLAB based package, and one of the most widely used neuroimaging tools worldwide.
Using a Cluster to Accelerate Data Processing
While AA is invaluable for managing image analysis pipelines, executing all the steps in a complete pipeline takes time, particularly in a project that involves 700 participants. To accelerate this process, we use MATLAB Parallel Server™ to process the data on our 1200-core cluster. Because much of the processing work for an individual participant can be completed independently (that is, without affecting the processing for another participant), our analysis is embarrassingly parallel and easily executed concurrently on a cluster. We see a nearly linear increase in computation speed with the number of cores allocated to each job.
MATLAB Parallel Server not only reduces processing time; it also lowers the barrier to entry into parallel computing—an important consideration given the wide range of technical abilities among our scientists here at the Cambridge MRC Brain and Cognition Unit. In many cases, a researcher can move their processing onto the cluster by simply changing a
for loop to a
parfor loop. We have written scripts that enable researchers to select default sets of resources, such as number of cores and an amount of RAM per core, for a variety of job sizes. Because MATLAB Parallel Server is integrated with the Slurm scheduler via plugin scripts, it is easy to submit jobs and manage a cluster shared by many users.
Analyzing Cam-CAN Data in MATLAB
After completing the initial processing of neuroimaging Cam-CAN data with an AA pipeline, our researchers can apply statistical and machine learning techniques to make inferences and derive insights. For example, some researchers use Statistics and Machine Learning Toolbox™ to try to predict each participant’s age from the large variety of brain data, in order to determine which brain features are most important for predicting age. Other researchers use multivariable linear regression and moderation analysis to try to find out what lifestyle factors allow some people to maintain their cognitive abilities into old age, despite the dramatic changes in their brains illustrated in Figure 1. One study showed that activities such as sports, hobbies, or social activities undertaken in mid-life made a unique contribution to predicting late-life cognitive abilities, over and above education . Furthermore, the more of these activities an older person had engaged in earlier in life, the less dependent their cognitive health was on their brain structure (Figure 3). This suggests that the brain can functionally adapt to age-related structural changes and that mid-life activities are particularly important for this adaptability, potentially allowing people to maintain their independence longer into old age.
Neuroscience is a very active area for cross-school research at Cambridge University, with a well-established Interdisciplinary Research Centre. Members of the local Cam-CAN team have published numerous papers based on data in the Cam-CAN repository. Furthermore, anonymized versions of the data are available on request, and have been downloaded by hundreds of scientists around the world.
We are currently seeking additional funding to run follow-up tests on participants in the original group, as they grow older, to provide longitudinal data. We are also combining the Cam-CAN data with similar brain imaging cohorts across Europe, growing the data from hundreds to tens of thousands of brain scans. This increased sample size is important for analyzing, for example, the role of genetics, based on samples donated by volunteers. Understanding healthy aging is also important for understanding “unhealthy” aging, such as Alzheimer’s disease and other forms of dementia.