Cancer patients receiving chemotherapy- or immunotherapy-based treatments must undergo regular CT and PET scans—and in some cases, new biopsies—to evaluate the efficacy of the treatment. Flow cytometry, a method for identifying circulating tumor cells (CTCs) via a simple blood test, is much less invasive than scans and
In flow cytometry, cells are examined as they pass one-by-one through a small opening in a flow cytometer. In traditional flow cytometry, the cells require fluorescent labeling, which can affect cellular behavior and compromise viability. Imaging flow cytometers do not require labels, but at camera speeds faster than 2000 cells per second they produce blurred images, making it impractical to screen a cell population large enough to find rare abnormal cells.
Our group in the photonics lab at UCLA has developed a time stretch quantitative phase imaging (TS-QPI) system that enables accurate classification of large sample sizes without biomarker labels (Figure 1). This system combines imaging flow cytometry, photonic time stretch technology (see sidebar), and machine learning algorithms developed in
Our TS-QPI system generates 100 gigabytes of data per second-a firehose of data equivalent to 20 HD movies per second. For a single experiment, in which every cell in a 10-milliliter blood sample is imaged at almost 100,000 cells per second, the system generates from 10 to 50 terabytes of data.
Working in MATLAB with Image Processing Toolbox™, we developed a machine vision pipeline for extracting biophysical features from cell images. The pipeline also includes CellProfiler, an open-source cell image analysis package written in Python®. We extracted over 200 features from each cell, grouped into three categories: morphological features that characterize the cell’s size and shape, optical phase features that correlate with the cell’s density, and optical loss features that correlate with the size of organelles within the cell. Linear regression indicated that 16 of these features contained most of the information required for classification.
Evaluating Machine Learning Algorithms
A principal benefit of MATLAB is the ability to test a wide variety of machine learning models in a short amount of time. We compared four classification algorithms from Statistics and Machine Learning Toolbox™: naive Bayes, support vector machine (SVM), logistic regression (LR), and a deep neural network (DNN) trained by cross entropy and backpropagation.
In tests conducted using samples with a known concentration of CTCs, all four algorithms (Bayes, SVM, LR, and DNN) achieved better than 85% accuracy (Figure 2). We further enhanced the accuracy, consistency, and balance between sensitivity and specificity of our machine learning classification by combining deep learning with global optimization of the receiver operating characteristics (ROC). Implemented in MATLAB, this novel approach increased classification accuracy to 95.5%.
Accelerating Experiments with Parallel Computing
Because we were working with big data, it often took more than a week to complete our image processing and machine learning processes. To shorten this turnaround time, we parallelized our analyses using a 16-core processor and Parallel Computing Toolbox™. Using a simple parallel for-loop (
parfor), we ran our processes concurrently on the 16 processors, reducing the time needed to complete the analysis from eight days to approximately half a day.
Modeling and Refining the Experimental Setup
In the photonics lab at UCLA, MATLAB is the workhorse for model development and data analysis. We used MATLAB to develop a model of the complete experimental setup, from the optics and laser pulses all the way to the classification of individual cells (Figure 3).
We used this model to guide enhancements to our setup. For example, to improve the signal-to-noise ratio we used the model to simulate specific gain coefficients. The simulation results showed us how and where changes to the setup could improve overall performance.
Modeling and simulating the system in MATLAB has saved us months of experimental time and is guiding our next steps. We are currently incorporating detailed models of individual cells into the overall system model. These models will enable us to make better-informed tradeoffs between spatial resolution and phase resolution based on the types of cells we are classifying.
The system we developed is not limited to classifying cancer cells. We have also used it to classify algae cells based on their lipid content and suitability as biofuels. The only significant change we made was to the surface coating within the channel that the cells flow through. We made no changes to the machine learning pipeline that underpins the analysis (Figure 4); it learned on its own that optical loss and phase features were more important than morphological features in classification of algae cells, whereas the reverse held true for cancer cells.
How Photonic Time Stretch Works
The TS-QPI system creates a train of laser pulses with widths measured in femtoseconds. Lenses, diffraction gratings, mirrors, and a beam splitter disperse the laser pulses into a train of rainbow flashes that illuminate the cells passing through the cytometer. Spatial information on each cell is encoded in the spectrum of a pulse. The optical dispersion imposes varying delays to different wavelength components. Processing the signals optically in this way slows them sufficiently to enable real-time digitization using an electronic analog-to-digital converter (ADC).
The relatively low number of photons collected during the short pulse width and the drop in optical power caused by the time stretch make it difficult to detect the resulting signal. We compensate for this loss in sensitivity by using a Raman amplifier. By slowing the signal and concurrently amplifying it, the system can simultaneously capture quantitative optical phase shift and intensity loss images for each cell in the sample.