Given a set of high-dimensional data, run_umap.m produces a lower-dimensional representation of the data for purposes of data visualization and exploration. See the comments at the top of the file run_umap.m for documentation and many examples of how to use this code.
This MATLAB implementation follows a very similar structure to the Python implementation from 2019, and many of the function descriptions are nearly identical.
Here are some additional tools we have added to our implementation:
2) Visual and computational tools for data group comparisons. Data groups can be defined either by running clustering on the data islands resulting from UMAP’s reduction or by external classification labels. We use a change quantification metric (QFMatch) which detects similarity in both mass & distance (described at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5818510/) as well as an F-score for measuring overlap when the groups are different classifications for the same data. For visualizing data groups, we provide a dendrogram (described as QF-tree at https://www.nature.com/articles/s42003-019-0467-6) and sortable tables which show each data group’s similarity, overlap, false positive rate and false negative rate. The documentation in run_umap.m and UMAP_extra_results.m describes these and additional related tools provided. 3) A PredictionAdjudicator feature that helps determine how well one classification’s subsets predict another’s.
5) The ability to use neural networks either from MATLAB's "fitcnet" function or the Python package TensorFlow to learn from a training data set and provide a classification on new data to either compare against or merge with UMAP classification.
Without the aid of any compression, this MATLAB UMAP implementation tends to be faster than the current Python implementation (version 0.5.2 of umap-learn). Due to File Exchange requirements, we only supply the C++ source code for the MEX modules we use to accelerate the computations. The command "run_umap" (without arguments) lets you select the immediate download of these files or the building of these files with C++ source code and build script that we provide. See the fast_approximation argument comments in the run_umap.m file for further speedups. As examples 13 to 15 show, you can test the speed difference between the implementations for yourself on your computer by setting the 'python' argument to true.
The Bioinformatics Toolbox is required to change the 'qf_tree' argument, which is optional.
This implementation is a work in progress. It has been looked over by Leland McInnes, who in 2019 described it as "a fairly faithful direct translation of the original Python code". We hope to continue improving it in the future.
Provided by the Herzenberg Lab at Stanford University.
We appreciate all and any help in finding bugs. Our priority has been determining the suitability of our concepts for research publications in flow cytometry for the use of UMAP supervised templates and exhaustive projection pursuit.