Introduction
Principal Component Analysis (PCA) is a classic among the many methods of multivariate data analysis. Invented in 1901 by Karl Pearson the method is mostly used today as a tool in exploratory data analysis and dimension reduction, but also for making predictive models in machine learning.
Step 1: Centre and Standardize
A first step for many multivariate methods begins by removing the influence of location and scale from variables in the raw data. Also commonly known as the z-scores of X, Z is a transformation of X such that the columns are centered to have mean 0 and scaled to have standard deviation 1 (unless a column of X is constant, in which case that column of Z is constant at 0). Strictly speaking, z-scores are based on population parameters, whereas the analogous calculation based on sample mean and standard deviation is the Student's t-statistic.
Task
Write a function to centre and standardize the input matrix X, returning as the output a structure with the following fields:
Tips
Following problems in the series
Your definition of a constant (or invariant) data with rand is problematic. If you increase the size of your data (n=1000, n=10000...), you can always increase the deviations (so what threshold for sigma ?). I think that with real data, this artifact isn't possible. No ?
Nice solution that looks pretty nifty but it actually produces an unsafe result, just like Matlab's own zscore. It passed on it's first run trough the test suite but fails on some number of repeats. I've now added a second pass with a know random seed.
14026 Solvers
given 3 sides, find area of this triangle
600 Solvers
Arrange Vector in descending order
1732 Solvers
29 Solvers
28 Solvers