Vectorizing a repeated regression
Show older comments
Hi All,
My apologies. Yet another question on how to vectorize a calculation. What I am trying to do, is recreate a script I wrote 15 years (!) ago in IDL, in Matlab, full description of the why and how here https://doi.org/10.1080/01431160500497119.
In short: I am bootstrapping a stepwise regression model, selecting the best predictor on statistics within the bootstrap. The current script results in the sort of output I am looking for. However, running it takes over an hour for just one response factor & one dataset. Taking that I have 10 response factors, and several datasets in this one study alone, I need this to become a lot quicker so it can become part of my standard data processing.
I know vectorizing is the way to go, but being a newbe, I cannot get my head around how to handle this. Someone willing to lend me a hand? I think that the actual regression is the one to be optimized, vectorizing for the "band" loop (Is that how you would refer to it?):
model = fitglm(predictors(train, [selectedbands,band]), response(train));
stats(thispredictor,band) = model.Rsquared.Ordinary;
I am not sure whether it is possible though: There are about 2000 wavebands in there. In other words: for each bootstrap iteration, 2k regressions are run. And I am running 10k iterations for each predictor.
The process it goes through (Full function in the attached file):
% Create the regression model one predictor at a time,
% as I want to bootstrap the sensitivity of each predictor
for thispredictor = 1:maxpredictors
% Bootstrap loop
for iteration = 1:iterations
% create the statistics for each iteration
stats = zeros(maxpredictors,numbands);
outdata = zeros(numbands);
% create a subset for training and testing
train = datasample(datarange, numsamples);
% Correlation at each waveband
for band = 1:numbands
% Set the dataset
predictorset = predictors(train, [selectedbands,band]);
responseset = response(train);
model = fitglm(predictors(train, [selectedbands,band]), response(train));
stats(thispredictor,band) = model.Rsquared.Ordinary;
end %band loop
% Select the best band for the model
[maxrsq,maxband] = max(stats(thispredictor,:));
bandcount(thispredictor,maxband) = bandcount(thispredictor,maxband)+1;
end %iterations loop
[frequency, peakband] = max(bandcount(thispredictor,:));
selectedbands = [selectedbands,peakband];
end %predictors loop
Accepted Answer
More Answers (0)
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!