Tips to eliminate FOR loops and vectorize code
Show older comments
Greetings - I am looking for philosophical help on how best to vectorize the code shown to eliminate the numerous FOR loops. As it stands, this simulation takes about 8 hours to run. The sample selection process and the curve fit are the big slowdowns. Any hints to try or paradigm shifts in the approach are welcome. I have tried to comment the code to help you understand just what I am trying to do. I had hoped to create the multidimensional array of samples without loops, but the need to choose different sample sizes from DBH ranges pushed me back into using FOR loops. Thanks!
%%Script to run ASMA analysis
tic
clear all
%get data
load fakeforest_v2_array.mat %array of dbh x biomass length 100,000
%simulation parameters
nums=10:20:1000; %number of sample trees per pull
nsim=1000; %number of simulations per set nums; Ideally this should be 10,000
%set up dbh bins
edges=[0 15 20 30 40 50 60 70 max(fakeforest(:,1))+1];
bins=discretize(fakeforest(:,1),edges);
fakeforest=[bins fakeforest];
groups=unique(bins);
% distribution array, left weighted to right plus others to determine the
% number of samples from each dbh category to use. These are precentages to
% pull from each dbh category
distros= [100 0 0 0 0 0 0 0;
33 67 0 0 0 0 0 0;
17 33 50 0 0 0 0 0;
10 20 30 40 0 0 0 0;
7 13 20 27 33 0 0 0;
5 10 14 19 24 29 0 0;
4 7 11 14 18 21 25 0;
3 6 8 11 14 17 19 22;
67 33 0 0 0 0 0 0;
50 33 17 0 0 0 0 0;
40 30 20 10 0 0 0 0;
33 27 20 13 7 0 0 0;
29 24 19 14 10 5 0 0;
25 21 18 14 11 7 4 0;
22 19 17 14 11 8 6 3;
0 0 0 0 0 0 0 100;
0 0 0 0 0 0 33 67;
0 0 0 0 0 17 33 50;
0 0 0 0 10 20 30 40;
0 0 0 7 13 20 27 33;
0 0 5 10 14 19 24 29;
0 4 7 11 14 18 21 25;
3 6 8 11 14 17 19 22];
% set up output arrays
UncertaintyM=zeros(size(distros,1),length(nums));
UncertaintyU=zeros(size(distros,1),length(nums));
UncertaintyL=zeros(size(distros,1),length(nums));
% start to sample the 'forest'
for ns=1:length(nums) %loop for each sample size above
n=nums(ns);
for i=1:length(distros) %start with first sample distribution type
forestsample=nan(n,2,nsim); %initialize multidimensional array to hold sample
nsamplegroup=distros(i,:);
nsamplegroup=round(nsamplegroup./100.*n);
for sim=1:nsim
for g=1:length(groups)
grptemp=fakeforest(fakeforest(:,1)==g,2:3);
%datatmp=datasample(grptemp,nsamplegroup(g),'Replace',false);
datatmp=datasample(grptemp,nsamplegroup(g)); %large n fails as not enough trees available
%the following ugliness is to deal with the odd cases of
%distributions
if isempty(datatmp) && g==1
cnt=1;
elseif nsamplegroup(g)>0 && g==1
forestsample(1:size(datatmp,1),:,sim)=datatmp;
cnt=length(datatmp)+1;
elseif size(datatmp,1)==1
forestsample(cnt,:,sim)=datatmp;
cnt=cnt+size(datatmp,1);
else
forestsample(cnt:cnt+size(datatmp,1)-1,:,sim)=datatmp;
cnt=cnt+size(datatmp,1);
end
end
cnt=0;
end
%fit the curves and calculate the difference
% fit is 'exp(a+b.*log(x))' The user function below just loops through the
% sets of data and fits the equation
[R G]=ASMACurvesfit(nsim,forestsample(:,1,1:nsim),forestsample(:,2,1:nsim));
%coeffs=coeffExtract(R,nsim,2); %In case I ever need the coefficients
uncert=zeros(nsim,1);
for r=1:nsim
uncert(r)=abs((sum(R{r}(fakeforest(:,2)))-totalbiomass)/totalbiomass);
end
uncertM=prctile(uncert,50);
uncertU=prctile(uncert,97.5);
uncertL=prctile(uncert,2.5);
UncertaintyM(i,ns)=uncertM;
UncertaintyU(i,ns)=uncertU;
UncertaintyL(i,ns)=uncertL;
end
end
toc
3 Comments
dpb
on 17 Jun 2017
Take a fair amount of digging to wrap head around the actual code but the "philosophical answer" :) is to start from the innermost loop and see what can vectorize/refactor there then work your way out.
Is the outermost loop one simply to see the results sensitivity to the sample size I'm guessing? If so, I'd probably start by factoring out that level and making that a loop outside the computational function having the function just return the results for a given size.
That starts to reduce complexity of the function and may then make further factorization easier to see. More smaller pieces are usually far easier to deal with than is one larger code section.
CraigW
on 19 Jun 2017
Sagar Doshi
on 20 Jun 2017
I would also suggest using profiler on your code to identify places where the slow down really occurs and then come up with a way to vectorize that particular section of the code.
Answers (0)
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!