|
I'm working on calculating a 4-D histogram, and I'm trying to optimize my code to run as fast as possible.
First, I define xx, yy, zz, and maxis to be the histogram edge values, and preallocate specdata=zeros([numel(xx) numel(yy) numel(zz) numel(maxis) ],'single');
My actual code segment is:
fid = fopen( [pathname filename],'r','ieee-be');
%Waitbar
wait_handle=waitbar(0,'Gridding data');
flag=1;
ions=0;
num_loops=0;
while flag==1
[buffer,count]=fread(fid,[4,read_max],'single'); %Read 1e5 ions
%read_max=1e5
for inner_loop=1:round(count/4)
i=find(buffer(1,inner_loop)>=xx, 1, 'last' ); %Grid into x
j=find(buffer(2,inner_loop)>=yy, 1, 'last' );% into y
k=find(buffer(3,inner_loop)>=zz, 1, 'last' );% z
l=find(buffer(4,inner_loop)>=maxis, 1, 'last' );% m/q
specdata([i,j,k,l])=specdata([i,j,k,l])+1;%Add to specdata
end
ions=ions+(count/4);
num_loops=num_loops+1;
waitbar(ions/num_ions,wait_handle);
%Check for end-of-file
if count < read_max*4
flag=0; %Loop finished
end
end
For 1e6 4-element datapoints in the datafile, this takes ~40 seconds. About half the time is spent in the specdata=specdata+1 line, and the rest spread across the four find() lines. (Using the profiler.)
I'm planning to move from this 1e6 unit play-dataset to real datasets of 1e9+ units, which will take this code ~10 hours (assuming linearity).
I've already optimized as well as I know, using the tricks in http://www.mathworks.com/company/newsletters/news_notes/june07/patterns.html
Has anyone thought about how to grid data into an N-D histogram efficiently? Any suggestions on how to speed this up? I'm a poor materials scientist learning computer science sink-or-swim style!
Thanks for the help.
|