File Merge Speed Improvement

3 views (last 30 days)
Travis Knepp
Travis Knepp on 21 Jul 2011
I have a script written to merge several data sets containing data (shocker!) that all have different time stamp frequencies. After running there is a time stamp for every minute over the past year and either data values or NaNs depending on whether the corresponding data set had anything written at that time. My overall for loop has the structure.
for n = time1:timeEnd % Loop through each minute.
[dif, row] = min(abs(data(n,1) - n))
end
Now, dif should always be zero (this is not a problem). I wonder if this can be faster? I think my main time sink is the for loops, and not the [dif, row]...line. I have a few questions:
1. Does anyone have a suggestion for speeding this up in MatLab?
2. Does anyone have a suggestion of another language (please don't say C, though I know it is faster) that will be faster.
3. Does anyone know how the [dif, row]...line actually works? Can I do the same thing with list comprehensions in Python (but faster?). This is really me wondering how the [dif, row] compares to list comprehensions in Python and how MatLab is working.

Answers (3)

Jan
Jan on 21 Jul 2011
You use this to find an equal value:
[dif, row] = min(abs(temp(:,1) - data(n,1)));
But actually you want the faster:
row = (temp(:, 1) == data(n, 1));
If I understand your problem correctly, this might be faster:
[dummy, index] = sort(data(:, 1));
temp = data(index, :);
And if you want to check the completeness of the date numbers: "if any(diff(dummy) > limit), error..." with a suitable limit.

Sean de Wolski
Sean de Wolski on 21 Jul 2011
I think your for-loop is buggy and probably not doing what you want (guess)
abs(data(n,1) - n)
Will be a scalar value. Hence dif, will be the value and row, will be 1. Also, you're only storing the last iteration so the whole loop is only giving you dif, row for n = timeEnd.
Can you explain your end goal and give sample data/operation/result? This is begging to be speed-optimized with bsxfun.

Travis Knepp
Travis Knepp on 21 Jul 2011
Ok, you are right. This is really an abbreviated portion of my script. I should have posted the real thing. My apologies. Below is what I actually have written.
temp = ones(size(d1:step:d2,2),size(data,2))*-999;
temp(:,1) = d1:step:d2;
for n = 1:size(data,1)
[dif, row] = min(abs(temp(:,1) - data(n,1))); % Diff should always be zero.
if dif ~= 0
fprintf('Error in Pandora NO2. \n');
break
end
temp(row,:) = data(n,:);
end
save([md,'PAN_NO2.mat'],'temp');
Where d1 and d2 are the first/last days in the data set respectively, temp is the temporary subset holding all merged data, and step is one-minute iteration.
So, temp(:,1) holds merged time stamps (with 1-min resolution). As I step through "data"'s time stamps I am finding the corresponding time stamp in "temp" and inserting the data into "temp".
I don't know why I didn't post this originally, it's not as confusing as I thought.
  1 Comment
Sean de Wolski
Sean de Wolski on 21 Jul 2011
Could you provide (small) sample data for d1,d2,step and data so that the above is fully functional.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!