How to streamline this code to average certain rows?

I have a three row column data: lon lat and Z. What I want to do is to round longitude and latitude to 2 digits. For example, 65.3796 will be rounded 65.38. Then the program will sort the data based on longitude and latitude, and average any rows with identical longitudes and latitudes. For example,
The below matrix:
65.39 23.56 10.5
66.70 25.36 6.7
66.70 25.36 7.8
Will become:
65.39 23.56 10.5
66.70 25.36 7.25
Here is my code. The problem is that I have several dozens of millions of data. It will take forever to complete. Anyone could help me create a more streamlined code, so that the program will be much faster?
I really appreciate your help!
clear all
close all
clc
load data/data_3row % D: lon lat Z
lon = D(:,1);
lat = D(:,2);
for i=1:length(lon)
lon(i,1) = round(lon(i,1)*100)/100;
lat(i,1) = round(lat(i,1)*100)/100;
end
D2 = [lon, lat, D(:,3)];
D3 = sortrows(D2, [1:2]);
D3 = [D3; 0 0 0];
[m n] = size(D3);
Ind1 = D3(1,1);
Ind2 = D3(1,2);
row = D3(1,:);
D4 = [];
n4 = 0;
for i=2:m
if D3(i,1)==Ind1 & D3(i,2)==Ind2
row = [row; D3(i,:)];
else
n4 = n4+1;
D4(n4,:) = [row(1,1), row(1,2), nanmean(row(:,3))];
Ind1 = D3(i,1);
Ind2 = D3(i,2);
row = D3(i,:);
end
end
save('data/data_3row_avg.mat', 'D4');

 Accepted Answer

Replace
for i=1:length(lon)
lon(i,1) = round(lon(i,1)*100)/100;
lat(i,1) = round(lat(i,1)*100)/100;
end
with
lon = round(lon, 2);
lat = round(lat, 2);
After that,
vals = D(:,3);
[ulon, ~, ulonidx] = unique(lon);
[ulat, ~, ulatidx] = unique(lat);
meanarray = accumarray([ulonidx, ulatidx], vals, [], @mean, nan);
valididx = find( ~isnan(meanarray) );
[lonidx, latidx] = ind2sub( size(meanarray), valididx );
means = meanarray(valididx);
D4 = [ulon(lonidx), ulat(latidx), means];

11 Comments

100% brilliant!
Thank you so much for your big help! This will save me a lot of CPU time.
BTW, if I want to round the longitude to 0.05 instead of 0.01, the below code should be the quickest way to execute? Am I right?
lon = round(lon.*2, 1)./2
Sorry to have so many questions! I promise this will be the last one on this thread.
In my simplified example, I only mentioned Z, but in reality Z is not a one column data, it has 20 columns. Do I have to loop meanarray and the following three lines of code 20 times? Is there a faster way to process it?
My apologies for forgetting to bring this up at the very beginning.
lon = D(:,1);
lat = D(:,2);
lon = round(lon*2)/2;
lat = round(lat*2)/2;
Z = D(:,3:end);
[G, glon, glat] = findgroups(lon, lat);
means = splitapply(@mean, Z, G);
D4 = [glon, glat, means];
Needs R2015b or later.
Many thanks!
It seems there is an error from the line "means = splitapply(@mean, Z, G);"
Error using vertcat
Dimensions of arrays being concatenated are not consistent.
Error in splitapply>localapply (line 257)
finalOut{curVar} = vertcat(funOut{:,curVar});
Error in splitapply (line 132)
varargout = localapply(fun,splitData,gdim,nargout);
Also, why doesn't @nanmean work?
I will need to check later; I am going to be away from my computer for a while.
Try
means = splitapply(@(M)mean(M,1), Z, G);
The difference against @mean is that in the case where a single row happened to be a group of its own, mean() would be applied to the row vector and that would return a scalar, whereas in the case where multiple rows were in a group, mean() would be applied to the 2D array that that would return a row vector with one column per column of input. Using the mean(M,1) forces it to always take the column-wise mean, even if there is only one row.
Working flawless!
Thank you so much for your big help! This will help speed up my project significantly!

Sign in to comment.

More Answers (0)

Products

Tags

Asked:

on 20 Sep 2018

Commented:

on 21 Sep 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!