MATLAB Answers

Chad Greene
2

specifying a stride length in ncread

Asked by Chad Greene
on 4 Oct 2016
Latest activity Answered by Rylan Dmello on 14 Oct 2016
I have a big 1.5 GB .nc file. Data loading is the slowest part of my processing, but I'm lucky that it will be sufficient to load only every Nth data point. Loading the whole file takes about 0.14 seconds:
tic
z = ncread('myfile.nc','z');
toc
Elapsed time is 0.142669 seconds.
which is about the same amount of time it takes when I specify that which indices to load:
tic
z = ncread('myfile.nc','z',[1 1],[Inf Inf],[1 1]);
toc
Elapsed time is 0.156108 seconds.
And so it should be faster if I specify a "stride" of more than 1. But it actually takes much more time to load every 2nd datapoint:
tic
z = ncread('myfile.nc','z',[1 1],[Inf Inf],[2 2]);
toc
Elapsed time is 4.992349 seconds.
Increasing the stride length beyond 2 seems to bring data loading time back down, but I have to use a stride length of 8 or more to get any benefit at all. What gives? Any ideas for fixes?

  4 Comments

Show 1 older comment
Partly a matter of the elegance of a one-line solution. Also, I have a few other big variables in the workspace, so memory is becoming an issue. I want to load the least amount of data necessary.
I also want to know, why does it take 35 times as long to read only a quarter of the data?
KSSV
on 5 Oct 2016
Have you tried the same with netcdf.getVar?
Oh, interesting idea. The issue persists!
tic
ncid = netcdf.open('myfile.nc');
z = netcdf.getVar(ncid,2,[1 1],[12444 12444],[1 1]);
toc
Elapsed time is 0.231038 seconds.
tic
ncid = netcdf.open('myfile.nc');
z = netcdf.getVar(ncid,2,[1 1],[12444/2 12444/2],[2 2]);
toc
Elapsed time is 4.881778 seconds.

Sign in to comment.

Tags

1 Answer

Answer by Rylan Dmello on 14 Oct 2016
 Accepted Answer

It looks like this issue is actually occurring in the underlying NetCDF C library that MATLAB uses. Here is a discussion on the NetCDF mailing list about this issue from 2013:
As an example, I downloaded the ‘ test_echam_spectral.nc ’ NetCDF file from
Then, I entered the following commands into the MATLAB command prompt:
>> tic; z = ncread('test_echam_spectral.nc', 'xl'); toc; % Elapsed time is 0.035419 seconds
>> tic; z = ncread('test_echam_spectral.nc', 'xl', [1 1 1 1], [Inf Inf Inf Inf], [2 1 1 1]); toc; % Elapsed time is 0.424505 seconds
Clearly, the strided read is about an order of magnitude slower than the contiguous read. I was able to remedy the issue by reading the whole array, and then filtering the array using MATLAB’s inbuilt array manipulation syntax:
>> tic; z = ncread('test_echam_spectral.nc', 'xl'); z = z(1:2:192, :, :, :); toc; % Elapsed time is 0.041134 seconds
Note that this still takes a little more time than reading the whole array contiguously. However, this is much faster than using strided read in the ‘ ncread ’ function.

  0 Comments

Sign in to comment.