How to process NC file in matlab

I've monthly soil monisture NC file. This file contains monthly soil monisture data since 1948. I want to extract soil data within my shapefile in text file (Shapefile has also attached). I google alot to proccess NetCDF file in matlab find not solution, The ncdisp of my nc file is given below:
Format:
netcdf4_classic
Global Attributes:
_NCProperties = 'version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.10.1'
Conventions = 'CF-1.0'
title = 'CPC Soil Moisture'
institution = 'NOAA/ESRL PSD'
dataset_title = 'CPC Soil Moisture'
history = 'Wed Oct 18 15:13:37 2017: ncks -d time,,-2 soilw.mon.mean.x.nc soilw.mon.mean.xx.nc
Wed Oct 18 15:12:08 2017: ncks -d time,,-3 soilw.mon.mean.nc soilw.mon.mean.x.nc
CPC Soil Moisture
Obtained on Nov 2004 from CPC's website
and written to netCDF by Cathy Smith 12/2004.
he CPC Global monthly soil moisture dataset is a 1/2 degree resolution grid from 1948 to the present.
The file is written in COARDS and CF compliant netCDF at NOAA ESRL/PSD https://www.esrl.noaa.gov/psd/
Converted to chunked, deflated non-packed NetCDF4 Jul 2014'
NCO = '4.6.9'
References = 'https://www.psl.noaa.gov/data/gridded/data.cpcsoil.html'
Dimensions:
lat = 360
lon = 720
time = 886 (UNLIMITED)
Variables:
lat
Size: 360x1
Dimensions: lat
Datatype: single
Attributes:
long_name = 'Latitude'
units = 'degrees_north'
actual_range = [8.98e+01 -8.98e+01]
standard_name = 'latitude'
axis = 'Y'
coordinate_defines = 'point'
lon
Size: 720x1
Dimensions: lon
Datatype: single
Attributes:
long_name = 'Longitude'
units = 'degrees_east'
actual_range = [2.50e-01 3.60e+02]
standard_name = 'longitude'
axis = 'X'
coordinate_defines = 'point'
soilw
Size: 720x360x886
Dimensions: lon,lat,time
Datatype: single
Attributes:
long_name = 'Model-Calculated Monthly Mean Soil Moisture'
missing_value = -9.97e+36
units = 'mm'
valid_range = [0.00e+00 1.00e+03]
dataset = 'CPC Monthly Soil Moisture'
var_desc = 'Soil Moisture'
level_desc = 'Surface'
statistic = 'Monthly Mean'
parent_stat = 'Other'
standard_name = 'lwe_thickness_of_soil_moisture_content'
cell_methods = 'time: mean (monthly from values)'
actual_range = [0.00e+00 1.00e+30]
time
Size: 886x1
Dimensions: time
Datatype: double
Attributes:
long_name = 'Time'
units = 'days since 1800-01-01 00:00:0.0'
delta_t = '0000-01-00 00:00:00'
avg_period = '0000-01-00 00:00:00'
standard_name = 'time'
axis = 'T'
bounds = 'time_bnds'
coordinate_defines = 'start'
prev_avg_period = '0000-00-01 00:00:00'
actual_range = [5.41e+04 8.10e+04]
I've tied
file='data.nc';
ncdisp(file)
long = ncread(file,'lon');
latt = ncread(file,'lat');
time = ncread(file,'time');
I want output of Soilw in this format in text file
Date Soilw
01/01/2015 10
01/02/2015 20
01/03/2015 0.5
Please help me to convert monthly netcdf datainto text file using matlab please?

7 Comments

You have Soilw data that is 720 x 360 x 886 where 886 is the number of time instants, covering 1948 to 2021. If I calculate correctly, you have one reading per month
You show some dates on output, but you only show a single scalar for each date. How do you want to summarize 720 x 360 readings into one reading?
Many thanks for your reply @Walter Roberson
I want to get a text file in output which contains two columns, one column is the date format (Any fomate, My date formate in output was just for mathswork community understanding), and the other column contain Soilw value corrensponding to that month?
You have
720 * 360
ans = 259200
over 250,000 different locations, each of which has a soilw for each month.
We can give the dates down the left column, fine -- but for each of those dates you have 259200 different Soilw values. Do you want to output a vector of 259200 different values for each date? Do you want to somehow summarize the different values down to a scalar? Are you given a particular latitude and longitude and the task is to find the nearest point and to give the Soilw values for the point? Are you given a particular latitude and longitude and the task is to interpolate to find the Soilw values for each date based upon the surrounding points (for the case where the given point is not exactly at one of the locations there is a reading for) ?
Our discussion is going interesting @Walter Roberson
Yes, you're right. My NC file contains 0.5*0.5 degree grid and there are 259200 different Soilw values as you kindly mentioned. I not want to get Soilw data at particularly point (Lat or Long). I want to find average value of Soilw points which fall within my Area of interest (polygon) shapefile (Attached in Zip with this question).
By defination average will finds all Soilw values falls within my shapefile and then will calculate Average of Soilw value within my Shapefile. This average will be single value of Soilw falling within my Shapefile of single date.
I do not seem to find you in my list of customers of my company who have signed 4-Hour Response Time Contracts with us ?
@Walter Roberson Hahaha, you can signed with your company 4 Hour Response Time Contract. Actually Time is importantance for me, I'm end user of the NC files and all time of the research spent in proccessing the raw dataset.. I would like to thank you very much for your timely response..

Sign in to comment.

 Accepted Answer

file = 'data.nc';
shapefile = 'lahore.shp';
long = ncread(file, 'lon');
latt = ncread(file, 'lat');
time = ncread(file, 'time');
soilw = ncread(file, 'soilw');
S = shaperead(shapefile);
%ncread sometimes returns the transpose of what we expect
if size(soilw, 1) ~= length(lat)
soilw = permute(soilw, [2 1 3]);
end
Ntime = length(time);
Date = datetime('1800-01-01 00:00:00') + days(time(:));
mask = inpolygon(lat, lon, S.Y, S.X); %caution, lat is Y not X !
Soilw = zeros(Ntime, 1);
for K = 1 : Ntime
sl = soilw(:,:,K);
Soilw(K) = mean( sl(mask) );
end
output = table(Date, Soilw);

17 Comments

Many thanks for your kind reply @Walter Roberson
I' getting the error in
inpolygon(lat, lon, S.Y, S.X)
Error is
Matrix dimensions must agree.
Error in inpolygon (line 76)
mask = (x >= min(xv)) & (x <= max(xv)) & (y>=min(yv)) & (y<=max(yv));
I've checked inpolygon(lat, lon, S.Y, S.X) is giving a row vector of size 1*6778 and nature of dataset as double. Please resolve this error dear?
I just noticed that you are using r2013a. It will take me extra time to install a version of MATLAB that old to check the details of that old version of inpolygon()
Many Thanks for your reply @Walter Roberson
I'm currently using Matlab 2019a version and the error I shared is comming using Matlab 2019a version dear
Please respond sir?
I am responding to the person who posted the Question, who is using R2013a.
Unfortunately I do not have R2013a installed; it will take me time to install it.
I'm running your kind code on Matlab R2019a version. Yes, you are right I posted this question when I was using Rmatlab2013a version. I've both 2013a and 2019a versions installs on two different PC. The error I've got is using Rmatlab2019a version. So please solve this question
file = 'data.nc';
shapefile = 'lahore.shp';
long = ncread(file, 'lon');
latt = ncread(file, 'lat');
time = ncread(file, 'time');
soilw = ncread(file, 'soilw');
S = shaperead(shapefile);
%ncread sometimes returns the transpose of what we expect
if size(soilw, 1) ~= length(lat)
soilw = permute(soilw, [2 1 3]);
end
Ntime = length(time);
Date = datetime('1800-01-01 00:00:00') + days(time(:));
minlat = min(lat); maxlat = max(lat);
minlon = min(long); maxlon = max(long);
Latbox = [minlat maxlat maxlat minlat minlat];
Lonbox = [minlon minlon maxlon maxlon minlon];
mask = inpolygon(Latbox, Lonbox, S.Y, S.X); %caution, lat is Y not X !
Soilw = zeros(Ntime, 1);
for K = 1 : Ntime
sl = soilw(:,:,K);
Soilw(K) = mean( sl(mask) );
end
output = table(Date, Soilw);
I am getting NaN dataset in output of soilw column as shown in the picutre below:
I've checked there is dataset in NC file within the domain of my Shapefile. See below please
Please check, why I am getting Soilw column in output variable in NaN for all months since 1948? @Walter Roberson
I do not know. It is somewhat frustrating to be debugging this without the NC file to test with.
Many thanks for your reply. I definately sure, this question will help new users to process NC file in matlab and @Walter Roberson your answer will help number of researchers to solve the same problem I'm facing to process NC file in matlab.
Unfortunately, the size of NC file is more than the uploading data limit set on mathswork question, that why I'm sending your URL of NOAA plateform where I download this monthy means soil monsiture nc file.
I'm getting the error, I shared above by using your kind code from the NC file recevied from this data source. Please resolve this question please!
Tested.
Your shape turns out to only touch on 3 locations in the data.
file = 'soilw.mon.mean.v2.nc';
shapefile = 'Shapefile/lahore.shp';
long = ncread(file, 'lon');
latt = ncread(file, 'lat');
time = ncread(file, 'time');
soilw = ncread(file, 'soilw');
S = shaperead(shapefile);
XY = rmmissing([S.X(:), S.Y(:)]);
%ncread sometimes returns the transpose of what we expect
if size(soilw, 1) ~= length(latt)
soilw = permute(soilw, [2 1 3]);
end
[srow, scol, ~] = size(soilw);
Ntime = length(time);
Date = datetime('1800-01-01 00:00:00') + days(time(:));
%long is increasing
dx = discretize(XY(:,1), long);
%lat is decreasing, but discretize needs increasing
FLat = flipud(latt);
dy = discretize(XY(:,2), FLat);
dy = length(FLat)-dy+1;
uxy = unique([dx, dy], 'rows');
ind = sub2ind([srow, scol], uxy(:,2), uxy(:,1));
Soilw = zeros(Ntime, 1);
for K = 1 : Ntime
sl = soilw(:,:,K);
Soilw(K) = mean( sl(ind) );
end
output = table(Date, Soilw);
Note: the above code does not handle the case of shape files that have internal NaN values marking the ends of shapes. It also does not really handle the case of shapes that have sides that cover multiple pixels.
It also does not handle the case of coordinate systems that are rectilinear but not rectangular. For example if the measurements were effectively a fixed physical distance apart but used latitude and longitude, then you would have latitude and longitude grids instead of latitude and longitude vectors -- and this code does not handle that situation.
Respected Sir; @Walter Roberson will you please explain what is mean by a shapefile having internal NaN values at the ends and have sides that cover multiple pixelss?
Shape files are permitted to define multiple independent shapes at the same time. Imagine, for example, wanting to define a series of islands -- or, for that matter, wanting to define land while excluding significant bodies of water. Or water while excluding land, for that matter.
In order to define multiple shapes, shape files are permitted to list the shapes one after another in the same file, with NaN between the parts. The NaN terminates the shape. Because of this convention, it is normal for the last row of data in a shape file to be NaN, even if only one shape is being defined.
My code posted here does not try to deal with muliple shapes in the same file. The particular shape file being used for this project only defined one shape, so it was not worth implementing for the purpose of what was posted.
Shape files define shapes in terms of coordinates. There is no inherent coordinate system -- not inherently degrees or kilometers or pixels, just relative units.
Each coordinate other than the first and last is an endpoint of a line from the previous coordinates to the current coordiantes, and in turn is the start point of a line from the current coordinates to whatever is next in the file.
It would be perfectly valid, for example, for a shape to have coordinates such as
0 0
4 0
4 3
0 0
which would describe a closed right triangle with sides of length 4 (horizontal) and 3 (vertical).
This shape would be intended to enclose an area -- it would include coordinates [3 1] for example.
Now you want to apply that shape to a data file that uses the same units. But the data file does not necessarily have a data resolution that is 1:1 with the coordinates of the shape. The data file does not necessarily have data stored [3 1] or [3 0] -- it might, for example, only have data stored at even coordinates such as [0 0], [2 0], [4 0]: it might be lower resolution than the data coordinates. Or the data file could be higher resolution: it might have data at (for example) every 1/2 coordinate unit.
Because of these factors, the space implied by the coordinates in the shape file might include data coordinates that are not right "at" the coordinates given in the shape. There could even happen to be a data at a coordinate that was completely interior to the shape -- for example the data file might happen to have data at coordinates [0 1] and [3 1] -- the [0 1] is outside of the triangle shape, and the [3 1] is inside the triangle shape but neither 3 nor 1 are "mentioned" in the shape. As coodinates, [3 1] is enclosed by the shape.
The code I posted does not try to map out the shape to find data-file coordinates that might be internal but not "mentioned" by the shape.
The reason that I did not both to try to map out internal coordinates is that the shape spans quite a small portion of the data file. A really careful mask might even find that the exact location of one of the data points is "just outside" the shape, if you think of the data coordinates as being point-like objects. But at least for this file, the data coordinates are not point-like objects: they represent amount of precipitation over the entire grid square, so knowing that the shape coordinates discretized to the same grid location is good enough for this purpose. Likewise, none of the edges of this shape file are very long at all.
With higher resolution data sets or lower resolution shapes, the differences could turn out to be important. But as it is, the shape is just a very small smudge against the data file.
@Walter Roberson dear sir, whether the above code elimainating missing values (-9.97e+36) from the mean of dataset?
file = 'soilw.mon.mean.v2.nc';
shapefile = 'Shapefile/lahore.shp';
long = ncread(file, 'lon');
latt = ncread(file, 'lat');
time = ncread(file, 'time');
soilw = ncread(file, 'soilw');
info = ncinfo(file, 'soilw');
soil_fill = info.FillValue;
[found, idx] = ismember('missing_value', {info.Attributes.Name});
if found
soil_fill(end+1) = info.Attributes(idx).Value;
end
mask = ismember(soilw, soil_fill);
soilw(mask) = nan;
Then near the bottom
for K = 1 : Ntime
sl = soilw(:,:,K);
Soilw(K) = mean( sl(ind), 'omitnan' );
end

Sign in to comment.

More Answers (1)

KSSV
KSSV on 12 Dec 2021

1 Comment

Many thanks for your reply @KSSV. I've read your number of answers on NC file proccessing in Matlab but I not able to form code according to my requirement. I spent one week fully to solve my this problem under your replies on different questions but not able to solve that why I post here a question. I not expert in coding will you please write code dear? Thanks for this

Sign in to comment.

Products

Release

R2013a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!