Creating a matrix after giving initial conditions

8 views (last 30 days)
Damith on 12 Mar 2015
Commented: dpb on 31 Mar 2015
Hi,
I need to create a matrix when given the begin year, month, day, time and end year, month, day and time.
for example; (I do not need to see the header in the matrix)
I would like to enter:
begin year = 1998
begin month = 1
begin day = 1
begin time = 0
end year = 1998
end month = 2
end day = 15
end time = 12
(as shown in the table below)
But, if I enter leap years (e.g. 2000,2004, etc) it should include Feb 29 and do the same time steps (0 to 21) as shown in the table.
Then, the program should generate the output for the time periods given above. See the sample below.
Year Month Day Time
1998 1 1 0
1998 1 1 3
1998 1 1 6
1998 1 1 9
1998 1 1 12
1998 1 1 15
1998 1 1 18
1998 1 1 21
1998 1 2 0
.
.
.
1998 2 15 12
Any help is appreciated.

dpb on 12 Mar 2015
Edited: dpb on 12 Mar 2015
yr=2000;
dn=datevec(datenum(yr,1,1,[0:3:24*(365+isleapyr(yr))-1].',0,0))
>> [dn(1:5,:);dn(470:485,:);dn(length(dn)-10,:)]
ans =
2000 1 1 0 0 0
2000 1 1 3 0 0
2000 1 1 6 0 0
2000 1 1 9 0 0
2000 1 1 12 0 0
...
2000 2 28 15 0 0
2000 2 28 18 0 0
2000 2 28 21 0 0
2000 2 29 0 0 0
2000 2 29 3 0 0
2000 2 29 6 0 0
2000 2 29 9 0 0
2000 2 29 12 0 0
2000 2 29 15 0 0
2000 2 29 18 0 0
2000 2 29 21 0 0
2000 3 1 0 0 0
2000 3 1 3 0 0
2000 3 1 6 0 0
2000 3 1 9 0 0
2000 3 1 12 0 0
...
2000 12 30 15 0 0
2000 12 30 18 0 0
2000 12 30 21 0 0
2000 12 31 0 0 0
2000 12 31 3 0 0
2000 12 31 6 0 0
2000 12 31 9 0 0
2000 12 31 12 0 0
2000 12 31 15 0 0
2000 12 31 18 0 0
2000 12 31 21 0 0
>>
I introduced the ellipses in the output to show the breaks clearly...
Salt to suit...
Oh, isleapyr is one of my utilities...
function is=isleapyr(yr)
% returns T for input year being a leapyear
is=(datenum(yr+1,1,1)-datenum(yr,1,1))==366;
there may be a TMW-supplied equivalent since R2012b which lacks it.
dpb on 17 Mar 2015
OK, so the answer is "ignore days that are incomplete".
I've gotta' do some other things here just now, but I think I have a plan... :) I'll try to get back again in the morning and outline what I think'll work pretty well. Sometimes one only knows how well when actually working out the code if don't write it out on paper and for these exercises, what's the fun in that??? :)
Damith on 18 Mar 2015
yes, ignore the days that are incomplete.
Will see whether your plan works or not :)

dpb on 18 Mar 2015
Edited: dpb on 20 Mar 2015
Since dir returns a sorted list, I think I'd just process it similarly as you are but simply test the days within the loop by parsing the date/time string within a double loop. One could, as mentioned, get fancy and do some serious attempts at matching datenums but just doesn't seem to be needed here...
I'd start with something like the following which builds an array of the summed third column value in matching days by year. Each day is in a given column, only a small additional logic is needed to keep track of which day is present, a month and day row filled in by column while in the loop...
d=dir('TRMM_*_newntcl.csv');
ymdh=cell2mat(textscan([d.name], ...
'TRMM_%4d_%2d_%2d%2d_newntcl.csv', ...
'collectoutput',true));
idx=find(ymdh(:,end)==0); % get the beginning day 0 hr locations
N=length(idx);
idx=[idx;length(ymdh)+1];
precip=nan(51681,fix(N)); % preallocate on num whole days
k=0;
for i=1:length(idx)-1
isOK=~any(diff(ymdh(idx(i):idx(i+1)-1,3))) & idx(i+1)-idx(i)==8;
if ~isOK, continue, end % either aren't eight or it's a wrong day mixed in
data=zeros(51681,8); % preallocate based on num lat/lon points
for j=0:7 % ok, process each of these, they're one day's worth
end
k=k+1;
precip(:,k)=sum(data,2); % the daily sums are in an array
fn=d(idx(i)).name; fn(16:17)=[]; % fn(1:1)='O';
disp(fn)
csvwrite(fn,precip(:,k))
end
precip(:,k+1:end)=[]; % remove columns to leave complete days only
This snippet reads only the third column on the assumption you can easily add the constant position columns as you see fit.
I've never used a netCDF file so on casting it into that form I'm not of much use but the above should provide the basic building block to get the data consolidated by day. Note I specifically avoided cell arrays here and did the summing in place while accumulating the data to keep total memory down to minimal amount...as you've already done, use this inside the higher level loop to traverse the directories.
Unless there aren't multiple partial days, the final value of k will be the same as the floor(L/8) calculation. If there are there will be that many zero columns that can simply remove.
Hope that helps...
Damith on 24 Mar 2015
Thanks.
I was not successful in implementing this and simply do not know where and how to start with the precip array. Some coding help/hint will energize me. I need monthly and annual values for complete sets of data.
dpb on 24 Mar 2015
Go back an reread the comments starting on Mar 20@17:45 where I described how to build the 3D array of daily totals that you can subsequently simply sum by column for each plane and by plane for monthly and yearly totals for each year. What's the issue in doing that, it's simply modifying the size of the precip array initially and then using the ymdh array info on day and month to set the indices instead of just incrementing a counter k as doing now.

dpb on 24 Mar 2015
Edited: dpb on 29 Mar 2015
d=dir('TRMM_*_newntcl.csv');
ymdh=cell2mat(textscan([d.name], ...
'TRMM_%4d_%2d_%2d%2d_newntcl.csv', ...
'collectoutput',true));
idx=find(ymdh(:,end)==0); % get the beginning day 0 hr locations
N=length(idx);
idx=[idx;length(ymdh)+1];
precip=nan(51681,31,12); % preallocate on num whole days
%fprintf('Building daily sums for %d: \n',ymdh(1,1))
dispstat('','init')
for i=1:length(idx)-1
isOK=~any(diff(ymdh(idx(i):idx(i+1)-1,3))) & idx(i+1)-idx(i)==8;
if ~isOK, continue, end % either aren't eight or it's a wrong day mixed in
%msg=sprintf(' Month: %2d Day: %2d',ymdh(idx(i),2:3));
%dispstat(msg,'keepthis')
data=zeros(51681,8); % preallocate based on num lat/lon points
for j=0:7 % ok, process each of these, they're one day's worth
%dispstat(msg)
end
mo=ymdh(idx(i),2); % this month
da=ymdh(idx(i),3); % this day
precip(:,da,mo)=sum(data,2);
fn=d(idx(i)).name; fn(16:17)=[]; fn(1:1)='O';
%disp(fn)
csvwrite(fn,precip(:,k))
end
mTot=squeeze(sum(precip,2)); % monthly totals in each column 1-12
yTot=sum(mTot,2); % yearly total
Now just write mTot and yTot to files as you want them. As noted before, you'll probably want to do some checking also on whether there's data in every day of the month before accumulating that total and then for the months before the yearly total or you may have totals that aren't actually representative of the entire month/year.
Using nan to initialize instead of zero is one way, then any sums with NaN as a value are indicative altho there you've got to fix up the detail of how many days are in each month but that's pretty simple bookkeeping...
dpb on 30 Mar 2015
Edited: dpb on 30 Mar 2015
Somewhat against my better judgment (I think you need to develop some more critical thinking of solving the details of the logic problem outlined rather than relying on somebody else to provide complete solutions down to every detail), the following should work for any combination of missing days...
In the early portion of the script after building the ymdh array of dates/times from the file names insert
edpm=bsxfun(@eomday,double(ymdh(1)),[1:12].');
Then, after building the precip array, before the summation for the totals,
precip(isnan(precip(:,:,nDays==edpm)))=0;
Then revert to using builtin sum instead of nansum in
mTot=squeeze(sum(precip,2));
and joy should ensue.
I'll leave it to you to decipher why/how it works; hopefully from that exercise you'll gain a little bit of the thought process of using logical addressing within Matlab which is a key concept to grasp in effective use thereof.
The above will ensure complete months, if you do want to include any month in which there is a complete day, change the logic test from "==" to ">0". If you want there to be some minimum number of days in a month, then that would replace the 0, of course. In this case there's no need for the expected number of days/month, it's simply a test of whether there were any (or some number of) days counted.
I'll also reiterate the caution made previously in that if you do this, the data files created need to somewhere contain the information that some month(s) may be incomplete. This information is lost in the final output above; only keeping the result of the logic test for the specific month along with the totals will provide any way for the user of such data to know that is so from that file alone. Doing anything less is, in my opinion, a serious breach of scientific ethics.
One way, at a minimum, if you for some reason can't include the data in the files themselves would be to add an additional numeric code into the file name that would be the nDays(mo) value for the month. Then, at least the user would be able to check that against the expected days/month for complete datasets.
Similar arguments apply to the yearly totals, of course, but one presumes there may not be any of those that are complete in having every day of every month for the entire year. Still, encoding the days that are included would be significant information if can't include in the file the complete record of which days are missing/present.
dpb on 31 Mar 2015
BTW, if you haven't yet figured this out, you could save tons of time if you were to run the above script once for each year subdirectory and then
save precip
in that working directory. Then you can simply