Thread Subject: Summarizing yearly/monthly data in time series

Subject: Summarizing yearly/monthly data in time series

From: Ryan Utz

Date: 28 Oct, 2009 20:48:02

Message: 1 of 17

Hi all,

I have a time series dataset that consists of daily data over several years. Most of the analyses are run on a matrix where the first column is the year, the second the month, the third the day and the final column is the data of interest. I want to do some simple summary statistics (sum of the data, for instance) for each month. I was assuming I'd do this in a for-loop.

Normally, I would identify each unique month using the 'unique' and 'find' commands in the for-loop but since there is a month AND year I'm not quite sure how to do it. I have figured out that 'find' may include conditionals (i.e. '&') but still can't get to the bottom of it...

Also, I know I can use the 'consolidator' macro to do this quite easily, but I'm trying to write my own script for use by others and don't want to make the users download 'consolidator' as well.

Any takers?

Thanks,
ryan

Subject: Summarizing yearly/monthly data in time series

From: dpb

Date: 28 Oct, 2009 21:09:33

Message: 2 of 17

Ryan Utz wrote:
> Hi all,
>
> I have a time series dataset that consists of daily data over several
> years. Most of the analyses are run on a matrix where the first
> column is the year, the second the month, the third the day and the
> final column is the data of interest. I want to do some simple
> summary statistics (sum of the data, for instance) for each month. I
> was assuming I'd do this in a for-loop.
>
> Normally, I would identify each unique month using the 'unique' and
> 'find' commands in the for-loop but since there is a month AND year
> I'm not quite sure how to do it. I have figured out that 'find' may
> include conditionals (i.e. '&') but still can't get to the bottom of
> it...
>
> Also, I know I can use the 'consolidator' macro to do this quite
> easily, but I'm trying to write my own script for use by others and
> don't want to make the users download 'consolidator' as well.
>
> Any takers?

I'd suggest turning the y/m/d columns into date numbers and then use
date ranges on them instead of trying to do it on individual values.

--

Subject: Summarizing yearly/monthly data in time series

From: Oleg Komarov

Date: 28 Oct, 2009 21:54:02

Message: 3 of 17

dpb <none@non.net> wrote in message <hcac7v$3d5$1@news.eternal-september.org>...
> Ryan Utz wrote:
> > Hi all,
> >
> > I have a time series dataset that consists of daily data over several
> > years. Most of the analyses are run on a matrix where the first
> > column is the year, the second the month, the third the day and the
> > final column is the data of interest. I want to do some simple
> > summary statistics (sum of the data, for instance) for each month. I
> > was assuming I'd do this in a for-loop.
> >
> > Normally, I would identify each unique month using the 'unique' and
> > 'find' commands in the for-loop but since there is a month AND year
> > I'm not quite sure how to do it. I have figured out that 'find' may
> > include conditionals (i.e. '&') but still can't get to the bottom of
> > it...
> >
> > Also, I know I can use the 'consolidator' macro to do this quite
> > easily, but I'm trying to write my own script for use by others and
> > don't want to make the users download 'consolidator' as well.
> >
> > Any takers?
>
> I'd suggest turning the y/m/d columns into date numbers and then use
> date ranges on them instead of trying to do it on individual values.
>
> --
Data ranges loose on precision.
% I suppose your matrix would look like this:
[Y, M, D] = datevec((733075:734074)');
A = [Y, M, D , -1.7 + (5).*rand(1000,1)];

% Retrieve unique combinations of year and month
Monthly = unique(A(:,1:2),'rows');
% Build subs for accumarray
[~, subs] = ismember(A(:,1:2), Monthly, 'rows');
% Do accumarray
Monthly_sum = accumarray(subs, A(:,4));

Subject: Summarizing yearly/monthly data in time series

From: dpb

Date: 28 Oct, 2009 23:56:23

Message: 4 of 17

Oleg Komarov wrote:
...
> Data ranges loose on precision.
...
Certainly not at level of days since they're integer values...

--

Subject: Summarizing yearly/monthly data in time series

From: Oleg Komarov

Date: 29 Oct, 2009 00:19:02

Message: 5 of 17

dpb <none@non.net> wrote in message <hcam0p$q32$1@news.eternal-september.org>...
> Oleg Komarov wrote:
> ...
> > Data ranges loose on precision.
> ...
> Certainly not at level of days since they're integer values...
>
> --

i used "loose on precision" improperly.
What i meant is, defining fixed ranges (lets say of 30 days) would possibly lead sometimes to group into the same month, one or two days of another month.
suppose i start on the 31 jan, setting 30 days as my range to get montlhy stats.
31 jan + 30 would yield 1 or 2 march!

Subject: Summarizing yearly/monthly data in time series

From: dpb

Date: 29 Oct, 2009 00:20:13

Message: 6 of 17

Oleg Komarov wrote:
> dpb <none@non.net> wrote in message <hcam0p$q32$1@news.eternal-september.org>...
>> Oleg Komarov wrote:
>> ...
>>> Data ranges loose on precision.
>> ...
>> Certainly not at level of days since they're integer values...
>>
>> --
>
> i used "loose on precision" improperly.
> What i meant is, defining fixed ranges (lets say of 30 days) would
> possibly lead sometimes to group into the same month, one or two days of
> another month.
> suppose i start on the 31 jan, setting 30 days as my range to get montlhy stats.
> 31 jan + 30 would yield 1 or 2 march!

Well, that's not how you would set the intervals, obviously. One would
specify the start/end dates and compute the datevalues for those as
well. This leaves one w/ a totally generic interface where the user can
request any range within the database with no changes in algorithm.

--

Subject: Summarizing yearly/monthly data in time series

From: NZTideMan

Date: 29 Oct, 2009 01:41:38

Message: 7 of 17

On Oct 29, 1:20 pm, dpb <n...@non.net> wrote:
> Oleg Komarov wrote:
> > dpb <n...@non.net> wrote in message <hcam0p$q3...@news.eternal-september.org>...
> >> Oleg Komarov wrote:
> >> ...
> >>> Data ranges loose on precision.
> >> ...
> >> Certainly not at level of days since they're integer values...
>
> >> --
>
> > i used "loose on precision" improperly.
> > What i meant is, defining fixed ranges (lets say of 30 days) would
> > possibly lead sometimes to group into the same month, one or two days of
> > another month.
> > suppose i start on the 31 jan, setting 30 days as my range to get montlhy stats.
> > 31 jan + 30 would yield 1 or 2 march!
>
> Well, that's not how you would set the intervals, obviously.  One would
> specify the start/end dates and compute the datevalues for those as
> well.  This leaves one w/ a totally generic interface where the user can
> request any range within the database with no changes in algorithm.
>
> --

Now come on children, stop bickering!!

It's quite easy really.
To get the start times for the first of each month for the 100 months
from 1-Jan-2009:
t=datenum(2009*ones(100,1),[1:100]',ones(100,1));

Subject: Summarizing yearly/monthly data in time series

From: Oleg Komarov

Date: 29 Oct, 2009 08:49:02

Message: 8 of 17

> Now come on children, stop bickering!!
>
> It's quite easy really.
> To get the start times for the first of each month for the 100 months
> from 1-Jan-2009:
> t=datenum(2009*ones(100,1),[1:100]',ones(100,1));

well show me how easy is to get the intervals on each end of the month ona period longer than 4 years...
:)

Subject: Summarizing yearly/monthly data in time series

From: Branko

Date: 29 Oct, 2009 11:00:18

Message: 9 of 17

"Oleg Komarov" <oleg.komarov@hotmail.it> wrote in message <hcbktu$jjv$1@fred.mathworks.com>...
> > Now come on children, stop bickering!!
> >
> > It's quite easy really.
> > To get the start times for the first of each month for the 100 months
> > from 1-Jan-2009:
> > t=datenum(2009*ones(100,1),[1:100]',ones(100,1));
>
> well show me how easy is to get the intervals on each end of the month ona period longer than 4 years...
> :)

Here is one approach:

% Time span and increament
start = datenum('1-1-2004');
increment = datenum([0 0 1 0 0 0]); % day
finish = datenum('12-31-2009');
time = start:increment:finish;
yr=str2num(datestr(time,'yyyy'));
mon=str2num(datestr(time,'mm'));
day=str2num(datestr(time,'dd'));

% Sort by year
[vu,ix,ix]=unique(yr);
n=accumarray(ix,1);
nc=arrayfun(@(x) 1:x,n,'uni',false);
nc=[nc{:}].';
y=(accumarray([ix,nc],day,[],[],nan))';
Y=y(:)';
Y(isnan(Y))=[];

% Find max of the month
idx=[false,Y(1:end-2)<Y(2:end-1) & Y(2:end-1)>Y(3:end),false];
% Plot every fourth month
x=(1:length(Y))';
line(time,Y);
line(time(idx),Y(idx),'marker','*','Color','g','linestyle','none');
datetick('x','mm-dd-yy');
ylabel('Days in moth');
xlabel('Time')
% Display
disp([time(idx); Y(idx)]');

Branko

Subject: Summarizing yearly/monthly data in time series

From: Steven Lord

Date: 29 Oct, 2009 17:25:20

Message: 10 of 17


"Oleg Komarov" <oleg.komarov@hotmail.it> wrote in message
news:hcbktu$jjv$1@fred.mathworks.com...
>> Now come on children, stop bickering!!
>>
>> It's quite easy really.
>> To get the start times for the first of each month for the 100 months
>> from 1-Jan-2009:
>> t=datenum(2009*ones(100,1),[1:100]',ones(100,1));
>
> well show me how easy is to get the intervals on each end of the month ona
> period longer than 4 years...
> :)

Take a look at this function:

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/eomday.html

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ

Subject: Summarizing yearly/monthly data in time series

From: Oleg Komarov

Date: 4 Nov, 2009 15:11:02

Message: 11 of 17

"Steven Lord" <slord@mathworks.com> wrote in message <hccj4e$j17$1@fred.mathworks.com>...
>
> "Oleg Komarov" <oleg.komarov@hotmail.it> wrote in message
> news:hcbktu$jjv$1@fred.mathworks.com...
> >> Now come on children, stop bickering!!
> >>
> >> It's quite easy really.
> >> To get the start times for the first of each month for the 100 months
> >> from 1-Jan-2009:
> >> t=datenum(2009*ones(100,1),[1:100]',ones(100,1));
> >
> > well show me how easy is to get the intervals on each end of the month ona
> > period longer than 4 years...
> > :)
>
> Take a look at this function:
>
> http://www.mathworks.com/access/helpdesk/help/techdoc/ref/eomday.html
>
> --
> Steve Lord
> slord@mathworks.com
> comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
>
Ok i new it.
adding the complexity that sometimes you might wanna go for the last working day of the month (which is in fact truly the reference for financial applications) and excluding the Financial TB you need to know the day of the week.
This brings everything to my first approach, that is working on data ranges may be more bothersome than working with datevec, according to my experience/opinion

Subject: Summarizing yearly/monthly data in time series

From: TideMan

Date: 4 Nov, 2009 18:56:05

Message: 12 of 17

On Oct 29, 9:49 pm, "Oleg Komarov" <oleg.koma...@hotmail.it> wrote:
> > Now come on children, stop bickering!!
>
> > It's quite easy really.
> > To get the start times for the first of each month for the 100 months
> > from 1-Jan-2009:
> > t=datenum(2009*ones(100,1),[1:100]',ones(100,1));
>
> well show me how easy is to get the intervals on each end of the month ona period longer than 4 years...
> :)

How about:
t=datenum(2009*ones(100,1),[1:100]',ones(100,1)) - 1;

Subject: Summarizing yearly/monthly data in time series

From: Oleg Komarov

Date: 4 Nov, 2009 19:23:03

Message: 13 of 17

TideMan <mulgor@gmail.com> wrote in message <2b1ccf73-822a-443a-9ddb-c5f7ed5b01df@b36g2000prf.googlegroups.com>...
> On Oct 29, 9:49?pm, "Oleg Komarov" <oleg.koma...@hotmail.it> wrote:
> > > Now come on children, stop bickering!!
> >
> > > It's quite easy really.
> > > To get the start times for the first of each month for the 100 months
> > > from 1-Jan-2009:
> > > t=datenum(2009*ones(100,1),[1:100]',ones(100,1));
> >
> > well show me how easy is to get the intervals on each end of the month ona period longer than 4 years...
> > :)
>
> How about:
> t=datenum(2009*ones(100,1),[1:100]',ones(100,1)) - 1;
Actually the best solution to find the last day of the month.
I use the same concept, find the day "1" of a month and go back by one but never came in mind this feature: is it documented that datenum increments automatically the conversion if month > 12?
does it follows the same principle for day > 31?

Subject: Summarizing yearly/monthly data in time series

From: Bruno Luong

Date: 4 Nov, 2009 19:41:02

Message: 14 of 17

"Ryan Utz" <rutz@al.umces.edu> wrote in message <hcaam2$8dv$1@fred.mathworks.com>...
> Hi all,
>
> I have a time series dataset that consists of daily data over several years. Most of the analyses are run on a matrix where the first column is the year, the second the month, the third the day and the final column is the data of interest. I want to do some simple summary statistics (sum of the data, for instance) for each month. I was assuming I'd do this in a for-loop.
>
> Normally, I would identify each unique month using the 'unique' and 'find' commands in the for-loop but since there is a month AND year I'm not quite sure how to do it. I have figured out that 'find' may include conditionals (i.e. '&') but still can't get to the bottom of it...

May be something like this:

year_selected = 2009;
month_selected = 11;

matched = data(:,1)==selected & data(:,2)==month_selected;
sumdata = sum(data(matched,4))

Bruno

Subject: Summarizing yearly/monthly data in time series

From: dpb

Date: 4 Nov, 2009 19:37:14

Message: 15 of 17

Oleg Komarov wrote:
...
> I use the same concept, find the day "1" of a month and go back by
> one but never came in mind this feature: is it documented that
> datenum increments automatically the conversion if month > 12? does
> it follows the same principle for day > 31?

Yes,

doc datenum

Keep reading... :)

--

Subject: Summarizing yearly/monthly data in time series

From: Bruno Luong

Date: 4 Nov, 2009 19:55:21

Message: 16 of 17

> year_selected = 2009;
> month_selected = 11;
>
> matched = data(:,1)==selected & data(:,2)==month_selected;
> sumdata = sum(data(matched,4))

May be ISMEMBER syntax is clearer than FIND

data=[2009 11 01 1;
          2008 10 02 2;
          2009 11 02 3]

year_selected = 2009;
month_selected = 11;

matched = ismember(data(:,[1 2]), [year_selected month_selected], 'rows');
sumdata = sum(data(matched,4))

Bruno

Subject: Summarizing yearly/monthly data in time series

From: Phuong Nam Vu

Date: 20 Dec, 2009 07:55:04

Message: 17 of 17

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <hcsm79$79a$1@fred.mathworks.com>...
> > year_selected = 2009;
> > month_selected = 11;
> >
> > matched = data(:,1)==selected & data(:,2)==month_selected;
> > sumdata = sum(data(matched,4))
>
> May be ISMEMBER syntax is clearer than FIND
>
> data=[2009 11 01 1;
> 2008 10 02 2;
> 2009 11 02 3]
>
> year_selected = 2009;
> month_selected = 11;
>
> matched = ismember(data(:,[1 2]), [year_selected month_selected], 'rows');
> sumdata = sum(data(matched,4))
>
> Bruno
Dear all,
I have time series like this:
1961 I II III IV V VI VII VIII IX X XI XII

   1 12 11.3 17.1 18.4 23.7 26.1 24.4 25 23.8 21.5 21.7 19.1
   2 13.1 11.2 19.9 17.6 24.7 25.7 26.1 25.8 24.5 21.9 21.1 21
   3 13.1 11.2 19.5 20.4 22 22 26.6 26.6 24.3 23 21.2 19.3
   4 14.4 11.9 19.6 20.5 20.6 21.2 24.8 26.2 22.7 23.4 29 16.4
   5 14.4 12.9 20 23.2 20.4 21.5 25.3 24.6 24.2 23.2 16 14.7
   6 11.7 16.7 20.3 23.5 20.1 23.7 25.6 24.6 24.5 22.4 13.4 13.9
   7 10.8 17 21.2 22.5 23.7 23.4 22.7 25.5 24.1 20.5 19 9.9
   8 12.4 14.4 21.4 24.4 23.7 24.8 23.2 25.5 23.6 21.3 20 7.3
   9 14.1 15 16.1 20 25.1 24.7 23.2 24.1 23.6 22.3 21 7.8
  10 15.4 16.4 14.5 20 25.2 25.2 23.8 25.4 23.3 22.8 20 10
  11 14.4 18.2 15.3 21.6 24.6 24.7 25.7 24.5 23.2 22.8 22.2 12.4
  12 11.3 16.7 17.1 23.9 23.5 24.9 25.9 25.2 23.6 19.2 21.1 13.7
  13 9.8 15.5 18.8 21.2 20.6 24.7 26.7 24.9 24.4 19.4 21 15.4
  14 9.5 15.9 21.2 24.6 21.9 24.8 29.1 25 24 20.8 19.4 17
  15 7.8 14.1 17.9 21.4 24.1 24.7 25.2 24.6 23.9 20 18.7 16.8
  16 10.1 11.1 17.5 21.2 26.3 25.2 23.8 24.9 24.1 21 18.7 16.8
  17 3.3 11.3 17.6 22.6 24.2 23.9 24.9 24.6 24.2 21.8 20.4 19.9
  18 1.9 11.3 19.8 24 25.2 23.2 24.7 23.7 24.1 23 21.6 18.4
  19 2.8 12.5 21.3 22.4 25.5 24.5 24.7 23.9 23.6 22.1 21.5 17.4
  20 8.8 15.1 22.7 21.6 24.6 24.3 24.7 23.8 23.9 23.5 16.6 17.9
  21 14.3 16.2 22.7 22.6 24.2 23.7 24.8 24.6 23.6 22.2 12.6 17.8
  22 14.5 16.2 24 25.4 23.5 24.8 24.4 24.5 24.6 22.5 12.3 15.6
  23 11 14.7 23.8 24.5 22.1 24.7 25.5 25 23.7 23.4 16.6 16.3
  24 13 14.3 19.2 23 21.5 25.7 25.5 24.1 23.3 22.3 15.6 17.6
  25 13.9 14.8 18.2 25.9 22 25.3 25.2 23.5 23.6 21.5 12.7 19.5
  26 12.4 15.2 17.3 21.3 24.1 24.6 26 25.1 23.7 22 15.3 19
  27 13.4 14.7 16.5 19.4 23.6 24.7 25.5 25.9 24.1 21.5 14.3 20.6
  28 13.8 15.7 15.8 17.5 23.5 25.2 25.8 25.1 24.7 22 17.8 15
  29 14.8 17.1 18.2 23.5 23.2 25.2 23.5 24 22.8 19.1 13
  30 17.5 17.1 23 24.1 24.4 24.7 23.5 23.7 22.9 20.4 12.8
  31 12.9 17.2 22.8 24.6 24.2 20.5 12.4
 1962 I II III IV V VI VII VIII IX X XI XII

   1 10.6 11.8 12.4 19.4 22.4 25.9 25 24.6 24.2 22.9 23.5 12.5
   2 10.4 12.2 11.4 22.3 22.5 24.9 25.9 25.2 25 23.6 24.4 6
   3 8.4 12.5 13.9 18.8 22.5 25 24 25.1 24.8 23.5 21.5 6
   4 8.6 10.8 11.9 16.5 24.5 25.9 25 25.1 23.9 24.3 18.2 7.2
   5 12.3 10.4 17.5 12.8 23.5 25.8 25.2 25.5 21.5 22.1 14.9 7
   6 13.3 12.1 18.5 20.4 23.5 24.5 24.8 24.3 21 21.7 13 5.9
   7 13.4 13.7 20.1 22.5 24.2 24 25 25.2 19.8 23.1 16.6 7
   8 12.4 14.7 20.7 21.3 25.4 24 24.5 26.4 20.3 22.3 19 13.5
   9 12.1 14.3 21.1 22.8 22.6 24.8 26.7 25.4 23 23.1 18.2 11.5
  10 13.3 15.3 20.6 20.4 22.9 23.7 25.4 25.1 22.9 22.9 16.5 12.8
  11 13.2 18.7 19 21.3 23.6 23.5 25.5 23.6 23.2 22.6 13.6 17.4
  12 11.7 20.1 17.9 23.7 23.5 24 24.8 23.6 23.5 22.8 13 15.9
  13 13.9 20.5 19.5 18.8 23.6 24.4 24 24.7 23.6 23.6 15.2 18.7
  14 14.4 14.8 20.6 18.4 25.2 24 24 24.3 23.5 22.9 19.2 18
  15 14.8 13.8 21.4 18.3 25.1 24.9 24.6 24.1 22.5 19.3 20 17.8
  16 12.7 14 20.7 19.7 25.1 25 24.2 24.3 23.7 18.3 20.4 18.3
  17 10.9 10.4 19.2 22.4 25.1 25.4 24.3 24.1 24.5 19.6 21.2 15.5
  18 9.9 12.4 18.7 20.2 21.7 23.9 25.1 23.2 23.9 21 20.5 15.5
  19 10.2 11.5 19.8 18.9 21.2 24.4 25.6 23.2 23.8 20 19.8 17.6
  20 10.9 10.4 22.3 20.4 20 24.9 25.2 24.6 23.6 20.5 20.4 16.4
  21 7.5 10.5 20.4 21.4 21.9 25 25.5 24.1 22.5 17.9 20 16
  22 11.4 13.2 16.9 22.5 24 25.4 28.7 25.1 22.9 16.1 15.2 13.5
  23 10.9 16.4 14.6 22.4 25 25.4 26.2 25.1 22.8 16.1 14 9.6
  24 9.1 17.9 13.3 22.1 25.2 25.9 25 24.1 23.2 17.6 14.8 10.3
  25 12.2 18.7 13.7 18.4 25.2 23.1 25.9 25.2 23.5 18.1 16.9 11
  26 11.6 17.9 14.3 18.6 25.5 25.4 25.8 24.9 23 15.7 18.3 13.8
  27 5.9 16.5 13.5 21.6 26.4 25.4 24.5 24.1 22.6 16.4 14.5 15.3
  28 5.9 14.2 14.1 22.5 25 23.4 25 25.2 23.8 16.1 11.9 15.5
  29 12.8 16.8 22.6 25.2 23.4 25.7 22.5 23.5 17.3 13 16.4
  30 12.6 15.9 21.6 23.4 25.5 25.9 24.7 22.4 19.3 12.4 15.7
  31 11.5 17.3 26.4 27.1 24.2 21.1 14.5
Please help me to arrange those above data into ONE column ( including 2 years 1961 and 1962)

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
series Oleg Komarov 28 Oct, 2009 17:59:05
time Oleg Komarov 28 Oct, 2009 17:59:05
accumarray Oleg Komarov 28 Oct, 2009 17:59:04
time series Ryan Utz 28 Oct, 2009 16:49:05
find Ryan Utz 28 Oct, 2009 16:49:05
unique Ryan Utz 28 Oct, 2009 16:49:05
rssFeed for this Thread

Contact us at files@mathworks.com