MATLAB Answers

Ms. Mat

Datenum - different date formats

Asked by Ms. Mat
on 23 Dec 2012

Background

I have an excel sheet with different date formats (in excel some are in General format, some in date format totally random)

example: 10/05/2005 , 10/5/2005, 10/05/05, 05/12/2005 and so on.

I read the XL file into MATLAB as follows:

[dta_num , dta_txt , dta_raw] = xlsread(...)

I need to work with the dates in the excel sheet subsequently.

Actual Problem

I tried out the following commands

  • datenum('10/5/2005')
  • datenum('10/05/2005')
  • datenum('10/05/05')
  • datenum(dta_txt(2,2))

All of them return the same answer. (732590) But I realized that 732590 is read as 05-Oct-2005 when I ran

datestr(732590)

However the dates in my data sheet needs to be read as dd/mm i.e 10-May-2005.

If I choose to specify the format,

  1. * datenum('10/5/2005','dd/mm/yyyy') ans = 732442
  2. * datenum('10/05/2005','dd/mm/yyyy') ans = 732442
  3. * datenum('10/05/05','dd/mm/yyyy') ans = 1957

Originally I thought I will convert the dates using datenum and the after my manipulations using functions such as month(datenum(dta_txt(2,2),'dd/mm/yyyy')) I will use datestr and I will have all the dates in the same format. But now I am back to square 1 since datestr(1957) gives me a 2 digit year.

Also I want to make sure that date formats are not mixed up i.e dd/mm/yyyy and mm/dd/yyyy are mixed up. For this I plan to go over the date and see if the previous date belongs to the same month or one month earlier and no month is greater than 12 assuming a dd/mm format. If it is, then it is to be decoded as mm/dd/yyyy and appropriately changed to dd/mm/yyyy.

Is the only solution is to go over the length of the dta_txt contents using cellfun('length',dta_txt(:,2)) and change it to 4 year format (I guess there are very few in 2 year digit YY format)

To summarize my requirements are;

1. date format which takes less resource and easy to manipulate as I might want to extract and make comparisons of the the month and year later. 2. Have a consistent date format preferably dd/mm/yyyy.

  2 Comments

Walter Roberson
on 23 Dec 2012

I closed the other (earlier) post as there had been less commentary on it.

Products

No products are associated with this question.

2 Answers

Answer by Laura Proctor
on 23 Dec 2012
Edited by Laura Proctor
on 23 Dec 2012
 Accepted answer

Would something like this help?

% Create a cell array containing dates in various formats
y = { '10/5/05' ; '18/2/2004' ; '3/3/2003' }
% Split up the elements in the cell array
[~,~,~,~,~,~,ss] = regexp(y,'/')
% Determine which dates have a two element year
ivals = cellfun(@(x) numel(x{3})==2,ss)
% Preallocate a vector to contain the dates
dates = zeros(size(y))
% Convert those dates with a four element year using datenum
dates(~ivals) = datenum(y(~ivals),'dd/mm/yyyy')
% Convert the dates with a two element year
dates(ivals) = datenum(y(ivals),'dd/mm/yy')

You could also use datevec to pull out the elements separately. Here I have assumed that any two element years that occur are later than 2000. If you also have dates that occur in the 1900s, then a little more logic is required.

% Create three vectors: year, month, and day
[year,month,day,~,~,~] = datevec(y,'dd/mm/yyyy')
% Logical array containing those year values that are less than 2000
lt2000 = year < 2000
% Add 2000 to any year that are less than 2000
year(lt2000) = 2000+year(lt2000)

  3 Comments

José-Luis
on 23 Dec 2012

An alternative solution would be to first manipulate your excel file so all the dates are stored as numbers. I know next to nothing about Excel so I can't really advice you on that. It should be straightforward, methinks.

Once you have the date as a serial number, then you can transform to a Matlab datenumber after importing and avoid all the potentially costly string comparisons.

bym
on 23 Dec 2012

This works for 2 digit years

datenum('10/05/05','dd/mm/yyyy',2000)
ans =
        732442
Laura Proctor
on 23 Dec 2012

You're right, proecsm - it does work if you just have one date. However, if you have a cell array of dates, it will not return the same results.

Walter posted a great solution in the duplicate post:

dates = datenum(y,'dd/mm/yyyy',2000)

Answer by per isakson
on 23 Dec 2012
Edited by per isakson
on 23 Dec 2012

There is no way for me (in Sweden) to know how to interpret

    '01/02/03'

or

    '01/02/2003'

IMO: when in doubt avoid to use the default values. It is better to specify the format an extra time than not to do it when needed.

IMO: the most precious resource is the time of the programmer.

In this case a vectorized code is faster than a loop. Experiment:

>> mysteries_dates_to_sdn
Elapsed time is 1.540982 seconds.
Elapsed time is 0.056104 seconds.
>> mysteries_dates_to_sdn
Elapsed time is 1.508512 seconds.
Elapsed time is 0.055982 seconds.
>> 

where

    %% mysteries_dates_to_sdn
    N = 1e3;
    castr = {
        '10/05/2005'
        '10/5/2005'
        '10/05/05'
        '05/12/2005'
        };
    %%
    castr = repmat( castr, [N,1] );
    %%
    tic
    sdn_loop = mysteries_dates_to_sdn_loop( castr );
    toc
    %%
    tic
    sdn_vec = mysteries_dates_to_sdn_vectorized( castr );
    toc
    %%
    assert( all( sdn_vec == sdn_loop )              ...
        , 'mysteries_dates_to_sdn:failure'          ...
        , 'Failure: sdn_loop differs from sdn_vec'  )

and

    function    sdn = mysteries_dates_to_sdn_loop( castr )
        sdn = nan( size( castr ) );
        for ii = 1 : numel( castr )
            str = castr{ii};
            len = length( str );
            if len >= 9
                sdn(ii) = datenum( str, 'dd/mm/yyyy' );
            else
                sdn(ii) = datenum( str, 'dd/mm/yy' );
            end
        end
    end

and

    function    sdn = mysteries_dates_to_sdn_vectorized( castr )
        sdn = nan( size( castr ) );
        len = cellfun( @length, castr );
        islong  = ( len >= 9 );
        isshort = not( islong );
        sdn(islong)  = datenum( castr(islong) , 'dd/mm/yyyy' ); 
        sdn(isshort) = datenum( castr(isshort), 'dd/mm/yy' ); 
     end

.

BTW: Why not use datevec rather than datenum to avoid calling the function, month

.

More:

This function (proposed above) is approx. 7% faster than mysteries_dates_to_sdn_vectorized

    function    sdn = mysteries_dates_to_sdn_pivot( castr )
        sdn = datenum( castr, 'dd/mm/yyyy', 2000 ); 
    end

  0 Comments


Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply today