Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Textscan with non delimited text

Subject: Textscan with non delimited text

From: Elizabeth

Date: 10 Aug, 2010 16:27:22

Message: 1 of 3

I am trying to import large datafiles in the DSI-3200 format (http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td3200.pdf), but there are not delimited:

DLY43124399SNOWTI19030299990280199-99999M00299-99999M00399-99999M00499-99999M00599-99999M00699-99999M00799-99999M00899-99999M00999-99999M01099-99999M01199-99999M01299-99999M01399-99999M01499-99999M01599-99999M01699-99999M0179900001001189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001289900000001
DLY43124399PRCPHI1903029999011179900006201189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001
DLY43124399SNWD0I1903029999011189900002901199900002901209900002901219900002801229900002801239900002701249900002501259900002401269900002401279900002201289900001701

Each DLY marks a new row, and rows are of variable length. This is what I have been using to import:

fid=fopen('filename.txt');

output=textscan(fid,['%*3s %6n %*2n %4s %2c %4n %2n %*4n %*3n',repmat('%2n %*2n %6d %1c %1c',[1,62])],'EmptyValue',-99999);

fclose(fid);

...The sequence in repmat is repeated 62 times because there may be up to two daily records per day, maximum 31 days per month. However, matlab does not recognize when it has reach the end of a row, so I only get one row of output. Instead of filling the blank spaces with -99999, i get [].

Any hint are appreciated.

Subject: Textscan with non delimited text

From: Sean

Date: 10 Aug, 2010 17:11:06

Message: 2 of 3

"Elizabeth " <ean2@unh.edu> wrote in message <i3rula$irj$1@fred.mathworks.com>...
> I am trying to import large datafiles in the DSI-3200 format (http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td3200.pdf), but there are not delimited:
>
> DLY43124399SNOWTI19030299990280199-99999M00299-99999M00399-99999M00499-99999M00599-99999M00699-99999M00799-99999M00899-99999M00999-99999M01099-99999M01199-99999M01299-99999M01399-99999M01499-99999M01599-99999M01699-99999M0179900001001189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001289900000001
> DLY43124399PRCPHI1903029999011179900006201189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001
> DLY43124399SNWD0I1903029999011189900002901199900002901209900002901219900002801229900002801239900002701249900002501259900002401269900002401279900002201289900001701
>
> Each DLY marks a new row, and rows are of variable length. This is what I have been using to import:
>
> fid=fopen('filename.txt');
>
> output=textscan(fid,['%*3s %6n %*2n %4s %2c %4n %2n %*4n %*3n',repmat('%2n %*2n %6d %1c %1c',[1,62])],'EmptyValue',-99999);
>
> fclose(fid);
>
> ...The sequence in repmat is repeated 62 times because there may be up to two daily records per day, maximum 31 days per month. However, matlab does not recognize when it has reach the end of a row, so I only get one row of output. Instead of filling the blank spaces with -99999, i get [].
>
> Any hint are appreciated.

So you want to isolate all of the text after 'DLY' before the next occurrence?

%%%
%Load Text by the rows in file
fid = fopen('dly.txt');
T = textscan(fid,'%s');
fclose(fid);

%Combine in to one long string
T = cell2mat(T{1}');

%Split
rows = regexp(T,'DLY','split')

%%%My DLY test case was dly.txt:
DLY8038938
DLY09080DLY983r4938539058830DLY3850
2DLY29

Subject: Textscan with non delimited text

From: Elizabeth

Date: 10 Aug, 2010 17:31:05

Message: 3 of 3

No, I'm trying to split it into the different elements outlined in the pdf I linked in my 1st message. So the first row should be divided up into:

DLY 431243 99 SNOW TI 1903 02 9999 028 01 99 -99999 M 0 02 99 -99999 M....etc

It does fine with the first row up until there are no more values. But then it does not start on the next row.

I've given up and am delimiting the files in excel and importing using csv. It's just taking an eternity because i have to manually click on the line breaks.

Liz.


"Sean " <sean.dewolski@nospamplease.umit.maine.edu> wrote in message <i3s17a$5qg$1@fred.mathworks.com>...
> "Elizabeth " <ean2@unh.edu> wrote in message <i3rula$irj$1@fred.mathworks.com>...
> > I am trying to import large datafiles in the DSI-3200 format (http://www1.ncdc.noaa.gov/pub/data/documentlibrary/tddoc/td3200.pdf), but there are not delimited:
> >
> > DLY43124399SNOWTI19030299990280199-99999M00299-99999M00399-99999M00499-99999M00599-99999M00699-99999M00799-99999M00899-99999M00999-99999M01099-99999M01199-99999M01299-99999M01399-99999M01499-99999M01599-99999M01699-99999M0179900001001189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001289900000001
> > DLY43124399PRCPHI1903029999011179900006201189900000001199900000001209900000001219900000001229900000001239900000001249900000001259900000001269900000001279900000001
> > DLY43124399SNWD0I1903029999011189900002901199900002901209900002901219900002801229900002801239900002701249900002501259900002401269900002401279900002201289900001701
> >
> > Each DLY marks a new row, and rows are of variable length. This is what I have been using to import:
> >
> > fid=fopen('filename.txt');
> >
> > output=textscan(fid,['%*3s %6n %*2n %4s %2c %4n %2n %*4n %*3n',repmat('%2n %*2n %6d %1c %1c',[1,62])],'EmptyValue',-99999);
> >
> > fclose(fid);
> >
> > ...The sequence in repmat is repeated 62 times because there may be up to two daily records per day, maximum 31 days per month. However, matlab does not recognize when it has reach the end of a row, so I only get one row of output. Instead of filling the blank spaces with -99999, i get [].
> >
> > Any hint are appreciated.
>
> So you want to isolate all of the text after 'DLY' before the next occurrence?
>
> %%%
> %Load Text by the rows in file
> fid = fopen('dly.txt');
> T = textscan(fid,'%s');
> fclose(fid);
>
> %Combine in to one long string
> T = cell2mat(T{1}');
>
> %Split
> rows = regexp(T,'DLY','split')
>
> %%%My DLY test case was dly.txt:
> DLY8038938
> DLY09080DLY983r4938539058830DLY3850
> 2DLY29

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us