Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
reading alphanumeric data

Subject: reading alphanumeric data

From: pathfinder Kunagu

Date: 15 Oct, 2009 09:10:19

Message: 1 of 21

Hello Friends,
I am doing some data processing thing.
My data (alphanumeric) look like the following..having 10 columns, with typical notations.

Now the task is I want to extract data corresponding to selected dates & to process it.
I've used 'textread' also but having problem with the second column bcz of those ':' symbol.

Plz let me know how to go ahead with this task.

01-Jan-2001 23:59:53 -63.04 170.28 475.34 0 3 4014.662 6103.450 -50739.887
01-Jan-2001 23:59:54 -63.10 170.30 475.32 0 3 3974.387 6101.893 -50746.957
01-Jan-2001 23:59:55 -63.17 170.31 475.31 0 3 3934.606 6098.710 -50754.066
01-Jan-2001 23:59:56 -63.23 170.32 475.29 0 3 3896.105 6097.053 -50760.723
01-Jan-2001 23:59:57 -63.29 170.33 475.27 0 3 3856.992 6093.131 -50767.594
01-Jan-2001 23:59:58 -63.36 170.34 475.25 0 3 3817.332 6091.279 -50774.145
01-Jan-2001 23:59:59 -63.42 170.35 475.23 0 3 3778.741 6094.354 -50779.902
02-Jan-2001 00:00:00 -63.48 170.36 475.21 0 3 3740.585 6092.740 -50786.066
02-Jan-2001 00:00:01 -63.55 170.37 475.19 0 3 3701.370 6090.758 -50792.227
02-Jan-2001 00:00:02 -63.61 170.38 475.17 0 3 3662.072 6089.302 -50798.195
02-Jan-2001 00:00:03 -63.67 170.39 475.15 0 3 3622.398 6087.077 -50804.258
02-Jan-2001 00:00:04 -63.74 170.41 475.14 0 3 3583.234 6085.149 -50810.035
02-Jan-2001 00:00:05 -63.80 170.42 475.12 0 3 3543.696 6086.633 -50815.270
02-Jan-2001 00:00:06 -63.86 170.43 475.10 0 3 3504.965 6081.185 -50821.223
02-Jan-2001 00:00:07 -63.93 170.44 475.08 0 3 3466.551 6082.347 -50826.211
02-Jan-2001 00:00:08 -63.99 170.45 475.06 0 3 3427.262 6078.968 -50831.672
02-Jan-2001 00:00:09 -64.05 170.46 475.04 0 3 3388.032 6077.467 -50836.766
02-Jan-2001 00:00:10 -64.12 170.47 475.02 0 3 3348.961 6074.810 -50841.891
02-Jan-2001 00:00:11 -64.18 170.49 475.00 0 3 3309.816 6072.139 -50846.875

Praveen.

Subject: reading alphanumeric data

From: Branko

Date: 15 Oct, 2009 10:32:03

Message: 2 of 21

"pathfinder Kunagu" <praveen.earth@gmail.com> wrote in message <hb6otr$oq4$1@fred.mathworks.com>...
> Hello Friends,
> I am doing some data processing thing.
> My data (alphanumeric) look like the following..having 10 columns, with typical notations.
>
> Now the task is I want to extract data corresponding to selected dates & to process it.
> I've used 'textread' also but having problem with the second column bcz of those ':' symbol.
>
> Plz let me know how to go ahead with this task.
>
> 01-Jan-2001 23:59:53 -63.04 170.28 475.34 0 3 4014.662 6103.450 -50739.887
> 01-Jan-2001 23:59:54 -63.10 170.30 475.32 0 3 3974.387 6101.893 -50746.957
> 01-Jan-2001 23:59:55 -63.17 170.31 475.31 0 3 3934.606 6098.710 -50754.066
> 01-Jan-2001 23:59:56 -63.23 170.32 475.29 0 3 3896.105 6097.053 -50760.723
> 01-Jan-2001 23:59:57 -63.29 170.33 475.27 0 3 3856.992 6093.131 -50767.594
> 01-Jan-2001 23:59:58 -63.36 170.34 475.25 0 3 3817.332 6091.279 -50774.145
> 01-Jan-2001 23:59:59 -63.42 170.35 475.23 0 3 3778.741 6094.354 -50779.902
> 02-Jan-2001 00:00:00 -63.48 170.36 475.21 0 3 3740.585 6092.740 -50786.066
> 02-Jan-2001 00:00:01 -63.55 170.37 475.19 0 3 3701.370 6090.758 -50792.227
> 02-Jan-2001 00:00:02 -63.61 170.38 475.17 0 3 3662.072 6089.302 -50798.195
> 02-Jan-2001 00:00:03 -63.67 170.39 475.15 0 3 3622.398 6087.077 -50804.258
> 02-Jan-2001 00:00:04 -63.74 170.41 475.14 0 3 3583.234 6085.149 -50810.035
> 02-Jan-2001 00:00:05 -63.80 170.42 475.12 0 3 3543.696 6086.633 -50815.270
> 02-Jan-2001 00:00:06 -63.86 170.43 475.10 0 3 3504.965 6081.185 -50821.223
> 02-Jan-2001 00:00:07 -63.93 170.44 475.08 0 3 3466.551 6082.347 -50826.211
> 02-Jan-2001 00:00:08 -63.99 170.45 475.06 0 3 3427.262 6078.968 -50831.672
> 02-Jan-2001 00:00:09 -64.05 170.46 475.04 0 3 3388.032 6077.467 -50836.766
> 02-Jan-2001 00:00:10 -64.12 170.47 475.02 0 3 3348.961 6074.810 -50841.891
> 02-Jan-2001 00:00:11 -64.18 170.49 475.00 0 3 3309.816 6072.139 -50846.875
>
> Praveen.

On possible solution:

% Read file
infid=fopen('data.txt','rt');
val=textscan(infid,'%s %s %f %f %f %f %f %f %f %f', 'CollectOutput', true);
fclose(infid);
% transform from cell to matrix
Data=cat(1,val{2});
Date=cat(1,val{1});
% Convert to julian dates
Date=datenum([cat(1,Date{:,1}) cat(1,Date{:,2})],'dd-mmm-yyyyHH:MM:SS');
%Complete dataset
DATA=[Date Data];

Branko

Subject: reading alphanumeric data

From: pathfinder Kunagu

Date: 16 Oct, 2009 08:44:02

Message: 3 of 21

Hi Branko,
Thank you very much.
This code will really help me.
Thanks a lot dear.

Praveen.

"pathfinder Kunagu" <praveen.earth@gmail.com> wrote in message <hb6otr$oq4$1@fred.mathworks.com>...
> Hello Friends,
> I am doing some data processing thing.
> My data (alphanumeric) look like the following..having 10 columns, with typical notations.
>
> Now the task is I want to extract data corresponding to selected dates & to process it.
> I've used 'textread' also but having problem with the second column bcz of those ':' symbol.
>
> Plz let me know how to go ahead with this task.
>
> 01-Jan-2001 23:59:53 -63.04 170.28 475.34 0 3 4014.662 6103.450 -50739.887
> 01-Jan-2001 23:59:54 -63.10 170.30 475.32 0 3 3974.387 6101.893 -50746.957
> 01-Jan-2001 23:59:55 -63.17 170.31 475.31 0 3 3934.606 6098.710 -50754.066
> 01-Jan-2001 23:59:56 -63.23 170.32 475.29 0 3 3896.105 6097.053 -50760.723
> 01-Jan-2001 23:59:57 -63.29 170.33 475.27 0 3 3856.992 6093.131 -50767.594
> 01-Jan-2001 23:59:58 -63.36 170.34 475.25 0 3 3817.332 6091.279 -50774.145
> 01-Jan-2001 23:59:59 -63.42 170.35 475.23 0 3 3778.741 6094.354 -50779.902
> 02-Jan-2001 00:00:00 -63.48 170.36 475.21 0 3 3740.585 6092.740 -50786.066
> 02-Jan-2001 00:00:01 -63.55 170.37 475.19 0 3 3701.370 6090.758 -50792.227
> 02-Jan-2001 00:00:02 -63.61 170.38 475.17 0 3 3662.072 6089.302 -50798.195
> 02-Jan-2001 00:00:03 -63.67 170.39 475.15 0 3 3622.398 6087.077 -50804.258
> 02-Jan-2001 00:00:04 -63.74 170.41 475.14 0 3 3583.234 6085.149 -50810.035
> 02-Jan-2001 00:00:05 -63.80 170.42 475.12 0 3 3543.696 6086.633 -50815.270
> 02-Jan-2001 00:00:06 -63.86 170.43 475.10 0 3 3504.965 6081.185 -50821.223
> 02-Jan-2001 00:00:07 -63.93 170.44 475.08 0 3 3466.551 6082.347 -50826.211
> 02-Jan-2001 00:00:08 -63.99 170.45 475.06 0 3 3427.262 6078.968 -50831.672
> 02-Jan-2001 00:00:09 -64.05 170.46 475.04 0 3 3388.032 6077.467 -50836.766
> 02-Jan-2001 00:00:10 -64.12 170.47 475.02 0 3 3348.961 6074.810 -50841.891
> 02-Jan-2001 00:00:11 -64.18 170.49 475.00 0 3 3309.816 6072.139 -50846.875
>
> Praveen.

Subject: reading alphanumeric data

From: pathfinder Kunagu

Date: 16 Oct, 2009 08:48:01

Message: 4 of 21

Hi Branko,
Thank you very much.
This code will really help me.
Thanks a lot dear.

Praveen.

"pathfinder Kunagu" <praveen.earth@gmail.com> wrote in message <hb6otr$oq4$1@fred.mathworks.com>...
> Hello Friends,
> I am doing some data processing thing.
> My data (alphanumeric) look like the following..having 10 columns, with typical notations.
>
> Now the task is I want to extract data corresponding to selected dates & to process it.
> I've used 'textread' also but having problem with the second column bcz of those ':' symbol.
>
> Plz let me know how to go ahead with this task.
>
> 01-Jan-2001 23:59:53 -63.04 170.28 475.34 0 3 4014.662 6103.450 -50739.887
> 01-Jan-2001 23:59:54 -63.10 170.30 475.32 0 3 3974.387 6101.893 -50746.957
> 01-Jan-2001 23:59:55 -63.17 170.31 475.31 0 3 3934.606 6098.710 -50754.066
> 01-Jan-2001 23:59:56 -63.23 170.32 475.29 0 3 3896.105 6097.053 -50760.723
> 01-Jan-2001 23:59:57 -63.29 170.33 475.27 0 3 3856.992 6093.131 -50767.594
> 01-Jan-2001 23:59:58 -63.36 170.34 475.25 0 3 3817.332 6091.279 -50774.145
> 01-Jan-2001 23:59:59 -63.42 170.35 475.23 0 3 3778.741 6094.354 -50779.902
> 02-Jan-2001 00:00:00 -63.48 170.36 475.21 0 3 3740.585 6092.740 -50786.066
> 02-Jan-2001 00:00:01 -63.55 170.37 475.19 0 3 3701.370 6090.758 -50792.227
> 02-Jan-2001 00:00:02 -63.61 170.38 475.17 0 3 3662.072 6089.302 -50798.195
> 02-Jan-2001 00:00:03 -63.67 170.39 475.15 0 3 3622.398 6087.077 -50804.258
> 02-Jan-2001 00:00:04 -63.74 170.41 475.14 0 3 3583.234 6085.149 -50810.035
> 02-Jan-2001 00:00:05 -63.80 170.42 475.12 0 3 3543.696 6086.633 -50815.270
> 02-Jan-2001 00:00:06 -63.86 170.43 475.10 0 3 3504.965 6081.185 -50821.223
> 02-Jan-2001 00:00:07 -63.93 170.44 475.08 0 3 3466.551 6082.347 -50826.211
> 02-Jan-2001 00:00:08 -63.99 170.45 475.06 0 3 3427.262 6078.968 -50831.672
> 02-Jan-2001 00:00:09 -64.05 170.46 475.04 0 3 3388.032 6077.467 -50836.766
> 02-Jan-2001 00:00:10 -64.12 170.47 475.02 0 3 3348.961 6074.810 -50841.891
> 02-Jan-2001 00:00:11 -64.18 170.49 475.00 0 3 3309.816 6072.139 -50846.875
>
> Praveen.

Subject: reading alphanumeric data

From: burcu

Date: 21 Oct, 2009 13:31:04

Message: 5 of 21

Hi Guys,

I'm stucked in a similar situation. My data is something like:

0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1

It's a huge dataset of 42 columns and thousands of rows. I'm testing the command with 19 rows and 42 columns of dataset. Than i'll use this data at neral network training.
Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array. I want to merge this to a 19x42 matrix. It seems cat command could work on my issue but i couldnt understand it's syntax. Can you please advice how can i solve my issue on that?

Thanks!
Burcu








"pathfinder Kunagu" <praveen.earth@gmail.com> wrote in message <hb9c01$32e$1@fred.mathworks.com>...
> Hi Branko,
> Thank you very much.
> This code will really help me.
> Thanks a lot dear.
>
> Praveen.
>
> "pathfinder Kunagu" <praveen.earth@gmail.com> wrote in message <hb6otr$oq4$1@fred.mathworks.com>...
> > Hello Friends,
> > I am doing some data processing thing.
> > My data (alphanumeric) look like the following..having 10 columns, with typical notations.
> >
> > Now the task is I want to extract data corresponding to selected dates & to process it.
> > I've used 'textread' also but having problem with the second column bcz of those ':' symbol.
> >
> > Plz let me know how to go ahead with this task.
> >
> > 01-Jan-2001 23:59:53 -63.04 170.28 475.34 0 3 4014.662 6103.450 -50739.887
> > 01-Jan-2001 23:59:54 -63.10 170.30 475.32 0 3 3974.387 6101.893 -50746.957
> > 01-Jan-2001 23:59:55 -63.17 170.31 475.31 0 3 3934.606 6098.710 -50754.066
> > 01-Jan-2001 23:59:56 -63.23 170.32 475.29 0 3 3896.105 6097.053 -50760.723
> > 01-Jan-2001 23:59:57 -63.29 170.33 475.27 0 3 3856.992 6093.131 -50767.594
> > 01-Jan-2001 23:59:58 -63.36 170.34 475.25 0 3 3817.332 6091.279 -50774.145
> > 01-Jan-2001 23:59:59 -63.42 170.35 475.23 0 3 3778.741 6094.354 -50779.902
> > 02-Jan-2001 00:00:00 -63.48 170.36 475.21 0 3 3740.585 6092.740 -50786.066
> > 02-Jan-2001 00:00:01 -63.55 170.37 475.19 0 3 3701.370 6090.758 -50792.227
> > 02-Jan-2001 00:00:02 -63.61 170.38 475.17 0 3 3662.072 6089.302 -50798.195
> > 02-Jan-2001 00:00:03 -63.67 170.39 475.15 0 3 3622.398 6087.077 -50804.258
> > 02-Jan-2001 00:00:04 -63.74 170.41 475.14 0 3 3583.234 6085.149 -50810.035
> > 02-Jan-2001 00:00:05 -63.80 170.42 475.12 0 3 3543.696 6086.633 -50815.270
> > 02-Jan-2001 00:00:06 -63.86 170.43 475.10 0 3 3504.965 6081.185 -50821.223
> > 02-Jan-2001 00:00:07 -63.93 170.44 475.08 0 3 3466.551 6082.347 -50826.211
> > 02-Jan-2001 00:00:08 -63.99 170.45 475.06 0 3 3427.262 6078.968 -50831.672
> > 02-Jan-2001 00:00:09 -64.05 170.46 475.04 0 3 3388.032 6077.467 -50836.766
> > 02-Jan-2001 00:00:10 -64.12 170.47 475.02 0 3 3348.961 6074.810 -50841.891
> > 02-Jan-2001 00:00:11 -64.18 170.49 475.00 0 3 3309.816 6072.139 -50846.875
> >
> > Praveen.

Subject: reading alphanumeric data

From: Branko

Date: 22 Oct, 2009 07:59:19

Message: 6 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hbn2eo$p1$1@fred.mathworks.com>...
> Hi Guys,
>
> I'm stucked in a similar situation. My data is something like:
>
> 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
>
> It's a huge dataset of 42 columns and thousands of rows. I'm testing the command with 19 rows and 42 columns of dataset. Than i'll use this data at neral network training.
> Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array. I want to merge this to a 19x42 matrix. It seems cat command could work on my issue but i couldnt understand it's syntax. Can you please advice how can i solve my issue on that?
>
> Thanks!
> Burcu
>

> Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array.

Not necessary, depands how's your data structured and than apply textscan (regexp) to your data. If you provide some additional information(code, data ), we might help you.

Branko

Subject: reading alphanumeric data

From: burcu

Date: 23 Oct, 2009 12:48:05

Message: 7 of 21

Hi Branko,

My dataset is kkd'99 dataset. You can find the details below:
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
I'm using 10% percentages of dataset( kddcup.data_10_percent.gz for training)
I'm trying to open it with classical textscan codes, defining the variable types, %s, %f etc.
But the output only includes the first line. I've also added N, row number to data from the file, using the format but no way.

Thanks for your help
Burcu
 


"Branko " <bogunovic@mbss.org> wrote in message <hbp3cn$n2i$1@fred.mathworks.com>...
> "burcu " <burcu102@hotmail.com> wrote in message <hbn2eo$p1$1@fred.mathworks.com>...
> > Hi Guys,
> >
> > I'm stucked in a similar situation. My data is something like:
> >
> > 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
> >
> > It's a huge dataset of 42 columns and thousands of rows. I'm testing the command with 19 rows and 42 columns of dataset. Than i'll use this data at neral network training.
> > Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array. I want to merge this to a 19x42 matrix. It seems cat command could work on my issue but i couldnt understand it's syntax. Can you please advice how can i solve my issue on that?
> >
> > Thanks!
> > Burcu
> >
>
> > Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array.
>
> Not necessary, depands how's your data structured and than apply textscan (regexp) to your data. If you provide some additional information(code, data ), we might help you.
>
> Branko

Subject: reading alphanumeric data

From: Branko

Date: 23 Oct, 2009 13:11:18

Message: 8 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hbs8m5$86$1@fred.mathworks.com>...
> Hi Branko,
>
> My dataset is kkd'99 dataset. You can find the details below:
> http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
> I'm using 10% percentages of dataset( kddcup.data_10_percent.gz for training)
> I'm trying to open it with classical textscan codes, defining the variable types, %s, %f etc.
> But the output only includes the first line. I've also added N, row number to data from the file, using the format but no way.
>
> Thanks for your help
> Burcu
>
>
>
> "Branko " <bogunovic@mbss.org> wrote in message <hbp3cn$n2i$1@fred.mathworks.com>...
> > "burcu " <burcu102@hotmail.com> wrote in message <hbn2eo$p1$1@fred.mathworks.com>...
> > > Hi Guys,
> > >
> > > I'm stucked in a similar situation. My data is something like:
> > >
> > > 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1
> > >
> > > It's a huge dataset of 42 columns and thousands of rows. I'm testing the command with 19 rows and 42 columns of dataset. Than i'll use this data at neral network training.
> > > Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array. I want to merge this to a 19x42 matrix. It seems cat command could work on my issue but i couldnt understand it's syntax. Can you please advice how can i solve my issue on that?
> > >
> > > Thanks!
> > > Burcu
> > >
> >
> > > Anyway, as i can understand as an output of textscan command i'm always getting 1X42 array.
> >
> > Not necessary, depands how's your data structured and than apply textscan (regexp) to your data. If you provide some additional information(code, data ), we might help you.
> >
> > Branko

It's not clear what information (string & number, only number) to extract from your file(or single array)?

It seems that your file have same structure therefore regexp would be appropriate for extracting data.

 Branko

Subject: reading alphanumeric data

From: burcu

Date: 23 Oct, 2009 13:34:19

Message: 9 of 21

Actually i dont need to extract any specific partition. I need to load this dataset to matlab and use it for neural network training. I've saved it as txt file and i have a command like:

fid=fopen('kddcup.data_10_percent.txt');
trial=textscan(fid, %f,%s...(variable types for 42 colums), 19);
fclose(fid);

I need to use it in such kind of command:

P=trial(1:30, 1:2);
But trial is a 1x42 matrix and i get a dimension error.

I need to get a 19x42 matrix with textscan command.

Burcu

-------------------

>
> It's not clear what information (string & number, only number) to extract from your file(or single array)?
>
> It seems that your file have same structure therefore regexp would be appropriate for extracting data.
>
> Branko

Subject: reading alphanumeric data

From: Branko

Date: 26 Oct, 2009 07:49:01

Message: 10 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hbsbcr$omg$1@fred.mathworks.com>...
> Actually i dont need to extract any specific partition. I need to load this dataset to matlab and use it for neural network training. I've saved it as txt file and i have a command like:
>
> fid=fopen('kddcup.data_10_percent.txt');
> trial=textscan(fid, %f,%s...(variable types for 42 colums), 19);
> fclose(fid);
>
> I need to use it in such kind of command:
>
> P=trial(1:30, 1:2);
> But trial is a 1x42 matrix and i get a dimension error.
>
> I need to get a 19x42 matrix with textscan command.
>
> Burcu
>
> -------------------
>
> >
> > It's not clear what information (string & number, only number) to extract from your file(or single array)?
> >
> > It seems that your file have same structure therefore regexp would be appropriate for extracting data.
> >
> > Branko

Here is one approach to solve your problem.As I mentioned previously you should use regexp function which is useful in cases like this.
 
fid = fopen(filename,'rt');
data=textscan(fid,'%s','delimiter','','headerlines', 0);
fclose(fid);

% Engine - use regexp!
data=cat(1,data{:});
String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters

Numeric_data=cat(1,Numeric_data{:});
String_data=cat(1,String_data{:});
DATA=[String_data(:,1:3) Numeric_data(:,1:end-1) String_data(:,end)];

Branko

Subject: reading alphanumeric data

From: bla bla

Date: 27 Oct, 2009 23:48:19

Message: 11 of 21

extracting data is always time consuming, i usually use python for web
scraping but if i need to extract data in a hurry i use http://www.extractingdata.com,
they make custom web scrapers, document parsers, and general
extracting data programs quickly and cheaply.

Subject: reading alphanumeric data

From: burcu

Date: 28 Oct, 2009 14:28:02

Message: 12 of 21

Hi Branko,
Thank you very much for this code. I have been reading the help files of cat and regexp and tried the code you've provided to me with my dataset.

My dataset includes may rows like this:

0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00

So i have applied your code like this, and take an error:

>> fid=fopen('ked.txt');
>> data=textscan(fid, '%s', 'delimiter',',');
>> fclose(fid);
>> data =cat(1,data{:});
>> string_data=regexp(data,'([A-Z a-z]+)','match');
>> numeric_data=regexp(data, '([0.00-9.99 0-100000]+)','match');
>> numeric_data=cat(1,numeric_data{:});
>> string_data=cat(1,string_data{:});
??? Error using ==> cat
CAT arguments dimensions are not consistent.

Do you have any idea or advice on this? Besides is [0.00-9.99 0-100000] is true regarding to my data type? i have numerical variables like 10027 etc so i also want to give a range like this with your 0.00-9.99

One more thing: i couldnt be sure why you set the format just to %s in your code. Is it the way we are using with regexp function? I used to define my data like:
%u8, %s, %s, %s, %u16, %u16, %u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%f,%f,%f,%f,%f,%f,%f, %u16, %u16,%f,%f,%f,%f,%f,%f,%f,%f

Thanks!
Burcu
----------------------------
>
> Here is one approach to solve your problem.As I mentioned previously you should use regexp function which is useful in cases like this.
>
> fid = fopen(filename,'rt');
> data=textscan(fid,'%s','delimiter','','headerlines', 0);
> fclose(fid);
>
> % Engine - use regexp!
> data=cat(1,data{:});
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
>
> Numeric_data=cat(1,Numeric_data{:});
> String_data=cat(1,String_data{:});
> DATA=[String_data(:,1:3) Numeric_data(:,1:end-1) String_data(:,end)];
>
> Branko

Subject: reading alphanumeric data

From: Branko

Date: 29 Oct, 2009 07:26:01

Message: 13 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hc9kdi$a3m$1@fred.mathworks.com>...
> Hi Branko,
> Thank you very much for this code. I have been reading the help files of cat and regexp and tried the code you've provided to me with my dataset.
>
> My dataset includes may rows like this:
>
> 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00
>
> So i have applied your code like this, and take an error:
>
> >> fid=fopen('ked.txt');
> >> data=textscan(fid, '%s', 'delimiter',',');
> >> fclose(fid);
> >> data =cat(1,data{:});
> >> string_data=regexp(data,'([A-Z a-z]+)','match');
> >> numeric_data=regexp(data, '([0.00-9.99 0-100000]+)','match');
> >> numeric_data=cat(1,numeric_data{:});
> >> string_data=cat(1,string_data{:});
> ??? Error using ==> cat
> CAT arguments dimensions are not consistent.
>
> Do you have any idea or advice on this? Besides is [0.00-9.99 0-100000] is true regarding to my data type? i have numerical variables like 10027 etc so i also want to give a range like this with your 0.00-9.99
>
> One more thing: i couldnt be sure why you set the format just to %s in your code. Is it the way we are using with regexp function? I used to define my data like:
> %u8, %s, %s, %s, %u16, %u16, %u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%f,%f,%f,%f,%f,%f,%f, %u16, %u16,%f,%f,%f,%f,%f,%f,%f,%f
>
> Thanks!
> Burcu
> ----------------------------
> >
> > Here is one approach to solve your problem.As I mentioned previously you should use regexp function which is useful in cases like this.
> >
> > fid = fopen(filename,'rt');
> > data=textscan(fid,'%s','delimiter','','headerlines', 0);
> > fclose(fid);
> >
> > % Engine - use regexp!
> > data=cat(1,data{:});
> > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> >
> > Numeric_data=cat(1,Numeric_data{:});
> > String_data=cat(1,String_data{:});
> > DATA=[String_data(:,1:3) Numeric_data(:,1:end-1) String_data(:,end)];
> >
> > Branko


Burcu,

Example above was done for data that you provide (http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). Here is example.

data={'0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.'
'0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.'
'0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,29,29,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00,normal.'};

% Engine - use regexp!
String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters

Numeric_data=cat(1,Numeric_data{:});
String_data=cat(1,String_data{:});
DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];

I used (%s ) to read alll data as string since regexp can be performed only on strings and not numerics(%f).

Same for above problem (in this case you have 40 columns):
data={'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'};

% Engine - use regexp!
String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters

Numeric_data=cat(1,Numeric_data{:});
String_data=cat(1,String_data{:});
DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];

Try to copy above data in example file and run it(using %s) and should work for you-on my ML is working.

Branko

Subject: reading alphanumeric data

From: burcu

Date: 4 Nov, 2009 10:13:03

Message: 14 of 21

Hi Branko,

I've upgraded my matlab from 2006 to r2009b and tried again your codes and i'm exactly facing with the same error. May i kindly ask you to check my code and comment if i do something wrong?
(Data is the one that has 40 columns, i saved into a notepad)

>> fid=fopen ('data.txt');
>> data=textscan(fid, '%s', 'delimiter',',');
>> fclose(fid);
>> data =cat(1,data{:});
>> String_data=regexp(data,'([A-Z a-z]+)','match');
>> Numeric_data=regexp(data,'([0.00-9.99]+)','match');
>> Numeric_data=cat(1,Numeric_data{:});
>> String_data=cat(1,String_data{:});
??? Error using ==> cat
CAT arguments dimensions are not consistent.

>
>
> Burcu,
>
> Example above was done for data that you provide (http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). Here is example.
>
> data={'0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.'
> '0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.'
> '0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,29,29,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00,normal.'};
>
> % Engine - use regexp!
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
>
> Numeric_data=cat(1,Numeric_data{:});
> String_data=cat(1,String_data{:});
> DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
>
> I used (%s ) to read alll data as string since regexp can be performed only on strings and not numerics(%f).
>
> Same for above problem (in this case you have 40 columns):
> data={'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'};
>
> % Engine - use regexp!
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
>
> Numeric_data=cat(1,Numeric_data{:});
> String_data=cat(1,String_data{:});
> DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
>
> Try to copy above data in example file and run it(using %s) and should work for you-on my ML is working.
>
> Branko

Subject: reading alphanumeric data

From: Branko

Date: 4 Nov, 2009 10:44:01

Message: 15 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hcrk3f$l2p$1@fred.mathworks.com>...
> Hi Branko,
>
> I've upgraded my matlab from 2006 to r2009b and tried again your codes and i'm exactly facing with the same error. May i kindly ask you to check my code and comment if i do something wrong?
> (Data is the one that has 40 columns, i saved into a notepad)
>
> >> fid=fopen ('data.txt');
> >> data=textscan(fid, '%s', 'delimiter',',');
> >> fclose(fid);
> >> data =cat(1,data{:});
> >> String_data=regexp(data,'([A-Z a-z]+)','match');
> >> Numeric_data=regexp(data,'([0.00-9.99]+)','match');
> >> Numeric_data=cat(1,Numeric_data{:});
> >> String_data=cat(1,String_data{:});
> ??? Error using ==> cat
> CAT arguments dimensions are not consistent.
>
> >
> >
> > Burcu,
> >
> > Example above was done for data that you provide (http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). Here is example.
> >
> > data={'0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.'
> > '0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.'
> > '0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,29,29,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00,normal.'};
> >
> > % Engine - use regexp!
> > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> >
> > Numeric_data=cat(1,Numeric_data{:});
> > String_data=cat(1,String_data{:});
> > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
> >
> > I used (%s ) to read alll data as string since regexp can be performed only on strings and not numerics(%f).
> >
> > Same for above problem (in this case you have 40 columns):
> > data={'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'};
> >
> > % Engine - use regexp!
> > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> >
> > Numeric_data=cat(1,Numeric_data{:});
> > String_data=cat(1,String_data{:});
> > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
> >
> > Try to copy above data in example file and run it(using %s) and should work for you-on my ML is working.
> >
> > Branko

Burcu,

Simple reason why, cat error is comming up. When you copy example into ascii file DON'T copy data={ }-which is name of the cell array. I used this just to show that is regexp is working on different length of data.

Branko

Subject: reading alphanumeric data

From: burcu

Date: 4 Nov, 2009 11:15:05

Message: 16 of 21


> > Hi Branko,
> >
> > I've upgraded my matlab from 2006 to r2009b and tried again your codes and i'm exactly facing with the same error. May i kindly ask you to check my code and comment if i do something wrong?
> > (Data is the one that has 40 columns, i saved into a notepad)
> >
> > >> fid=fopen ('data.txt');
> > >> data=textscan(fid, '%s', 'delimiter',',');
> > >> fclose(fid);
> > >> data =cat(1,data{:});
> > >> String_data=regexp(data,'([A-Z a-z]+)','match');
> > >> Numeric_data=regexp(data,'([0.00-9.99]+)','match');
> > >> Numeric_data=cat(1,Numeric_data{:});
> > >> String_data=cat(1,String_data{:});
> > ??? Error using ==> cat
> > CAT arguments dimensions are not consistent.
> >
> > >
> > >

>
> Burcu,
>
> Simple reason why, cat error is comming up. When you copy example into ascii file DON'T copy data={ }-which is name of the cell array. I used this just to show that is regexp is working on different length of data.
>
> Branko

Branko,

Sorry but i couldn't get what you mean with don't copy data={ }.
I've been checking the usage of cat not to ask unnecessary questions and data =cat(1,data{:}); seemd a required step to me. and if i skip this code and enter String_data=regexp(data,'([A-Z a-z]+)','match'); after fclose(fid); i get the error All cells for regexp must be strings.

Thanks for your support
Burcu

Subject: reading alphanumeric data

From: Branko

Date: 4 Nov, 2009 11:39:02

Message: 17 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hcrnnp$2jj$1@fred.mathworks.com>...
>
> > > Hi Branko,
> > >
> > > I've upgraded my matlab from 2006 to r2009b and tried again your codes and i'm exactly facing with the same error. May i kindly ask you to check my code and comment if i do something wrong?
> > > (Data is the one that has 40 columns, i saved into a notepad)
> > >
> > > >> fid=fopen ('data.txt');
> > > >> data=textscan(fid, '%s', 'delimiter',',');
> > > >> fclose(fid);
> > > >> data =cat(1,data{:});
> > > >> String_data=regexp(data,'([A-Z a-z]+)','match');
> > > >> Numeric_data=regexp(data,'([0.00-9.99]+)','match');
> > > >> Numeric_data=cat(1,Numeric_data{:});
> > > >> String_data=cat(1,String_data{:});
> > > ??? Error using ==> cat
> > > CAT arguments dimensions are not consistent.
> > >
> > > >
> > > >
>
> >
> > Burcu,
> >
> > Simple reason why, cat error is comming up. When you copy example into ascii file DON'T copy data={ }-which is name of the cell array. I used this just to show that is regexp is working on different length of data.
> >
> > Branko
>
> Branko,
>
> Sorry but i couldn't get what you mean with don't copy data={ }.
> I've been checking the usage of cat not to ask unnecessary questions and data =cat(1,data{:}); seemd a required step to me. and if i skip this code and enter String_data=regexp(data,'([A-Z a-z]+)','match'); after fclose(fid); i get the error All cells for regexp must be strings.
>
> Thanks for your support
> Burcu

Burcu,

I see you are not getting there. I will go through step by step otherwise will not get through this problem.

1. Copy data bellow to word,notepad or whatever text editor you are using and name it data.txt:

'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'

2. Use bellow code and run it:

fid = fopen('data.txt','rt');
data=textscan(fid,'%s','delimiter');
fclose(fid);
 
% Engine - use regexp!
data=cat(1,data{:});
String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
 
Numeric_data=cat(1,Numeric_data{:});
String_data=cat(1,String_data{:});
DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];

And it should work!

I recommend you to read ML manual (help) about the function used in above example so that you can understand how problem is solved (and not just copy paste).

Branko

Subject: reading alphanumeric data

From: burcu

Date: 4 Nov, 2009 14:31:00

Message: 18 of 21

Thanks for this detailed explanation and your patience. I've checked every little item that you adviced me, help guides etc. but i'm still geting the same result.

String_data=cat(1,String_data{:});
??? Error using ==> cat
CAT arguments dimensions are not consistent.

if i try this as String_data=cat(2,String_data{:}); there is no error but the format is not that i need.
May i ask the version of matlab you're using?
If you dont have any other advice, maybe trying to reconstruct a new matrix from this String_data=cat(2,String_data{:}) output and trying to merge string and numeric data after that would be a solution.

Burcu

>
> I see you are not getting there. I will go through step by step otherwise will not get through this problem.
>
> 1. Copy data bellow to word,notepad or whatever text editor you are using and name it data.txt:
>
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
>
> 2. Use bellow code and run it:
>
> fid = fopen('data.txt','rt');
> data=textscan(fid,'%s','delimiter');
> fclose(fid);
>
> % Engine - use regexp!
> data=cat(1,data{:});
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
>
> Numeric_data=cat(1,Numeric_data{:});
> String_data=cat(1,String_data{:});
> DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
>
> And it should work!
>
> I recommend you to read ML manual (help) about the function used in above example so that you can understand how problem is solved (and not just copy paste).
>
> Branko

Subject: reading alphanumeric data

From: Branko

Date: 5 Nov, 2009 07:38:01

Message: 19 of 21

"burcu " <burcu102@hotmail.com> wrote in message <hcs374$19l$1@fred.mathworks.com>...
> Thanks for this detailed explanation and your patience. I've checked every little item that you adviced me, help guides etc. but i'm still geting the same result.
>
> String_data=cat(1,String_data{:});
> ??? Error using ==> cat
> CAT arguments dimensions are not consistent.
>
> if i try this as String_data=cat(2,String_data{:}); there is no error but the format is not that i need.
> May i ask the version of matlab you're using?
> If you dont have any other advice, maybe trying to reconstruct a new matrix from this String_data=cat(2,String_data{:}) output and trying to merge string and numeric data after that would be a solution.
>
> Burcu
>
> >
> > I see you are not getting there. I will go through step by step otherwise will not get through this problem.
> >
> > 1. Copy data bellow to word,notepad or whatever text editor you are using and name it data.txt:
> >
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> >
> > 2. Use bellow code and run it:
> >
> > fid = fopen('data.txt','rt');
> > data=textscan(fid,'%s','delimiter');
> > fclose(fid);
> >
> > % Engine - use regexp!
> > data=cat(1,data{:});
> > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> >
> > Numeric_data=cat(1,Numeric_data{:});
> > String_data=cat(1,String_data{:});
> > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
> >
> > And it should work!
> >
> > I recommend you to read ML manual (help) about the function used in above example so that you can understand how problem is solved (and not just copy paste).
> >
> > Branko

Burcu,

Again and again you are repeating same mistake. Error is occuring due to data that you are pasting into file:
data={'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'};

and error message(same as you are showing):
??? Error using ==> cat
CAT arguments dimensions are not consistent.
String_data=cat(1,String_data{:});

BUT if you were pasting:
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'

& run above code will work!

If you were doing as I wrote above it should work. It's working on ML 2007a & 2008a.

Branko

Subject: reading alphanumeric data

From: burcu

Date: 12 Nov, 2009 11:25:03

Message: 20 of 21

Hi Branko,

I've been trying to find a way to fix this issue and tried several variations of the data set. I guess i finally found the problem.
When i use exactly the same data lines that you use, it works perfectly and DATA is exactly the thing i want to find.
But for example at the 3rd column where the string private, we also have http, smtp etc strings at different rows of the dataset.
And at the training dataset(the ones that has labels at the end of the line) you may remember we have normal, smurf, satan etc. attack names.
So the problem is that, if the number of characters in a string is different in two lines, the code gives an error message. If they are exactly same it works perfectly.
ps: i used that ' things at the begining and end of the lines:)

Any suggestion on that?
Thanks!
Burcu


"Branko " <bogunovic@mbss.org> wrote in message <hctvcp$ldp$1@fred.mathworks.com>...
> "burcu " <burcu102@hotmail.com> wrote in message <hcs374$19l$1@fred.mathworks.com>...
> > Thanks for this detailed explanation and your patience. I've checked every little item that you adviced me, help guides etc. but i'm still geting the same result.
> >
> > String_data=cat(1,String_data{:});
> > ??? Error using ==> cat
> > CAT arguments dimensions are not consistent.
> >
> > if i try this as String_data=cat(2,String_data{:}); there is no error but the format is not that i need.
> > May i ask the version of matlab you're using?
> > If you dont have any other advice, maybe trying to reconstruct a new matrix from this String_data=cat(2,String_data{:}) output and trying to merge string and numeric data after that would be a solution.
> >
> > Burcu
> >
> > >
> > > I see you are not getting there. I will go through step by step otherwise will not get through this problem.
> > >
> > > 1. Copy data bellow to word,notepad or whatever text editor you are using and name it data.txt:
> > >
> > > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > >
> > > 2. Use bellow code and run it:
> > >
> > > fid = fopen('data.txt','rt');
> > > data=textscan(fid,'%s','delimiter');
> > > fclose(fid);
> > >
> > > % Engine - use regexp!
> > > data=cat(1,data{:});
> > > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> > >
> > > Numeric_data=cat(1,Numeric_data{:});
> > > String_data=cat(1,String_data{:});
> > > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
> > >
> > > And it should work!
> > >
> > > I recommend you to read ML manual (help) about the function used in above example so that you can understand how problem is solved (and not just copy paste).
> > >
> > > Branko
>
> Burcu,
>
> Again and again you are repeating same mistake. Error is occuring due to data that you are pasting into file:
> data={'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'};
>
> and error message(same as you are showing):
> ??? Error using ==> cat
> CAT arguments dimensions are not consistent.
> String_data=cat(1,String_data{:});
>
> BUT if you were pasting:
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
>
> & run above code will work!
>
> If you were doing as I wrote above it should work. It's working on ML 2007a & 2008a.
>
> Branko

Subject: reading alphanumeric data

From: Hanah

Date: 2 Sep, 2010 12:35:19

Message: 21 of 21

Hi Branco,

I was led here by my search to resolve a problem I have with reading data into MATLAB; I written a couple of routines and have tried some of the suggestions you have on her but none seems to pan out.

I have a cv file, testa, with 46 (sans header) rows x 25 columns of alphanumeric data:
http://hotfile.com/dl/66495316/438d04c/testa.csv.html

My goal is I to extract data (rows) corresponding to selected dates, times, expirations, strikes, etc & process them. My hence, my end result should be a 46 x 25 matrix of data for which I can select rows given a criteria (e.g. date) and execute operations on numeric data.

Here's what I have written so far:
...
fid = fopen('testa.csv','rt');
C = textscan(fid,'%s','delimiter','[]','headerlines',1)
fclose(fid);

data = cat(1, C{:});

string_data = regexp(data,'([A-Z a-z]+)','match');
string_data = cat(1,string_data{:});
numeric_data = regexp(data,'([0.00-9.99]+)','match');
numeric_data = cat(1,numeric_data{:});
...

However, my string_data matrix has a size of 46 x 4, instead of 46 x 3 (2nd row is blank). Also, my numeric_data matrix is 46 x 28 instead of 46 x 22 (the date fields get split further for some weird reason). I'm not sure what I'm doing wrong here and would appreciate a sanity check. Can you please take a look?

What I'm basically trying to do is load the data into MATLAB, and select all rows corresponding to different column parameters, and process the data withing those rows, etc. The end result should be a 46 x 25 operable matrix as close to the original data format as possible.

Thanks a lot for the help!

Han

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us