Thread Subject: matrix dimensions problem with textscan

Subject: matrix dimensions problem with textscan

From: burcu

Date: 23 Oct, 2009 16:54:04

Message: 1 of 8

I'm trying to read a file that includes strings and numerical data with textscan command. After importing this data, i'll use it for neural network training.
My dataset is kdd'99 dataset ( could be seen here: http://mlr.cs.umass.edu/ml/databases/kddcup99/kddcup99.html will use 10% ones) and looks like:

0,tcp,smtp,SF,829,327,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,8,113,0.88,0.25,0.12,0.02,0.00,0.00,0.00,0.00

Data dimensions are very large, so want to make a trial with first 19 rows of the dataset and used this commands:

fid = fopen('ked.txt');

hadi=textscan(fid, '%u8, %s, %s, %s, %u16, %u16, %u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%f,%f,%f,%f,%f,%f,%f, %u16, %u16,%f,%f,%f,%f,%f,%f,%f,%f',19);

fclose(fid);

Then let's say i need to set the first 3 columns of first 15 lines as training data, so i use this command:

 P= hadi(1:15, 1:3);
and i get this error:
??? Index exceeds matrix dimensions.

hadi is 1x42 matrix, and i cant fix it to be 19x42. My matlab version is R2006a.

PS: i've also tried textscantool but i've also faced with different errors, couldnt even make it work.

All the suggestions are welcome!

Subject: matrix dimensions problem with textscan

From: Vadim Teverovsky

Date: 23 Oct, 2009 17:32:16

Message: 2 of 8

The link you pointed to has lines like this:
0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
That looks like there is a string at the end. You need to add that to your
format. Also, I would take out the literal commas from your format, and use
a "delimiter', ',' instead.It would look like:hadi=textscan(fid, '%u8 %s %s
%s %u16 %u16 %u8 %u8 %u8
%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%f%f%f%f%f%f%f
%u16%u16%f%f%f%f%f%f%f%s', 19, 'delimiter', ',')You can also set the
'ReturnOnError' parameter, so that you can see where the mismatch is
occuring. See the help for textscan.
"burcu " <burcu102@hotmail.com> wrote in message
news:hbsn3c$3k2$1@fred.mathworks.com...
> I'm trying to read a file that includes strings and numerical data with
> textscan command. After importing this data, i'll use it for neural
> network training.
> My dataset is kdd'99 dataset ( could be seen here:
> http://mlr.cs.umass.edu/ml/databases/kddcup99/kddcup99.html will use 10%
> ones) and looks like:
>
> 0,tcp,smtp,SF,829,327,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,8,113,0.88,0.25,0.12,0.02,0.00,0.00,0.00,0.00
>
> Data dimensions are very large, so want to make a trial with first 19 rows
> of the dataset and used this commands:
>
> fid = fopen('ked.txt');
>
> hadi=textscan(fid, '%u8, %s, %s, %s, %u16, %u16,
> %u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%f,%f,%f,%f,%f,%f,%f,
> %u16, %u16,%f,%f,%f,%f,%f,%f,%f,%f',19);
>
> fclose(fid);
>
> Then let's say i need to set the first 3 columns of first 15 lines as
> training data, so i use this command:
>
> P= hadi(1:15, 1:3);
> and i get this error:
> ??? Index exceeds matrix dimensions.
>
> hadi is 1x42 matrix, and i cant fix it to be 19x42. My matlab version is
> R2006a.
>
> PS: i've also tried textscantool but i've also faced with different
> errors, couldnt even make it work.
>
> All the suggestions are welcome!

Subject: matrix dimensions problem with textscan

From: burcu

Date: 23 Oct, 2009 17:47:18

Message: 3 of 8

Hi Vadim,

Tried your advices, no differance. Also i've tried to set returnonerror parameter but the error message is the same, matrix exceed max dimensions.
I've read help of textscan but couldn't find any helpful info:(

Burcu

"Vadim Teverovsky" <vteverov@mathworks.com> wrote in message <hbspb0$1mr$1@fred.mathworks.com>...
> The link you pointed to has lines like this:
> 0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
> 0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
> That looks like there is a string at the end. You need to add that to your
> format. Also, I would take out the literal commas from your format, and use
> a "delimiter', ',' instead.It would look like:hadi=textscan(fid, '%u8 %s %s
> %s %u16 %u16 %u8 %u8 %u8
> %u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%f%f%f%f%f%f%f
> %u16%u16%f%f%f%f%f%f%f%s', 19, 'delimiter', ',')You can also set the
> 'ReturnOnError' parameter, so that you can see where the mismatch is
> occuring. See the help for textscan.

Subject: matrix dimensions problem with textscan

From: jrenfree

Date: 23 Oct, 2009 17:50:56

Message: 4 of 8

On Oct 23, 10:32 am, "Vadim Teverovsky" <vteve...@mathworks.com>
wrote:
> The link you pointed to has lines like this:
> 0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.
> 0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.
> That looks like there is a string at the end.  You need to add that to your
> format.  Also, I would take out the literal commas from your format, and use
> a "delimiter', ',' instead.It would look like:hadi=textscan(fid, '%u8 %s %s
> %s %u16 %u16 %u8 %u8 %u8
> %u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%u8%f%f%f%f%f%f%f
> %u16%u16%f%f%f%f%f%f%f%s', 19, 'delimiter', ',')You can also set the
> 'ReturnOnError' parameter, so that you can see where the mismatch is
> occuring.  See the help for textscan."burcu " <burcu...@hotmail.com> wrote in message
>
> news:hbsn3c$3k2$1@fred.mathworks.com...
>
> > I'm trying to read a file that includes strings and numerical data with
> > textscan command. After importing this data, i'll use it for neural
> > network training.
> > My dataset is kdd'99 dataset ( could be seen here:
> >http://mlr.cs.umass.edu/ml/databases/kddcup99/kddcup99.htmlwill use 10%
> > ones) and looks like:
>
> > 0,tcp,smtp,SF,829,327,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,8,113,0.88,0.25,0.12,0.02,0.00,0.00,0.00,0.00
>
> > Data dimensions are very large, so want to make a trial with first 19 rows
> > of the dataset and used this commands:
>
> > fid = fopen('ked.txt');
>
> > hadi=textscan(fid, '%u8, %s, %s, %s, %u16, %u16,
> > %u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%u8,%f,%f,%f,%f,%f,%f,%f,
> > %u16, %u16,%f,%f,%f,%f,%f,%f,%f,%f',19);
>
> > fclose(fid);
>
> > Then let's say i need to set the first 3 columns of first 15 lines as
> > training data, so i use this command:
>
> > P= hadi(1:15, 1:3);
> > and i get this error:
> > ??? Index exceeds matrix dimensions.
>
> > hadi is 1x42 matrix, and i cant fix it to be 19x42. My matlab version is
> > R2006a.
>
> > PS: i've also tried textscantool but i've also faced with different
> > errors, couldnt even make it work.
>
> > All the suggestions are welcome!

Vadim might be right, in that your format string might not match up
exactly with how the data is formatted. If hadi is only a 1x42
vector, then you aren't reading it in correctly.

You might need to specify the delimiter as well as if there is a
HeaderLine in the txt file.

Subject: matrix dimensions problem with textscan

From: burcu

Date: 23 Oct, 2009 18:02:19

Message: 5 of 8

This data example that i saw includes 41 columns, you're right. But this is a example from test data.
The one has 1 additional column of string is training data, indicates if this connection is normal or an attack.
Anyway, there is no change if i try with 41 or 42 with a %s at the end.
There is no headerline in my txt file.

Thanks!
Burcu

------------------------------------------------------
>
> Vadim might be right, in that your format string might not match up
> exactly with how the data is formatted. If hadi is only a 1x42
> vector, then you aren't reading it in correctly.
>
> You might need to specify the delimiter as well as if there is a
> HeaderLine in the txt file.

Subject: matrix dimensions problem with textscan

From: jrenfree

Date: 23 Oct, 2009 18:27:33

Message: 6 of 8

On Oct 23, 11:02 am, "burcu " <burcu...@hotmail.com> wrote:
> This data example that i saw includes 41 columns, you're right. But this is a example from test data.
> The one has 1 additional column of string is training data, indicates if this connection is normal or an attack.
> Anyway, there is no change if i try with 41 or 42 with a %s at the end.
> There is no headerline in my txt file.
>
> Thanks!
> Burcu
>
> ------------------------------------------------------
>
>
>
> > Vadim might be right, in that your format string might not match up
> > exactly with how the data is formatted.  If hadi is only a 1x42
> > vector, then you aren't reading it in correctly.
>
> > You might need to specify the delimiter as well as if there is a
> > HeaderLine in the txt file.

double check your hadi variable to see what exactly it contains. When
I try to load the data, the result I get is a 1x42 cell array.
Textscan returns the results as cell arrays since the variables in the
file are both numeric and string. Is the result you're getting a 1x42
cell array? Cause I'm able to load the data just fine. Make sure
you're using this as your formatting string:

format = ['%f %s %s %s ' repmat('%f ',1,37) '%s'];

And then to read in the data:

hadi = textscan(fid, format, 20, 'Delimiter', ',', 'Headerlines', 0);

Subject: matrix dimensions problem with textscan

From: burcu

Date: 23 Oct, 2009 18:35:26

Message: 7 of 8

Yes, i'm exactly getting 1x42 cell array but i need to get Nx42 where n is my row number.
Then i need to define first x columns of first y rows as training data, and remaining as target data etc. So i need nx42 matrix.


>
> double check your hadi variable to see what exactly it contains. When
> I try to load the data, the result I get is a 1x42 cell array.
> Textscan returns the results as cell arrays since the variables in the
> file are both numeric and string. Is the result you're getting a 1x42
> cell array? Cause I'm able to load the data just fine. Make sure
> you're using this as your formatting string:
>
> format = ['%f %s %s %s ' repmat('%f ',1,37) '%s'];
>
> And then to read in the data:
>
> hadi = textscan(fid, format, 20, 'Delimiter', ',', 'Headerlines', 0);

Subject: matrix dimensions problem with textscan

From: burcu

Date: 24 Oct, 2009 10:18:01

Message: 8 of 8

As i can understand from all help notes, this is a normal behavior of textscan to create 1xn cell array. Let me change my question in this way:

How can i convert this output to my exact row x column value?



"burcu " <burcu102@hotmail.com> wrote in message <hbst1e$2n$1@fred.mathworks.com>...
> Yes, i'm exactly getting 1x42 cell array but i need to get Nx42 where n is my row number.
> Then i need to define first x columns of first y rows as training data, and remaining as target data etc. So i need nx42 matrix.
>
>
> >
> > double check your hadi variable to see what exactly it contains. When
> > I try to load the data, the result I get is a 1x42 cell array.
> > Textscan returns the results as cell arrays since the variables in the
> > file are both numeric and string. Is the result you're getting a 1x42
> > cell array? Cause I'm able to load the data just fine. Make sure
> > you're using this as your formatting string:
> >
> > format = ['%f %s %s %s ' repmat('%f ',1,37) '%s'];
> >
> > And then to read in the data:
> >
> > hadi = textscan(fid, format, 20, 'Delimiter', ',', 'Headerlines', 0);

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
textscan burcu 23 Oct, 2009 12:54:05
rssFeed for this Thread

Contact us at files@mathworks.com