Thread Subject: need help to find the error on matlab textscan code

Subject: need help to find the error on matlab textscan code

From: burcu

Date: 14 Nov, 2009 08:38:01

Message: 1 of 9

Dear all,

I'm working on a alphanumeric data set. I have a code you can find below and i need help on finding the problem on it. My data is something like this:

'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > fid = fopen('data.txt','rt');
> > data=textscan(fid,'%s','delimiter');
> > fclose(fid);
> >
> > % Engine - use regexp!
> > data=cat(1,data{:});
> > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> >
> > Numeric_data=cat(1,Numeric_data{:});
> > String_data=cat(1,String_data{:});
> > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];

The problem is that, when i use exactly the same data lines like above , it works perfectly and DATA is exactly the thing i want to find.
But for example at the 3rd column where the string private, we also have http, smtp etc strings at different rows of the dataset..
So the problem is that, if the number of characters in a string is different in two lines, the code gives an error message . If they are exactly same it works perfectly.
Error occurs after this code and like this:
 > > String_data=cat(1,String_data{:});
> > ??? Error using ==> cat
> > CAT arguments dimensions are not consistent.
ps: thanks to Branko for all his help on creating this code.

Subject: need help to find the error on matlab textscan code

From: Jan Simon

Date: 14 Nov, 2009 22:44:01

Message: 2 of 9

Dear burcu!

> I'm working on a alphanumeric data set. I have a code you can find below and i need help on finding the problem on it. My data is something like this:
>
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > > fid = fopen('data.txt','rt');
> > > data=textscan(fid,'%s','delimiter');
> > > fclose(fid);
> > >
> > > % Engine - use regexp!
> > > data=cat(1,data{:});
> > > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> > >
> > > Numeric_data=cat(1,Numeric_data{:});
> > > String_data=cat(1,String_data{:});
> > > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
>
> The problem is that, when i use exactly the same data lines like above , it works perfectly and DATA is exactly the thing i want to find.
> But for example at the 3rd column where the string private, we also have http, smtp etc strings at different rows of the dataset..
> So the problem is that, if the number of characters in a string is different in two lines, the code gives an error message . If they are exactly same it works perfectly.
> Error occurs after this code and like this:
> > > String_data=cat(1,String_data{:});
> > > ??? Error using ==> cat
> > > CAT arguments dimensions are not consistent.

Strings are CHAR vectors and concatenating of vectors with not matching dimensions fails. What do you want String_data to look like?
If a cell strings is desired, be happy with String_data and omit the CAT.
If you want a CHAR array with inserting some spaces on the right:
  String_data = char(String_data);

> ps: thanks to Branko for all his help on creating this code.

Kind regards, also to Branko, Jan

Subject: need help to find the error on matlab textscan code

From: burcu

Date: 15 Nov, 2009 15:58:01

Message: 3 of 9

Hi Jan,

Thank you very much for reply. My dataset includes 500 thousands of rows and 41 columns. I need to import this dataset to matlab for further processing. The problem is that textscan function returns a cell array and i need a matrix. The code supposed to provide me an matrix output with integrating character and numbers in a row. So i need this cat function.
The problem occurs when there are some characters that includes more or less characters in a string.
For example if all rows of my dataset includes http, no error. But if a row includes private in the column of http i get this error. http is 4 characters and private is 7 characters. That's my problem. Maybe i need to add one more command line or something in the code but i can't find a way.

Thanks
Burcu

"Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de> wrote in message <hdnbrh$ps5$1@fred.mathworks.com>...
> Dear burcu!
>
> > I'm working on a alphanumeric data set. I have a code you can find below and i need help on finding the problem on it. My data is something like this:
> >
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> > > > fid = fopen('data.txt','rt');
> > > > data=textscan(fid,'%s','delimiter');
> > > > fclose(fid);
> > > >
> > > > % Engine - use regexp!
> > > > data=cat(1,data{:});
> > > > String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> > > > Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters
> > > >
> > > > Numeric_data=cat(1,Numeric_data{:});
> > > > String_data=cat(1,String_data{:});
> > > > DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
> >
> > The problem is that, when i use exactly the same data lines like above , it works perfectly and DATA is exactly the thing i want to find.
> > But for example at the 3rd column where the string private, we also have http, smtp etc strings at different rows of the dataset..
> > So the problem is that, if the number of characters in a string is different in two lines, the code gives an error message . If they are exactly same it works perfectly.
> > Error occurs after this code and like this:
> > > > String_data=cat(1,String_data{:});
> > > > ??? Error using ==> cat
> > > > CAT arguments dimensions are not consistent.
>
> Strings are CHAR vectors and concatenating of vectors with not matching dimensions fails. What do you want String_data to look like?
> If a cell strings is desired, be happy with String_data and omit the CAT.
> If you want a CHAR array with inserting some spaces on the right:
> String_data = char(String_data);
>
> > ps: thanks to Branko for all his help on creating this code.
>
> Kind regards, also to Branko, Jan

Subject: need help to find the error on matlab textscan code

From: Jan Simon

Date: 18 Nov, 2009 11:38:03

Message: 4 of 9

Dear burcu!

> So i need this cat function.
> The problem occurs when there are some characters that includes more or less characters in a string.
> For example if all rows of my dataset includes http, no error. But if a row includes private in the column of http i get this error. http is 4 characters and private is 7 characters. That's my problem. Maybe i need to add one more command line or something in the code but i can't find a way.

Did you read my answer?
Instead of CAT, use CHAR to convert your cell string to a CHAR array. Then the right side is padded with spaces such that all rows have the same number of elements.

Kind regards, Jan

Subject: need help to find the error on matlab textscan code

From: burcu

Date: 18 Nov, 2009 12:52:04

Message: 5 of 9

Hi Jan,

Thank you very much for your response. I've checked the help files of char command and it seems to be the command what i want.
I've tried to use it in my command.
Here are the results:
When i use it just after fclose(fid) command, it says cell elements must be character arrays. So i skip it using here.
Next i've tried to use it in stead of cat on the line:
String_data=cat(1, String_data{:}) (tried trial=char(String_data) instead of this) and it says cell elements must be strings.
As my understanding i've collected all the string data with regexp command from my dataset it was supposed to work on my logic.
I'll be very appreciated on any advice.
Regards..
Burcu


"Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de> wrote in message <he0mar$dt2$1@fred.mathworks.com>...
> Dear burcu!
>
> > So i need this cat function.
> > The problem occurs when there are some characters that includes more or less characters in a string.
> > For example if all rows of my dataset includes http, no error. But if a row includes private in the column of http i get this error. http is 4 characters and private is 7 characters. That's my problem. Maybe i need to add one more command line or something in the code but i can't find a way.
>
> Did you read my answer?
> Instead of CAT, use CHAR to convert your cell string to a CHAR array. Then the right side is padded with spaces such that all rows have the same number of elements.
>
> Kind regards, Jan

Subject: need help to find the error on matlab textscan code

From: Branko

Date: 18 Nov, 2009 13:51:19

Message: 6 of 9

"burcu " <burcu102@hotmail.com> wrote in message <he0qlk$raj$1@fred.mathworks.com>...
> Hi Jan,
>
> Thank you very much for your response. I've checked the help files of char command and it seems to be the command what i want.
> I've tried to use it in my command.
> Here are the results:
> When i use it just after fclose(fid) command, it says cell elements must be character arrays. So i skip it using here.
> Next i've tried to use it in stead of cat on the line:
> String_data=cat(1, String_data{:}) (tried trial=char(String_data) instead of this) and it says cell elements must be strings.
> As my understanding i've collected all the string data with regexp command from my dataset it was supposed to work on my logic.
> I'll be very appreciated on any advice.
> Regards..
> Burcu
>
>
> "Jan Simon" <matlab.THIS_YEAR@nMINUSsimon.de> wrote in message <he0mar$dt2$1@fred.mathworks.com>...
> > Dear burcu!
> >
> > > So i need this cat function.
> > > The problem occurs when there are some characters that includes more or less characters in a string.
> > > For example if all rows of my dataset includes http, no error. But if a row includes private in the column of http i get this error. http is 4 characters and private is 7 characters. That's my problem. Maybe i need to add one more command line or something in the code but i can't find a way.
> >
> > Did you read my answer?
> > Instead of CAT, use CHAR to convert your cell string to a CHAR array. Then the right side is padded with spaces such that all rows have the same number of elements.
> >
> > Kind regards, Jan

Burcu,

I still don't understand what do you want. If you specify clearly what do you want maybe we could help you. If your data have fixed number of column (41) then use proccedure that was shown to you (at least three times) & this will work, regardless how many char you have in array (http, private).

However if your data set don't have 41 columns-is varying depands on data available, then you little bit of modificatio that Jan showed you (use CHAR instead of CAT) and apply regexp.

Branko

Subject: need help to find the error on matlab textscan code

From: burcu

Date: 18 Nov, 2009 14:20:17

Message: 7 of 9

Hi again Bronko,

The column number is not changing, it'same for every line. But the number of alphabetic characters on a column may change.
For example on the 3rd column of the dataset, i have different strings per line like:
http, private, smtp, ftp_data....
If every string on the character column includes same number of alphabetic characters, the code works perfectly and i get what i want.
To be more clear http= 4 character, private= 7 character, ftp_data= 8 character etc. It's really very weird that i'm getting this error, your code seems perfectly fine.
Hope i can be clear and thanks for all your supports.
Burcu
>
> Burcu,
>
> I still don't understand what do you want. If you specify clearly what do you want maybe we could help you. If your data have fixed number of column (41) then use proccedure that was shown to you (at least three times) & this will work, regardless how many char you have in array (http, private).
>
> However if your data set don't have 41 columns-is varying depands on data available, then you little bit of modificatio that Jan showed you (use CHAR instead of CAT) and apply regexp.
>
> Branko

Subject: need help to find the error on matlab textscan code

From: Branko

Date: 18 Nov, 2009 14:50:20

Message: 8 of 9

"burcu " <burcu102@hotmail.com> wrote in message <he0vr1$ht1$1@fred.mathworks.com>...
> Hi again Bronko,
>
> The column number is not changing, it'same for every line. But the number of alphabetic characters on a column may change.
> For example on the 3rd column of the dataset, i have different strings per line like:
> http, private, smtp, ftp_data....
> If every string on the character column includes same number of alphabetic characters, the code works perfectly and i get what i want.
> To be more clear http= 4 character, private= 7 character, ftp_data= 8 character etc. It's really very weird that i'm getting this error, your code seems perfectly fine.
> Hope i can be clear and thanks for all your supports.
> Burcu
> >
> > Burcu,
> >
> > I still don't understand what do you want. If you specify clearly what do you want maybe we could help you. If your data have fixed number of column (41) then use proccedure that was shown to you (at least three times) & this will work, regardless how many char you have in array (http, private).
> >
> > However if your data set don't have 41 columns-is varying depands on data available, then you little bit of modificatio that Jan showed you (use CHAR instead of CAT) and apply regexp.
> >
> > Branko

Burcu,

Well finally you showed where problems is hiden. In char (ftp_data) therefore you should change prevoius line of the code:
String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
In this example regexp is treating underscore(_) as spliting factor, and results is two char which is not correct:
'ftp' 'data'

Therefore you should use this:
String_data=regexp(data,'([A-Z a-z _]+)','match'); % Remove all numeric
and you will get correct reding:
'ftp_data'.

Branko

Subject: need help to find the error on matlab textscan code

From: burcu

Date: 18 Nov, 2009 15:09:04

Message: 9 of 9

Yes! That' s it! Evil underscore!
Branko you saved my life:) It works as i want now.

Thanks for all your help!
Burcu

"Branko " <bogunovic@mbss.org> wrote in message <he11jc$9m1$1@fred.mathworks.com>...
> "burcu " <burcu102@hotmail.com> wrote in message <he0vr1$ht1$1@fred.mathworks.com>...
> > Hi again Bronko,
> >
> > The column number is not changing, it'same for every line. But the number of alphabetic characters on a column may change.
> > For example on the 3rd column of the dataset, i have different strings per line like:
> > http, private, smtp, ftp_data....
> > If every string on the character column includes same number of alphabetic characters, the code works perfectly and i get what i want.
> > To be more clear http= 4 character, private= 7 character, ftp_data= 8 character etc. It's really very weird that i'm getting this error, your code seems perfectly fine.
> > Hope i can be clear and thanks for all your supports.
> > Burcu
> > >
> > > Burcu,
> > >
> > > I still don't understand what do you want. If you specify clearly what do you want maybe we could help you. If your data have fixed number of column (41) then use proccedure that was shown to you (at least three times) & this will work, regardless how many char you have in array (http, private).
> > >
> > > However if your data set don't have 41 columns-is varying depands on data available, then you little bit of modificatio that Jan showed you (use CHAR instead of CAT) and apply regexp.
> > >
> > > Branko
>
> Burcu,
>
> Well finally you showed where problems is hiden. In char (ftp_data) therefore you should change prevoius line of the code:
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> In this example regexp is treating underscore(_) as spliting factor, and results is two char which is not correct:
> 'ftp' 'data'
>
> Therefore you should use this:
> String_data=regexp(data,'([A-Z a-z _]+)','match'); % Remove all numeric
> and you will get correct reding:
> 'ftp_data'.
>
> Branko

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
textscan burcu 14 Nov, 2009 03:39:03
cat burcu 14 Nov, 2009 03:39:03
rssFeed for this Thread

Contact us at files@mathworks.com