Path: news.mathworks.com!not-for-mail
From: "burcu " <burcu102@hotmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: reading alphanumeric data
Date: Wed, 4 Nov 2009 10:13:03 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 52
Message-ID: <hcrk3f$l2p$1@fred.mathworks.com>
References: <hb6otr$oq4$1@fred.mathworks.com> <hb9c01$32e$1@fred.mathworks.com> <hbn2eo$p1$1@fred.mathworks.com> <hbp3cn$n2i$1@fred.mathworks.com> <hbs8m5$86$1@fred.mathworks.com> <hbsa1m$rbd$1@fred.mathworks.com> <hbsbcr$omg$1@fred.mathworks.com> <hc3k9d$6ha$1@fred.mathworks.com> <hc9kdi$a3m$1@fred.mathworks.com> <hcbg29$3td$1@fred.mathworks.com>
Reply-To: "burcu " <burcu102@hotmail.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1257329583 21593 172.30.248.38 (4 Nov 2009 10:13:03 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 4 Nov 2009 10:13:03 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1870888
Xref: news.mathworks.com comp.soft-sys.matlab:582312


Hi Branko,

I've upgraded my matlab from 2006 to r2009b and tried again your codes and i'm exactly facing with the same error. May i kindly ask you to check my code and comment if i do something wrong?
(Data is the one that has 40 columns, i saved into a notepad)

>> fid=fopen ('data.txt');
>> data=textscan(fid, '%s', 'delimiter',',');
>> fclose(fid);
>> data =cat(1,data{:});
>> String_data=regexp(data,'([A-Z a-z]+)','match');
>> Numeric_data=regexp(data,'([0.00-9.99]+)','match');
>> Numeric_data=cat(1,Numeric_data{:});
>> String_data=cat(1,String_data{:});
??? Error using ==> cat
CAT arguments dimensions are not consistent.

> 
> 
> Burcu,
> 
> Example above was done for data that you provide (http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html). Here is example. 
> 
> data={'0,tcp,http,SF,181,5450,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,9,9,1.00,0.00,0.11,0.00,0.00,0.00,0.00,0.00,normal.'
> '0,tcp,http,SF,239,486,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,19,19,1.00,0.00,0.05,0.00,0.00,0.00,0.00,0.00,normal.'
> '0,tcp,http,SF,235,1337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,8,8,0.00,0.00,0.00,0.00,1.00,0.00,0.00,29,29,1.00,0.00,0.03,0.00,0.00,0.00,0.00,0.00,normal.'};
> 
> % Engine - use regexp!
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters 
> 
> Numeric_data=cat(1,Numeric_data{:});
> String_data=cat(1,String_data{:});
> DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];
> 
> I used (%s ) to read alll data as string since regexp can be performed only on strings and not numerics(%f).
> 
> Same for above problem (in this case you have 40 columns):
> data={'0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'
> '0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00'};
> 
> % Engine - use regexp!
> String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
> Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters 
> 
> Numeric_data=cat(1,Numeric_data{:});
> String_data=cat(1,String_data{:});
> DATA=[String_data(:,1:end-1) Numeric_data(:,1:end-1) String_data(:,end)];  
> 
> Try to copy above data in example file and run it(using %s) and should work for you-on my ML is working. 
> 
> Branko