Path: news.mathworks.com!not-for-mail
From: "Branko " <bogunovic@mbss.org>
Newsgroups: comp.soft-sys.matlab
Subject: reading alphanumeric data
Date: Mon, 26 Oct 2009 07:49:01 +0000 (UTC)
Organization: National Institute of Biology
Lines: 41
Message-ID: <hc3k9d$6ha$1@fred.mathworks.com>
References: <hb6otr$oq4$1@fred.mathworks.com> <hb9c01$32e$1@fred.mathworks.com> <hbn2eo$p1$1@fred.mathworks.com> <hbp3cn$n2i$1@fred.mathworks.com> <hbs8m5$86$1@fred.mathworks.com> <hbsa1m$rbd$1@fred.mathworks.com> <hbsbcr$omg$1@fred.mathworks.com>
Reply-To: "Branko " <bogunovic@mbss.org>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1256543341 6698 172.30.248.37 (26 Oct 2009 07:49:01 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 26 Oct 2009 07:49:01 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 237386
Xref: news.mathworks.com comp.soft-sys.matlab:579996


"burcu " <burcu102@hotmail.com> wrote in message <hbsbcr$omg$1@fred.mathworks.com>...
> Actually i dont need to extract any specific partition. I need to load this dataset to matlab and use it for neural network training. I've saved it as txt file and i have a command like:
> 
> fid=fopen('kddcup.data_10_percent.txt');
> trial=textscan(fid, %f,%s...(variable types for 42 colums), 19);
> fclose(fid);
> 
> I need to use it in such kind of command:
> 
> P=trial(1:30, 1:2);
> But trial is a 1x42 matrix and i get a dimension error.
> 
> I need to get a 19x42 matrix with textscan command. 
> 
> Burcu
> 
> -------------------
> 
> > 
> > It's not clear what information (string & number, only number) to extract from your file(or single array)?
> > 
> > It seems that your file have same structure therefore regexp would be appropriate for extracting data.
> > 
> >  Branko

Here is one approach to solve your problem.As I mentioned previously you should use regexp function which is useful in cases like this.
 
fid = fopen(filename,'rt');
data=textscan(fid,'%s','delimiter','','headerlines', 0);
fclose(fid);

% Engine - use regexp!
data=cat(1,data{:});
String_data=regexp(data,'([A-Z a-z]+)','match'); % Remove all numeric
Numeric_data=regexp(data,'([0.00-9.99]+)','match'); % Remove all letters 

Numeric_data=cat(1,Numeric_data{:});
String_data=cat(1,String_data{:});
DATA=[String_data(:,1:3) Numeric_data(:,1:end-1) String_data(:,end)];

Branko