Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: determine col num from column headers
Date: Tue, 12 Aug 2008 19:58:01 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 141
Message-ID: <g7sq09$rj2$1@fred.mathworks.com>
References: <g7q2bb$ffi$1@fred.mathworks.com> <g7rj59$g4m$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1218571081 28258 172.30.248.38 (12 Aug 2008 19:58:01 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Tue, 12 Aug 2008 19:58:01 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1413393
Xref: news.mathworks.com comp.soft-sys.matlab:485135



"Andres " > wrote in message 
> <g7q2bb$ffi$1@fred.mathworks.com>...
> > Hello!
> > I have 5 data sets containing 25 files each with 465 
to 
> > 517 columns, and I need to read 4 specific columns 
from 
> > each file. The files are loaded from *.csv into 
> > seperate 'colheader' string array and the data 
matrix.  
> > While each data set tends to have the same number of 
> > columns, thus far I need to manually view 'colheader' 
> data 
> > string from each file in the data set to be certain 
the 
> > target columns are indeed the same column number.  
Once I 
> > am certain of the column number I load this column 
into a 
> > seperate vector to use in other Matlab operations. Any 
> > suggestions on how can I determine the column number? 
Can 
> > I use the column headers (the 'colheader' string) to 
> > locate the specific column numbers?  
> 
> Hi,
> 
> we need more information about your colheader string, 
but 
> here's a hint to an automation routine that might work, 
as 
> a best guess:
> 
> You know the exact distinct column title strings 
appearing 
> in your header line string
> 
>     myColumnNames = {'Val1','Val2','Val3','Val4'};
> 
> If your header string contains more than one line, you 
> should extract the line that contains your column names. 
> Maybe you can identify the lines by the positions of a 
line 
> break character:
> 
>     lineBreakPos = strfind(colheader,char(10));
> 
> Find out somehow which line has the names, perhaps you 
know 
> it beforehand. Assuming the k-th line is the line of 
> interest, do something like:
> 
>     columnNameLine = colheader(lineBreakPos(k-1)+1: 
> lineBreakPos(k)-1 );
> 
> (take obvious precautions if it is the first or last 
line 
> in colheader)
> 
> Most probably there is a distinct delimiter (Tab, 
> space, ';', ',' ...) between the column name strings. 
Find 
> its positions in the line ...
> 
>     delim = ';';
>     delimPos = strfind(columnNameLine,delim);
> 
> find out the positions of your column name strings ...
> 
>     namePos(n) = strfind(columnNameLine,myColumnNames
(n));
> 
> and compare them to the delimiter positions to make out 
the 
> corresponding column index of your data matrix.
> 
> I hope this gets you started.
> Best regards
> Andres

Andres,
thanks for your reply.
This is probably too much detail but here it is:
The original .csv file contains about 23 header lines, but 
I delete these so the first line is, infact, the column 
names (extra header lines will not import properly).  The 
file is then imported into Matlab and then save as a 
matrix.  Usually when I import csv files I have a matrix 
and 2 'arrays', one called "textdata" and the other 
called "colheaders".  In this case one array was the 
actual column names/headers (e.g. 'Torque') and the other 
was the units for the column headers (e.g. 'Nm').  For 
some reason this is not working and I have to re-import 
the column headers as its own array - but this isn't 
really an issue.

A better example:
I have just loaded a data file with a matrix 
called "datafile1" (sized at <148x482 double>), and the 
corresponding column names is in "colheaders" array (size 
<1x482 cell>). I know the exact column names, but I must 
open the colheaders array and scroll to find where a 
target column is, since the data matrix doesn't have 
column names.
   For example: the column name/header "Torque" is in 
column # 422 of the 'colheader' array. The corresponding 
torque data is in the matrix "datafile1" column no 422. 
The end goal is to use this column number (422) to read 
the Torque data into a seperate vector to be further 
manipulated later. If I double click on the colheaders 
cell named 'Torque', the array editor title is colheaders
{1,422}.   

Herein lies the problem.  I have to verify all files in 
this data set have the 'Torque' data stored in column 422 -
or determine which column it is stored in.  Once 
determined, the torque data from each file will be saved 
as a seperate variable.  If I only had to manual verify 
this in 1 data set (approx 25 files), it would be okay.  
Currently however, it is 125 files, with more on the way. 

Also: I believe the string data in column headers is 
delimited by single quote (') not semicolon (;).  So 
modifing the code to " delim = '''; " does not work.  Am 
also unsuccessful with strfind:

strfind(colheaders,'Torque')
ans = 

  Columns 1 through 9

     []     []     []  ...
  Columns 442 through 450

     []     []     []     []     []
etc.


Hope this helps expain what is going on.  Thanks so much 
for all your time.