Reading a string to get required data

1 view (last 30 days)
Tom  Pearce
Tom Pearce on 27 Mar 2011
Im trying to write a program where I can read HTML code for the purposes of extracting the data, for some analyse im conducting. Ive managed to remove the HTML jargon and am now left with a string which contains the data i require. However im trying to convert the data into a readable cell array;
Mar25,2011>4.88>4.88>4.83>4.88>51,000>Mar24,2011>4.72>4.72>4.72>4.72>13,300>Mar22,2011>4.88>4.88>4.88>4.88>0>Mar18,2011>5.00>5.00>5.00>5.00>0>Mar17,2011>4.81>4.89>4.81>4.89>1,001>
I know this may seem rather simple to most of you but im new to Matlab. Basically im trying to convert this string into a column array, firstly with the date followed by the sucessive five numbers for the whole data set. Any help on this would be greatly appreciated.

Answers (2)

Walter Roberson
Walter Roberson on 27 Mar 2011
textscan('%s%f%f%f%f%f', 'Delimiter', '>', 'CollectOutput', 1)
You might need to change the shapes around afterwards. I am not clear on what you are envisioning for a "column array".
  2 Comments
Tom  Pearce
Tom Pearce on 29 Mar 2011
Basically I just want the data in a list (6 Columns wide) from which i can write to file and produce a graph. Ive tried textscan but keeps returning {0x1} [0x1] [0x1] [0x1] [0x1] [0x1]. Now i realise im along the right lines i will persivere. Thanks Very Much for your help.
Walter Roberson
Walter Roberson on 29 Mar 2011
Ah, you have commas in your fifth numeric field; that throws off parsing them as a number. Also I forgot to show the string field.
Let T be the string you have the line stored in. Then,
Q = textscan(T,'%s%f%f%f%f%s', 'Delimiter', '>');
Q{6} = str2double(regexprep(Q{6},',',''));

Sign in to comment.


Clemens
Clemens on 29 Mar 2011
Personally I don't remove the "html jargon" in such cases. I use regexps like:
table_lines = regexp(table,'<tr [^>]*>(.*?)</tr>','tokens');
table_line_entry = regexp(table_line,'<td [^>]*>(.*?)</td>','tokens');
This has the advantage that it keeps the structure information of original table.
Also you might run into problems if in a table cell is html code, or just a ">" sign.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!