File Exchange

image thumbnail

Get HTML-table data into MATLAB via urlread and without builtin browser

version 1.0 (4.91 KB) by

Based on getTableFromWeb with a little more functionality for bring in table data from web to MATLAB

47 Downloads

Updated

View License

Function getTableFromWeb_mod is based on the very very good "pick of the week" from August 20th, 2010 (http://www.mathworks.com/matlabcentral/fileexchange/22465-get-html-table-data-into-matlab) by Jeremy Barry.
It is inspired by the restrictions of the original function and should users help, who had problems with the loading time of the requested webpage. So the workaround doesn't use the internal webbrowser by Matlab but takes the urlread function to import and analyze the table-webdata.

To get table data, it is necessary to know from which url you want to read in the data and from which
table. If you have an url but no idea which table with the specified tablenummer has the data use the originalfuntion getTableFromWeb
(http://www.mathworks.com/matlabcentral/fileexchange/22465-get-html-table-data-into-matlab) to check which tablenumber with content you are interested in.

The first example(at the end of description) gets actual departure information by german railways for the railwaystation Frankfurt Hbf (coded by ibnr, international railway station number).

The second example belongs to the orinigal example by Jeremy Barry and gets financial information.

There are two input arguments:
url_string -- is the string of the requested webpage
nr_table -- number of table to get and to put in out_table

Ouput argument:
out_table -- is a cell array of requested data

Example:

% German Railways-travelling information example
ibnr = 8098105; % IBNR railway station: Frankfurt-Hbf (for more ibnr see: http://www.ibnr.de.vu/)
url_string = [ 'http://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?rt=1&ld=10000&evaId=', num2str(ibnr) ,'&boardType=dep&time=actual&productsDefault=1111000000&start=yes']; % question string fo calling actual departure information for Frankfurt HBF
nr_table = 2; % Table with the travelinformation data
out_table = getTableFromWeb_mod(url_string, nr_table)

% Finance example
% run getTableDataScript to see, which table is number 7 (Valuation Measures)
url_string = ('http://finance.yahoo.com/q/ks?s=GOOG');
nr_table = 7;
out_table = getTableFromWeb_mod(url_string, nr_table)

Comments and Ratings (14)

Varun Save

I get a 403 forbidden error when I try to use this.
How do I get to a website that needs credentials passed to it (username + password)?

João

João (view profile)

RAJKUMAR

How can I use it for downloading a series of tables for different date and time.

J.D.

J.D. (view profile)

Agree with Jorge:
If there is only one table in your page, need to modify the "if i>=1 % if there are tables to read" section.

Other than that, very helpful!

Simon Garland

Worked perfectly, many thanks.

Ingrid

Ingrid (view profile)

Loved it, have been trying to parse the data myself by using urlread as a starting point but it was giving me a headache as there was no delimiter between the data in columns which made it impossible for me. Now my code is just a few lines and does exactly what I want!

Gareth Thomas

Worked like a charm, thanks:)

Jorge

Jorge (view profile)

Great function! However I've seem to have found a problem with a webpage with only one table, where I got "No Table detected". I changed:

if i>1 % if there are tables to read

to

if i>=1 % if there are tables to read

To correct for the case where only one table is detected, and worked perfectly.

Douglas

Hi, I am pretty new to this. But what if I need to extract more than one table each time i run the script? and also some of the text could not be read and it return a [] cell array. Thanks.

David Jessop

Sorry, stupid question above have now sorted!

David Jessop

That's great! Do you know how I remove the ' ' from around the numbers found so I can use them?

Raj Sodhi

Raj Sodhi (view profile)

Fantastic!

Excellent.

Brian

Brian (view profile)

Great tool. One issue I discovered is that in some HTML sites, the table identifier is capitalized (ex. </TABLE> instead of </table>, etc.). In these cases, the function fails because the string comparison commands are case sensitive. Modifying to use regexprep(...,'preservecase') and regexpi() where appropriate, allows tables to be extracted from websites where the original function failed.

MATLAB Release
MATLAB 7.10 (R2010a)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video