File Exchange

image thumbnail

Get HTML-table data into MATLAB via urlread and without builtin browser

version 1.0.0.0 (4.91 KB) by Sven Koerner
Based on getTableFromWeb with a little more functionality for bring in table data from web to MATLAB

37 Downloads

Updated 07 Dec 2010

View License

Function getTableFromWeb_mod is based on the very very good "pick of the week" from August 20th, 2010 (http://www.mathworks.com/matlabcentral/fileexchange/22465-get-html-table-data-into-matlab) by Jeremy Barry.
It is inspired by the restrictions of the original function and should users help, who had problems with the loading time of the requested webpage. So the workaround doesn't use the internal webbrowser by Matlab but takes the urlread function to import and analyze the table-webdata.

To get table data, it is necessary to know from which url you want to read in the data and from which
table. If you have an url but no idea which table with the specified tablenummer has the data use the originalfuntion getTableFromWeb
(http://www.mathworks.com/matlabcentral/fileexchange/22465-get-html-table-data-into-matlab) to check which tablenumber with content you are interested in.

The first example(at the end of description) gets actual departure information by german railways for the railwaystation Frankfurt Hbf (coded by ibnr, international railway station number).

The second example belongs to the orinigal example by Jeremy Barry and gets financial information.

There are two input arguments:
url_string -- is the string of the requested webpage
nr_table -- number of table to get and to put in out_table

Ouput argument:
out_table -- is a cell array of requested data

Example:

% German Railways-travelling information example
ibnr = 8098105; % IBNR railway station: Frankfurt-Hbf (for more ibnr see: http://www.ibnr.de.vu/)
url_string = [ 'http://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?rt=1&ld=10000&evaId=', num2str(ibnr) ,'&boardType=dep&time=actual&productsDefault=1111000000&start=yes']; % question string fo calling actual departure information for Frankfurt HBF
nr_table = 2; % Table with the travelinformation data
out_table = getTableFromWeb_mod(url_string, nr_table)

% Finance example
% run getTableDataScript to see, which table is number 7 (Valuation Measures)
url_string = ('http://finance.yahoo.com/q/ks?s=GOOG');
nr_table = 7;
out_table = getTableFromWeb_mod(url_string, nr_table)

Cite As

Sven Koerner (2020). Get HTML-table data into MATLAB via urlread and without builtin browser (https://www.mathworks.com/matlabcentral/fileexchange/29642-get-html-table-data-into-matlab-via-urlread-and-without-builtin-browser), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (20)

Excellent function, works like a charm.

Many thanks also to Brian (comment from 28 Mar 2011): I replaced all instances of regexp with regexpi and added the regexprep(...,'preservecase') option to all regexprep calls, and it handled the case of </TABLE> instead of </table>.

Mens Sana

The change to code using webread is actually quite simple. It simply requires substituting
urlText = java.lang.String(urlread(url_string)) ;
with
options = weboptions('ContentType','text');
urlText = java.lang.String(webread(url_string,options));

Mens Sana

I have been using this function for quite a while, but it recently failed on a site- I think because the site uses cookies. However, I was able to download the content using webread (recommended by MathWorks). I was wondering if there is an easy way of making the function work with webread instead of urlread, without making extensive changes to the existing code?

Excellent, just change
if i>1 % if there are tables to read
to
if i>=1 % if there are tables to read
as suggested by Jorge if only 1 table in html file.

Varun Save

I get a 403 forbidden error when I try to use this.
How do I get to a website that needs credentials passed to it (username + password)?

João

RAJKUMAR

How can I use it for downloading a series of tables for different date and time.

J.D.

Agree with Jorge:
If there is only one table in your page, need to modify the "if i>=1 % if there are tables to read" section.

Other than that, very helpful!

Worked perfectly, many thanks.

Ingrid

Loved it, have been trying to parse the data myself by using urlread as a starting point but it was giving me a headache as there was no delimiter between the data in columns which made it impossible for me. Now my code is just a few lines and does exactly what I want!

Worked like a charm, thanks:)

Jorge

Great function! However I've seem to have found a problem with a webpage with only one table, where I got "No Table detected". I changed:

if i>1 % if there are tables to read

to

if i>=1 % if there are tables to read

To correct for the case where only one table is detected, and worked perfectly.

Douglas

Hi, I am pretty new to this. But what if I need to extract more than one table each time i run the script? and also some of the text could not be read and it return a [] cell array. Thanks.

Sorry, stupid question above have now sorted!

That's great! Do you know how I remove the ' ' from around the numbers found so I can use them?

Raj Sodhi

Fantastic!

Excellent.

Brian

Great tool. One issue I discovered is that in some HTML sites, the table identifier is capitalized (ex. </TABLE> instead of </table>, etc.). In these cases, the function fails because the string comparison commands are case sensitive. Modifying to use regexprep(...,'preservecase') and regexpi() where appropriate, allows tables to be extracted from websites where the original function failed.

MATLAB Release Compatibility
Created with R2010a
Compatible with any release
Platform Compatibility
Windows macOS Linux