convert a html table to csv format

11 views (last 30 days)
FATEMEH
FATEMEH on 10 Apr 2013
I need to convert the table in the following url to csv format. Since I have to convert many tables, I can't use cope paste. http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12
  1 Comment
Matt Kindig
Matt Kindig on 10 Apr 2013
Do you have to use Matlab for this purpose? The reason I ask is because other languages that are more commonly used for website development have good HTML parsing capabilities, whereas such features are more limited in Matlab--in Matlab you'd basically have to resort to complex regexp statements.
I would recommend Python and the BeautifulSoup package to do this, actually.

Sign in to comment.

Answers (2)

Jan
Jan on 11 Apr 2013
You can import the HTML table to Matlab at first by FEX: htmltableToCell or FEX: get-html-table-data-into-matlab. Then an export to CSV depends on the contents of the data.

Cedric
Cedric on 10 Apr 2013
Edited: Cedric on 10 Apr 2013
As Matt mentions, Python + package would be perfect for this part. Here is one way to do it using REGEXP in MATLAB.. not the full stuff though, but enough to illustrate.
% - Get HTML page.
url = 'http://climate.weatheroffice.gc.ca/climate_normals/results_e.html?stnID=2046&lang=e&dCode=0&province=ALTA&provBut=Search&month1=0&month2=12' ;
buffer = urlread(url) ;
% - Extract horizontal header.
p = '(?<=<td class="dataTableColHeader">).*?(?=</td>)' ;
hheader = regexp(buffer, p, 'match') ;
% - Extract vertical header.
p = '(?<=<td class="dataTableRowHeader">).*?(?=</td>)' ;
vheader = regexp(buffer, p, 'match') ;
% - Extract/reshape data.
p = '(?<=<td class="dataTableRowData">).*?(?=</td>)' ;
data = regexp(buffer, p, 'match') ;
data = reshape(data, 12+2, []).' ;
% - Build and export the whole.
content = [vheader.',[hheader; data]] ;
xlswrite('example.xlsx', content) ;
Let me know if you want to go this way and I can improve a little this code. There would be still quite a bit of work to do on your side, e.g. to manage some inconsistency in the way they build the HTML table, to detect/manage failures in the processing, to export to CSV instead of XLSX, etc.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!