How to create Excel spreadsheet from HTML report?

150 views (last 30 days)
I have an HTML report with some information, and I would like to use MATLAB to extract a subset of the text and export to an Excel spreadsheet. How can I do this?
Attached is the HTML file, and copied below are its contents. I have also included a screenshot of the result that I would like to achieve after exporting to an Excel spreadsheet:
<div class="info">
<pre>
###########################################
<strong>-------------------------------------------</strong>
<strong>--------------- Section 1 -----------------</strong>
<strong>-------------------------------------------</strong>
Date = 01/01/2018
Time = 12:00:00
Type = Type1
Number1 = 1234
Comments =
<strong>-------------------------------------------</strong>
<strong>--------------- Section 2 -----------------</strong>
<strong>-------------------------------------------</strong>
Number2 = 100.0001
Version = 1.0.0
Program = example.exe
</pre>
</div>

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 16 Oct 2018
Attached (and also copy-pasted below) is an example script that illustrates one such approach to convert the HTML file to an Excel spreadsheet as per the following workflow:
1) Read in the HTML file
2) Remove specific sections/patterns of text (via regular expression search and replacement)
3) Parse the remaining text based on a specified delimiter and format
4) Export the result to an Excel Spreadsheet
Please note that this example is provided simply to illustrate the application of these functions and outline an example of a general approach. This example code would likely need to be adapted and tuned for a particular workflow and use-case, since the implementation is highly dependent on the requirements therein.
%%Read in HTML file.
filenameHTML = 'Example.html';
txt = fileread('Example.html');
%%Remove HTML tags, header text, and last section (pertaining to images).
txt = regexprep(txt,'<script.*?/script>','');
txt = regexprep(txt,'<style.*?/style>','');
txt = regexprep(txt,'<.*?>','');
txt = regexprep(txt,'.*#\n','');
txt = regexprep(txt,'--.*?\n','');
txt = regexprep(txt,'\n\n.*','');
%%Set up delimiters and format specification to read columns of data as text:
% For more information, see the TEXTSCAN documentation.
delimiter = {' = '};
formatSpec = '%q%q%[^\n\r]';
%%Read columns of data according to the format.
dataArray = textscan(txt, formatSpec, 'Delimiter', delimiter, ...
'TextType', 'char', 'ReturnOnError', false);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1); %preallocation before loop
for col = 1:(length(dataArray)-1)
raw(1:length(dataArray{col}),col) = dataArray{col};
end;
%%Write data to Excel spreadsheet.
filenameSpreadsheet = 'Example.xlsx';
xlswrite(filenameSpreadsheet,raw)
The following links provide additional information and references to the functions that I used, along with their syntax, arguments, parameters, and options.

More Answers (0)

Tags

No tags entered yet.

Products


Release

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!