MATLAB Answers

How to read particular dates from an HTML script?

2 views (last 30 days)
Abhinav
Abhinav on 28 Jan 2018
Edited: Abhinav on 29 Jan 2018
I have downloaded HTML script of of few webpages; one is uploaded for reference. My task is to extract some information from the this script. On line 851 latitude and longitude are given which I extracted using the following code:
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
fileID=fopen(filename); % fileID
open_file=textscan(fileID,'%s','%f'); % parsing the file
open_file=open_file{1,1};
lat_id=find(ismember(open_file,... % Finding the position of Latitude in text-file
'<dd>Latitude'));
long_id=find(ismember(open_file,... % Finding the position of Longitude in text-file
'Longitude'));
lat(i)=open_file(lat_id+1); % latitude
long(i)=open_file(long_id+1); % longitude
proj=open_file(long_id+3); % projection type, e.g., NAD27, NAD83
But I am not able to use similar code for reading the data in line 865, which contains the time-range of the some data. The problem is that the variable open_file do not seem to contain these values. Any suggestions will be helpful.

  0 Comments

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 29 Jan 2018
filename=strcat(pwd,'/',num2str(site(i))); % file to be read, which is same as the file uploaded
S = fileread(filename);
place_info = regexp(S, 'Latitude\s+(?<lat>[^ ,]+),\s*\S+\s*Longitude\s+(?<long>\S+)\s*\S+\s*(?<proj>\w+)', 'names', 'once');
periods_info = regexp(S, '''begin_date''[^\d]*(?<begin_date>\d+-\d+(-\d+)?).*?end_date[^\d]*(?<end_date>\d+-\d+(-\d+)?).*?sites_selection_links\W*(?<stats_type>[^<]+)', 'names');
other_info = regexp(S, 'site_no=\d+">(?<stats_type>.*?)</a>.*?''begin_date''[^d]*?(?<begin_date>\d+(-\d+(-\d+)?)?).*?end_date[^\d]*?(?<end_date>\d+(-\d+(-\d+)?)?)', 'names');
combined_info = [periods_info, other_info];
Now:
place_info is a struct with fields 'lat', 'long', and 'proj' reflecting latitude, longitude, and projection. The lat and long are in the form they were stored in the file, so they may have a ° in them, corresponding to the ° symbol.
combined_info is a struct with fields begin_date, end_date, and stats_type . stats_type is the information about what the period is describing. In the sample data file those are
'Daily Statistics' 'Monthly Statistics' 'Annual Statistics' 'Current / Historical Observations' 'Peak streamflow' 'Field measurements' 'Field/Lab water-quality samples' and 'Water-Year Summary'

  1 Comment

Abhinav
Abhinav on 29 Jan 2018
Thanks a lot! I did not know that these routines exist in MATLAB.

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Products