Path: news.mathworks.com!newsfeed-00.mathworks.com!newscon02.news.prodigy.net!prodigy.net!news.glorb.com!news-in-01.newsfeed.easynews.com!easynews.com!easynews!easynews-local!fe08.news.easynews.com.POSTED!not-for-mail
From: "Cassandra J. Nichols" <cassiejn@softnospamhome.net>
User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.soft-sys.matlab
Subject: Re: Reading Web Data
References: <e96768fe-79a6-411b-b231-78bc1ca0b169@j44g2000hsj.googlegroups.com>
In-Reply-To: <e96768fe-79a6-411b-b231-78bc1ca0b169@j44g2000hsj.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 21
Message-ID: <Omq2j.88454$bj2.60906@fe08.news.easynews.com>
X-Complaints-To: abuse@easynews.com
Organization: EasyNews, UseNet made Easy!
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Mon, 26 Nov 2007 02:40:47 GMT
Xref: news.mathworks.com comp.soft-sys.matlab:439302


Predictor wrote:
> Is there a way to read Web data exactly as it appears in "View... Page
> Source" within a browser?  My experiments with urlread() seem to show
> that HTML tags and other items are ignored, but I'd like to be able to
> read the exact contents of Web pages, and interpret or filter out tags
> and so forth in my own code.  Any ideas?
> 
> 
> Thanks,
> Will

Will,

   In a quick test I just did with MATLAB 2007a, HTML tags were intact.

   >> urlread('http://www.google.com')

   returns everything starting from "<html>".

-- 
  - Cassandra J. Nichols