Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

Subject: Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

From: Tony

Date: 30 Oct, 2012 03:29:05

Message: 1 of 6

Hi,

I am wanting to read the url

http://d-cyphatrade.com.au/market_futures#A*Q

but when I do this by

urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

the retrieved data does not have the data from the page (ie, all the numeric values of the prices/volume/etc are not there)

What do I need to do to retrieve all the data from this page.
Does urlread just not work and I need to use something else, maybe java from Matlab?

Cheers,

Tony

Subject: Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

From: Nasser M. Abbasi

Date: 30 Oct, 2012 04:04:18

Message: 2 of 6

On 10/29/2012 10:29 PM, Tony wrote:
> Hi,
>
> I am wanting to read the url
>
> http://d-cyphatrade.com.au/market_futures#A*Q
>
> but when I do this by
>
> urlread('http://d-cyphatrade.com.au/market_futures#A*Q')
>
> the retrieved data does not have the data from the page (ie, all the numeric
>values of the prices/volume/etc are not there)
>
> What do I need to do to retrieve all the data from this page.
> Does urlread just not work and I need to use something else, maybe java from Matlab?
>
> Cheers,
>
> Tony
>

that pages seems to be generated on the fly on the server.
So the URL itself is not like a static page there.

Actually, do this and you'll see:

1) http://d-cyphatrade.com.au/market_futures#A*Q
2) view source, copy all the source, paste in an empty text file
3) save the text file as foo.htm
4) open foo.htm and you'll the page is not the same as the one
you are looking at in step (1).

I I think url read is meant to read static html pages. When server side
scripts, or Javascript get involved, things might change since things
now are generated on the fly.

I am not a web programmer, so someone would know more these details.

--Nasser

Subject: Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

From: Nasser M. Abbasi

Date: 30 Oct, 2012 04:21:56

Message: 3 of 6

On 10/29/2012 11:04 PM, Nasser M. Abbasi wrote:
> On 10/29/2012 10:29 PM, Tony wrote:
>> Hi,
>>
>> I am wanting to read the url
>>
>> http://d-cyphatrade.com.au/market_futures#A*Q
>>
>> but when I do this by
>>
>> urlread('http://d-cyphatrade.com.au/market_futures#A*Q')
>>
>> the retrieved data does not have the data from the page (ie, all the numeric
>> values of the prices/volume/etc are not there)
>>
>> What do I need to do to retrieve all the data from this page.
>> Does urlread just not work and I need to use something else, maybe java from Matlab?
>>
>> Cheers,
>>
>> Tony
>>
>
> that pages seems to be generated on the fly on the server.
> So the URL itself is not like a static page there.
>
> Actually, do this and you'll see:
>
> 1) http://d-cyphatrade.com.au/market_futures#A*Q
> 2) view source, copy all the source, paste in an empty text file
> 3) save the text file as foo.htm
> 4) open foo.htm and you'll the page is not the same as the one
> you are looking at in step (1).
>
> I I think url read is meant to read static html pages. When server side
> scripts, or Javascript get involved, things might change since things
> now are generated on the fly.
>
> I am not a web programmer, so someone would know more these details.
>
> --Nasser
>

reading little more on this stuff (been a while), I think the
problem might be is that the anchor (stuff after #) which is
in your case '#A*Q'.

This anchor is used by the (client) i.e. browser to jump to
some specific location in a webpage. (client might have
more things to do talking to sever to get to that page
location also, which is not being done when you use already)

So, using # in an API call (like already) is not going to
work. That is why it works in the client (i.e. the browser).

But there should be other API's you can use to do all this.

May be use Java URL API from inside Matlab? as it has more
options to navigate and handle such things I am sure.

--Nasser

Subject: Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

From: Christopher Creutzig

Date: 31 Oct, 2012 07:30:47

Message: 4 of 6

On 30.10.12 05:04, Nasser M. Abbasi wrote:

> I I think url read is meant to read static html pages. When server side
> scripts, or Javascript get involved, things might change since things
> now are generated on the fly.

Server side scripts should not be a problem at all (unless someone got
clever with browser detection and falls over their feet for unknown
browsers like MATLAB). The client doesn't really know any difference
between the two.

Javascript, now, that's a different thing. And exactly what happens
here. If you have FireBug installed, activate it and load the page with
the Net panel turned on. The data itself is loaded from
http://d-cyphatrade.com.au/market_futures?product=A*Q (apparently
twice?) which may of course change and I have no idea if using that
directly would be within the terms of use of the web service.


Christopher

Subject: Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q') -- only part of the web page is retrieved

From: Steven_Lord

Date: 31 Oct, 2012 13:39:49

Message: 5 of 6



"Christopher Creutzig" <Christopher.Creutzig@mathworks.com> wrote in message
news:5090D3A7.6070305@mathworks.com...
> On 30.10.12 05:04, Nasser M. Abbasi wrote:
>
>> I I think url read is meant to read static html pages. When server side
>> scripts, or Javascript get involved, things might change since things
>> now are generated on the fly.
>
> Server side scripts should not be a problem at all (unless someone got
> clever with browser detection and falls over their feet for unknown
> browsers like MATLAB). The client doesn't really know any difference
> between the two.
>
> Javascript, now, that's a different thing. And exactly what happens
> here. If you have FireBug installed, activate it and load the page with
> the Net panel turned on. The data itself is loaded from
> http://d-cyphatrade.com.au/market_futures?product=A*Q (apparently
> twice?) which may of course change and I have no idea if using that
> directly would be within the terms of use of the web service.

To the OP: you may want to contact the owners/operators of that website and
explain what you're trying to do. They may provide an API with which you can
obtain the data you want directly, rather than having to query a web page
and scrape the data from it. This would run the risk that they say "You're
not allowed to do this by section X of the terms of use" but they may also
say "Sure, here's an easy way to access that data."

I'd also recommend checking if the data in which you're interested is
available from one of the data sources supported by Datafeed Toolbox:

http://www.mathworks.com/products/datafeed/

If it isn't, but you believe it's an important data set for users in a
particular region of the world or a particular type of financial analysis,
please submit an enhancement request asking that a source from which it is
available be added to Datafeed Toolbox.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Problem using urlread('http://d-cyphatrade.com.au/market_futures#A*Q')

From: Tony

Date: 31 Oct, 2012 23:36:15

Message: 6 of 6

Thanks for the replies on this.
I have finally end up using


pauseTime=.5;
url='http://d-cyphatrade.com.au/market_futures#A*Q';
IE = actxserver('InternetExplorer.Application');
pause(pauseTime)
IE.Offline = 0;
Navigate(IE,url);
pause(pauseTime)
htmlStr = IE.Document.Body.InnerHTML
pause(pauseTime)
delete(IE)

which is not in Java as I originally wanted. My attempts using Java also only returned partial information and not the entire web page.
Note: You might have to experiment with the pauseTime, on my machine and network connection it seems to work for 0.5 seconds, but for 0.1 seconds it does not work.

I know the solution is restricted to only being used on Windows, and I am sure there are better ways to grab this information using actxserver(). I saw reference to using
actxserver( 'WinHttp.WinHttpRequest.5.1' )
http://stackoverflow.com/questions/11538038/matlab-how-would-you-login-to-a-website-and-download-a-report
I am not sure how superior using
actxserver( 'WinHttp.WinHttpRequest.5.1' )
is over
actxserver('InternetExplorer.Application')


Cheers,

Tony

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us