Perform Google Search in Matlab

31 views (last 30 days)
dsmalenb on 4 Jun 2019
Edited: David Chen on 27 May 2020
I am trying to figure out how to perform a Google search automatically in matlab and save the results in an array.
Say I wanted to save the paths to the pdf files: " filetype:pdf"
Some answers in the list should then be:
I have seen some scripts (links below) but unfortunately they are outdated or simply do not work. I am guessing it may be possible to do this but I cannot seem to figure it out. Any assistance would be very welcome!
dsmalenb on 4 Jun 2019
Thank you for your response. Perhaps I am missing something significant but after parsing through the html I tried to compare the parts so I can made the neccesary changes. However, it does not seem as if all the necessary parts of the link are available. I have included an example below. It is for the first arciel that the search displays.
We have:
  1. The file typoe is in GREEN
  2. The Article's title is in YELLOW
  3. The parts of the link are in MAGENTA
I am missing "2004" and "01/23/" to complete the link. These parts do not seem to be listed in the HTML code.
Any idea how to get these pieces?
Joel Handy
Joel Handy on 10 Jun 2019
After doing some more research, it looks like scraping (thats what we are doing, scraping googles search results) is against their terms of service and they actively attempt to thwart it. That would explain why some older tools are no longer maintained. I'm not a web expert, There appear to be ways of doing what you want but I dont think any of them are simple.
Sorry I couldnt be more help.

Sign in to comment.

Answers (2)

Monika Phadnis
Monika Phadnis on 27 Jun 2019
I followed the example given on this link to extract data from the url.
As for the url, I used " " this as the url parameter for webread for the example given by you. This gives string array of the href links, you can try parsing the array for the required links.
In my output strings starting with " /url " had the search links.

KARTIK GURNANI on 21 May 2020
This Does seem true.
Ps :
Microsoft introduced this feature to prevent Other Web engines from copying their data {Search Results } on Bing way before Google.
It seems like we would be violating TOS on google and bing .
I tried.
I got Partial Results.
The best possible way would be to use Matlab to build a Neural Network which Runs search Querries from a system with Dynamic IP.
@AndrewNg might shed some better light on this.
There is a possible solution to this .
But , the Biggest issue of it all :
Google and Bing {Microsoft} might label your ip address as spam or bot .
Which Means , No netflix , No Hulu , No other streaming Service.
You might get locked out of Even Reading News on certain websites.
Hell , even simple web searches you might end up solving Recaptcha or the Newer Version : ImageCaptcha.
Dynamic IP will help in this case but check with your ISP before attempting this.
You might lose the Security or your Plan may get suspended .
>>It will take the ISP a lot of man hours to get that single IP cleaned up : Removed from Blacklist across most filters.
>>You would mostly increase their headache.
Note :
I have created a matlab script that can work your search querry.
I am not sure about posting it here.
The issue being you can only run it :
Single Search Query
It works but crawling takes a while , then use of postcript to convert to pdf .
Better when saving to HTML file with images.
If anyone would like the script , please let me know.
The script is only for educational terms.
Do not use it to violate TOS of any organization.
Good Luck & Stay Safe,
  1 Comment
David Chen
David Chen on 27 May 2020
Edited: David Chen on 27 May 2020
"If anyone would like the script , please let me know."
I want.

Sign in to comment.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!