webbot

A Java-based "web browser" that extract all links from a web-page, and display them.
8.6K Downloads
Updated 15 Oct 2003

View License

WEBBOT Java-based browser with download and PERL regular expressions. The function will extract all links from a web-page, and display them. The resulting documents can be downloaded.

WEBBOT(URL)
URL is a string indicating the base page address; the url must link to an html file. The function lists all links in the file. URL can also be a cell vector of url-strings.

WEBBOT(URL, WHAT)
displays only specific links. WHAT is a string:
'all_links': displays all links (default).
'page_links': displays all links to an html web page*.
'local_links': displays all local links on the server*.
'external_links': displays all links to external websites.
'image_links': displays all links to an image file**.
'image_tags': displays all image tags <img src="xxx">.
'.xxx.yyyy.zz': displays all links to each specific .xxx files; the case is ignored ('zip' will find 'ZiP'); e.g. '.zip.gz.gzip.tar.Z'.

WEBBOT(URL, WHAT, ACT)
performs an action on found links. ACT is a string:
'noaction': just display links (default)
'download': downloads all links found locally.
'cartoons': downloads all image tags found on linked pages. This is usefull for cartoons websites where each cartoon (e.g. "01.gif") is on its own html page (e.g. "c01.html").
<li>'follow.x': follows links to html pages and recursively performs the same action on the resulting page. 'x' is an integer indicating the ecursivity depth (0 is equivalent to 'noaction').

lks = WEBBOT(URL, ...)
returns an cell-array with links of URL{end}.

Notes: * Links explicitely pointing to a .htm or .html url.
** Image links are recognized by the following file types:
.jpg .jpeg .gif .pict .bmp .tif .tiff .ras .png (.giff)

Try it with:
webbot('http://www.unitedmedia.com/comics/dilbert/archive/', ...
'local_links', 'cartoons');

Written by L.Cavin, 28.09.2003, (c) CSE
This code is free to use and modify for non-commercial purposes.
Web address: http://ltcmail.ethz.ch/cavin/CSEDBLib.html#WEBBOT

Cite As

Laurent Cavin (2024). webbot (https://www.mathworks.com/matlabcentral/fileexchange/4023-webbot), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R13
Compatible with any release
Platform Compatibility
Windows macOS Linux
Categories
Find more on Call Web Services from MATLAB Using HTTP in Help Center and MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.0.0.0

Major update:
Much, much, faster downloads with the Matworks object "com.mathworks.mlwidgets.io.InterruptibleStreamCopier".
The old code using "java.net.url" is still included for demonstration purposes.