## Trendy TutorialThis page is for Trendy screen scraping practice. Use Trendy's URLFILTER function to explore this page.
This is the kind of page you might want to extract data from. For example, let's say this page is updated every day with the number of widgets that were produced. ## Data For Task 1:Total Widgets produced today: 437 ## Task 1: How many widgets were produced?The phrase "Total Widgets" makes a good target string. the_url = 'http://www.mathworks.com/matlabcentral/trendy/Tutorial/trendco.html'; total_widgets = urlfilter(the_url,'Total Widgets'); updatetrend(total_widgets) Even though the phrase "Total Widgets" doesn't sit next to the number, it's still a good specific marker for the number that we're interested in. URLFILTER starts there and begins looking for numbers. The first one it finds is 437, at which point it stops. ## Data For Task 2:Today's temperature and relative humidity: 77F, 48% ## Task 2: What is today's relative humidity?In this case "humidity" is a good target, but the number immediately following the target (77) isn't the one we're interested in. The workaround is to collect two numbers and only use the second one. This involves passing a third argument to URLFILTER. the_url = 'http://www.mathworks.com/matlabcentral/trendy/Tutorial/trendco.html'; vals = urlfilter(the_url,'humidity',2); humidity = vals(2); updatetrend(humidity) ## Data For Task 3:## TrendCo Daily Sales Numbers
## Task 3: How much did TrendCo make on Tetrascopes today?In this case we want to collect two numbers (sales from East and West) and add them together. the_url = 'http://www.mathworks.com/matlabcentral/trendy/Tutorial/trendco.html'; vals = urlfilter(the_url,'Tetrascope',3); east_sales = vals(2); west_sales = vals(3); total_sales = east_sales + west_sales; updatetrend(total_sales) ## Task 4: What is the ID number for Grid Brackets?This looks like an easy question, but the obvious solution doesn't work: ```
the_url = 'http://www.mathworks.com/matlabcentral/trendy/Tutorial/trendco.html';
sales = urlfilter(the_url,'Grid Bracket');
updatetrend(sales) % fails!
```
Why isn't it working? It's only when you view the underlying HTML source that you see there is a non-breaking space between "Grid" and "Bracket". So instead of "Grid Bracket", the target search term should be "Grid Bracket". sales = urlfilter('http://www.mathworks.com/matlabcentral/trendy/Tutorial/trendco.html','Grid Bracket'); updatetrend(sales) Now the code works fine. Remember that sometimes you have to examine the source HTML before you can be confident that your target string is accurate and unique. |