Technical Articles

Internet-Enabled Data Analysis and Visualization with MATLAB

By Peter Webb, MathWorks


Many organizations make their data publicly available on Internet servers and provide an interface for extracting slices or sections of that data.

Analyzing this kind of data with MATLAB was once a complex process-first retrieve the data somehow, then get it into MATLAB for analysis. Today, using the MATLAB interface to Java, you can write MATLAB programs that incorporate both the data retrieval and analysis steps in a single M-file; Java has classes and functions that make it easy to capture data from Internet sources.

MATLAB programs for Internet data retrieval and analysis have a three-part pattern:

  • Open a specially formatted URL, creating a stream of data.
  • Read the data stream into MATLAB variables.
  • Analyze or visualize the data.

This pattern can be applied to data of any type. For simplicity, the example code presented below uses financial stock price data-stock prices are easy to understand (if not predict!) and readily available from a multitude of public sources. In order to follow the example below you'll need to download the example DisplayStockData program from MATLAB Central. The program requires Internet access, as it uses an Internet data source to obtain stock price data.

Stock Market Data

Yahoo! maintains a database of historical stock price data. The Yahoo! server delivers this data either formatted for your browser to display or in the form of a comma-delimited file (more technically, a CSV, or comma-separated value file). Yahoo! servers determine what kind of, and how much, data to serve by examining the URL used to connect to them. The server examines the query, and responds with a stream of data.

Since the query is just a text string, it is easy to duplicate. The query specifies a stock symbol, a start date, an end date, and a data frequency (daily, monthly, or yearly). Given this data, the example program, DisplayStockData, retrieves stock data for the indicated time period and draws two graphs, a candlestick graph of the stock's performance and a comparison of the stock to the performance of the S&P 500.

For example, this command demonstrates the monthly performance of Sears for the 27 months since January 1, 2000:

DisplayStockData('S', '1/1/2000', 'm', 27);

The first graph uses the candle function from the Financial Toolbox to draw a candlestick graph of the stock's performance. If you don't have the Financial Toolbox, simply comment out the call to candle inDisplayStockData.m. This graph shows open, close, high, and low prices for the stock during the given trading interval (in this case, monthly averages). The program also displays the stock's compound annual growth rate in the chart's title.

analysis_fig1_w.gif
Figure 1. Candlestick graph for a single stock. Click on image to see enlarged view.

The second graph compares the performance of the stock to the performance of the S&P 500 index. In this case, it is evident that Sears outperformed the index over the given period.

analysis_fig2_w.gif
Figure 2. Comparison of stock performance and S&P 500 index performance. Click on image to see enlarged view.

Data Retrieval

GetStockData.m contains the code that retrieves the data from the Yahoo! server in four steps. First, connect to the server, using a specially formatted URL, by creating a Java URL object: url = java.net.

URL(urlString);

Once the connection is established, create a stream object to read the data that the server has returned. Use the initial stream object to create a buffered I/O stream object, which enables us to read an entire line at a time, rather than just a single character.

stream = openStream(url);
ireader = java.io.InputStreamReader(stream);
breader = java.io.BufferedReader(ireader);

Next, read all the available data into a MATLAB cell array. The Java readLine function returns a zero length string to signal the end of the data stream. To make parsing easier, make sure that each line ends with a comma. Note the use of the char function to convert a Java string to a MATLAB character array.

while 1
line = readLine(breader);
if (prod(size(line)) == 0)
break;
end
line = char(line);
if line(end)~=',';line(end+1) = ',';end
% Store each line in a cell array.
stockdata{end+1} = line;
end

Finally, use strread to parse the data in the cell array into five MATLAB variables. The first argument to strread is the string to read from; create this from the cell array of strings by concatenating all the rows into one long row. The second argument is a format string that describes the format of each data group in the string: a string, four floating-point numbers, and an integer, or six columns of data. strread returns one variable for each column of data in the string. You can skip over a column by placing an asterisk (*) between the % and the character that indicates the type of data that the column contains. This call to strread skips over the final column, for example, by specifying %*n. The last four arguments to strread are parameter/value pairs; the first indicates that the columns in the file are delimited by commas, and the second that any empty entries in data matrix are to be filled with NaNs.

% Parse the string data into MATLAB numeric arrays.
stockdata = cat(2, stockdata{:});
[dates, open, high, low, close, volume] = ...
strread(stockdata,'%s%f%f%f%f%*n', 'delimiter', ',', ...
'emptyvalue', NaN);

Analysis and Visualization

Before displaying the graphs, compute the growth rates of the user- specified stock and the S&P 500 (see the function periodGrowth in DisplayStockData.m). Then, a single command creates the first graph.

candle(high, low, close, open, 'red', date, 2);

The second graph is a nonstandard type, but can easily be constructed by callingline, legend, and dateaxis. The hold on/off commands ensure that the two lines appear on the same figure.

hold on
line(date500, close500, 'color', 'blue');
line(date, close, 'color', 'red');
hold off
legend('S&P 500', upper(symbol));
dateaxis('x', 12);

Calls to sprintf create the title strings for each of these charts.

Java and MATLAB Make a Powerful Pair

This example performs a fairly simple analysis of a small amount of data, but the techniques demonstrated here can be applied to far larger and more complex data sets. The union of Java and MATLAB creates a uniquely powerful environment. Java provides connectivity to an immense variety of data, and MATLAB provides the tools that create insights into that data. Download the example program www.mathworks.com/nn_stock and start using these tools together to produce better analyses faster than you could with either MATLAB or Java alone.

Published 2002