Interacting with Graphed Data

Data Brushing with the Variable Editor

Shared variables in linked figures are highlighted in all figures when data in one is brushed. They also highlight when you open the variables in the Variable Editor.

The Variable Editor also has a Data Brushing tool. It has no Data Linking tool, however, because in the Variable Editor, variables are always "live," and their data sources therefore respond immediately to any changes you make in the Variable Editor. This means that whenever you place it in data brushing mode, brush marks and changes to data values you make in the Variable Editor appear in linked plots.

If you have linked plots of matrix data with observations across rows and where each column represents a distinct, related quantity, brushing any observation—whether in a graph or the Variable Editor—highlights all observations in the same row, as you can see in the next image.

For more information about the using the Variable Editor, see Viewing and Editing Workspace Variables with the Variable Editor in the MATLAB® Desktop Tools and Development Environment documentation and see the reference page for openvar.

Using Datatips to Explore Graphs

A datatip is a small display associated with an axes that reads out individual data observation values from a 2-D or 3-D graph. You create datatips by mouse clicks on graphs using the Data Cursor tool from the figure toolbar. When you select this tool, you are in data cursor mode—signified by a hollow cross-hair cursor—in which you identify x-, y-, and z-values of data points you click. Like data points you brush, export such values to the workspace.

For descriptions of data cursor properties and how to use them, see Data Cursor — Displaying Data Values Interactively in the MATLAB Graphics documentation and see the reference page for datacursormode.

The default behavior of datatips is to simply display the XData, YData, and ZData values of the selected observations as text in a box. Sometimes this information is not helpful by itself, and you might want to replace or augment it with other information. You can modify this behavior to display other facts connected to observations. You customize datatip behavior by constructing a datatip text update function (in M-code) to construct text strings for display in datatips and then instructing data cursor mode to use your function instead of the default one.

Customize data cursor update functions to display information such as

You can create datatip text update functions to display such information and change their behavior on the fly. You can even make the update function behave differently for distinct observations in the same graph if your update function or the code calling it can distinguish groups of them. The next section contains an example of coding and using a customized data cursor update function.

Example — Visually Exploring Demographic Statistics

The extended example that follows begins by using datatips to explore the incidence of fatal traffic accidents tabulated for U.S. states, with respect to state populations. The example extends this analysis to brush, link, and map the data to discover spatial patterns in the data. Each section of the example has four or fewer steps. By executing them all, you gain insight into the data set and become familiar with useful graphical data exploration techniques.

Censuses of population and other national government statistics are valuable sources of demographic and socioeconomic data. An important aspect of census data is its geography, i.e., the regions to which a given set of statistics applies, and at what level of granularity. When exploring census data, you frequently need to identify what geographic unit any given observation represents.

This example uses datatips to show place names and statistics for individual observations. You pass place names and the data matrix to a custom text update function to enable this. The place names are for U.S. states and the District of Columbia. If all these names were placed as labels on the x-axis, they would be too small or too crowded to be legible, but they are readable one at a time as datatips.

The example also illustrates how sorting a data matrix by rows can enhance interpretation when the original ordering (in this case alphabetical by state) provides no special insight into relationships among observations and variables.

The Datatip Text Update Function

Datatips can present other information beyond x-, y- and z-values. Read through the example function labeldtips, which takes three more parameters than a default callback, and displays the following information:

Because it customizes datatips, the function must be an M-file that you invoke from the Command Window or from a script.

function output_txt = labeldtips(obj,event_obj,...
                      xydata,labels,xymean)
% Display an observation's Y-data and label for a datatip
% obj          Currently not used (empty)
% event_obj    Handle to event object
% xydata       Entire data matrix
% labels       State names identifying matrix row
% xymean       Ratio of y to x mean (avg. for all obs.)
% output_txt   Datatip text (string or string cell array)
% This datacursor callback calculates a deviation from the
% expected value and displays it, Y, and a label taken
% from the cell array 'labels'; the data matrix is needed
% to determine the index of the x-value for looking up the
% label for that row. X values could be output, but are not.

pos = get(event_obj,'Position');
x = pos(1); y = pos(2);
output_txt = {['Y: ',num2str(y,4)]};
ydev = round((y - x*xymean));
ypct = round((100 * ydev) / (x*xymean));
output_txt{end+1} = ['Yobs-Yexp: ' num2str(ydev) ...
                     '; Pct. dev: ' num2str(ypct)];
idx = find(xydata == x,1);  % Find index to retrieve obs. name
% The find is reliable only if there are no duplicate x values
[row,col] = ind2sub(size(xydata),idx);
output_txt{end+1} = cell2mat(labels(row));

Copy this code into an M-file and save it as labeldtips.m in your working directory or somewhere on your MATLAB path.

To use this update function, first declare it as a callback in a data cursor object:

hdt = datacursormode;
set(hdt,'UpdateFcn',{@labeldtips,hwydata,statelabel,usmean})

hdt is the handle of a data cursor mode object for the figure you want to explore; declare the function's name and formal arguments as a cell array. The call to datacursormode puts the current figure in data cursor mode.

Preparing, Plotting, and Annotating the Data

The following steps show how you load statistical data for U.S. states, plot some of it, and enter data cursor mode to explore the data:

  1. Load U.S. state data statistics from the National Transportation Safety Highway Administration and the Bureau of the Census and look at the variables:

    load 'accidents.mat'
    whos
      Name             Size            Bytes  Class  
    
      datasources       3x1              2568  cell  
      hwycols           1x1                 8  double
      hwydata          51x17             6936  double
      hwyheaders        1x17             1874  cell  
      hwyidx           51x1               408  double
      hwyrows           1x1                 8  double
      statelabel       51x1              3944  cell  
      ushwydata         1x17              136  double
      uslabel           1x1                86  cell  
    

    The data set has 51 observations for 17 variables.

  2. (Not required) To help you interpret graphs of it, the data matrix and labels have been presorted by rows to be in ascending order of total state population. The 51-by-1 vector hwyidx contains indices from the presorting (the data were originally in alphabetic order)

    You should not carry out this step now, but if you ever want to resort the rows of the data array and state labels alphabetically, you could do the following:

    [hwydata hwyidx] = sortrows(hwydata,1);
    statelabel = statelabel(hwyidx);
    

    (The first column of the hwydata matrix contains Census Bureau state IDs that ascend in alphabetical order.)

  3. Plot a line graph of the population by state as x versus the number of traffic fatalities per state as y:

    hf1 = figure;
    plot(hwydata(:,14),hwydata(:,4));
    xlabel(hwyheaders(14))
    ylabel(hwyheaders(4))

    Because the state observations are sorted by population size, the graph is monotonic in x. The larger a population a state has, the more variation in traffic accident fatalities it tends to show.

  4. Compute the per capita rate of traffic fatalities for the entire United States; in the next part of this example, the data cursor update function uses this average to compute an expected value for each state you query:

    usmean = ushwydata(4)/ushwydata(14)
    
    usmean =
      1.5150e-004
    

    The statistic shows that nationally, about 150 per 100,0000 people die in traffic accidents every year.

    Use usmean to compute the smallest and largest expected values by multiplying it by the smallest and largest state populations, and draw a line connecting them:

    line([min(hwydata(:,14)) max(hwydata(:,14))],...
         [min(hwydata(:,14))*usmean max(hwydata(:,14)*usmean)],...
          'Color','m');
    

Explore the Graph with the Custom Data Cursor

You can now explore the graphed data with the example custom data cursor update function labeldtips (which must be on the MATLAB path or in the current directory). labeldtips displays state names and y-deviations.

  1. Turn on data cursor mode and invoke the custom callback:

    hdt = datacursormode;
    set(hdt,'DisplayStyle','window');
    % Declare a custom datatip update function to display state names
    set(hdt,'UpdateFcn',{@labeldtips,hwydata,statelabel,usmean})

    The data cursor 'window' display style sends datatip output to a small window that you can move anywhere within the figure. This display style is best suited to datatips that contain more text than just x-, y-, and z-values. The labeldtips callback remains active for that figure until you use set to replace it with another function (or empty, to restore the default data cursor behavior). Click the right-most point on the blue graph.

    The datatip shows that California has the largest population and the largest number of traffic fatalities, 4120. However, it had 1012, or 20%, fewer fatalities than predicted by the national average.

  2. The next data point to the left depicts Texas. Click that data point or press the left arrow to show its datatip.

    Texas had 3583 fatalities, which is 424 (13%) more than the expected value. To see results from other states, move the datatip by dragging the black square or using the left or right arrow to step it along the graph. If you know a little about U.S. geography, you might observe a pattern.

Plot and Link a Histogram of a Related Variable

The ninth column of hwydata, labeled "Fatalities per 100K Licensed Drivers," is related to population. Plot a histogram of this variable to see which states have fewer or more fatalities per driver. To do this, link the plots to their data, and brush either of them.

  1. Open a new figure and plot a histogram of Fatalities per 100K Licensed Drivers in it:

    hf2 = figure
    hist(hwydata(:,9),5)
    xlabel(hwyheaders(9))
    
  2. Link both the line graph and the histogram to their data sources in hwydata:

    linkdata(hf1)
    linkdata(hf2)

    You can also click the Data Linking tool on the two figures. The first figure links automatically; the histogram does not because linkdata cannot determine with certainty the YDataSource for histograms. The Linked Plot information bar on top of the histogram informs you No Graphics have data sources. Cannot link plot: fix it.

  3. Click fix it to open the Specify Data Source Properties dialog box. Type hwydata(:,9) into the YDataSource edit box and click OK.

    The Linked Plot information bar displays the data source you identified. The histogram looks like this.

Explore the Linked Graphs with Data Brushing

Now that you have linked both graphs to a common data set, you can brush portions of one to see the effect on the other.

  1. It isn't necessary, but you might want to dock the plots in a figure group so you can see them side by side.

  2. Select the Data Brushing tool on the histogram plot. Brush the three right-most bars in the histogram; they represent higher values that range from 25 to 48 fatalities per 100,000 drivers.

    Notice which observations light up on the line graph. Not only are these states with smaller populations, they are also states with above-average numbers of traffic fatalities.

  3. Click the line graph to make it the active figure and select its Data Brushing tool. Click all the observations you can that fall below the straight line average. You need to hold the Shift key down to make multiple selections, whether by clicking or dragging. You might want to zoom in on the left side of the graph to brush properly there. What do you see happening on the histogram?

Plot the Observations on a Linked Map

The hwydata matrix contains geographic location information in the form of latitude-longitude coordinates of a centroid for each state. You can make a crude map by generating a scatter plot of these coordinates, using longitude as x and latitude as y. If you link the scatter plot, you can brush all the plots at once.

  1. To provide a context for the map, plot an outline map of the conterminous United State. Obtain the latitude and longitude coordinates required from the demo MAT-file uspoly.mat:

    hf3 = figure;
    load usapolygon
    patch(uslon,uslat,[1 .9 .8],'Edgecolor','none');
    hold on
    

    When projected into the figure. the map is distorted to fit the aspect ratio of the axes.

  2. Map the centroid longitude and latitude as a scatter plot with filled circles. Plot a rectangle over part of the map, as follows:

    scatter(hwydata(:,2),hwydata(:,3),36,'b','filled');
    xlabel('Longitude')
    ylabel('Latitude')
    rectangle('Position',[-115,25,115-77,36-25],...
        'EdgeColor',[.75 .75 .75])

    The x- and y-limits change, shrinking the map, because the data matrix contains observations for Alaska and Hawaii, but the map outline file does not include these states.

  3. Dock the map underneath the other two figures. Brush the map after turning on the Data Linking and Data Brushing tools for its figure. Drag across the gray rectangle with the Data Brushing tool to highlight just the southeastern and southwestern states. What you see should look like this.

    Data brushing and linking reveals that almost all the states with above-average traffic fatality rates are in the southern part of the U.S.

Using graphic data exploration, you have identified some intriguing regularities in this data. However, you have not identified any causes for the patterns you found. That will take more work on with the data, and possibly additional data sets, along with some hypotheses and models.

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS