Documentation Center |
On this page… |
---|
Data Brushing with the Variables Editor |
To brush data in the Variables editor, link the figure windows associated with variable. Then right-click on a cell in the Variables editor and select Brushing > Brushing on in the context menu. Select one or more cells to brush elements in the variable. The corresponding points on your plots highlight simultaneously.
You can brush observations that appear in multiple linked plots at the same time. You can do this only when your observations are in a matrix with the plot variables running along separate columns. For example, you can create two separate plots of observations in a matrix called data, which contains system response measurements at 50 different (x, y) points. The first column, data(:,1), contains the x-coordinates, data(:,2) contains y-coordinates, and data(:,3) contains the measured response at each point. The left plot below shows the response versus x. The plot on the right shows the response versus y. If you brush a point in one plot, the corresponding point in the other plot highlights at the same time. Furthermore, if you have the Variables editor open, the corresponding data row is highlighted whenever you brush a point.
For more information about the using the Variables editor, see the openvar reference page.
A data tip is a small display associated with an axes that reads out individual data observation values from a 2-D or 3-D graph. You create data tips by mouse clicks on graphs using the Data Cursor tool from the figure toolbar. When you select this tool, you are in data cursor mode—signified by a hollow cross-hair cursor—in which you identify x-, y-, and z-values of data points you click. Like data points you brush, export such values to the workspace.
For descriptions of data cursor properties and how to use them, see
Data Cursor — Displaying Data Values Interactively and Using Data Cursors with Histograms in the in the MATLAB^{®} Graphics documentation
The MATLAB function reference page for datacursormode
The default behavior of data tips is to simply display the XData, YData, and ZData values of the selected observations as text in a box. Sometimes this information is not helpful by itself, and you might want to replace or augment it with other information. You can modify this behavior to display other facts connected to observations. You customize data tip behavior by constructing a data tip text update function (in MATLAB code) to construct text strings for display in data tips and then instructing data cursor mode to use your function instead of the default one.
Customize data cursor update functions to display information such as
Names associated with x-, y-, and z-values
Weights associated with x-, y-, and z-values
Differences in x-, y-, and z-values from the mean or their neighbors
Transformations of values (e.g., normalizations or to different units of measure)
Related variables
You can create data tip text update functions to display such information and change their behavior on the fly. You can even make the update function behave differently for distinct observations in the same graph if your update function or the code calling it can distinguish groups of them. The next section contains an example of coding and using a customized data cursor update function.
The extended example that follows begins by using data tips to explore the incidence of fatal traffic accidents tabulated for U.S. states, with respect to state populations. The example extends this analysis to brush, link, and map the data to discover spatial patterns in the data. Each section of the example has four or fewer steps. By executing them all, you gain insight into the data set and become familiar with useful graphical data exploration techniques.
Censuses of population and other national government statistics are valuable sources of demographic and socioeconomic data. An important aspect of census data is its geography, i.e., the regions to which a given set of statistics applies, and at what level of granularity. When exploring census data, you frequently need to identify what geographic unit any given observation represents.
This example uses data tips to show place names and statistics for individual observations. You pass place names and the data matrix to a custom text update function to enable this. The place names are for U.S. states and the District of Columbia. If all these names were placed as labels on the x-axis, they would be too small or too crowded to be legible, but they are readable one at a time as data tips.
The example also illustrates how sorting a data matrix by rows can enhance interpretation when the original ordering (in this case alphabetical by state) provides no special insight into relationships among observations and variables.
Data tips can present other information beyond x-, y- and z-values. Read through the example function labeldtips, which takes three more parameters than a default callback, and displays the following information:
Its y-value
Deviation from an expected y-value
Percent deviation from the expected y-value
The observation's label (state name)
Because it customizes data tips, the function must be a code file that you invoke from the Command Window or from a script. This file, labeldtips.m, and the MAT-files accidents.mat and usapolygon.mat that the following examples also use, exist on the MATLAB path. Here is the code for the labeldtips data cursor callback function.
function output_txt = labeldtips(obj,event_obj,... xydata,labels,xymean) % Display an observation's Y-data and label for a data tip % obj Currently not used (empty) % event_obj Handle to event object % xydata Entire data matrix % labels State names identifying matrix row % xymean Ratio of y to x mean (avg. for all obs.) % output_txt Datatip text (string or string cell array) % This datacursor callback calculates a deviation from the % expected value and displays it, Y, and a label taken % from the cell array 'labels'; the data matrix is needed % to determine the index of the x-value for looking up the % label for that row. X values could be output, but are not. pos = get(event_obj,'Position'); x = pos(1); y = pos(2); output_txt = {['Y: ',num2str(y,4)]}; ydev = round((y - x*xymean)); ypct = round((100 * ydev) / (x*xymean)); output_txt{end+1} = ['Yobs-Yexp: ' num2str(ydev) ... '; Pct. dev: ' num2str(ypct)]; idx = find(xydata == x,1); % Find index to retrieve obs. name % The find is reliable only if there are no duplicate x values [row,col] = ind2sub(size(xydata),idx); output_txt{end+1} = cell2mat(labels(row));
The portion of the example called Explore the Graph with the Custom Data Cursor sets up data cursor mode and declares this function as a callback using the following code:
hdt = datacursormode; set(hdt,'UpdateFcn',{@labeldtips,hwydata,statelabel,usmean})
The call to datacursormode puts the current figure in data cursor mode. hdt is the handle of a data cursor mode object for the figure you want to explore. The function name and its three formal arguments are a cell array.
The following steps show how you load statistical data for U.S. states, plot some of it, and enter data cursor mode to explore the data:
Note: To help you interpret graphs created in this example, the hwydata data matrix and its row labels have been presorted by rows to be in ascending order by total state population. The 51-by-1 vector hwyidx contains indices from the presorting (the data were originally in alphabetic order) If you ever want to resort the data array and state labels alphabetically, you can sort on the first column of the hwydata matrix, which contains Census Bureau state IDs that ascend in alphabetical order, as follows: [hwydata hwyidx] = sortrows(hwydata,1); statelabel = statelabel(hwyidx); If you do resort the data, to make the graph easier to interpret you might plot it using markers rather than lines. To do this, change the call to plot in section 2, below, to the following: plot(hwydata(:,14),hwydata(:,4),'.') |
Load U.S. state data statistics from the National Transportation Safety Highway Administration and the Bureau of the Census and look at the variables:
load 'accidents.mat' whos Name Size Bytes Class datasources 3x1 2568 cell hwycols 1x1 8 double hwydata 51x17 6936 double hwyheaders 1x17 1874 cell hwyidx 51x1 408 double hwyrows 1x1 8 double statelabel 51x1 3944 cell ushwydata 1x17 136 double uslabel 1x1 86 cell
The data set has 51 observations for 17 variables.
The state-by-state statistics; the double 51-by-17 matrix hwydata
The variable (column) names; the 1-by-17 text cell array hwyheaders
The state names; the 51-by-1 text cell array statelabel
Values for the entire United States for the 17 variables; the 1-by-17 matrix ushwydata
The label for the US values; the 1-by-1 cell array uslabel
Metadata describing data sources; the 3-by-1 cell array datasources
Plot a line graph of the population by state as x versus the number of traffic fatalities per state as y:
hf1 = figure; plot(hwydata(:,14),hwydata(:,4)); xlabel(hwyheaders(14)) ylabel(hwyheaders(4))
Because the state observations are sorted by population size, the graph is monotonic in x. The larger a population a state has, the more variation in traffic accident fatalities it tends to show.
Compute the per capita rate of traffic fatalities for the entire United States; in the next part of this example, the data cursor update function uses this average to compute an expected value for each state you query:
usmean = ushwydata(4)/ushwydata(14) usmean = 1.5150e-004
The statistic shows that nationally, about 150 per 100,0000 people die in traffic accidents every year.
Use usmean to compute the smallest and largest expected values by multiplying it by the smallest and largest state populations, and draw a line connecting them:
line([min(hwydata(:,14)) max(hwydata(:,14))],... [min(hwydata(:,14))*usmean max(hwydata(:,14)*usmean)],... 'Color','m');
Note: The magenta line is not a regression line; it is a trend line that plots the number of traffic deaths that a state of a given size would have if all states obeyed the national average. |
You can now explore the graphed data with the example custom data cursor update function labeldtips (which must be on the MATLAB path or in the current folder). labeldtips displays state names and y-deviations.
Turn on data cursor mode and invoke the custom callback:
hdt = datacursormode; set(hdt,'DisplayStyle','window'); % Declare a custom datatip update function % to display state names: set(hdt,'UpdateFcn',{@labeldtips,hwydata,statelabel,usmean})
The data cursor 'window' display style sends data tip output to a small window that you can move anywhere within the figure. This display style is best suited to data tips that contain more text than just x-, y-, and z-values. The labeldtips callback remains active for that figure until you use set to replace it with another function (or empty, to restore the default data cursor behavior). Click the right-most point on the blue graph.
The data tip shows that California has the largest population and the largest number of traffic fatalities, 4120. However, it had 1012, or 20%, fewer fatalities than predicted by the national average.
The next data point to the left depicts Texas. Click that data point or press the left arrow to show its data tip.
Texas had 3583 fatalities, which is 424 (13%) more than the expected value. To see results from other states, move the data tip by dragging the black square or using the left or right arrow to step it along the graph. If you know a little about U.S. geography, you might observe a pattern.
The ninth column of hwydata, labeled "Fatalities per 100K Licensed Drivers," is related to population. Plot a histogram of this variable to see which states have fewer or more fatalities per driver. To do this, link the plots to their data, and brush either of them.
Open a new figure and plot a histogram of Fatalities per 100K Licensed Drivers in it:
hf2 = figure hist(hwydata(:,9),5) xlabel(hwyheaders(9))
Link both the line graph and the histogram to their data sources in hwydata:
linkdata(hf1) linkdata(hf2)
You can also click the Data Linking tool on the two figures. The first figure links automatically; the histogram does not because linkdata cannot determine with certainty the YDataSource for histograms. The Linked Plot information bar on top of the histogram informs you No Graphics have data sources. Cannot link plot: fix it.
Click fix it to open the Specify Data Source Properties dialog box. Type hwydata(:,9) into the YDataSource edit box and click OK.
The Linked Plot information bar displays the data source you identified. The histogram looks like this.
Now that you have linked both graphs to a common data set, you can brush portions of one to see the effect on the other.
It isn't necessary, but you might want to dock the plots in a figure group so you can see them side by side.
Select the Data Brushing tool on the histogram plot. Brush the three right-most bars in the histogram; they represent higher values that range from 25 to 48 fatalities per 100,000 drivers.
Notice which observations light up on the line graph. Not only are these states with smaller populations, they are also states with above-average numbers of traffic fatalities.
Click the line graph to make it the active figure and select its Data Brushing tool. Click all the observations you can that fall below the straight line average. You need to hold the Shift key down to make multiple selections, whether by clicking or dragging. You might want to zoom in on the left side of the graph to brush properly there. What do you see happening on the histogram?
The hwydata matrix contains geographic location information in the form of latitude-longitude coordinates of a centroid for each state. You can make a crude map by generating a scatter plot of these coordinates, using longitude as x and latitude as y. If you link the scatter plot, you can brush all the plots at once.
To provide a context for the map, plot an outline map of the conterminous United State. Obtain the latitude and longitude coordinates required from the MAT-file usapolygon.mat:
hf3 = figure; load usapolygon patch(uslon,uslat,[1 .9 .8],'Edgecolor','none'); hold on
When projected into the figure. the map is distorted to fit the aspect ratio of the axes.
Map the centroid longitude and latitude as a scatter plot with filled circles. Plot a rectangle over part of the map, as follows:
scatter(hwydata(:,2),hwydata(:,3),36,'b','filled'); xlabel('Longitude') ylabel('Latitude') rectangle('Position',[-115,25,115-77,36-25],... 'EdgeColor',[.75 .75 .75])
The x- and y-limits change, shrinking the map, because the data matrix contains observations for Alaska and Hawaii, but the map outline file does not include these states.
Dock the map underneath the other two figures. Brush the map after turning on the Data Linking and Data Brushing tools for its figure. Drag across the gray rectangle with the Data Brushing tool to highlight just the southeastern and southwestern states. What you see should look like this.
Data brushing and linking reveals that almost all the states with above-average traffic fatality rates are in the southern part of the U.S.
Using graphic data exploration, you have identified some intriguing regularities in this data. However, you have not identified any causes for the patterns you found. That will take more work on with the data, and possibly additional data sets, along with some hypotheses and models.