Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To brush data in the Variables editor, link the figure windows
associated with variable. Then right-click on a cell in the Variables
editor and select **Brushing** > **Brushing on** in the context
menu. Select one or more cells to brush elements in the variable.
The corresponding points on your plots highlight simultaneously.

You can brush observations that appear in multiple linked plots
at the same time. You can do this only when your observations are
in a matrix with the plot variables running along separate columns.
For example, you can create two separate plots of observations in
a matrix called `data`

, which contains system response
measurements at 50 different (*x*, *y*)
points. The first column, `data(:,1)`

, contains the *x*-coordinates, `data(:,2)`

contains *y*-coordinates,
and `data(:,3)`

contains the measured response at
each point. The left plot below shows the response versus *x*.
The plot on the right shows the response versus *y*.
If you brush a point in one plot, the corresponding point in the other
plot highlights at the same time. Furthermore, if you have the Variables
editor open, the corresponding data row is highlighted whenever you
brush a point.

For more
information about the using the Variables editor, see the `openvar`

reference page.

A data tip is a small display associated with an axes that reads
out individual data observation values from a 2-D or 3-D graph. You
create data tips by mouse clicks on graphs using the **Data
Cursor tool** from the figure toolbar.
When you select this tool, you are in data cursor mode—signified
by a hollow cross-hair cursor—in which you identify *x*-, *y*-,
and *z*-values of data points you click. Like data
points you brush, export such values to the workspace.

For descriptions of data cursor properties and how to use them, see

Display Data Values Interactively and Data Cursors with Histograms

The MATLAB

^{®}function reference page for`datacursormode`

The default behavior of data tips is to simply display the `XData`

, `YData`

,
and `ZData`

values of the selected observations
as text in a box. Sometimes this information is not helpful by itself,
and you might want to replace or augment it with other information.
You can modify this behavior to display other facts connected to observations.
You customize data tip behavior by constructing a data tip text update
function (in MATLAB code) to construct text for display in data
tips and then instructing data cursor mode to use your function instead
of the default one.

Customize data cursor update functions to display information such as

Names associated with

*x*-,*y*-, and*z*-valuesWeights associated with

*x*-,*y*-, and*z*-valuesDifferences in

*x*-,*y*-, and*z*-values from the mean or their neighborsTransformations of values (e.g., normalizations or to different units of measure)

Related variables

You can create data tip text update functions to display such information and change their behavior on the fly. You can even make the update function behave differently for distinct observations in the same graph if your update function or the code calling it can distinguish groups of them. The next section contains an example of coding and using a customized data cursor update function.

The extended example that follows begins by using data tips to explore the incidence of fatal traffic accidents tabulated for U.S. states, with respect to state populations. The example extends this analysis to brush, link, and map the data to discover spatial patterns in the data. Each section of the example has four or fewer steps. By executing them all, you gain insight into the data set and become familiar with useful graphical data exploration techniques.

Censuses of population and other national government statistics are valuable sources of demographic and socioeconomic data. An important aspect of census data is its geography, i.e., the regions to which a given set of statistics applies, and at what level of granularity. When exploring census data, you frequently need to identify what geographic unit any given observation represents.

This example uses data tips to show place names and statistics
for individual observations. You pass place names and the data matrix
to a custom text update function to enable this. The place names are
for U.S. states and the District of Columbia. If all these names were
placed as labels on the *x*-axis, they would be too
small or too crowded to be legible, but they are readable one at a
time as data tips.

The example also illustrates how sorting a data matrix by rows can enhance interpretation when the original ordering (in this case alphabetical by state) provides no special insight into relationships among observations and variables.

Data tips can present other information beyond *x*-, *y*-
and *z*-values. Read through the example function `labeldtips`

,
which takes three more parameters than a default callback, and displays
the following information:

Its

*y*-valueDeviation from an expected

*y*-valuePercent deviation from the expected

*y*-valueThe observation's label (state name)

Because it customizes data tips, the function must
be a code file that you invoke from the Command Window or from a script.
This file, `labeldtips.m`

, and the MAT-files `accidents.mat`

and `usapolygon.mat`

that
the following examples also use, exist on the MATLAB path. Here
is the code for the `labeldtips`

data cursor callback
function.

function output_txt = labeldtips(obj,event_obj,... xydata,labels,xymean) % Display an observation's Y-data and label for a data tip % obj Currently not used (empty) % event_obj Handle to event object % xydata Entire data matrix % labels State names identifying matrix row % xymean Ratio of y to x mean (avg. for all obs.) % output_txt Datatip text (character vector or cell array % of character vectors) % This datacursor callback calculates a deviation from the % expected value and displays it, Y, and a label taken % from the cell array 'labels'; the data matrix is needed % to determine the index of the x-value for looking up the % label for that row. X values could be output, but are not. pos = get(event_obj,'Position'); x = pos(1); y = pos(2); output_txt = {['Y: ',num2str(y,4)]}; ydev = round((y - x*xymean)); ypct = round((100 * ydev) / (x*xymean)); output_txt{end+1} = ['Yobs-Yexp: ' num2str(ydev) ... '; Pct. dev: ' num2str(ypct)]; idx = find(xydata == x,1); % Find index to retrieve obs. name % The find is reliable only if there are no duplicate x values [row,col] = ind2sub(size(xydata),idx); output_txt{end+1} = cell2mat(labels(row));

The portion of the example called Explore the Graph with the Custom Data Cursor sets up data cursor mode and declares this function as a callback using the following code:

hdt = datacursormode; set(hdt,'UpdateFcn',{@labeldtips,hwydata,statelabel,usmean})

`datacursormode`

puts
the current figure in data cursor mode. `hdt`

is
the handle of a data cursor mode object for the figure you want to
explore. The function name and its three formal arguments are a cell
array.The following steps show how you load statistical data for U.S. states, plot some of it, and enter data cursor mode to explore the data:

To help you interpret graphs created in this example, the `hwydata`

data
matrix and its row labels have been presorted by rows to be in ascending
order by total state population. The 51-by-1 vector `hwyidx`

contains
indices from the presorting (the data were originally in alphabetic
order)

If you ever want to resort the data array and state labels alphabetically,
you can sort on the first column of the `hwydata`

matrix,
which contains Census Bureau state IDs that ascend in alphabetical
order, as follows:

[hwydata hwyidx] = sortrows(hwydata,1); statelabel = statelabel(hwyidx);

If you do resort the data, to make the graph easier to interpret
you might plot it using markers rather than lines. To do this, change
the call to `plot`

in section 2, below, to the
following:

plot(hwydata(:,14),hwydata(:,4),'.')

Load U.S. state data statistics from the National Transportation Safety Highway Administration and the Bureau of the Census and look at the variables:

The data set has 51 observations for 17 variables.load 'accidents.mat' whos Name Size Bytes Class datasources 3x1 2568 cell hwycols 1x1 8 double hwydata 51x17 6936 double hwyheaders 1x17 1874 cell hwyidx 51x1 408 double hwyrows 1x1 8 double statelabel 51x1 3944 cell ushwydata 1x17 136 double uslabel 1x1 86 cell

The state-by-state statistics; the double 51-by-17 matrix

`hwydata`

The variable (column) names; the 1-by-17 text cell array

`hwyheaders`

The state names; the 51-by-1 text cell array

`statelabel`

Values for the entire United States for the 17 variables; the 1-by-17 matrix

`ushwydata`

The label for the US values; the 1-by-1 cell array

`uslabel`

Metadata describing data sources; the 3-by-1 cell array

`datasources`

Plot a line graph of the population by state as

*x*versus the number of traffic fatalities per state as*y*:Because the state observations are sorted by population size, the graph is monotonic inhf1 = figure; plot(hwydata(:,14),hwydata(:,4)); xlabel(hwyheaders(14)) ylabel(hwyheaders(4))

*x*. The larger a population a state has, the more variation in traffic accident fatalities it tends to show.Compute the per capita rate of traffic fatalities for the entire United States; in the next part of this example, the data cursor update function uses this average to compute an expected value for each state you query:

The statistic shows that nationally, about 150 per 100,0000 people die in traffic accidents every year.usmean = ushwydata(4)/ushwydata(14) usmean = 1.5150e-004

Use

`usmean`

to compute the smallest and largest expected values by multiplying it by the smallest and largest state populations, and draw a line connecting them:line([min(hwydata(:,14)) max(hwydata(:,14))],... [min(hwydata(:,14))*usmean max(hwydata(:,14)*usmean)],... 'Color','m');

The magenta line is not a regression line; it is a trend line that plots the number of traffic deaths that a state of a given size would have if all states obeyed the national average.

You can now explore the graphed data with the example
custom data cursor update function `labeldtips`

(which
must be on the MATLAB path or in the current folder). `labeldtips`

displays
state names and y-deviations.

Turn on data cursor mode and invoke the custom callback:

The data cursorhdt = datacursormode; set(hdt,'DisplayStyle','window'); % Declare a custom datatip update function % to display state names: set(hdt,'UpdateFcn',{@labeldtips,hwydata,statelabel,usmean})

`'window'`

display style sends data tip output to a small window that you can move anywhere within the figure. This display style is best suited to data tips that contain more text than just*x-*,*y-*, and*z*-values. The`labeldtips`

callback remains active for that figure until you use`set`

to replace it with another function (or empty, to restore the default data cursor behavior). Click the right-most point on the blue graph.The data tip shows that California has the largest population and the largest number of traffic fatalities, 4120. However, it had 1012, or 20%, fewer fatalities than predicted by the national average. The next data point to the left depicts Texas. Click that data point or press the left arrow to show its data tip. To see results from other states, move the data tip by dragging the black square or using the left or right arrow to step it along the graph. If you know a little about U.S. geography, you might observe a pattern.

The ninth column of `hwydata`

, labeled "Fatalities
per 100K Licensed Drivers,” is related to population. Plot
a histogram of this variable to see which states have fewer or more
fatalities per driver. To do this, link the plots to their data, and
brush either of them.

Open a new figure and plot a histogram of Fatalities per 100K Licensed Drivers using five bins:

hf2 = figure histogram(hwydata(:,9),5) xlabel(hwyheaders(9))

Link both the line graph and the histogram to their data sources in

`hwydata`

:You can also click thelinkdata(hf1) linkdata(hf2)

**Link Plot**tool on either figure. The first figure links automatically, however the histogram does not because`linkdata`

cannot determine with certainty the`YDataSource`

for histograms. The Linked Plot information bar on top of the histogram informs you**No Graphics have data sources. Cannot link plot: fix it**.Click

**fix it**to open the Specify Data Source Properties dialog box. Type`hwydata(:,9)`

into the`YDataSource`

edit box and click**OK**.The Linked Plot information bar displays the data source you identified.

Now that you have linked both graphs to a common data set, you can brush portions of one to see the effect on the other.

Arrange both figures on your computer screen so that they are simultaneously visible.

Select the Data Brushing tool on the histogram plot. Brush the three right-most bars in the histogram by clicking the third bar and dragging to the right over the last two bars. The brushed bars represent the number of fatalities ranging from 25 to 52 per 100,000 drivers.

Notice the data points that are automatically highlighted on the line graph. These points generally correspond to the states with smaller populations, but above-average traffic fatality totals.

Click the line graph to make it the active figure and select its Data Brushing tool. Select all of the points falling

*below*the straight-line average by holding the**Shift**key and either clicking or dragging your selections. It might be helpful to zoom in on the graph to do this. What do you see happening on the histogram?

The `hwydata`

matrix contains geographic location
information in the form of latitude-longitude coordinates of a centroid
for each state. You can make a crude map by generating a scatter plot
of these coordinates, using longitude as *x* and
latitude as *y*. If you link the scatter plot, you
can brush all the plots at once.

To provide a context for the map, plot an outline map of the conterminous United States. Obtain the latitude and longitude coordinates required from the MAT-file

`usapolygon.mat`

:hf3 = figure; load usapolygon patch(uslon,uslat,[1 .9 .8],'Edgecolor','none'); hold on

Map the centroid longitude and latitude as a scatter plot with filled circles. Plot a rectangle over part of the map, as follows:

scatter(hwydata(:,2),hwydata(:,3),36,'b','filled'); xlabel('Longitude') ylabel('Latitude') rectangle('Position',[-115,25,115-77,36-25],... 'EdgeColor',[.75 .75 .75])

The

*x*- and*y*-limits change, shrinking the map, because the data matrix contains observations for Alaska and Hawaii, but the map outline file does not include these states.Align the map alongside the other two figures. Brush the map after turning on the Data Linking and Data Brushing tools. Drag across the gray rectangle with the Data Brushing tool to highlight the southern-most states. The result should look like this:

Data brushing and linking reveals that most of the states with above-average traffic fatality totals are in the southern part of the U.S.

Using graphic data exploration, you have identified some intriguing regularities in this data. However, you have not identified any causes for the patterns you found. That will take more work with the data, and possibly additional data sets, along with some hypotheses and models.

Was this topic helpful?