Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Visualization of Tall Arrays

Visualizing large data sets requires that the data is summarized, binned, or sampled in some way to reduce the number of points that are plotted on the screen. In some cases, functions such as histogram and pie bin the data to reduce the size, while other functions such as plot and scatter use a more complex approach that avoids plotting duplicate pixels on the screen. For problems where the pixel overlap is relevant to the analysis, the binscatter function also offers an efficient way to visualize density patterns.

Visualizing tall arrays does not require the use of gather. MATLAB® immediately evaluates and displays visualizations of tall arrays. Currently, you can visualize tall arrays using the functions in this table.

FunctionRequired ToolboxesNotes
plot

These functions plot in iterations, progressively adding to the plot as more data is read. During the updates, a progress indicator shows the proportion of data that has been plotted. Zooming and panning is supported during the updating process, before the plot is complete. To stop the update process, press the pause button in the progress indicator.

scatter
binscatter
histogram 
histogram2 
pie

For visualizing categorical data only.

binScatterPlot Statistics and Machine Learning Toolbox™

Figure contains a slider to control the brightness and color detail in the image. The slider adjusts the value of the Gamma image correction parameter.

ksdensity Statistics and Machine Learning Toolbox

Produces a probability density estimate for the data, evaluated at 100 points for univariate data, or 900 points for bivariate data.

datasample Statistics and Machine Learning Toolbox

datasample allows greater control over subsampling your data in a statistically sound way compared to simple indexing.

Tall Array Plotting Examples

This example shows several different ways you can visualize tall arrays.

Create a datastore for the airlinesmall.csv data set, which contains rows of airline flight data. Select a subset of the table variables to work with and remove rows that contain missing values.

ds = datastore('airlinesmall.csv','TreatAsMissing','NA');
ds.SelectedVariableNames = {'Year','Month','ArrDelay','DepDelay','Origin','Dest'};
T = tall(ds);
T = rmmissing(T)
T =

  Mx6 tall table

    Year    Month    ArrDelay    DepDelay    Origin    Dest 
    ____    _____    ________    ________    ______    _____

    1987    10        8          12          'LAX'     'SJC'
    1987    10        8           1          'SJC'     'BUR'
    1987    10       21          20          'SAN'     'SMF'
    1987    10       13          12          'BUR'     'SJC'
    1987    10        4          -1          'SMF'     'LAX'
    1987    10       59          63          'LAX'     'SJC'
    1987    10        3          -2          'SAN'     'SFO'
    1987    10       11          -1          'SEA'     'LAX'
    :       :        :           :           :         :
    :       :        :           :           :         :

Pie Chart of Flights by Month

Convert the numeric Month variable into a categorical variable that reflects the name of the month. Then plot a pie chart showing how many flights are in the data for each month of the year.

T.Month = categorical(T.Month,1:12,{'Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'})
T =

  Mx6 tall table

    Year    Month    ArrDelay    DepDelay    Origin    Dest 
    ____    _____    ________    ________    ______    _____

    1987    Oct       8          12          'LAX'     'SJC'
    1987    Oct       8           1          'SJC'     'BUR'
    1987    Oct      21          20          'SAN'     'SMF'
    1987    Oct      13          12          'BUR'     'SJC'
    1987    Oct       4          -1          'SMF'     'LAX'
    1987    Oct      59          63          'LAX'     'SJC'
    1987    Oct       3          -2          'SAN'     'SFO'
    1987    Oct      11          -1          'SEA'     'LAX'
    :       :        :           :           :         :
    :       :        :           :           :         :
pie(T.Month)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 2: Completed in 0 sec
- Pass 2 of 2: Completed in 1 sec
Evaluation completed in 2 sec

Histogram of Delays

Plot a histogram of the arrival delays for each flight in the data. Since the data has a long tail, limit the plotting area using the BinLimits name-value pair.

histogram(T.ArrDelay,'BinLimits',[-50 150])
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 2: Completed in 3 sec
- Pass 2 of 2: Completed in 1 sec
Evaluation completed in 5 sec

Scatter Plot of Delays

Plot a scatter plot of arrival and departure delays. You can expect a strong correlation between these variables since flights that leave late are also likely to arrive late.

When operating on tall arrays, the plot, scatter, and binscatter functions plot the data in iterations, progressively adding to the plot as more data is read. During the updates the top of the plot has a progress indicator showing how much data has been plotted. Zooming and panning is supported during the updates before the plot is complete.

scatter(T.ArrDelay,T.DepDelay)
xlabel('Arrival Delay')
ylabel('Departure Delay')
xlim([-140 1000])
ylim([-140 1000])

The progress bar also includes a Pause/Resume button. Use the button to stop the plot updates early once enough data is displayed.

Fit Trend Line

Use the polyfit and polyval functions to overlay a linear trend line on the plot of arrival and departure delays.

hold on
p = polyfit(T.ArrDelay,T.DepDelay,1);
x = (-140:1000)';
yp = polyval(p,x);
plot(tall(x),yp,'r-')
hold off

Visualize Density

The scatter plot of points is helpful up to a certain point, but it can be hard to decipher information from the plot if the points overlap extensively. In that case, it helps to visualize the density of points in the plot to spot trends.

Use the binscatter function to visualize the density of points in the plot of arrival and departure delays.

binscatter(T.ArrDelay,T.DepDelay,'XLimits',[-100 1000],'YLimits',[-100 1000])
xlim([-100 1000])
ylim([-100 1000])
xlabel('Arrival Delay')
ylabel('Departure Delay')

Adjust the CLim property of the axes so that all bin values greater than 150 are colored the same. This prevents a few bins with very large values from dominating the plot.

ax = gca;
ax.CLim = [0 150];

See Also

| |

Related Topics

Was this topic helpful?