Model Data Using the Distribution Fitting App

The Distribution Fitting app fits univariate distributions to data. This section describes how to use the Distribution Fitting app and covers the following topics:

Opening the Distribution Fitting App

Open the Distribution Fitting app by entering:

dfittool

Alternatively, click Distribution Fitting on the Apps tab.

Adjusting the Plot

Buttons at the top of the tool allow you to adjust the plot displayed in this window:

  • — Toggle the legend on (default) or off.

  • — Toggle grid lines on or off (default).

  • — Restore default axes limits.

Displaying the Data

The Display type field specifies the type of plot displayed in the main window. Each type corresponds to a probability function, for example, a probability density function. The following display types are available:

  • Density (PDF) — Display a probability density function (PDF) plot for the fitted distribution.

  • Cumulative probability (CDF) — Display a cumulative probability plot of the data.

  • Quantile (inverse CDF) — Display a quantile (inverse CDF) plot.

  • Probability plot — Display a probability plot.

  • Survivor function — Display a survivor function plot of the data.

  • Cumulative hazard — Display a cumulative hazard plot of the data.

Inputting and Fitting Data

The task buttons enable you to perform the tasks necessary to fit distributions to data. Each button opens a new dialog box in which you perform the task. The buttons include:

  • Data — Import and manage data sets.

  • New Fit — Create new fits.

  • Manage Fits — Manage existing fits.

  • Evaluate — Evaluate fits at any points you choose.

  • Exclude — Create rules specifying which values to exclude when fitting a distribution.

The display pane displays plots of the data sets and fits you create. Whenever you make changes in one of the dialog boxes, the results in the display pane update.

Saving and Customizing Distributions

The Distribution Fitting app menus contain items that enable you to do the following:

  • Save and load sessions.

  • Generate a file with which you can fit distributions to data and plot the results independently of the Distribution Fitting app.

  • Define and import custom distributions.

Creating and Managing Data Sets

This section describes how to create and manage data sets.

To begin, click the Data button in the Distribution Fitting app to open the Data dialog box shown in the following figure.

Importing Data

The Import workspace vectors pane enables you to create a data set by importing a vector from the MATLAB® workspace. The following sections describe the fields in this pane and give appropriate values for vectors imported from the MATLAB workspace:

  • Data — The drop-down list in the Data field contains the names of all matrices and vectors, other than 1-by-1 matrices (scalars) in the MATLAB workspace. Select the array containing the data you want to fit. The actual data you import must be a vector. If you select a matrix in the Data field, the first column of the matrix is imported by default. To select a different column or row of the matrix, click Select Column or Row. This displays the matrix in the Variables editor, where you can select a row or column by highlighting it with the mouse.

    Alternatively, you can enter any valid MATLAB expression in the Data field.

    When you select a vector in the Data field, a histogram of the data appears in the Data preview pane.

  • Censoring — If some of the points in the data set are censored, enter a Boolean vector, of the same size as the data vector, specifying the censored entries of the data. A 1 in the censoring vector specifies that the corresponding entry of the data vector is censored, while a 0 specifies that the entry is not censored. If you enter a matrix, you can select a column or row by clicking Select Column or Row. If you do not want to censor any data, leave the Censoring field blank.

  • Frequency — Enter a vector of positive integers of the same size as the data vector to specify the frequency of the corresponding entries of the data vector. For example, a value of 7 in the 15th entry of frequency vector specifies that there are 7 data points corresponding to the value in the 15th entry of the data vector. If all entries of the data vector have frequency 1, leave the Frequency field blank.

  • Data set name — Enter a name for the data set you import from the workspace, such as My data.

After you have entered the information in the preceding fields, click Create Data Set to create the data set My data.

Managing Data Sets

The Manage data sets pane enables you to view and manage the data sets you create. When you create a data set, its name appears in the Data sets list. The following figure shows the Manage data sets pane after creating the data set My data.

For each data set in the Data sets list, you can:

  • Select the Plot check box to display a plot of the data in the main Distribution Fitting app window. When you create a new data set, Plot is selected by default. Clearing the Plot check box removes the data from the plot in the main window. You can specify the type of plot displayed in the Display type field in the main window.

  • If Plot is selected, you can also select Bounds to display confidence interval bounds for the plot in the main window. These bounds are pointwise confidence bounds around the empirical estimates of these functions. The bounds are only displayed when you set Display Type in the main window to one of the following:

    • Cumulative probability (CDF)

    • Survivor function

    • Cumulative hazard

The Distribution Fitting app cannot display confidence bounds on density (PDF), quantile (inverse CDF), or probability plots. Clearing the Bounds check box removes the confidence bounds from the plot in the main window.

When you select a data set from the list, the following buttons are enabled:

  • View — Display the data in a table in a new window.

  • Set Bin Rules — Defines the histogram bins used in a density (PDF) plot.

  • Rename — Rename the data set.

  • Delete — Delete the data set.

Setting Bin Rules

To set bin rules for the histogram of a data set, click Set Bin Rules. This opens the Set Bin Width Rules dialog box.

You can select from the following rules:

  • Freedman-Diaconis rule — Algorithm that chooses bin widths and locations automatically, based on the sample size and the spread of the data. This rule, which is the default, is suitable for many kinds of data.

  • Scott rule — Algorithm intended for data that are approximately normal. The algorithm chooses bin widths and locations automatically.

  • Number of bins — Enter the number of bins. All bins have equal widths.

  • Bins centered on integers — Specifies bins centered on integers.

  • Bin width — Enter the width of each bin. If you select this option, you can also select:

    • Automatic bin placement — Place the edges of the bins at integer multiples of the Bin width.

    • Bin boundary at — Enter a scalar to specify the boundaries of the bins. The boundary of each bin is equal to this scalar plus an integer multiple of the Bin width.

The Set Bin Width Rules dialog box also provides the following options:

  • Apply to all existing data sets — Apply the rule to all data sets. Otherwise, the rule is only applied to the data set currently selected in the Data dialog box.

  • Save as default — Apply the current rule to any new data sets that you create. You can also set default bin width rules by selecting Set Default Bin Rules from the Tools menu in the main window.

Creating a New Fit

This section describes how to create a new fit. To begin, click the New Fit button at the top of the main window to open the New Fit dialog box. If you created the data set My data, it appears in the Data field.

Field NameDescription
Fit NameEnter a name for the fit in the Fit Name field.
Data

The Data field contains a drop-down list of the data sets you have created. Select the data set to which you want to fit a distribution.

Distribution

Select the type of distribution to fit from the Distribution drop-down list.

Only the distributions that apply to the values of the selected data set appear in the Distribution field. For example, positive distributions are not displayed when the data include values that are zero or negative.

You can specify either a parametric or a nonparametric distribution. When you select a parametric distribution from the drop-down list, a description of its parameters appears in the Normal pane. The Distribution Fitting Tool estimates these parameters to fit the distribution to the data set. When you select Nonparametric fit, options for the fit appear in the pane, as described in Further Options for Nonparametric Fits.

Exclusion ruleSpecify a rule to exclude some data in the Exclusion rule field. Create an exclusion rule by clicking Exclude in the Distribution Fitting app. For more information, see Excluding Data.

Apply the New Fit

Click Apply to fit the distribution. For a parametric fit, the Results pane displays the values of the estimated parameters. For a nonparametric fit, the Results pane displays information about the fit.

When you click Apply, the Distribution Fitting app displays a plot of the distribution, along with the corresponding data.

    Note   When you click Apply, the title of the dialog box changes to Edit Fit. You can now make changes to the fit you just created and click Apply again to save them. After closing the Edit Fit dialog box, you can reopen it from the Fit Manager dialog box at any time to edit the fit.

After applying the fit, you can save the information to the workspace using probability distribution objects by clicking Save to workspace.

Available Distributions

Most, but not all, of the distributions available in the Distribution Fitting app are supported elsewhere in Statistics Toolbox™ software, and have dedicated distribution fitting functions. These functions compute the majority of the fits in the Distribution Fitting app, and are referenced in the list below. Other fits are computed using functions internal to the Distribution Fitting app.

Not all of the distributions listed below are available for all data sets. The Distribution Fitting app determines the extent of the data (nonnegative, unit interval, etc.) and displays appropriate distributions in the Distribution drop-down list. Distribution data ranges are given parenthetically in the list below.

Further Options for Nonparametric Fits

When you select Non-parametric in the Distribution field, a set of options appears in the Non-parametric pane, as shown in the following figure.

The options for nonparametric distributions are:

  • Kernel — Type of kernel function to use.

    • Normal

    • Box

    • Triangle

    • Epanechnikov

  • Bandwidth — The bandwidth of the kernel smoothing window. Select Auto for a default value that is optimal for estimating normal densities. This value appears in the Fit results pane after you click Apply. Select Specify and enter a smaller value to reveal features such as multiple modes or a larger value to make the fit smoother.

  • Domain — The allowed x-values for the density.

    • Unbounded — The density extends over the whole real line.

    • Positive — The density is restricted to positive values.

    • Specify — Enter lower and upper bounds for the domain of the density.

    When you select Positive or Specify, the nonparametric fit has zero probability outside the specified domain.

Displaying Results

This section explains the different ways to display results in the Distribution Fitting app window. This window displays plots of:

  • The data sets for which you select Plot in the Data dialog box

  • The fits for which you select Plot in the Fit Manager dialog box

  • Confidence bounds for:

    • Data sets for which you select Bounds in the Data dialog box

    • Fits for which you select Bounds in the Fit Manager dialog box

The following fields are available.

Display Type

The Display Type field in the main window specifies the type of plot displayed. Each type corresponds to a probability function, for example, a probability density function. The following display types are available:

  • Density (PDF) — Display a probability density function (PDF) plot for the fitted distribution. The main window displays data sets using a probability histogram, in which the height of each rectangle is the fraction of data points that lie in the bin divided by the width of the bin. This makes the sum of the areas of the rectangles equal to 1.

  • Cumulative probability (CDF) — Display a cumulative probability plot of the data. The main window displays data sets using a cumulative probability step function. The height of each step is the cumulative sum of the heights of the rectangles in the probability histogram.

  • Quantile (inverse CDF) — Display a quantile (inverse CDF) plot.

  • Probability plot — Display a probability plot of the data. You can specify the type of distribution used to construct the probability plot in the Distribution field, which is only available when you select Probability plot. The choices for the distribution are:

    • Exponential

    • Extreme value

    • Logistic

    • Log-Logistic

    • Lognormal

    • Normal

    • Rayleigh

    • Weibull

    In addition to these choices, you can create a probability plot against a parametric fit that you create in the New Fit pane. These fits are added at the bottom of the Distribution drop-down list when you create them.

  • Survivor function — Display survivor function plot of the data.

  • Cumulative hazard — Display cumulative hazard plot of the data.

      Note   Some distributions are unavailable if the plotted data includes 0 or negative values.

Confidence Bounds

You can display confidence bounds for data sets and fits when you set Display Type to Cumulative probability (CDF), Survivor function, Cumulative hazard, or, for fits only, Quantile (inverse CDF).

  • To display bounds for a data set, select Bounds next to the data set in the Data sets pane of the Data dialog box.

  • To display bounds for a fit, select Bounds next to the fit in the Fit Manager dialog box. Confidence bounds are not available for all fit types.

To set the confidence level for the bounds, select Confidence Level from the View menu in the main window and choose from the options.

Managing Fits

This section describes how to manage fits that you have created. To begin, click the Manage Fits button in the Distribution Fitting app. This opens the Fit Manager dialog box as shown in the following figure.

The Table of fits displays a list of the fits you create, with the following options:

  • Plot — Select Plot to display a plot of the fit in the main window of the Distribution Fitting app. When you create a new fit, Plot is selected by default. Clearing the Plot check box removes the fit from the plot in the main window.

  • Bounds — If Plot is selected, you can also select Bounds to display confidence bounds in the plot. The bounds are displayed when you set Display Type in the main window to one of the following:

    • Cumulative probability (CDF)

    • Quantile (inverse CDF)

    • Survivor function

    • Cumulative hazard

    The Distribution Fitting app cannot display confidence bounds on density (PDF) or probability plots. In addition, bounds are not supported for nonparametric fits and some parametric fits.

    Clearing the Bounds check box removes the confidence intervals from the plot in the main window.

    When you select a fit in the Table of fits, the following buttons are enabled below the table:

    • New Fit — Open a New Fit window.

    • Copy — Create a copy of the selected fit.

    • Edit — Open an Edit Fit dialog box, where you can edit the fit.

        Note   You can only edit the currently selected fit in the Edit Fit dialog box. To edit a different fit, select it in the Table of fits and click Edit to open another Edit Fit dialog box.

    • Save to workspace — Save the selected fit as a distribution object.

    • Delete — Delete the selected fit.

Evaluating Fits

The Evaluate dialog box enables you to evaluate any fit at whatever points you choose. To open the dialog box, click the Evaluate button in the Distribution Fitting app. The following figure shows the Evaluate dialog box.

The Evaluate dialog box contains the following items:

  • Fit pane — Display the names of existing fits. Select one or more fits that you want to evaluate. Using your platform specific functionality, you can select multiple fits.

  • Function — Select the type of probability function you want to evaluate for the fit. The available functions are

    • Density (PDF) — Computes a probability density function.

    • Cumulative probability (CDF) — Computes a cumulative probability function.

    • Quantile (inverse CDF) — Computes a quantile (inverse CDF) function.

    • Survivor function — Computes a survivor function.

    • Cumulative hazard — Computes a cumulative hazard function.

    • Hazard rate — Computes the hazard rate.

  • At x = — Enter a vector of points or the name of a workspace variable containing a vector of points at which you want to evaluate the distribution function. If you change Function to Quantile (inverse CDF), the field name changes to At p = and you enter a vector of probability values.

  • Compute confidence bounds — Select this box to compute confidence bounds for the selected fits. The check box is only enabled if you set Function to one of the following:

    • Cumulative probability (CDF)

    • Quantile (inverse CDF)

    • Survivor function

    • Cumulative hazard

    The Distribution Fitting app cannot compute confidence bounds for nonparametric fits and for some parametric fits. In these cases, it returns NaN for the bounds.

  • Level — Set the level for the confidence bounds.

  • Plot function — Select this box to display a plot of the distribution function, evaluated at the points you enter in the At x = field, in a new window.

      Note   The settings for Compute confidence bounds, Level, and Plot function do not affect the plots that are displayed in the main window of the Distribution Fitting app. The settings only apply to plots you create by clicking Plot function in the Evaluate window.

Click Apply to apply these settings to the selected fit. The following figure shows the results of evaluating the cumulative density function for the fit My fit, at the points in the vector -4:1:6.

The window displays the following values in the columns of the table to the right of the Fit pane:

  • X — The entries of the vector you enter in At x = field

  • F(X)— The corresponding values of the CDF at the entries of X

  • LB — The lower bounds for the confidence interval, if you select Compute confidence bounds

  • UB — The upper bounds for the confidence interval, if you select Compute confidence bounds

To save the data displayed in the Evaluate window, click Export to Workspace. This saves the values in the table to a matrix in the MATLAB workspace.

Excluding Data

To exclude values from fit, click the Exclude button in the main window of the Distribution Fitting app. This opens the Exclude window, in which you can create rules for excluding specified values. You can use these rules to exclude data when you create a new fit in the New Fit window. The following figure shows the Exclude window.

To create an exclusion rule:

  1. Exclusion Rule Name—Enter a name for the exclusion rule in the Exclusion rule name field.

  2. Exclude Sections—In the Exclude sections pane, you can specify bounds for the excluded data:

    • In the Lower limit: exclude Y drop-down list, select <= or < from the drop-down list and enter a scalar in the field to the right. This excludes values that are either less than or equal to or less than that scalar, respectively.

    • In the Upper limit: exclude Y drop-down list, select >= or > from the drop-down list and enter a scalar in the field to the right to exclude values that are either greater than or equal to or greater than the scalar, respectively.

    OR

    Exclude Graphically—The Exclude Graphically button enables you to define the exclusion rule by displaying a plot of the values in a data set and selecting the bounds for the excluded data with the mouse. For example, if you created the data set My data, described in Creating and Managing Data Sets, select it from the drop-down list next to Exclude graphically and then click the Exclude graphically button. This displays the values in My data in a new window as shown in the following figure.

    To set a lower limit for the boundary of the excluded region, click Add Lower Limit. This displays a vertical line on the left side of the plot window. Move the line with the mouse to the point you where you want the lower limit, as shown in the following figure.

    Moving the vertical line changes the value displayed in the Lower limit: exclude data field in the Exclude window, as shown in the following figure.

    The value displayed corresponds to the x-coordinate of the vertical line.

    Similarly, you can set the upper limit for the boundary of the excluded region by clicking Add Upper Limit and moving the vertical line that appears at the right side of the plot window. After setting the lower and upper limits, click Close and return to the Exclude window.

  3. Create Exclusion Rule—Once you have set the lower and upper limits for the boundary of the excluded data, click Create Exclusion Rule to create the new rule. The name of the new rule now appears in the Existing exclusion rules pane.

    When you select an exclusion rule in the Existing exclusion rules pane, the following buttons are enabled:

    • Copy — Creates a copy of the rule, which you can then modify. To save the modified rule under a different name, click Create Exclusion Rule.

    • View — Opens a new window in which you can see which data points are excluded by the rule. The following figure shows a typical example.

      The shaded areas in the plot graphically display which data points are excluded. The table to the right lists all data points. The shaded rows indicate excluded points:

    • Rename — Renames the rule

    • Delete — Deletes the rule

    Once you define an exclusion rule, you can use it when you fit a distribution to your data. The rule does not exclude points from the display of the data set.

Saving and Loading Sessions

This section explains how to save your work in the current Distribution Fitting app session and then load it in a subsequent session, so that you can continue working where you left off.

Saving a Session

To save the current session, select Save Session from the File menu in the main window. This opens a dialog box that prompts you to enter a filename, such as my_session.dfit, for the session. Clicking Save saves the following items created in the current session:

  • Data sets

  • Fits

  • Exclusion rules

  • Plot settings

  • Bin width rules

Loading a Session

To load a previously saved session, select Load Session from the File menu in the main window and enter the name of a previously saved session. Clicking Open restores the information from the saved session to the current session of the Distribution Fitting app.

Example: Fitting a Distribution

This section presents an example that illustrates how to use the Distribution Fitting app. The example involves the following steps:

Step 1: Generate Random Data

To try the example, first generate some random data to which you will fit a distribution. The following command generates a vector data, of length 100, whose entries are random numbers from a normal distribution with mean.36 and standard deviation 1.4.

rng('default')
data = normrnd(.36, 1.4, 100, 1);

Step 2: Import Data

Open the distribution fitting tool:

dfittool

To import the vector data into the Distribution Fitting app, click the Data button in main window. This opens the window shown in the following figure.

The Data field displays all numeric arrays in the MATLAB workspace. Select data from the drop-down list, as shown in the following figure.

This displays a histogram of the data in the Data preview pane.

In the Data set name field, type a name for the data set, such as My data, and click Create Data Set to create the data set. The main window of the Distribution Fitting app now displays a larger version of the histogram in the Data preview pane, as shown in the following figure.

    Note   Because the example uses random data, you might see a slightly different histogram if you try this example for yourself.

Step 3: Create a New Fit

To fit a distribution to the data, click New Fit in the main window of the Distribution Fitting app. This opens the window shown in the following figure.

To fit a normal distribution, the default entry of the Distribution field, to My data:

  1. Enter a name for the fit, such as My fit, in the Fit name field.

  2. Select My data from the drop-down list in the Data field.

  3. Click Apply.

The Results pane displays the mean and standard deviation of the normal distribution that best fits My data, as shown in the following figure.

The main window of the Distribution Fitting app displays a plot of the normal distribution with this mean and standard deviation, as shown in the following figure.

Generating a File to Fit and Plot Distributions

The Generate Code option in the File menu enables you to create a file that

  • Fits the distributions used in the current session to any data vector in the MATLAB workspace.

  • Plots the data and the fits.

After you end the current session, you can use the file to create plots in a standard MATLAB figure window, without having to reopen the Distribution Fitting app.

As an example, assuming you created the fit described in Creating a New Fit, do the following steps:

  1. Select Generate Code from the File menu.

  2. Choose File > Save as in the MATLAB Editor window. Save the file as normal_fit.m in a folder on the MATLAB path.

You can then apply the function normal_fit to any vector of data in the MATLAB workspace. For example, the following commands

new_data = normrnd(4.1, 12.5, 100, 1);
newfit = normal_fit(new_data)
legend('New Data', 'My fit')

generate newfit, a fitted normal distribution of the data, and generates a plot of the data and the fit.

newfit = 

normal distribution

    mu = 3.19148
    sigma = 12.5631

    Note   By default, the file labels the data in the legend using the same name as the data set in the Distribution Fitting app. You can change the label using the legend command, as illustrated by the preceding example.

Was this topic helpful?