The Statistical Reference Datasets Project, maintained by staff of the Statistical Engineering Division within the IT Laboratory of the National Institute of Standards and Technology, is a collection of datasets that have been made publicly available on the World Wide Web, with the purpose of providing benchmark applications for testing statistical software.
The datasets are maintained by NIST, a US federal government agency, and they have confirmed to me that this puts the data itself entirely within the public domain.
With this in mind, for convenience and to provide a useful service to the wider MATLAB community, I have cast all the nonlinear regression datasets into an easy-to-use MAT file, containing "struct" objects for each dataset, each of which comprise:
* the dependent variable, x
* the observations/simulations, y
* a function handle describing the model function f(b,x)
* b0 and b1, the two starting points given with each dataset
* the calibrated true value, breal
* the standard deviation given, bsd
Quote from the text on the NIST group website to motivate this project:
"...most evaluations of nonlinear least squares software should also include a measure of the reliability of the code, that is, whether the code correctly recognizes when it has (or has not) found a solution. The datasets provided here are particularly well suited for such testing of robustness and reliability. We have included both generated and 'real-world' nonlinear least squares problems of varying levels of difficulty. The generated datasets are designed to challenge specific computations. Real-world data include challenging datasets such as the Thurber problem, and more benign datasets such as Misra1a. The certified values are 'best-available' solutions, obtained using 128-bit precision and confirmed by at least two different algorithms and software packages using analytic derivatives."
I hope this dataset is of use to those using techniques of nonlinear regression. Let me know if it has been of use!
Information on the entire StRD suite:
Information on nonlinear regression data:
The NLR data itself:
IMPORTANT COPYRIGHT NOTE:
This data is PUBLIC DOMAIN data by virtue of being published by a US federal government agency. I hold NO copyright on it; my contribution has been to faithfully cast the data in a convenient MAT file for use in MATLAB.
If you find this useful, please cite the NIST group in the first instance.
The y values in the lanczos1 data are given with 13 significant digits in the NIST Web site. These values have been truncated to just 3 significant digits in the Matlab .mat file prepared by Adam Gripton. This point should be kept in mind when making comparisons, since not only the resulting model coefficients but even the success of a NLS calculation with this data set may depend on these differences.
The NIST models and data can be used with older versions of Matlab if the Statistics Toolbox function nlinfit is available. Here is the code for running the models:
% Start up Matlab, "load nistdata"
% Then do run(thurber)
% ... etc.
fprintf('Starting with b0\n')
fprintf('Starting with b1\n')
fprintf('Residual SSQ = %e\nResidual STD = %e\n',rsq,rsd)
Very useful! Here is how easy it is to run the NIST problems using Matlab 2012a and later, using the 'fitnlm' function:
1. Download the Zip file using the "Download" button at the top right of the page; unzip the downloaded file.
2. In Matlab, do
3. The Matlab command "who" will show the NIST models that have been loaded. To run one of the models, say, 'gauss1', in Matlab type
the problem will be solved and the results displayed in the Matlab command window.
4. To plot the fitted curve and the input data, the Matlab commands are
xf=linspace(0,250,126); % for Gauss1
Thank you for your submission.
Would you, please, create an example of using it?
For example, to test the following NLLS solver: