As an enthusiast for the dataset class, I notice with interest a new class table in the latest MATLAB release (in the promo video). This sounds very similar to the existing dataset class in the Statistics Toolbox which I have been using since release.
When I search the documentation/help for "table dataset" all I find is a converter function dataset2table and table2dataset, but the question I have is what is the difference in intention between these? When is it appropriate to use a dataset and when to use a table? What is the difference between the design of these two classes?
What about the "new" categorical class. Has this moved from stats toolbox into base MATLAB?
Should we expect dataset and categorical classess in the Statistics Toolbox to be deprecated in the future?
Julian, as you noticed, MATLAB R2013b includes two new array types known as tables and categorical arrays. These are very similar to the dataset, nominal, and ordinal array types that have been part of the Statistics Toolbox for about six years. Like a dataset array, a table is a container that holds mixed-type tabular data, the sort of column-oriented data you would often import from a CSV file or a spreadsheet. And like nominal and ordinal arrays, a categorical array represents discrete non-numeric data, the sort of data you might otherwise have used strings or "coded integers" to store.
Generally speaking, these new data types should look and feel very familiar to anyone who has used the ones in the Statistics Toolbox. One obvious difference is that they are included as part of core MATLAB, and you don't need to install the Statistics Toolbox to use them. In addition, their design and terminology makes them a bit more accessible for non-statistical uses, though they remain just as useful for statistics.
Tables and categorical arrays are ultimately intended as replacements for dataset, nominal, and ordinal arrays, and we recommend that MATLAB users adopt them for new work. We also recommend that, over time, users update any of their existing code that uses dataset/nominal/ordinal, but we don't expect that that changeover can happen immediately. Upcoming releases will provide more details and strategies for making the transition.
In R2013b, all of the Statistics Toolbox functionality that uses nominal and ordinal arrays also supports the new categorical arrays. In R2013b, you'll still need to use dataset arrays in the Statistics Toolbox for things like LinearModel and (new in R2013b) LinearMixedModel, but you might consider creating tables and converting to dataset only when needed, using table2dataset.