Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Extend Tall Arrays with Other Products

Products Used: Statistics and Machine Learning Toolbox™, Database Toolbox™, Parallel Computing Toolbox™, MATLAB® Distributed Computing Server™, MATLAB Compiler™

Several toolboxes enhance the capabilities of tall arrays. These enhancements include writing machine learning algorithms, integrating with big data systems, and deploying standalone apps.

Statistics and Machine Learning

Statistics and Machine Learning Toolbox enables you to perform advanced statistical calculations on tall arrays. Capabilities include:

  • K-means clustering

  • Linear regression fitting

  • Grouped statistics

  • Classification

See Analysis of Big Data with Tall Arrays (Statistics and Machine Learning Toolbox) for more information.

Control Where Your Code Runs

When you execute calculations on tall arrays, the default execution environment uses either the local MATLAB session, or a local parallel pool if you have Parallel Computing Toolbox. Use the mapreducer function to change the execution environment of tall arrays when using Parallel Computing Toolbox, MATLAB Distributed Computing Server, or MATLAB Compiler:

  • Parallel Computing Toolbox — Run calculations in parallel using local workers to speed up large tall array calculations. See Use Tall Arrays on a Parallel Pool (Parallel Computing Toolbox) for more information.

  • MATLAB Distributed Computing Server — Run tall array calculations on a cluster, including Apache Spark™ enabled Hadoop® clusters. This can significantly reduce the execution time of very large calculations. See Use Tall Arrays on a Spark Enabled Hadoop Cluster (Parallel Computing Toolbox) for more information.

  • MATLAB Compiler — Deploy MATLAB applications containing tall arrays as standalone apps on Apache Spark. See Spark Applications (MATLAB Compiler) for more information.

One of the benefits of developing your algorithms with tall arrays is that you only need to write the code once. You can develop your code locally, then use mapreducer to scale up and take advantage of the capabilities offered by Parallel Computing Toolbox, MATLAB Distributed Computing Server, or MATLAB Compiler, without needing to rewrite your algorithm.

Note

Each tall array is bound to a single execution environment when it is constructed using tall(ds). If that execution environment is later modified or deleted, then the tall array becomes invalid.

For this reason, each time you change the execution environment you must reconstruct the tall array.

Work with Databases

Database Toolbox enables you to create a tall table from a DatabaseDatastore that is backed by data in a database. For more information, see Analyze Large Data in Database Using Tall Arrays (Database Toolbox).

Note

DatabaseDatastore has these limitations:

  • DatabaseDatastore must use the local MATLAB session as the execution environment. Set this environment using the command mapreducer(0).

  • Standalone applications containing tall arrays that use DatabaseDatastore cannot be deployed against Apache Spark using MATLAB Compiler.

See Also

| |

Related Topics

Was this topic helpful?