Extend Tall Arrays with Other Products
Products Used: Statistics and Machine Learning Toolbox™, Database Toolbox™, Parallel Computing Toolbox™, MATLAB® Parallel Server™, MATLAB Compiler™
Several toolboxes enhance the capabilities of tall arrays. These enhancements include writing machine learning algorithms, integrating with big data systems, and deploying standalone apps.
Statistics and Machine Learning
Statistics and Machine Learning Toolbox enables you to perform advanced statistical calculations on tall arrays. Capabilities include:
Linear regression fitting
See Analysis of Big Data with Tall Arrays (Statistics and Machine Learning Toolbox) for more information.
Control Where Your Code Runs
When you execute calculations on tall arrays, the default execution
environment uses either the local MATLAB session,
or a local parallel pool if you have Parallel Computing Toolbox.
to change the execution environment of tall arrays when using Parallel Computing Toolbox, MATLAB
Parallel Computing Toolbox — Run calculations in parallel using local or cluster workers to speed up large tall array calculations. See Use Tall Arrays on a Parallel Pool (Parallel Computing Toolbox) or Process Big Data in the Cloud (Parallel Computing Toolbox) for more information.
MATLAB Parallel Server — Run tall array calculations on a cluster, including Apache Spark™ enabled Hadoop® clusters. This can significantly reduce the execution time of very large calculations. See Use Tall Arrays on a Spark Cluster (Parallel Computing Toolbox) for more information.
MATLAB Compiler — Deploy MATLAB applications containing tall arrays as standalone apps on Apache Spark. See Spark Applications (MATLAB Compiler) for more information.
One of the benefits of developing your algorithms with tall
arrays is that you only need to write the code once. You can develop
your code locally, then use
mapreducer to scale
up and take advantage of the capabilities offered by Parallel Computing Toolbox, MATLAB
Compiler, without needing to rewrite your algorithm.
Each tall array is bound to a single execution environment when
it is constructed using
tall(ds). If that execution
environment is later modified or deleted, then the tall array becomes
For this reason, each time you change the execution environment you must reconstruct the tall array.
Work with Databases
Database Toolbox enables you to create a tall table from a
that is backed by data in a database. For more information, see Analyze Large Data in Database Using Tall Arrays (Database Toolbox).
DatabaseDatastore has these limitations:
DatabaseDatastoremust use the local MATLAB session as the execution environment. Set this environment using the command
Standalone applications containing tall arrays that use
DatabaseDatastorecannot be deployed against Apache Spark using MATLAB Compiler.