Hadoop Compiler
Package MATLAB programs for deployment to Hadoop clusters as MapReduce programs
The Hadoop Compiler app will be removed in a future release. To create
standalone MATLAB® MapReduce applications, or deployable archives from MATLAB map and reduce functions, use the mcc
command. For details, see Compatibility Considerations.
Description
The Hadoop Compiler app packages MATLAB map and reduce functions into a deployable archive. You can incorporate the archive into a Hadoop® mapreduce job by passing it as a payload argument to job submitted to a Hadoop cluster.
Open the Hadoop Compiler App
MATLAB Toolstrip: On the Apps tab, under Application Deployment, click the app icon.
MATLAB command prompt: Enter
hadoopCompiler
.
Parameters
map function
— mapper file
character vector
Function for the mapper, specified as a character vector.
reduce function
— reducer file
character vector
Function for the reducer, specified as a character vector.
datastore file
— file containing a datastore representing the data to be processed
character vector
A file containing a datastore representing the data to be processed, specified as a character vector.
In most cases, you will start off by working on a small sample dataset residing on a local machine that is representative of the actual dataset on the cluster. This sample dataset has the same structure and variables as the actual dataset on the cluster. By creating a datastore object to the dataset residing on your local machine you are taking a snapshot of that structure. By having access to this datastore object, a Hadoop job executing on the cluster will know how to access and process the actual dataset residing on HDFS™.
output types
— format of output
keyvalue (default) | tabulartext
Format of output from Hadoop mapreduce job, specified as a keyvalue or tabular text.
additional configuration file content
— additional parameters configuring how Hadoop executes the job
character vector
Additional parameters to configure how Hadoop executes the job, specified as a character vector. For more information, see Configuration File for Creating Deployable Archive Using the mcc Command.
files required for your MapReduce job payload to run
— files that must be included with generated artifacts
list of files
Files that must be included with generated artifacts, specified as a list of files.
Additional parameters passed to MCC
— flags controlling the behavior of the compiler
character vector
Flags controlling the behavior of the compiler, specified as a character vector.
testing files
— folder where files for testing are stored
character vector
Folder where files for testing are stored, specified as a character vector.
packaged files
— folder where generated artifacts are stored
character vector
Folder where generated artifacts are stored, specified as a character vector.