Trial software

How do I create a deployed MATLAB® applications to run against Cloudera Spark™?

6 views (last 30 days)

Show older comments

MathWorks Support Team on 4 Oct 2016

1
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/305729-how-do-i-create-a-deployed-matlab-applications-to-run-against-cloudera-spark

Edited: MathWorks Support Team on 25 Aug 2021

Accepted Answer: MathWorks Support Team

How do I create a deployed MATLAB® applications to run against Cloudera Spark™?

Sign in to answer this question.

Accepted Answer

MathWorks Support Team on 11 Aug 2021

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/305729-how-do-i-create-a-deployed-matlab-applications-to-run-against-cloudera-spark#answer_237342

Edited: MathWorks Support Team on 25 Aug 2021

Open in MATLAB Online

In R2016b, MATLAB Compiler supports running MATLAB applications as standalone executables against a Spark enabled cluster. The ability to deploy MATLAB applications against a Cloudera Spark distribution requires an alternate workflow that is undocumented in the release documentation.

To deploy MATLAB applications against a Cloudera distribution of Spark requires a new wrapper type that can be generated using the mcc command. Using this new wrapper type generates a jar file as well as a shell script which calls spark_submit. The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It supports both yarn-client mode and yarn-cluster mode.

MATLAB applications that use tall arrays or the MATLAB API for Spark can be deployed using this workflow.

Example 1:

Deploy Tall Arrays to a Cloudera Spark Enabled Hadoop Cluster

This example shows you how to deploy a MATLAB application that uses tall arrays to a Cloudera Spark enabled Hadoop cluster. The application meanArrivalDemo.m computes the mean arrival delay from airline data. The inputs to the application are:

master—URL to the Spark cluster.

inputFile—the file containing the input data.

outputFile—the file containing the results of the computation.

Prerequisites:

Install the MATLAB Runtime in the default location on the desktop. This example uses as the default location for the MATLAB Runtime.
Install the MATLAB Runtime on every worker node.
Copy the airlinesmall.csv from folder of your MATLAB install area into Hadoop Distributed File System (HDFS™) folder /datasets/airlinemod.

If you don't have the MATLAB Runtime, you can download it from the website at: https://www.mathworks.com/products/compiler/matlab-runtime.html.

Procedure:

1. At the MATLAB command prompt, use the mcc command to generate a jar file and shell script for the MATLAB application meanArrivalDemo.m

>> mcc -vCW 'Spark:meanArrivalDemoApp' meanArrivalDemo.m

or, if using Spark version 2:

>> mcc -vCW 'Spark:meanArrivalDemoApp, 2' meanArrivalDemo.m

This creates a jar file named meanArrivalDempApp.jar and a shell script named run_meanArrivalDemoApp.sh.

Note: In order to use the shell script, you need the environment variables HADOOP_PREIX, HADOOP_CONF_DIR and SPARK_HOME to be set up correctly.

2. You can execute the shell script in yarn-client mode or yarn-cluster mode. In yarn-client mode, the driver runs on the desktop. In yarn-cluster mode, the driver runs in the Application Master process in the cluster.

The general syntax to execute the shell script is:

./run_meanArrivalDemoApp.sh <runtime install root> [Spark arguments] [Application arguments] 

a. yarn-client mode

Run the following command from a Linux terminal:

$ ./run_meanArrivalDemoApp.sh \   /usr/local/MATLAB/MATLAB_Runtime/v91 \    yarn-client \    hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv \    hdfs://hadoop01glnxa64:54310/user/someuser/meanArrivalResult

To examine the result, enter the following from the MATLAB command prompt:

>> ds = … datastore('hdfs://hadoop01glnxa64:54310/user/someuser/meanArrivalResult/*');>> readall(ds)

b. yarn-cluster mode

Run the following command from a Linux terminal:

$ ./run_meanArrivalDemoApp.sh \   /usr/local/MATLAB/MATLAB_Runtime/v91 \   --deploy-mode cluster --master yarn yarn-cluster \   hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv \    hdfs://hadoop01glnxa64:54310/user/someuser/meanArrivalResult

In yarn-cluster mode, since the driver is running on some worker node in the cluster, any standard output from the MATLAB function will not be displayed on your desktop. In addition, files can end up being saved anywhere. In order to prevent such behavior, this example uses the write function to explicitly save the results to a particular location in HDFS.

Example 2:

Deploy Applications Using the MATLAB API for Spark

This example shows you how to deploy a MATLAB application developed using the MATLAB API for Spark against a Cloudera Spark enabled Hadoop cluster. The application flightsByCarrierDemo.m computes the number of airline carrier types from airline data. The inputs to the application are:

master—URL to the Spark cluster.

inputFile—the file containing the input data.

Prerequisites:

Install the MATLAB Runtime in the default location on the desktop. This example uses as the default location for the MATLAB Runtime.
Install the MATLAB Runtime on every worker node.
Copy the airlinesmall.csv from folder of your MATLAB install area into Hadoop Distributed File System (HDFS™) folder /datasets/airlinemod.

If you don't have the MATLAB Runtime, you can download it from the website at: https://www.mathworks.com/products/compiler/matlab-runtime.htmlProcedure:

1. At the MATLAB command prompt, use the mcc command to generate a jar file and shell script for the MATLAB application flightsByCarrierDemo.m

>> mcc -C -W 'Spark:flightsByCarrierDemoApp' flightsByCarrierDemo.m

This creates a jar file named flightsByCarrierDemoApp.jar and a shell script named run_flightsByCarrierDemoApp.sh.

2. You can execute the shell script in yarn-client mode or yarn-cluster mode. In yarn-client mode, the driver runs on the desktop. In yarn-cluster mode, the driver runs in the Application Master process in the cluster. The results of the computation in both cases are saved to a text file on HDFS by calling the saveAsTextFile method on the RDD.

a. yarn-client mode

Run the following command from a Linux terminal:

$ ./run_flightsByCarrierDemoApp.sh \    /usr/local/MATLAB/MATLAB_Runtime/v91 \   yarn-client \   hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv

To examine the results, enter the following from a Linux terminal.

$ hadoop fs -cat flightsByCarrierResults/*

b. yarn-cluster mode

Run the following command from a Linux terminal:

$ ./run_flightsByCarrierDemoApp.sh \   /usr/local/MATLAB/MATLAB_Runtime/v91 \   --deploy-mode cluster --master yarn yarn-cluster \               hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Categories

Application Deployment MATLAB Compiler SDK Package MATLAB Functions

Find more on Package MATLAB Functions in Help Center and File Exchange

Tags

Products

MATLAB

Release

R2016b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Trial software