Deploying Applications to CLOUDERA Spark Using the MATLAB API for Spark

This example shows you how to deploy a MATLAB^® application developed using the MATLAB API for Spark™ against a CLOUDERA^® Spark enabled Hadoop^® cluster.

The application flightsByCarrierDemo.m computes the number of airline carrier types from airline data. The inputs to the application are:

master — URL to the Spark cluster
inputFile — the file containing the input data

Note

The complete code for this example is in the file flightsByCarrierDemo.m, as shown below.

flightsByCarrierDemo.m

Prerequisites

Install the MATLAB Runtime in the default location on the desktop. This example uses /usr/local/MATLAB/MATLAB_Runtime/R2025a as the default location for the MATLAB Runtime.
If you don’t have MATLAB Runtime, see Download and Install MATLAB Runtime for installation instructions.
Install the MATLAB Runtime on every worker node.
Copy the airlinesmall.csv from folder toolbox/matlab/demos of your MATLAB install area into Hadoop Distributed File System (HDFS™) folder /datasets/airlinemod.

Deploy Applications to CLOUDERA Spark

At the MATLAB command prompt, use the mcc command to generate a jar file and a shell script for the MATLAB application flightsByCarrierDemo.m.
```
>> mcc -C -W 'Spark:flightsByCarrierDemoApp' flightsByCarrierDemo.m
```
This action creates a jar file named flightsByCarrierDemoApp.jar and a shell script named run_flightsByCarrierDemoApp.sh.
Execute the shell script in either yarn-client mode or yarn-cluster mode. In yarn-client mode, the driver runs on the desktop. In yarn-cluster mode, the driver runs in the Application Master process in the cluster. The results of the computation in both cases are saved to a text file on HDFS by calling the saveAsTextFile method on the RDD.
yarn-client mode
Run the following command from a Linux^® terminal:
```
$ ./run_flightsByCarrierDemoApp.sh \ 
   /usr/local/MATLAB/MATLAB_Runtime/R2025a \
   yarn-client \
   hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv
```
To examine the results, enter the following from a Linux terminal:
```
$ hadoop fs -cat flightsByCarrierResults/*
```
yarn-cluster mode
Run the following command from a Linux terminal:
```
$ ./run_flightsByCarrierDemoApp.sh \
/usr/local/MATLAB/MATLAB_Runtime/R2025a \
--deploy-mode cluster --master yarn yarn-cluster \
 hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv
```