Main Content

Deploying Applications to CLOUDERA Spark Using the MATLAB API for Spark

This example shows you how to deploy a MATLAB® application developed using the MATLAB API for Spark™ against a CLOUDERA® Spark enabled Hadoop® cluster.

The application flightsByCarrierDemo.m computes the number of airline carrier types from airline data. The inputs to the application are:

  • master — URL to the Spark cluster

  • inputFile — the file containing the input data

Note

The complete code for this example is in the file flightsByCarrierDemo.m, as shown below.

 flightsByCarrierDemo.m

Prerequisites

  • Install the MATLAB Runtime in the default location on the desktop. This example uses /usr/local/MATLAB/MATLAB_Runtime/R2025a as the default location for the MATLAB Runtime.

    If you don’t have MATLAB Runtime, see Download and Install MATLAB Runtime for installation instructions.

  • Install the MATLAB Runtime on every worker node.

  • Copy the airlinesmall.csv from folder toolbox/matlab/demos of your MATLAB install area into Hadoop Distributed File System (HDFS™) folder /datasets/airlinemod.

Deploy Applications to CLOUDERA Spark

  1. At the MATLAB command prompt, use the mcc command to generate a jar file and a shell script for the MATLAB application flightsByCarrierDemo.m.

    >> mcc -C -W 'Spark:flightsByCarrierDemoApp' flightsByCarrierDemo.m

    This action creates a jar file named flightsByCarrierDemoApp.jar and a shell script named run_flightsByCarrierDemoApp.sh.

  2. Execute the shell script in either yarn-client mode or yarn-cluster mode. In yarn-client mode, the driver runs on the desktop. In yarn-cluster mode, the driver runs in the Application Master process in the cluster. The results of the computation in both cases are saved to a text file on HDFS by calling the saveAsTextFile method on the RDD.

    yarn-client mode

    Run the following command from a Linux® terminal:

    $ ./run_flightsByCarrierDemoApp.sh \ 
       /usr/local/MATLAB/MATLAB_Runtime/R2025a \
       yarn-client \
       hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv
    

    To examine the results, enter the following from a Linux terminal:

    $ hadoop fs -cat flightsByCarrierResults/*

    yarn-cluster mode

    Run the following command from a Linux terminal:

    $ ./run_flightsByCarrierDemoApp.sh \
    /usr/local/MATLAB/MATLAB_Runtime/R2025a \
    --deploy-mode cluster --master yarn yarn-cluster \
     hdfs://hadoop01glnxa64:54310/datasets/airlinemod/airlinesmall.csv