This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Object Detection

This example shows how to generate CUDA® code from a SeriesNetwork object created for YOLO architecture trained for classifying the PASCAL dataset. YOLO is an object detection network that can classify objects in an image frame and the position of these objects [1].

Prerequisites

  • CUDA enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library (v7).

  • OpenCV 3.1.0 libraries for video read and image display operations.

  • Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products. For setting up the environment variables, see Setting Up the Prerequisite Products.

  • Deep Learning Toolbox™ for using SeriesNetwork objects.

  • GPU Coder™ for generating CUDA code.

  • GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.

coder.checkGpuInstall('gpu','codegen','cudnn','quiet');

Get the Pretrained SeriesNetwork

net = getYOLO();

It contains 58 layers. These are convolution layers followed by leaky ReLU, and fully connected layers in the end.

net.Layers
ans = 

  58x1 Layer array with layers:

     1   'ImageInputLayer'        Image Input             448x448x3 images
     2   'Convolution2DLayer'     Convolution             64 7x7x3 convolutions with stride [2  2] and padding [3  3  3  3]
     3   'leakyrelu_1'            Leaky ReLU              Leaky ReLU with scale 0.1
     4   'MaxPooling2DLayer0'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'Convolution2DLayer0'    Convolution             192 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
     6   'leakyrelu_2'            Leaky ReLU              Leaky ReLU with scale 0.1
     7   'MaxPooling2DLayer1'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'Convolution2DLayer1'    Convolution             128 1x1x192 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'leakyrelu_3'            Leaky ReLU              Leaky ReLU with scale 0.1
    10   'Convolution2DLayer2'    Convolution             256 3x3x128 convolutions with stride [1  1] and padding [1  1  1  1]
    11   'leakyrelu_4'            Leaky ReLU              Leaky ReLU with scale 0.1
    12   'Convolution2DLayer3'    Convolution             256 1x1x256 convolutions with stride [1  1] and padding [0  0  0  0]
    13   'leakyrelu_5'            Leaky ReLU              Leaky ReLU with scale 0.1
    14   'Convolution2DLayer4'    Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    15   'leakyrelu_6'            Leaky ReLU              Leaky ReLU with scale 0.1
    16   'MaxPooling2DLayer2'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    17   'Convolution2DLayer5'    Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    18   'leakyrelu_7'            Leaky ReLU              Leaky ReLU with scale 0.1
    19   'Convolution2DLayer6'    Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    20   'leakyrelu_8'            Leaky ReLU              Leaky ReLU with scale 0.1
    21   'Convolution2DLayer7'    Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    22   'leakyrelu_9'            Leaky ReLU              Leaky ReLU with scale 0.1
    23   'Convolution2DLayer8'    Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    24   'leakyrelu_10'           Leaky ReLU              Leaky ReLU with scale 0.1
    25   'Convolution2DLayer9'    Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    26   'leakyrelu_11'           Leaky ReLU              Leaky ReLU with scale 0.1
    27   'Convolution2DLayer10'   Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    28   'leakyrelu_12'           Leaky ReLU              Leaky ReLU with scale 0.1
    29   'Convolution2DLayer11'   Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    30   'leakyrelu_13'           Leaky ReLU              Leaky ReLU with scale 0.1
    31   'Convolution2DLayer12'   Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    32   'leakyrelu_14'           Leaky ReLU              Leaky ReLU with scale 0.1
    33   'Convolution2DLayer13'   Convolution             512 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    34   'leakyrelu_15'           Leaky ReLU              Leaky ReLU with scale 0.1
    35   'Convolution2DLayer14'   Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    36   'leakyrelu_16'           Leaky ReLU              Leaky ReLU with scale 0.1
    37   'MaxPooling2DLayer3'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    38   'Convolution2DLayer15'   Convolution             512 1x1x1024 convolutions with stride [1  1] and padding [0  0  0  0]
    39   'leakyrelu_17'           Leaky ReLU              Leaky ReLU with scale 0.1
    40   'Convolution2DLayer16'   Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    41   'leakyrelu_18'           Leaky ReLU              Leaky ReLU with scale 0.1
    42   'Convolution2DLayer17'   Convolution             512 1x1x1024 convolutions with stride [1  1] and padding [0  0  0  0]
    43   'leakyrelu_19'           Leaky ReLU              Leaky ReLU with scale 0.1
    44   'Convolution2DLayer18'   Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    45   'leakyrelu_20'           Leaky ReLU              Leaky ReLU with scale 0.1
    46   'Convolution2DLayer19'   Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    47   'leakyrelu_21'           Leaky ReLU              Leaky ReLU with scale 0.1
    48   'Convolution2DLayer20'   Convolution             1024 3x3x1024 convolutions with stride [2  2] and padding [1  1  1  1]
    49   'leakyrelu_22'           Leaky ReLU              Leaky ReLU with scale 0.1
    50   'Convolution2DLayer21'   Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    51   'leakyrelu_23'           Leaky ReLU              Leaky ReLU with scale 0.1
    52   'Convolution2DLayer22'   Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    53   'leakyrelu_24'           Leaky ReLU              Leaky ReLU with scale 0.1
    54   'FullyConnectedLayer'    Fully Connected         4096 fully connected layer
    55   'leakyrelu_25'           Leaky ReLU              Leaky ReLU with scale 0.1
    56   'FullyConnectedLayer1'   Fully Connected         1470 fully connected layer
    57   'softmax'                Softmax                 softmax
    58   'ClassificationLayer'    Classification Output   crossentropyex

Generate Code from SeriesNetwork

Generate code for the host platform.

cnncodegen(net);
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWConvLayer.o" "MWConvLayer.cpp"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWLeakyReLULayer.o" "MWLeakyReLULayer.cpp"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "cnn_api.o" "cnn_api.cpp"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWCNNLayerImpl.o" "MWCNNLayerImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWConvLayerImpl.o" "MWConvLayerImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWLeakyReLULayerImpl.o" "MWLeakyReLULayerImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWTargetNetworkImpl.o" "MWTargetNetworkImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/home/lnarasim/Documents/MATLAB/Examples/deeplearning_shared-ex35875752/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "cnn_exec.o" "cnn_exec.cpp"
nvcc -lib -Xlinker -rpath,"/bin/glnxa64",-L"/bin/glnxa64" -lc -Xnvlink -w -Wno-deprecated-gpu-targets -g -G -arch sm_35  -o cnnbuild.a MWConvLayer.o MWLeakyReLULayer.o cnn_api.o MWCNNLayerImpl.o MWConvLayerImpl.o MWLeakyReLULayerImpl.o MWTargetNetworkImpl.o cnn_exec.o -L".." "/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/lib64/libcudnn.so" -lcublas -lcudart -lcusolver 
### Created: cnnbuild.a
### Successfully generated all binary outputs.

Generated Code Description

This generates the *.cu and header files within the 'codegen' folder of the current directory. The files are compiled into a static library 'cnnbuild.a'.

The SeriesNetwork is generated as a C++ class containing an array of 58 layer classes and 3 public functions.

   class CnnMain
   {
     ....
     public:
       CnnMain();
       void setup();
       void predict();
       void cleanup();
       ~CnnMain();
   };

The setup() method of the class sets up handles and allocates memory for each layer object. The predict() method invokes prediction for each of the 58 layers in the network.

The files cnn_CnnMain_Convolution2DLayer*_w and cnn_CnnMain_Convolution2DLayer*_w are the binary weights and bias file for convolution layer in the network. The files cnn_CnnMain_FullyConnectedLayer*_w and cnn_CnnMain_FullyConnectedLayer*_b are the binary weights and bias file for fully connected layer in the network.

dir('codegen')
.                                   cnn_CnnMain_Convolution2DLayer1_b   
..                                  cnn_CnnMain_Convolution2DLayer1_w   
MWCNNLayerImpl.cu                   cnn_CnnMain_Convolution2DLayer20_b  
MWCNNLayerImpl.hpp                  cnn_CnnMain_Convolution2DLayer20_w  
MWCNNLayerImpl.o                    cnn_CnnMain_Convolution2DLayer21_b  
MWConvLayer.cpp                     cnn_CnnMain_Convolution2DLayer21_w  
MWConvLayer.hpp                     cnn_CnnMain_Convolution2DLayer22_b  
MWConvLayer.o                       cnn_CnnMain_Convolution2DLayer22_w  
MWConvLayerImpl.cu                  cnn_CnnMain_Convolution2DLayer2_b   
MWConvLayerImpl.hpp                 cnn_CnnMain_Convolution2DLayer2_w   
MWConvLayerImpl.o                   cnn_CnnMain_Convolution2DLayer3_b   
MWLeakyReLULayer.cpp                cnn_CnnMain_Convolution2DLayer3_w   
MWLeakyReLULayer.hpp                cnn_CnnMain_Convolution2DLayer4_b   
MWLeakyReLULayer.o                  cnn_CnnMain_Convolution2DLayer4_w   
MWLeakyReLULayerImpl.cu             cnn_CnnMain_Convolution2DLayer5_b   
MWLeakyReLULayerImpl.hpp            cnn_CnnMain_Convolution2DLayer5_w   
MWLeakyReLULayerImpl.o              cnn_CnnMain_Convolution2DLayer6_b   
MWTargetNetworkImpl.cu              cnn_CnnMain_Convolution2DLayer6_w   
MWTargetNetworkImpl.hpp             cnn_CnnMain_Convolution2DLayer7_b   
MWTargetNetworkImpl.o               cnn_CnnMain_Convolution2DLayer7_w   
cnn_CnnMain_Convolution2DLayer0_b   cnn_CnnMain_Convolution2DLayer8_b   
cnn_CnnMain_Convolution2DLayer0_w   cnn_CnnMain_Convolution2DLayer8_w   
cnn_CnnMain_Convolution2DLayer10_b  cnn_CnnMain_Convolution2DLayer9_b   
cnn_CnnMain_Convolution2DLayer10_w  cnn_CnnMain_Convolution2DLayer9_w   
cnn_CnnMain_Convolution2DLayer11_b  cnn_CnnMain_Convolution2DLayer_b    
cnn_CnnMain_Convolution2DLayer11_w  cnn_CnnMain_Convolution2DLayer_w    
cnn_CnnMain_Convolution2DLayer12_b  cnn_CnnMain_FullyConnectedLayer1_b  
cnn_CnnMain_Convolution2DLayer12_w  cnn_CnnMain_FullyConnectedLayer1_w  
cnn_CnnMain_Convolution2DLayer13_b  cnn_CnnMain_FullyConnectedLayer_b   
cnn_CnnMain_Convolution2DLayer13_w  cnn_CnnMain_FullyConnectedLayer_w   
cnn_CnnMain_Convolution2DLayer14_b  cnn_CnnMain_labels.txt              
cnn_CnnMain_Convolution2DLayer14_w  cnn_api.cpp                         
cnn_CnnMain_Convolution2DLayer15_b  cnn_api.hpp                         
cnn_CnnMain_Convolution2DLayer15_w  cnn_api.o                           
cnn_CnnMain_Convolution2DLayer16_b  cnn_exec.cpp                        
cnn_CnnMain_Convolution2DLayer16_w  cnn_exec.hpp                        
cnn_CnnMain_Convolution2DLayer17_b  cnn_exec.o                          
cnn_CnnMain_Convolution2DLayer17_w  cnnbuild.a                          
cnn_CnnMain_Convolution2DLayer18_b  cnnbuild_rtw.mk                     
cnn_CnnMain_Convolution2DLayer18_w  rtwtypes.h                          
cnn_CnnMain_Convolution2DLayer19_b  tmwtypes.h                          
cnn_CnnMain_Convolution2DLayer19_w  

Main File

The main file creates and sets up the CnnMain network object with layers and weights. It uses the OpenCV VideoCapture method to read frames from input video. It runs prediction for each frame fetching the output from the final fully connected layer.

The class probabilities and bounding box values are read from the output array and displayed.

   int main(int argc, char* argv[])
   {
       float *inputBuffer = (float*)calloc(sizeof(float),448*448*3);
       float *outputBuffer = (float*)calloc(sizeof(float),1470);
       if ((inputBuffer == NULL) || (outputBuffer == NULL)) {
           printf("ERROR: Input/Output buffers could not be allocated!\n");
           exit(-1);
       }
       CnnMain* net = new CnnMain;
       net->batchSize = 1;
       net->setup();
       if (argc < 2)
       {
           printf("Pass in input video file name as argument\n");
           return -1;
       }
       VideoCapture cap(argv[1]);
       if (!cap.isOpened()) {
           printf("Could not open the video capture device.\n");
           return -1;
       }
       namedWindow("Yolo Demo",CV_WINDOW_NORMAL);
       cvMoveWindow("Yolo Demo", 0, 0);
       resizeWindow("Yolo Demo", 1352,1013);
       float fps = 0;
       cudaEvent_t start, stop;
       cudaEventCreate(&start);
       cudaEventCreate(&stop);
       for(;;)
       {
           Mat orig;
           cap >> orig;
           if (orig.empty()) break;
           Mat im;
           readData(inputBuffer, orig, im);
           cudaEventRecord(start);
           cudaMemcpy(net->inputData,
                      inputBuffer,
                      sizeof(float)*448*448*3,
                      cudaMemcpyHostToDevice);
           net->predict();
           cudaMemcpy(outputBuffer,
                      net->layers[55]->getData(),
                      sizeof(float)*1470,
                      cudaMemcpyDeviceToHost);
           cudaEventRecord(stop);
           cudaEventSynchronize(stop);
           float milliseconds = -1.0;
           cudaEventElapsedTime(&milliseconds, start, stop);
           fps = fps*.9+1000.0/milliseconds*.1;
           Mat resized;
           resize(orig, resized, Size(1352,1013));
           writeData(outputBuffer, resized, fps);
           imshow("Yolo Demo", resized);
           if( waitKey(50)%256 == 27 ) break; // stop capturing by pressing ESC
       }
       destroyWindow("Yolo Demo");
       delete net;
       free(inputBuffer);
       free(outputBuffer);
       return 0;
   }

Build and Run Executable

Run the executable with an input video file.

video_input = fullfile(matlabroot, ...
         'toolbox', 'vision', 'visiondata', 'viptrain.avi');
if ispc
    system('make_object_detection.bat');
    system(['object_detection_exe.exe ', video_input]);
else
    system('make -f makefile_object_detection');
    system(['./object_detection_exe ', video_input]);
end

Press the escape(Esc) key to stop capturing at any time.

Input Screenshot

Output Screenshot

References

[1] Redmon, J., Santosh D., Ross G., and Ali F. "You Only Look Once: Unified, Real-Time Object Detection." arXiv preprint arXiv:1506.02640, 2015.