Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

Object Detection

This example shows how to generate CUDA® code from a SeriesNetwork object created for YOLO architecture trained for classifying the PASCAL dataset. YOLO is an object detection network that can classify objects in an image frame as well as the position of these objects. Reference : You Only Look Once: Unified, Real-Time Object Detection (Joseph Redmon, Santosh Divala and others).

Prerequisites

  • CUDA® enabled NVIDIA® GPU with compute capability 3.2 or higher.

  • NVIDIA CUDA toolkit and driver.

  • NVIDIA cuDNN library (v7).

  • OpenCV 3.1.0 libraries for video read and image display operations.

  • Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products. For setting up the environment variables, see Environment Variables.

  • Neural Network Toolbox™ for using SeriesNetwork objects.

Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.

coder.checkGpuInstall('gpu','codegen','cudnn','quiet');

Create a New Folder and Copy Relevant Files

The following code will create a folder in your current working folder (pwd). The new folder will only contain the files that are relevant for this example. If you do not want to affect the current folder (or if you cannot generate files in this folder), you should change your working folder.

Run Command: Create a New Folder and Copy Relevant Files

gpucoderdemo_setup('gpucoderdemo_object_detection');

Get the Pre-trained SeriesNetwork

net = getYolo();

It contains 58 layers. These are convolution layers followed by leaky ReLU, and fully connected layers in the end.

disp(net.Layers);
  58x1 Layer array with layers:

     1   'ImageInputLayer'        Image Input             448x448x3 images
     2   'Convolution2DLayer'     Convolution             64 7x7x3 convolutions with stride [2  2] and padding [3  3  3  3]
     3   'leakyrelu_1'            Leaky ReLU              Leaky ReLU with scale 0.1
     4   'MaxPooling2DLayer0'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'Convolution2DLayer0'    Convolution             192 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
     6   'leakyrelu_2'            Leaky ReLU              Leaky ReLU with scale 0.1
     7   'MaxPooling2DLayer1'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'Convolution2DLayer1'    Convolution             128 1x1x192 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'leakyrelu_3'            Leaky ReLU              Leaky ReLU with scale 0.1
    10   'Convolution2DLayer2'    Convolution             256 3x3x128 convolutions with stride [1  1] and padding [1  1  1  1]
    11   'leakyrelu_4'            Leaky ReLU              Leaky ReLU with scale 0.1
    12   'Convolution2DLayer3'    Convolution             256 1x1x256 convolutions with stride [1  1] and padding [0  0  0  0]
    13   'leakyrelu_5'            Leaky ReLU              Leaky ReLU with scale 0.1
    14   'Convolution2DLayer4'    Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    15   'leakyrelu_6'            Leaky ReLU              Leaky ReLU with scale 0.1
    16   'MaxPooling2DLayer2'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    17   'Convolution2DLayer5'    Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    18   'leakyrelu_7'            Leaky ReLU              Leaky ReLU with scale 0.1
    19   'Convolution2DLayer6'    Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    20   'leakyrelu_8'            Leaky ReLU              Leaky ReLU with scale 0.1
    21   'Convolution2DLayer7'    Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    22   'leakyrelu_9'            Leaky ReLU              Leaky ReLU with scale 0.1
    23   'Convolution2DLayer8'    Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    24   'leakyrelu_10'           Leaky ReLU              Leaky ReLU with scale 0.1
    25   'Convolution2DLayer9'    Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    26   'leakyrelu_11'           Leaky ReLU              Leaky ReLU with scale 0.1
    27   'Convolution2DLayer10'   Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    28   'leakyrelu_12'           Leaky ReLU              Leaky ReLU with scale 0.1
    29   'Convolution2DLayer11'   Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    30   'leakyrelu_13'           Leaky ReLU              Leaky ReLU with scale 0.1
    31   'Convolution2DLayer12'   Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    32   'leakyrelu_14'           Leaky ReLU              Leaky ReLU with scale 0.1
    33   'Convolution2DLayer13'   Convolution             512 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    34   'leakyrelu_15'           Leaky ReLU              Leaky ReLU with scale 0.1
    35   'Convolution2DLayer14'   Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    36   'leakyrelu_16'           Leaky ReLU              Leaky ReLU with scale 0.1
    37   'MaxPooling2DLayer3'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    38   'Convolution2DLayer15'   Convolution             512 1x1x1024 convolutions with stride [1  1] and padding [0  0  0  0]
    39   'leakyrelu_17'           Leaky ReLU              Leaky ReLU with scale 0.1
    40   'Convolution2DLayer16'   Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    41   'leakyrelu_18'           Leaky ReLU              Leaky ReLU with scale 0.1
    42   'Convolution2DLayer17'   Convolution             512 1x1x1024 convolutions with stride [1  1] and padding [0  0  0  0]
    43   'leakyrelu_19'           Leaky ReLU              Leaky ReLU with scale 0.1
    44   'Convolution2DLayer18'   Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    45   'leakyrelu_20'           Leaky ReLU              Leaky ReLU with scale 0.1
    46   'Convolution2DLayer19'   Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    47   'leakyrelu_21'           Leaky ReLU              Leaky ReLU with scale 0.1
    48   'Convolution2DLayer20'   Convolution             1024 3x3x1024 convolutions with stride [2  2] and padding [1  1  1  1]
    49   'leakyrelu_22'           Leaky ReLU              Leaky ReLU with scale 0.1
    50   'Convolution2DLayer21'   Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    51   'leakyrelu_23'           Leaky ReLU              Leaky ReLU with scale 0.1
    52   'Convolution2DLayer22'   Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    53   'leakyrelu_24'           Leaky ReLU              Leaky ReLU with scale 0.1
    54   'FullyConnectedLayer'    Fully Connected         4096 fully connected layer
    55   'leakyrelu_25'           Leaky ReLU              Leaky ReLU with scale 0.1
    56   'FullyConnectedLayer1'   Fully Connected         1470 fully connected layer
    57   'softmax'                Softmax                 softmax
    58   'ClassificationLayer'    Classification Output   crossentropyex

Generate Code from SeriesNetwork

Generate code for the host platform.

cnncodegen(net);
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/devel/sbs/37/jshankar.Blcmdacore.j789535.1/matlab/toolbox/gpucoder/gpucoderdemos/gpucoderdemo_object_detection2/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWLeakyReLULayer.o" "MWLeakyReLULayer.cpp"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/devel/sbs/37/jshankar.Blcmdacore.j789535.1/matlab/toolbox/gpucoder/gpucoderdemos/gpucoderdemo_object_detection2/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "cnn_api.o" "cnn_api.cpp"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/devel/sbs/37/jshankar.Blcmdacore.j789535.1/matlab/toolbox/gpucoder/gpucoderdemos/gpucoderdemo_object_detection2/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWCNNLayerImpl.o" "MWCNNLayerImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/devel/sbs/37/jshankar.Blcmdacore.j789535.1/matlab/toolbox/gpucoder/gpucoderdemos/gpucoderdemo_object_detection2/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWLeakyReLULayerImpl.o" "MWLeakyReLULayerImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/devel/sbs/37/jshankar.Blcmdacore.j789535.1/matlab/toolbox/gpucoder/gpucoderdemos/gpucoderdemo_object_detection2/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "MWTargetNetworkImpl.o" "MWTargetNetworkImpl.cu"
nvcc -c  -rdc=true  -Xcompiler -fPIC -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -O0 -g -G -arch sm_35  -I"/mathworks/devel/sbs/37/jshankar.Blcmdacore.j789535.1/matlab/toolbox/gpucoder/gpucoderdemos/gpucoderdemo_object_detection2/codegen" -I"/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/include" -o "cnn_exec.o" "cnn_exec.cpp"
nvcc -lib -Xlinker -rpath,"/bin/glnxa64",-L"/bin/glnxa64" -lc -Xnvlink -w -Wno-deprecated-gpu-targets -g -G -arch sm_35  -o cnnbuild.a MWLeakyReLULayer.o cnn_api.o MWCNNLayerImpl.o MWLeakyReLULayerImpl.o MWTargetNetworkImpl.o cnn_exec.o -L".." "/mathworks/hub/3rdparty/R2018a/2950900/glnxa64/cuDNN/cuda/lib64/libcudnn.so" -lcublas -lcudart -lcusolver 
### Created: cnnbuild.a
### Successfully generated all binary outputs.

Generated Code Description

This generates the .cu and header files within the 'codegen' directory of the current folder. The files are compiled into a static library 'cnnbuild.a'.

The SeriesNetwork is generated as a C++ class containing an array of 58 layer classes and 3 public functions.

   class CnnMain
   {
     ....
     public:
       CnnMain();
       void setup();
       void predict();
       void cleanup();
       ~CnnMain();
   };

The setup() method of the class sets up handles and allocates memory for each layer object. The predict() method invokes prediction for each of the 58 layers in the network.

The files cnn_CnnMain_Convolution2DLayer*_w and cnn_CnnMain_Convolution2DLayer*_w are the binary weights and bias file for convolution layer in the network. The files cnn_CnnMain_FullyConnectedLayer*_w and cnn_CnnMain_FullyConnectedLayer*_b are the binary weights and bias file for fully connected layer in the network.

dir('codegen')
.                                   cnn_CnnMain_Convolution2DLayer20_w  
..                                  cnn_CnnMain_Convolution2DLayer21_b  
MWCNNLayerImpl.cu                   cnn_CnnMain_Convolution2DLayer21_w  
MWCNNLayerImpl.hpp                  cnn_CnnMain_Convolution2DLayer22_b  
MWCNNLayerImpl.o                    cnn_CnnMain_Convolution2DLayer22_w  
MWLeakyReLULayer.cpp                cnn_CnnMain_Convolution2DLayer2_b   
MWLeakyReLULayer.hpp                cnn_CnnMain_Convolution2DLayer2_w   
MWLeakyReLULayer.o                  cnn_CnnMain_Convolution2DLayer3_b   
MWLeakyReLULayerImpl.cu             cnn_CnnMain_Convolution2DLayer3_w   
MWLeakyReLULayerImpl.hpp            cnn_CnnMain_Convolution2DLayer4_b   
MWLeakyReLULayerImpl.o              cnn_CnnMain_Convolution2DLayer4_w   
MWTargetNetworkImpl.cu              cnn_CnnMain_Convolution2DLayer5_b   
MWTargetNetworkImpl.hpp             cnn_CnnMain_Convolution2DLayer5_w   
MWTargetNetworkImpl.o               cnn_CnnMain_Convolution2DLayer6_b   
cnn_CnnMain_Convolution2DLayer0_b   cnn_CnnMain_Convolution2DLayer6_w   
cnn_CnnMain_Convolution2DLayer0_w   cnn_CnnMain_Convolution2DLayer7_b   
cnn_CnnMain_Convolution2DLayer10_b  cnn_CnnMain_Convolution2DLayer7_w   
cnn_CnnMain_Convolution2DLayer10_w  cnn_CnnMain_Convolution2DLayer8_b   
cnn_CnnMain_Convolution2DLayer11_b  cnn_CnnMain_Convolution2DLayer8_w   
cnn_CnnMain_Convolution2DLayer11_w  cnn_CnnMain_Convolution2DLayer9_b   
cnn_CnnMain_Convolution2DLayer12_b  cnn_CnnMain_Convolution2DLayer9_w   
cnn_CnnMain_Convolution2DLayer12_w  cnn_CnnMain_Convolution2DLayer_b    
cnn_CnnMain_Convolution2DLayer13_b  cnn_CnnMain_Convolution2DLayer_w    
cnn_CnnMain_Convolution2DLayer13_w  cnn_CnnMain_FullyConnectedLayer1_b  
cnn_CnnMain_Convolution2DLayer14_b  cnn_CnnMain_FullyConnectedLayer1_w  
cnn_CnnMain_Convolution2DLayer14_w  cnn_CnnMain_FullyConnectedLayer_b   
cnn_CnnMain_Convolution2DLayer15_b  cnn_CnnMain_FullyConnectedLayer_w   
cnn_CnnMain_Convolution2DLayer15_w  cnn_CnnMain_labels.txt              
cnn_CnnMain_Convolution2DLayer16_b  cnn_api.cpp                         
cnn_CnnMain_Convolution2DLayer16_w  cnn_api.hpp                         
cnn_CnnMain_Convolution2DLayer17_b  cnn_api.o                           
cnn_CnnMain_Convolution2DLayer17_w  cnn_exec.cpp                        
cnn_CnnMain_Convolution2DLayer18_b  cnn_exec.hpp                        
cnn_CnnMain_Convolution2DLayer18_w  cnn_exec.o                          
cnn_CnnMain_Convolution2DLayer19_b  cnnbuild.a                          
cnn_CnnMain_Convolution2DLayer19_w  cnnbuild_rtw.mk                     
cnn_CnnMain_Convolution2DLayer1_b   rtwtypes.h                          
cnn_CnnMain_Convolution2DLayer1_w   tmwtypes.h                          
cnn_CnnMain_Convolution2DLayer20_b  

Main File

The main file creates and sets up the CnnMain network object with layers and weights. It use the OpenCV VideoCapture method to read frames from input video. It runs prediction for each frame fetching the output from the final fully connected layer.

The class probabilities and bounding box values is read from the output array and displayed.

   int main(int argc, char* argv[])
   {
       float *inputBuffer = (float*)calloc(sizeof(float),448*448*3);
       float *outputBuffer = (float*)calloc(sizeof(float),1470);
       if ((inputBuffer == NULL) || (outputBuffer == NULL)) {
           printf("ERROR: Input/Output buffers could not be allocated!\n");
           exit(-1);
       }
       CnnMain* net = new CnnMain;
       net->batchSize = 1;
       net->setup();
       if (argc < 2)
       {
           printf("Pass in input video file name as argument\n");
           return -1;
       }
       VideoCapture cap(argv[1]);
       if (!cap.isOpened()) {
           printf("Could not open the video capture device.\n");
           return -1;
       }
       namedWindow("Yolo Demo",CV_WINDOW_NORMAL);
       cvMoveWindow("Yolo Demo", 0, 0);
       resizeWindow("Yolo Demo", 1352,1013);
       float fps = 0;
       cudaEvent_t start, stop;
       cudaEventCreate(&start);
       cudaEventCreate(&stop);
       for(;;)
       {
           Mat orig;
           cap >> orig;
           if (orig.empty()) break;
           Mat im;
           readData(inputBuffer, orig, im);
           cudaEventRecord(start);
           cudaMemcpy(net->inputData,
                      inputBuffer,
                      sizeof(float)*448*448*3,
                      cudaMemcpyHostToDevice);
           net->predict();
           cudaMemcpy(outputBuffer,
                      net->layers[55]->getData(),
                      sizeof(float)*1470,
                      cudaMemcpyDeviceToHost);
           cudaEventRecord(stop);
           cudaEventSynchronize(stop);
           float milliseconds = -1.0;
           cudaEventElapsedTime(&milliseconds, start, stop);
           fps = fps*.9+1000.0/milliseconds*.1;
           Mat resized;
           resize(orig, resized, Size(1352,1013));
           writeData(outputBuffer, resized, fps);
           imshow("Yolo Demo", resized);
           if( waitKey(50)%256 == 27 ) break; // stop capturing by pressing ESC
       }
       destroyWindow("Yolo Demo");
       delete net;
       free(inputBuffer);
       free(outputBuffer);
       return 0;
   }

Build and Run Executable

Run executable with an input video file.

video_input = fullfile(matlabroot, ...
      'toolbox', 'vision', 'visiondata', 'viptrain.avi');
  if ispc
      system('make_win.bat');
      system(['object_detection_exe.exe ', video_input]);
  else
      system('make');
      system(['./object_detection_exe ', video_input]);
  end

Press escape to stop capturing at any time.

Input Screenshot

Output Screenshot

Cleanup

Remove the files and return to the original folder.

Run Command: Cleanup

cleanup
Was this topic helpful?