Object Detection on Lidar Point Clouds Using Deep Learning
Learn how to use a PointPillars deep learning network for 3D object detection on lidar point clouds using Lidar Toolbox™ functionalities. PointPillars networks address some of the common challenges in training robust detectors like sparsity of data per object, object occlusions, and sensor noise.
Hello everyone, In this video we’ll go over how to apply deep learning on lidar data for tasks such as object detection and semantic segmentation.
Deep learning is used to help perceive the environment in autonomous driving and robotics application by identifying and classifying objects in the scene.
For demonstrating deep learning with lidar, we will follow an example from MathWorks documentation that uses a deep learning network called PointPillars for 3D-object detection on point cloud data.
PointPillars networks address some of the common challenges in training robust detectors like sparsity of data per object, object occlusions and sensor noise by using an encoder that can learn representation of point clouds organized in vertical columns called Pillars.
This is the workflow followed in the example:
First, we load the data and then preprocess it. We then define the PointPillars network followed by training the network on the preprocessed data, and
finally, we test the network to evaluate its performance.
Let us start by downloading point cloud dataset from this url.
The dataset contains around 1600 organized lidar point cloud scans of highway scenes and corresponding ground-truth labels for the car and truck objects.
This ground truth data was labeled using MATLAB’s Lidar Labeler App.
You can load point cloud data to the app, define cuboid bounding boxes around the object, automate labeling with inbuilt or custom automation algorithm.
Once you have labeled the data you can export into MATLAB workspace.
We will preprocess the loaded data in 3 steps,
First, we will crop the full-view point cloud to the front-view point cloud or ego vehicle perspective. This will reduce the size of data and in-turn decreases overall network training time.
In the second step we will split the data into a training and testing set.
We will use fileDatastore to manage the point clouds. A fileDatastore is an object that helps to manage collections of custom format files, that are too large to fit in memory.
We will then create another datastore for loading the ground truth labels with 3-D bounding boxes.
In the third step, we will perform ground truth data augmentation which will randomly add a fixed number of car class objects to every point cloud. This technique will help improve network accuracy by synthetically increasing the size of training dataset.
In addition to this, we will use additional augmentation techniques such as flipping, scaling, rotation and translation.
Next we will build the PointPillars network to train on this data.
A PointPillars network accepts point clouds as input and estimates oriented 3D boxes around the objects. It consists of three main stages
1-A feature encoder that converts a point cloud to a sparse pseudo image. It does this by converting point clouds into pillars and then using a simplified version of PointNet to learn a representation of point clouds organized as pillars.
2- a 2D convolutional backbone to process the pseudo-image into high-level representation and
3- a detection head that detects objects and create 3D bounding boxes around them.
We can use command lines in MATLAB to construct this network using inbuilt deep learning layers like input layer, convolution layer, deconvolution layer, batch normalization layer, maxpool and relu layer.
Next we will define training options like number of epochs and learning rate. We will set execution environment as ‘auto’, which will allow to train the network on a GPU if available and otherwise uses CPU for training.
Training point pillars on the entire dataset takes a long time, so for now I am loading a pretrained PointPillars network to detect objects on lidar data.
We can now load the test data and apply the trained network to generate bounding boxes on the identified objects
Finally, we have our objects detected on point cloud data. Here the red bounding boxes represents predicted objects and green bounding boxes are ground truth labels. You can see that they are very close, which means our model’s prediction is reasonably good.
Follow the links below, to learn more. If you have any questions or comments, please let us know through the comment box below.
Thanks for watching.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.