Drass Develops Deep Learning System for Real-Time Object Detection in Maritime Environments
Help ship operators monitor sea environments and detect objects, obstacles, and other ships
Create an object-detection deep learning model that can be deployed on ships and run in real time
- Data labeling automated
- Development time reduced
- Flexible and reproducible framework established
“From data annotation to choosing, training, testing, and fine-tuning our deep learning model, MATLAB had all the tools we needed—and GPU Coder enabled us to rapidly deploy to our NVIDIA GPUs even though we had limited GPU experience.”Valerio Imbriolo, Drass Group
To ensure a safe passage, ship operators must track objects, obstacles, and other ships throughout the voyage. Detecting objects at sea is challenging, however, because the constant motion of the waves changes the background, and the open sea provides few reference points.
To address these challenges, maritime technology company Drass developed a deep learning model for real-time object detection at sea. Even though this was the team’s first deep learning application, MATLAB® enabled them to train and validate a YOLOv2 model two months ahead of their deadline and then integrate it with a C++ application running on the ships.
“From prototyping to integration, MATLAB had tools that made every step of the project easy,” says Valerio Imbriolo, computer vision engineer at Drass. “We were able to finish the object detection application in seven months and were ready for testing in 10 months. In the remaining two months we developed additional features.”
There are no pretrained object-detection models for maritime environments, which meant that the Drass team had to develop and train their own deep learning network from scratch. Traditional object detection systems use a single visual input source. In contrast, the Drass team’s deep learning model would need to merge input from multiple sources, including daylight and thermal cameras, which required additional adjustments to the network architecture and data preprocessing pipeline.
Because Drass wanted to test the object detection model on multiple sea targets, the team had to create, preprocess, and label their own data set, a laborious and time-consuming task.
The team had to develop their model in a way that could be integrated into the main object-detection application used in ships, which was written in C++. Given the specialized nature of their application, a great deal of compute-intensive testing and tweaking would be required to find the best-performing model configuration.
All these tasks needed to be completed within 12 months, a tight deadline given that this would be their first deep learning project.
The Drass team used Deep Learning Toolbox™ to create and train a prototype of their neural network and GPU Coder™ to integrate it into their C++ application.
The team started with a data set of 5,000 frames of unlabeled raw footage taken from the sea. Using Image Processing Toolbox™, they preprocessed the images, removing noise and lens distortion. Wavelet Toolbox™ helped them merge data from the daylight and thermal camera into a single data source.
The team labeled a small portion of their data set with the Video Labeler app. They created a YOLOv2 object detection model in Deep Learning Toolbox and trained it on the subset of labeled data. The partially trained model helped them automate the process of annotating the rest of the data set in Video Labeler. The team then augmented their data set by creating copies of the training examples and modifying them by adding noise, flipping images, or changing the coloring.
To speed up the process of configuring the network for the target application, the Drass team used Parallel Computing Toolbox™ to run multiple instances of the training and optimization process on high-capacity CPU and GPU clusters.
GPU Coder translated the prototype model into CUDA® code for the NVIDIA® GPU used in the ships. The code, written in C++, was then passed to the programming team for integration with the on-ship application.
- Data labeling automated. “Manual labeling would have taken three minutes per frame, or 249 hours for the entire data set,” Imbriolo says. “By using the automated labeling process in Video Labeler, we reduced it to 0.3 seconds per frame. We were able to label and verify 5,000 frames in 42 hours.”
- Development time reduced. “We finished our project in 10 months and used the remaining two months before the deadline to implement additional features for the client,” Imbriolo says. “Without MATLAB and GPU Coder, it would probably have taken 18 months for a full team of C++ engineers, working 40 hours a week, to develop and test the model from scratch.”
- Flexible and reproducible framework established. “In the future, we will be able to modify, update, retrain, and reintegrate the model in the application with minimal effort using our MATLAB framework,” Imbriolo says. “For example, if someone asked me to introduce a YOLOv3 model instead of YOLOv2, I could present the result in about three days.”