Embedded vision applications, such as network or on-vehicle cameras, must process large amounts of data in real time. An HDTV-quality signal, for example, carries more than 16 megabits per second. While algorithms designed to work on streaming data generally produce the correct results for individual images, they might not be fast enough to process a continuous stream of video data. The only reliable way to verify an algorithm’s real-time processing performance is to test it on embedded hardware.
At Niigata University we use MATLAB® and Simulink® to develop our algorithms and then run the algorithms on BeagleBoard hardware using the Run on Target Hardware feature of Simulink. We recently used this approach to develop video denoising and image restoration algorithms that apply directional lapped orthogonal transforms (DirLOTs). DirLOTs handle diagonal edges and textures better than the discrete cosine transforms and discrete wavelet transforms often used in this type of application.
MATLAB and Simulink enabled us to prove the viability of image processing with DirLOTs on an embedded processor while focusing on our top research priority—developing and improving DirLOT applications—instead of on low-level C programming tasks and hardware implementation details. One graduate student and one undergraduate student worked with me to complete and refine the hardware implementation in just two months, a task that could easily have taken six months without the visualization, optimization, signal processing, image processing, and code generation capabilities of MATLAB and Simulink.
Moving to BeagleBoard
Previously, we implemented elements of the DirLOT on an FPGA. The implementation, however, required complicated memory and I/O control logic, which were difficult to code by hand. Our goal was to move the implementation to an embedded processor without manually rewriting the algorithm in another programming language. Further, we wanted to simplify the development of video input and output interfaces to the processor running the algorithm.
To meet these objectives, we first chose the BeagleBoard, a low-cost, single-board computer with a Cortex® A8 processor and digital video connectivity. The Simulink support package for BeagleBoard and the Simulink Run on Target Hardware feature enabled us to run Simulink models on BeagleBoard hardware with no manual programming. The BeagleBoard V4L2 Video Capture and Beagle Board SDL Video Display block included in the support package made it easy to implement the input and output interfaces to the algorithm.
Developing, Debugging, and Testing the Algorithm
We began developing the DirLOT algorithm in MATLAB, Optimization Toolbox™, Wavelet Toolbox™, and Image Processing Toolbox™. After verifying the functionality of the algorithm and evaluating its performance, we shared the code on MATLAB File Exchange as the DirLOT Toolbox.
To prepare for running the algorithm on BeagleBoard hardware, we implemented core processing elements of the algorithm using MATLAB System objects™ that support C code generation. Because the System objects we employed use stream processing—with incoming video data partitioned into frames—they enabled the algorithm to handle large amounts of image data while efficiently using available memory.
To test the initial version of the algorithm and the System objects version, we downloaded unit testing frameworks developed by other MATLAB users from File Exchange. These frameworks supported our preferred test-first approach to development.
Next, we created a Simulink model, incorporating the System objects used in our algorithm as MATLAB Function blocks. We added the V4L2 Video Capture and Beagle Board SDL Video Display blocks, which provided the input and output interfaces for the video data. We used Run on Target Hardware feature of Simulink to deploy the model onto the BeagleBoard hardware. The Simulink External mode helped us with debugging, because it enabled us to monitor the model in Simulink as it ran on the BeagleBoard processor.
Lastly, we implemented a standalone version of the model on the BeagleBoard hardware using the Run on Target Hardware feature of Simulink, and performed functional verification of DirLOT running in real time on our lab setup (Figure 1).
We augmented the DirLOT Toolbox on File Exchange, adding the complete Simulink model to the MATLAB code we had previously shared.
From DirLOTs to NSOLTs
We are currently working on a generalization of the DirLOT algorithm that uses nonseparable oversampled lapped transforms (NSOLTs). The MATLAB code, SaivDr (Sparsity-aware image and volume data restoration) Package, is available for download from File Exchange. NSOLTs can yield a better denoising and restoration performance than other approaches, and with less memory requirement. Using MATLAB and Simulink we are performing the same design optimization, performance evaluation, and code development for the NSOLT version as for the DirLOT implementation. We plan to deploy this algorithm to a Xilinx® Zynq®-7000 programmable SoC by generating HDL code for the device’s programmable logic using HDL Coder™ and generating C code for its ARM processor using Embedded Coder®.