“High-performance computing with MATLAB enables us to process previously unanalyzed big data. We translate what we learn into an understanding of how human activities affect the health of ecosystems to inform responsible decisions about what humans do in the ocean and on land.”
Dr. Christopher Clark, Cornell University
For more than 30 years, scientists have studied local animal populations by recording animal sounds in oceans, jungles, forests, and other natural environments. They use the results to assess the effect of man-made noise on natural environments, monitor endangered animal populations, and investigate animal communication. Passive acoustic monitoring systems record sounds continuously, generating terabytes of data. Scientists are often unable to process even 1% of this data because they lack the necessary advanced algorithms and processing capacity.
Bioacoustics Research Program (BRP) scientists at the Cornell Laboratory of Ornithology analyze vast amounts of acoustic data with MATLAB®, Parallel Computing Toolbox™, and MATLAB Distributed Computing Server™. The project, funded by a grant from the Office of Naval Research and the National Oceanic Partnership Program, is led by two principal investigators from Cornell: Dr. Christopher Clark, senior scientist and director of BRP, and Dr. Peter Dugan, lead data scientist for BRP.
“MATLAB and MATLAB parallel computing tools gave us the flexibility to dynamically improve and adapt the algorithms that we use to process our big acoustic data sets,” says Dr. Clark. “If we were using C++ or a similar language, we would not be able to move as quickly or explore as many scenarios.”
Researchers analyzing acoustic data must contend with noise from weather, other animals, and nearby machinery and vehicles. The variability of animal sounds across individuals within a species is a further complication. These two factors—noise and variability—increase the number of false positives and negatives, reducing the detection algorithms’ accuracy.
Processing the hundreds of terabytes of data that BRP is gathering presents another challenge. A typical project involves processing years of raw acoustic data—up to 10TB—recorded on multiple channels. Each channel may capture hundreds of millions of events—sounds that stand out when the data is viewed as a spectrogram. Algorithms tested on small, high-quality samples are often considerably less accurate when applied to larger, noisier data sets.
Lastly, BRP analysis tools must serve a wide range of research initiatives, environments, and shifting requirements. “Answers to our initial research questions often lead to brand-new avenues to explore, and we need to be able to handle these sudden changes in our requirements,” says Dr. Clark.
BRP data scientists used MATLAB to develop high-performance computing (HPC) software for automatically processing acoustic data.
They begin a detection-classification project by collecting audio clips of the animal they wish to detect, clips of background noise in the animal’s environment, and MAT-files of archived acoustic data. Working in MATLAB, they develop new or refine existing algorithms that detect audio sequences in the archived data similar to those in the clip catalog.
The algorithms use pattern matching, edge detection, connected region analysis, convolution, and other techniques supported by Image Processing Toolbox™ and Signal Processing Toolbox™, as well as machine learning techniques supported by Fuzzy Logic Toolbox™ and Neural Network Toolbox™.
To evaluate the accuracy of the algorithms, the researchers use Statistics Toolbox™ to compute receiver operating characteristics (ROC) and other performance curves.
After debugging and optimizing the algorithms on small data sets using Parallel Computing Toolbox, the scientists run them against the full archived data sets on a 64-worker cluster using MATLAB Distributed Computing Server.
The BRP team developed a MATLAB interface that enables researchers to specify the algorithms, data sets, and number of processors.
BRP collaborated with Marinexplore and the Kaggle community to sponsor a worldwide competition in which more than 240 participants submitted algorithms for detecting and classifying the upsweep contact calls of North Atlantic right whales. BRP used their MATLAB HPC platform to identify the most accurate algorithm, which will be used to help prevent ship collisions with the whales.
In addition to detection and classification algorithms, BRP uses MATLAB for noise analysis and acoustic modeling, in which the time and frequency dispersion effects of marine or terrestrial environments are captured and simulated.
Detect and classify animal sounds in huge sets of acoustic data acquired from oceans, fields, forests, and jungles
Develop a high-performance computing platform for acoustic data analysis using MATLAB, Parallel Computing Toolbox, and MATLAB Distributed Computing Server