This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Live Direction Of Arrival Estimation with a Linear Microphone Array

This example shows how to acquire and process live multichannel audio. It also presents a simple algorithm for estimating the Direction Of Arrival (DOA) of a sound source using multiple microphone pairs within a linear array.

Select and configure the source of audio samples

If a multichannel input audio interface is available, then modify this script to set sourceChoice to 'live'. In this mode the example will use live audio input signals - it will also assume all inputs (two or more) are driven by microphones arranged on a linear array. If no microphone array or multichanel audio card is available, then set sourceChoice to 'recorded'. In this mode the example will use prerecorded audio samples, acquired with a linear array. For sourceChoice = 'live', the following code uses audioDeviceReader to acquire 4 live audio channels through a Microsoft Kinect™ for Windows®. To use another microphone array setup, ensure the installed audio device driver is one of the conventional types supported by MATLAB and set the Device property of audioDeviceReader accordingly. You can query valid Device assignments for your computer by calling getAudioDevices method of audioDeviceReader object. Note that even when using Microsoft Kinect, the device name can vary across machines and may not match the one used in this example. Use tab completion to get the correct name on your machine.

% sourceChoice = {'recorded'|'live'}
sourceChoice = 'recorded';

% Set for how long the live processing should last
endTime = 20;
% And how many samples per channel should be acquired and processed at each
% iteration
audioFrameLength = 3200;

switch sourceChoice
    case 'live'
        fs = 16000;
        audioInput = audioDeviceReader(...
            'Device', 'Microphone Array (Microsoft Kinect USB Audio)',...
            'SampleRate', fs, ...
            'NumChannels', 4,...
            'SamplesPerFrame', audioFrameLength);
    case 'recorded'
        % The following audio files holds a 20-second recording of 4 raw
        % audio channels acquired with a Microsoft Kinect(TM) for
        % Windows(R) in the presence of a noisy source moving in front of
        % the array roughly from -40 to about +40 degrees and then back
        % to the initial position.
        audioFileName = 'AudioArray-16-16-4channels-20secs.wav';
        audioInput = dsp.AudioFileReader(...
            'Filename', audioFileName,...
            'PlayCount', inf, ...
            'SamplesPerFrame', audioFrameLength);
        fs = audioInput.SampleRate;

Define Array Geometry

The following values identify the approximate linear coordinates of the 4 built-in microphones of the Microsoft Kinect™ relative to the position of the RGB camera (not used in this example). For 3D coordinates use [[x1;y1;z1], [x2;y2;z2], ..., [xN;yN;zN]]

micPositions = [-0.088, 0.042, 0.078, 0.11];

Form Microphone Pairs

The algorithm used in this example works with pairs of microphones independently. It then combines the individual DOA estimates to provide a single live DOA output. The more pairs available, the more robust (yet computationally expensive) DOA estimation. The maximum number of pairs available can be computed through nchoosek(length(micPositions),2). In this case the 3 pairs with the largest inter-microphone distances are selected. The larger the inter-microphone distance the more sensitive the DOA estimate. Each column of the following matrix describes a choice of microphone pair within the array. All values must be integers between 1 and length(micPositions).

micPairs = [1 4; 1 3; 1 2];
numPairs = size(micPairs, 1);

Initialize DOA visualization

% Create an instance of the helper plotting object DOADisplay. This will be
% display the estimated DOA live with an arrow in a polar plot.
DOAPointer = audioexample.DOADisplay();

Create and configure the algorithmic building blocks

Audio Frame lengths

bufferLength = 64;

% Use a helper object to rearrange the input samples according the how the
% microphone pairs are selected
preprocessor = audioexample.PairArrayPreprocessor(...
    'MicPositions', micPositions,...
    'MicPairs', micPairs,...
    'BufferLength', bufferLength);
micSeparations = getPairSeparations(preprocessor);

% The main algorithmic builing block of this example is a cross-correlator.
% That is used in conjunction with an interpolator to ensure a finer DOA
% resolution. In this simple case it is sufficient to use the same two
% objects across the different pairs available. In general, however,
% different channels may need to independently save their internal states
% and hence to be handled by separate objects.
XCorrelator = dsp.Crosscorrelator(...
    'Method', 'Frequency Domain');
interpFactor = 8;
b = interpFactor * fir1((2*interpFactor*8-1),1/interpFactor);
groupDelay = median(grpdelay(b));
interpolator = dsp.FIRInterpolator(...

Acquire and process signals in a loop

For each iteration of the following while loop: read audioFrameLength samples for each audio channel, process the data to estimate a DOA value and display the result on a bespoke arrow-based polar visualization.

while(toc < endTime)
    cycleStart = toc;
    % Read a multichannel frame from the audio source
    % The returned array is of size AudioFrameLenght x size(micPositions,2)
    multichannelAudioFrame = audioInput();

    % Rearrange the acquired sample in 4-D array of size
    % bufferLength x numBuffers x 2 x numPairs where 2 is the number of
    % channels per microphone pair
    bufferedFrame = preprocessor(multichannelAudioFrame);

    % First estimate the DOA for each pair, independently

    % Initialize arrays uses across available pairs
    numBuffers = size(bufferedFrame, 2);
    delays = zeros(1,numPairs);
    anglesInRadians = zeros(1,numPairs);
    xcDense = zeros((2*bufferLength-1)*interpFactor, numPairs);

    % Loop through available pairs
    for kPair = 1:numPairs
        % Estimate inter-microphone delay for each 2-channel buffer
        delayVector = zeros(numBuffers, 1);
        for kBuffer = 1:numBuffers
            % Cross-correlate pair channels to get a coarse
            % crosscorrelation
            xcCoarse = XCorrelator( ...
                bufferedFrame(:,kBuffer,1,kPair), ...

            % Interpolate to increase spatial resolution
            xcDense = interpolator(flipud(xcCoarse));

            % Extract position of maximum, equal to delay in sample time
            % units, including the group delay of the interpolation filter
            [~,idx] = max(xcDense);
            delayVector(kBuffer) = ...
                (idx - groupDelay)/interpFactor - bufferLength;

        % Combine DOA estimation across pairs by selecting the median value
        delays(kPair) = median(delayVector);

        % Convert delay into angle using the microsoft pair spatial
        % separtions provided
        anglesInRadians(kPair) = HelperDelayToAngle(delays(kPair), fs, ...

    % Combine DOA estimation across pairs by keeping only the median value
    DOAInRadians = median(anglesInRadians);

    % Arrow display

    % Delay cycle execution artificially if using recorded data
        pause(audioFrameLength/fs - toc + cycleStart)


Was this topic helpful?