Main Content

detectionErrorTradeoff

Evaluate binary classification system

    Description

    example

    results = detectionErrorTradeoff(ivs,data,labels) computes detection error tradeoff of the i-vector system ivs for the enrolled labels and the specified data.

    results = detectionErrorTradeoff(ivs) returns a structure containing previously calculated results of threshold sweeping for probabilistic linear discriminant analysis (PLDA) and cosine similarity scoring (CSS).

    [results,threshold] = detectionErrorTradeoff(___) also returns the threshold corresponding to the equal error rate.

    [___] = detectionErrorTradeoff(___,Name,Value) specifies additional options using name-value arguments. For example, you can choose the scorer results and the hardware resource for extracting i-vectors.

    detectionErrorTradeoff(___) with no output arguments plots the equal error rate and the detection error tradeoff.

    Examples

    collapse all

    Use the Pitch Tracking Database from Graz University of Technology (PTDB-TUG) [1]. The data set consists of 20 English native speakers reading 2342 phonetically rich sentences from the TIMIT corpus. Download and extract the data set. Depending on your system, downloading and extracting the data set can take approximately 1.5 hours.

    url = 'https://www2.spsc.tugraz.at/databases/PTDB-TUG/SPEECH_DATA_ZIPPED.zip';
    downloadFolder = tempdir;
    datasetFolder = fullfile(downloadFolder,'PTDB-TUG');
    
    if ~exist(datasetFolder,'dir')
        disp('Downloading PTDB-TUG (3.9 G) ...')
        unzip(url,datasetFolder)
    end

    Create an audioDatastore object that points to the data set. The data set was originally intended for use in pitch-tracking training and evaluation and includes laryngograph readings and baseline pitch decisions. Use only the original audio recordings.

    ads = audioDatastore([fullfile(datasetFolder,"SPEECH DATA","FEMALE","MIC"),fullfile(datasetFolder,"SPEECH DATA","MALE","MIC")], ...
                         'IncludeSubfolders',true, ...
                         'FileExtensions','.wav');

    The file names contain the speaker IDs. Decode the file names to set the labels in the audioDatastore object.

    ads.Labels = extractBetween(ads.Files,'mic_','_');
    countEachLabel(ads)
    ans=18×2 table
        Label    Count
        _____    _____
    
         F01      211 
         F02      213 
         F03      213 
         F04      213 
         F05      236 
         F06      213 
         F07      213 
         F08      210 
         F09      213 
         M01      211 
         M02      213 
         M03      213 
         M04      213 
         M05      235 
         M06      213 
         M07      213 
          ⋮
    
    

    Read an audio file from the data set, listen to it, and plot it.

    [audioIn,audioInfo] = read(ads);
    fs = audioInfo.SampleRate;
    
    t = (0:size(audioIn,1)-1)/fs;
    sound(audioIn,fs)
    plot(t,audioIn)
    xlabel('Time (s)')
    ylabel('Amplitude')
    axis([0 t(end) -1 1])
    title('Sample Utterance from Data Set')

    Separate the audioDatastore object into four: one for training, one for enrollment, one to evaluate the detection-error tradeoff, and one for testing. The training set contains 16 speakers. The enrollment, detection-error tradeoff, and test sets contain the other four speakers.

    speakersToTest = categorical(["M01","M05","F01","F05"]);
    
    adsTrain = subset(ads,~ismember(ads.Labels,speakersToTest));
    
    ads = subset(ads,ismember(ads.Labels,speakersToTest));
    [adsEnroll,adsTest,adsDET] = splitEachLabel(ads,3,1);

    Display the label distributions of the audioDatastore objects.

    countEachLabel(adsTrain)
    ans=14×2 table
        Label    Count
        _____    _____
    
         F02      213 
         F03      213 
         F04      213 
         F06      213 
         F07      213 
         F08      210 
         F09      213 
         M02      213 
         M03      213 
         M04      213 
         M06      213 
         M07      213 
         M08      213 
         M09      213 
    
    
    countEachLabel(adsEnroll)
    ans=4×2 table
        Label    Count
        _____    _____
    
         F01       3  
         F05       3  
         M01       3  
         M05       3  
    
    
    countEachLabel(adsTest)
    ans=4×2 table
        Label    Count
        _____    _____
    
         F01       1  
         F05       1  
         M01       1  
         M05       1  
    
    
    countEachLabel(adsDET)
    ans=4×2 table
        Label    Count
        _____    _____
    
         F01      207 
         F05      232 
         M01      207 
         M05      231 
    
    

    Create an i-vector system. By default, the i-vector system assumes the input to the system is mono audio signals.

    speakerVerification = ivectorSystem('SampleRate',fs)
    speakerVerification = 
      ivectorSystem with properties:
    
             InputType: 'audio'
            SampleRate: 48000
          DetectSpeech: 1
        EnrolledLabels: [0×2 table]
    
    

    To train the extractor of the i-vector system, call trainExtractor. Specify the number of universal background model (UBM) components as 128 and the number of expectation maximization iterations as 5. Specify the total variability space (TVS) rank as 64 and the number of iterations as 3.

    trainExtractor(speakerVerification,adsTrain, ...
        'UBMNumComponents',128,'UBMNumIterations',5, ...
        'TVSRank',64,'TVSNumIterations',3)
    Calculating standardization factors ....done.
    Training universal background model ........done.
    Training total variability space ......done.
    i-vector extractor training complete.
    

    To train the classifier of the i-vector system, use trainClassifier. To reduce dimensionality of the i-vectors, specify the number of eigenvectors in the projection matrix as 16. Specify the number of dimensions in the probabilistic linear discriminant analysis (PLDA) model as 16, and the number of iterations as 3.

    trainClassifier(speakerVerification,adsTrain,adsTrain.Labels, ...
        'NumEigenvectors',16, ...
        'PLDANumDimensions',16,'PLDANumIterations',3)
    Extracting i-vectors ...done.
    Training projection matrix .....done.
    Training PLDA model ......done.
    i-vector classifier training complete.
    

    To inspect parameters used previously to train the i-vector system, use info.

    info(speakerVerification)
    i-vector system input
      Input feature vector length: 60
      Input data type: double
    
    trainExtractor
      Train signals: 2979
      UBMNumComponents: 128
      UBMNumIterations: 5
      TVSRank: 64
      TVSNumIterations: 3
    
    trainClassifier
      Train signals: 2979
      Train labels: F02 (213), F03 (213) ... and 12 more
      NumEigenvectors: 16
      PLDANumDimensions: 16
      PLDANumIterations: 3
    

    Split the enrollment set.

    [adsEnrollPart1,adsEnrollPart2] = splitEachLabel(adsEnroll,1,2);

    To enroll speakers in the i-vector system, call enroll.

    enroll(speakerVerification,adsEnrollPart1,adsEnrollPart1.Labels)
    Extracting i-vectors ...done.
    Enrolling i-vectors .......done.
    Enrollment complete.
    

    When you enroll speakers, the read-only EnrolledLabels property is updated with the enrolled labels and corresponding template i-vectors. The table also keeps track of the number of signals used to create the template i-vector. Generally, using more signals results in a better template.

    speakerVerification.EnrolledLabels
    ans=4×2 table
                  ivector       NumSamples
               _____________    __________
    
        F01    {16×1 double}        1     
        F05    {16×1 double}        1     
        M01    {16×1 double}        1     
        M05    {16×1 double}        1     
    
    

    Enroll the second part of the enrollment set and then view the enrolled labels table again. The i-vector templates and the number of samples are updated.

    enroll(speakerVerification,adsEnrollPart2,adsEnrollPart2.Labels)
    Extracting i-vectors ...done.
    Enrolling i-vectors .......done.
    Enrollment complete.
    
    speakerVerification.EnrolledLabels
    ans=4×2 table
                  ivector       NumSamples
               _____________    __________
    
        F01    {16×1 double}        3     
        F05    {16×1 double}        3     
        M01    {16×1 double}        3     
        M05    {16×1 double}        3     
    
    

    To evaluate the i-vector system and determine a decision threshold for speaker verification, call detectionErrorTradeoff.

    [results, eerThreshold] = detectionErrorTradeoff(speakerVerification,adsDET,adsDET.Labels);
    Extracting i-vectors ...done.
    Scoring i-vector pairs ...done.
    Detection error tradeoff evaluation complete.
    

    The first output from detectionErrorTradeoff is a structure with two fields: CSS and PLDA. Each field contains a table. Each row of the table contains a possible decision threshold for speaker verification tasks, and the corresponding false alarm rate (FAR) and false rejection rate (FRR). The FAR and FRR are determined using the enrolled speaker labels and the data input to the detectionErrorTradeoff function.

    results
    results = struct with fields:
        PLDA: [1000×3 table]
         CSS: [1000×3 table]
    
    
    results.CSS
    ans=1000×3 table
        Threshold      FAR      FRR
        _________    _______    ___
    
        0.030207           1     0 
        0.031161     0.99962     0 
        0.032115     0.99962     0 
        0.033069     0.99962     0 
        0.034023     0.99962     0 
        0.034977     0.99962     0 
        0.035931     0.99962     0 
        0.036885     0.99962     0 
        0.037839     0.99962     0 
        0.038793     0.99962     0 
        0.039747     0.99962     0 
        0.040701     0.99962     0 
        0.041655     0.99962     0 
        0.042609     0.99962     0 
        0.043563     0.99962     0 
        0.044517     0.99962     0 
          ⋮
    
    
    results.PLDA
    ans=1000×3 table
        Threshold      FAR      FRR
        _________    _______    ___
    
         -217.63           1     0 
          -217.4     0.99962     0 
         -217.17     0.99962     0 
         -216.95     0.99962     0 
         -216.72     0.99962     0 
         -216.49     0.99962     0 
         -216.27     0.99962     0 
         -216.04     0.99962     0 
         -215.81     0.99962     0 
         -215.59     0.99962     0 
         -215.36     0.99962     0 
         -215.13     0.99962     0 
         -214.91     0.99962     0 
         -214.68     0.99962     0 
         -214.45     0.99962     0 
         -214.23     0.99962     0 
          ⋮
    
    

    The second output from detectionErrorTradeoff is a structure with two fields: CSS and PLDA. The corresponding value is the decision threshold that results in the equal error rate (when FAR and FRR are equal).

    eerThreshold
    eerThreshold = struct with fields:
        PLDA: -34.3083
         CSS: 0.7991
    
    

    The first time you call detectionErrorTradeoff, you must provide data and corresponding labels to evaluate. Subsequently, you can get the same information, or a different analysis using the same underlying data, by calling detectionErrorTradeoff without data and labels.

    Call detectionErrorTradeoff a second time with no data arguments or output arguments to visualize the detection-error tradeoff.

    detectionErrorTradeoff(speakerVerification)

    Call detectionErrorTradeoff again. This time, visualize only the detection-error tradeoff for the PLDA scorer.

    detectionErrorTradeoff(speakerVerification,'Scorer',"plda")

    Depending on your application, you may want to use a threshold that weights the error cost of a false alarm higher or lower than the error cost of a false rejection. You may also be using data that is not representative of the prior probability of the speaker being present. You can use the minDCF parameter to specify custom costs and prior probability. Call detectionErrorTradeoff again, this time specify the cost of a false rejection as 1, the cost of a false acceptance as 2, and the prior probability that a speaker is present as 0.1.

    costFR = 1;
    costFA = 2;
    priorProb = 0.1;
    detectionErrorTradeoff(speakerVerification,'Scorer',"plda",'minDCF',[costFR,costFA,priorProb])

    Call detectionErrorTradeoff again. This time, get the minDCF threshold for the PLDA scorer and the parameters of the detection cost function.

    [~,minDCFThreshold] = detectionErrorTradeoff(speakerVerification,'Scorer',"plda",'minDCF',[costFR,costFA,priorProb])
    minDCFThreshold = -23.4316
    

    Test Speaker Verification System

    Read a signal from the test set.

    adsTest = shuffle(adsTest);
    [audioIn,audioInfo] = read(adsTest);
    knownSpeakerID = audioInfo.Label
    knownSpeakerID = 1×1 cell array
        {'F05'}
    
    

    To perform speaker verification, call verify with the audio signal and specify the speaker ID, a scorer, and a threshold for the scorer. The verify function returns a logical value indicating whether a speaker identity is accepted or rejected, and a score indicating the similarity of the input audio and the template i-vector corresponding to the enrolled label.

    [tf,score] = verify(speakerVerification,audioIn,knownSpeakerID,"plda",eerThreshold.PLDA);
    if tf
        fprintf('Success!\nSpeaker accepted.\nSimilarity score = %0.2f\n\n',score)
    else
        fprinf('Failure!\nSpeaker rejected.\nSimilarity score = %0.2f\n\n',score)
    end
    Success!
    Speaker accepted.
    Similarity score = -4.19
    

    Call speaker verification again. This time, specify an incorrect speaker ID.

    possibleSpeakers = speakerVerification.EnrolledLabels.Properties.RowNames;
    imposterIdx = find(~ismember(possibleSpeakers,knownSpeakerID));
    imposter = possibleSpeakers(imposterIdx(randperm(numel(imposterIdx),1)))
    imposter = 1×1 cell array
        {'F01'}
    
    
    [tf,score] = verify(speakerVerification,audioIn,imposter,"plda",eerThreshold.PLDA);
    if tf
        fprintf('Failure!\nSpeaker accepted.\nSimilarity score = %0.2f\n\n',score)
    else
        fprintf('Success!\nSpeaker rejected.\nSimilarity score = %0.2f\n\n',score)
    end
    Success!
    Speaker rejected.
    Similarity score = -63.44
    

    References

    [1] Signal Processing and Speech Communication Laboratory. https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html. Accessed 12 Dec. 2019.

    Input Arguments

    collapse all

    i-vector system, specified as an object of type ivectorSystem.

    Labeled evaluation data, specified as a cell array or as an audioDatastore, signalDatastore, or TransformedDatastore object.

    • If InputType is set to 'audio' when the i-vector system is created, specify data as one of these:

      • A cell array of single-channel audio signals, each specified as a column vector with underlying type single or double.

      • An audioDatastore object or a signalDatastore object that points to a data set of mono audio signals.

      • A TransformedDatastore with an underlying audioDatastore or signalDatastore that points to a data set of mono audio signals. The output from calls to read from the transform datastore must be mono audio signals with underlying data type single or double.

    • If InputType is set to 'features' when the i-vector system is created, specify data as one of these:

      • A cell array of matrices with underlying type single or double. The matrices must consist of audio features where the number of features (columns) is locked the first time trainExtractor is called and the number of hops (rows) is variable-sized. The number of features input in any subsequent calls to any of the object functions must be equal to the number of features used when calling trainExtractor.

      • A TransformedDatastore object with an underlying audioDatastore or signalDatastore whose read function has output as described in the previous bullet.

      • A signalDatastore object whose read function has output as described in the first bullet.

    Data Types: cell | audioDatastore | signalDatastore

    Classification labels used by an i-vector system, specified as one of the following:

    • A categorical array

    • A cell array of character vectors

    • A string array

    Note

    The number of audio signals in data must match the number of labels.

    Data Types: categorical | cell | string

    Name-Value Arguments

    Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

    Example: detectionErrorTradeoff(ivs,'Scorer','css')

    Scorer results returned from an i-vector system, specified as 'plda', which corresponds to probabilistic linear discriminant analysis (PLDA), 'css', which corresponds to cosine similarity score (CSS), or 'all'.

    Data Types: char | string

    Parameters of the detection cost function, specified as a three-element vector consisting of the cost of a false rejection, the cost of a false acceptance, and the prior probability of an enrolled label being present, in that order.

    When you specify parameters of a detection cost function, detectionErrorTradeoff returns the threshold corresponding to the minimum of the detection cost function [1]. The detection cost function is defined as

    Cdet(PFR,PFA) = CFR × PFR × Ppresent + CFA × PFA × (1 – Ppresent),

    where

    • Cdet — Detection cost function

    • CFR — Cost of a false rejection

    • CFA — Cost of a false acceptance

    • Ppresent — Prior probability of an enrolled label being present

    • PFR — Observed probability of a false rejection given the data input to detectionErrorTradeoff

    • PFA — Observed probability of a false acceptance given the data input to detectionErrorTradeoff

    Data Types: single | double

    Hardware resource for execution, specified as one of these:

    • "auto" — Use the GPU if it is available. Otherwise, use the CPU.

    • "cpu" — Use the CPU.

    • "gpu" — Use the GPU. This option requires Parallel Computing Toolbox™.

    • "multi-gpu" — Use multiple GPUs on one machine, using a local parallel pool based on your default cluster profile. If there is no current parallel pool, the software starts a parallel pool with pool size equal to the number of available GPUs. This option requires Parallel Computing Toolbox.

    • "parallel" — Use a local or remote parallel pool based on your default cluster profile. If there is no current parallel pool, the software starts one using the default cluster profile. If the pool has access to GPUs, then only workers with a unique GPU perform training computation. If the pool does not have GPUs, then the training takes place on all available CPU workers. This option requires Parallel Computing Toolbox.

    Data Types: char | string

    Option to use prefetch queuing when reading from a datastore, specified as a logical value. This argument requires Parallel Computing Toolbox.

    Data Types: logical

    Output Arguments

    collapse all

    FAR and FRR per threshold tested, returned as a structure or a table.

    • If 'Scorer' is specified as 'all', then results is returned as a structure with fields PLDA and CSS and values containing tables. Each table has three variables: Threshold, FAR, and FRR.

    • If 'Scorer' is specified as 'plda' or 'css', then results is returned as a table corresponding to the specified scorer.

    Data Types: struct | table

    Threshold corresponding to the equal error rate (EER) or minimum of the detection cost function (minDCF), returned as a scalar or a structure. If the minDCF is specified, then threshold corresponds to minDCF. Otherwise, threshold corresponds to the EER.

    • If 'Scorer' is specified as 'all', then threshold is returned as a structure with fields PLDA and CSS and values equal to the respective thresholds.

    • If 'Scorer' is specified as 'plda' or 'css', then threshold is returned as a scalar corresponding to the specified scorer.

    Data Types: single | double | struct

    References

    [1] Leeuwen, David A. van, and Niko Brümmer. “An Introduction to Application-Independent Evaluation of Speaker Recognition Systems.” In Speaker Classification I, edited by Christian Müller, 4343:330–53. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. https://doi.org/10.1007/978-3-540-74200-5_19.

    Introduced in R2021a