Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Neural Network , Speaker Recognition

Subject: Neural Network , Speaker Recognition

From: KAMAL ABAZA

Date: 25 Jan, 2012 21:23:11

Message: 1 of 4

Hello Guys,

I'm doing a speaker Recognition system using Neural Networks .

I used the MLP Neural Network , but I wanted to ask what Transfer Functions are the best to use , and How many Layers should I use??

Thank you.

Subject: Neural Network , Speaker Recognition

From: Greg Heath

Date: 26 Jan, 2012 05:53:00

Message: 2 of 4

On Jan 25, 4:23 pm, "KAMAL ABAZA" <abaza_ka...@yahoo.com> wrote:
> Hello Guys,
>
> I'm doing a speaker Recognition system using Neural Networks .
>
> I used the MLP Neural Network , but I wanted to ask what Transfer
> Functions are the best to use , and How many Layers should I use??
>
> Thank you.

How many speakers (classes), c ?
Number of output nodes O = c
Number of input nodes (dimension of input vectors) I
How many input/output measurememt pairs, N?

[I N] = size(p) % input matrix,
[O N] = size(t) % output matrix.... columns of eye(c)

1. divide the data into trn/val/tst subsets, e.g., N=Ntrn+Nval+Ntst..
Typically
0.6 <~ Ntrn/N <~ 0.8 and Ntst = Nval.
2. Standardize (zero-mean/unit-variance) ptrn and use the mean and
variance
from ptrn to normalize pval and ptst.
3, Use 1 hidden layer with H TANSIG activation functions.
4. Use SOFTMAX for the output layer activation function
    a. If you use LOGSIG, the output conditional posterior probability
estimates
        will not be constained to a unity sum.
    b. If you usePURELIN, the output conditional posterior probability
estimates
        will not be constained to the (0,1) interval.
5. The number of training equations for the I-H-O node topology is
     Neq = Ntrn*O
6. The number of unknown weights to estimate is Nw = (I+1)*H + (H+1)*O
7. For training to convergence Neq >= Nw is required but Neq >> Nw
is
    desirable to mitigate noise and measurement error.
    a. Equivalently, H <= Hub is required but H << Hub is desired
where
       Hub = (Neq-O) /(I+O+1).
8. If Ntrn is not large enough to support H << Hub. Then Stopped
Training
   (aka Early Stopping) with a validation set or regularization using
TRAINBR
    is recommended.
9. If training to convergence, the best value for H can be decided
from a
    search over Ntrial (e.g., 10) different weight initializations for
each
     candidate value of H.
    Sometimes a refined search over H = Hmin2:dH:Hmax2 is preceeded by
a
    coarse search over H = Hmin1*2.^(0:log2(Hmax1/Hmin1)).
10. Although the nets are designed using a differentiable objective
function
      like MSE, MSEREG or SSE, The ultimate criterion for a
classifier is to
     minimize a weighted linear combination of class error rates.
Sometimes the
     weights depend on specified prior probabilities and/or
misclassification
     costs.
11. The ranking.of the numH*Ntrial candidate nets is determined by
the
       performance on the validation data. Generalization: The
estimate of
       performance on unseen data is obtained with the test data.

 Hope this helps.

Greg

Subject: Neural Network , Speaker Recognition

From: sherine

Date: 26 Jan, 2012 12:11:46

Message: 3 of 4

On Jan 26, 7:53 am, Greg Heath <g.he...@verizon.net> wrote:
> On Jan 25, 4:23 pm, "KAMAL ABAZA" <abaza_ka...@yahoo.com> wrote:
>
> > Hello Guys,
>
> > I'm doing a speaker Recognition system using Neural Networks .
>
> > I used the MLP Neural Network , but I wanted to ask what Transfer
> > Functions are the best to use , and How many Layers should I use??
>
> > Thank you.
>
> How many speakers (classes), c ?
> Number of output nodes O = c
> Number of input nodes (dimension of input vectors)  I
> How many input/output measurememt pairs, N?
>
> [I N] = size(p) % input matrix,
> [O N] = size(t) % output matrix.... columns of eye(c)
>
> 1. divide the data into trn/val/tst subsets, e.g., N=Ntrn+Nval+Ntst..
> Typically
> 0.6 <~ Ntrn/N <~ 0.8 and Ntst = Nval.
> 2. Standardize (zero-mean/unit-variance) ptrn and use the mean and
> variance
> from ptrn to normalize pval and ptst.
> 3, Use 1 hidden layer with H TANSIG activation functions.
> 4. Use SOFTMAX for the output layer activation function
>     a. If you use LOGSIG, the output conditional posterior probability
> estimates
>         will not be constained to a unity sum.
>     b. If you usePURELIN, the output conditional posterior probability
> estimates
>         will not be constained to the (0,1) interval.
> 5. The number of training equations for the I-H-O node topology is
>      Neq =  Ntrn*O
> 6. The number of unknown weights to estimate is Nw = (I+1)*H + (H+1)*O
> 7. For training to convergence Neq >= Nw is required but Neq >> Nw
> is
>     desirable to mitigate noise and measurement error.
>     a. Equivalently, H <= Hub is required but H << Hub is desired
> where
>        Hub =  (Neq-O) /(I+O+1).
> 8. If Ntrn is not large enough to support H << Hub. Then Stopped
> Training
>    (aka Early Stopping) with a validation set or regularization using
> TRAINBR
>     is recommended.
> 9. If training to convergence, the best value for H can be decided
> from a
>     search over Ntrial (e.g., 10) different weight initializations for
> each
>      candidate value of H.
>     Sometimes a refined search over H = Hmin2:dH:Hmax2 is preceeded by
> a
>     coarse search over H = Hmin1*2.^(0:log2(Hmax1/Hmin1)).
> 10. Although the nets are designed using a differentiable objective
> function
>       like MSE, MSEREG or SSE,  The ultimate criterion for a
> classifier is to
>      minimize a weighted linear combination of class error rates.
> Sometimes the
>      weights depend on specified prior probabilities and/or
> misclassification
>      costs.
> 11. The ranking.of the numH*Ntrial candidate nets is determined by
> the
>        performance  on the validation data. Generalization: The
> estimate of
>        performance on unseen data is obtained with the test data.
>
>  Hope this helps.
>
> Greg

Hi, I think I'm doing the same project,
the database used is the elsdsr database consisting of 23 persons and
each has a 7 trainning recordings and 2 test recordings,
I extracted the features using mfcc. I think we should use these
features as an input to the nn,but these features are huge matrix,
and each is of different size, and I don't know how the targets will
look like..
can u help please.

Sherine

Subject: Neural Network , Speaker Recognition

From: KAMAL ABAZA

Date: 26 Jan, 2012 14:54:09

Message: 4 of 4

sherine <en.sherine@gmail.com> wrote in message <54582376-5261-4705-ae32-b56d49a47c05@b18g2000vbz.googlegroups.com>...
> On Jan 26, 7:53 am, Greg Heath <g.he...@verizon.net> wrote:
> > On Jan 25, 4:23 pm, "KAMAL ABAZA" <abaza_ka...@yahoo.com> wrote:
> >
> > > Hello Guys,
> >
> > > I'm doing a speaker Recognition system using Neural Networks .
> >
> > > I used the MLP Neural Network , but I wanted to ask what Transfer
> > > Functions are the best to use , and How many Layers should I use??
> >
> > > Thank you.
> >
> > How many speakers (classes), c ?
> > Number of output nodes O = c
> > Number of input nodes (dimension of input vectors)  I
> > How many input/output measurememt pairs, N?
> >
> > [I N] = size(p) % input matrix,
> > [O N] = size(t) % output matrix.... columns of eye(c)
> >
> > 1. divide the data into trn/val/tst subsets, e.g., N=Ntrn+Nval+Ntst..
> > Typically
> > 0.6 <~ Ntrn/N <~ 0.8 and Ntst = Nval.
> > 2. Standardize (zero-mean/unit-variance) ptrn and use the mean and
> > variance
> > from ptrn to normalize pval and ptst.
> > 3, Use 1 hidden layer with H TANSIG activation functions.
> > 4. Use SOFTMAX for the output layer activation function
> >     a. If you use LOGSIG, the output conditional posterior probability
> > estimates
> >         will not be constained to a unity sum.
> >     b. If you usePURELIN, the output conditional posterior probability
> > estimates
> >         will not be constained to the (0,1) interval.
> > 5. The number of training equations for the I-H-O node topology is
> >      Neq =  Ntrn*O
> > 6. The number of unknown weights to estimate is Nw = (I+1)*H + (H+1)*O
> > 7. For training to convergence Neq >= Nw is required but Neq >> Nw
> > is
> >     desirable to mitigate noise and measurement error.
> >     a. Equivalently, H <= Hub is required but H << Hub is desired
> > where
> >        Hub =  (Neq-O) /(I+O+1).
> > 8. If Ntrn is not large enough to support H << Hub. Then Stopped
> > Training
> >    (aka Early Stopping) with a validation set or regularization using
> > TRAINBR
> >     is recommended.
> > 9. If training to convergence, the best value for H can be decided
> > from a
> >     search over Ntrial (e.g., 10) different weight initializations for
> > each
> >      candidate value of H.
> >     Sometimes a refined search over H = Hmin2:dH:Hmax2 is preceeded by
> > a
> >     coarse search over H = Hmin1*2.^(0:log2(Hmax1/Hmin1)).
> > 10. Although the nets are designed using a differentiable objective
> > function
> >       like MSE, MSEREG or SSE,  The ultimate criterion for a
> > classifier is to
> >      minimize a weighted linear combination of class error rates.
> > Sometimes the
> >      weights depend on specified prior probabilities and/or
> > misclassification
> >      costs.
> > 11. The ranking.of the numH*Ntrial candidate nets is determined by
> > the
> >        performance  on the validation data. Generalization: The
> > estimate of
> >        performance on unseen data is obtained with the test data.
> >
> >  Hope this helps.
> >
> > Greg
>
> Hi, I think I'm doing the same project,
> the database used is the elsdsr database consisting of 23 persons and
> each has a 7 trainning recordings and 2 test recordings,
> I extracted the features using mfcc. I think we should use these
> features as an input to the nn,but these features are huge matrix,
> and each is of different size, and I don't know how the targets will
> look like..
> can u help please.
>
> Sherine

------------------------------------------------------------------------------------------------
Thank you Greg .... I'll try what you said
K.A

------------------------------------------------------------------------------------------------
Hi Shereen ,
well I'm having the same problem I'm trying to find a way to take only a vector out of the mfcc , about the Target I think it'll be same as number of speakers ..
But haven't figured out the format to do ii
K.A

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us