|
sherine <en.sherine@gmail.com> wrote in message <54582376-5261-4705-ae32-b56d49a47c05@b18g2000vbz.googlegroups.com>...
> On Jan 26, 7:53 am, Greg Heath <g.he...@verizon.net> wrote:
> > On Jan 25, 4:23 pm, "KAMAL ABAZA" <abaza_ka...@yahoo.com> wrote:
> >
> > > Hello Guys,
> >
> > > I'm doing a speaker Recognition system using Neural Networks .
> >
> > > I used the MLP Neural Network , but I wanted to ask what Transfer
> > > Functions are the best to use , and How many Layers should I use??
> >
> > > Thank you.
> >
> > How many speakers (classes), c ?
> > Number of output nodes O = c
> > Number of input nodes (dimension of input vectors) I
> > How many input/output measurememt pairs, N?
> >
> > [I N] = size(p) % input matrix,
> > [O N] = size(t) % output matrix.... columns of eye(c)
> >
> > 1. divide the data into trn/val/tst subsets, e.g., N=Ntrn+Nval+Ntst..
> > Typically
> > 0.6 <~ Ntrn/N <~ 0.8 and Ntst = Nval.
> > 2. Standardize (zero-mean/unit-variance) ptrn and use the mean and
> > variance
> > from ptrn to normalize pval and ptst.
> > 3, Use 1 hidden layer with H TANSIG activation functions.
> > 4. Use SOFTMAX for the output layer activation function
> > a. If you use LOGSIG, the output conditional posterior probability
> > estimates
> > will not be constained to a unity sum.
> > b. If you usePURELIN, the output conditional posterior probability
> > estimates
> > will not be constained to the (0,1) interval.
> > 5. The number of training equations for the I-H-O node topology is
> > Neq = Ntrn*O
> > 6. The number of unknown weights to estimate is Nw = (I+1)*H + (H+1)*O
> > 7. For training to convergence Neq >= Nw is required but Neq >> Nw
> > is
> > desirable to mitigate noise and measurement error.
> > a. Equivalently, H <= Hub is required but H << Hub is desired
> > where
> > Hub = (Neq-O) /(I+O+1).
> > 8. If Ntrn is not large enough to support H << Hub. Then Stopped
> > Training
> > (aka Early Stopping) with a validation set or regularization using
> > TRAINBR
> > is recommended.
> > 9. If training to convergence, the best value for H can be decided
> > from a
> > search over Ntrial (e.g., 10) different weight initializations for
> > each
> > candidate value of H.
> > Sometimes a refined search over H = Hmin2:dH:Hmax2 is preceeded by
> > a
> > coarse search over H = Hmin1*2.^(0:log2(Hmax1/Hmin1)).
> > 10. Although the nets are designed using a differentiable objective
> > function
> > like MSE, MSEREG or SSE, The ultimate criterion for a
> > classifier is to
> > minimize a weighted linear combination of class error rates.
> > Sometimes the
> > weights depend on specified prior probabilities and/or
> > misclassification
> > costs.
> > 11. The ranking.of the numH*Ntrial candidate nets is determined by
> > the
> > performance on the validation data. Generalization: The
> > estimate of
> > performance on unseen data is obtained with the test data.
> >
> > Hope this helps.
> >
> > Greg
>
> Hi, I think I'm doing the same project,
> the database used is the elsdsr database consisting of 23 persons and
> each has a 7 trainning recordings and 2 test recordings,
> I extracted the features using mfcc. I think we should use these
> features as an input to the nn,but these features are huge matrix,
> and each is of different size, and I don't know how the targets will
> look like..
> can u help please.
>
> Sherine
------------------------------------------------------------------------------------------------
Thank you Greg .... I'll try what you said
K.A
------------------------------------------------------------------------------------------------
Hi Shereen ,
well I'm having the same problem I'm trying to find a way to take only a vector out of the mfcc , about the Target I think it'll be same as number of speakers ..
But haven't figured out the format to do ii
K.A
|