Efficient training of LSTM network with GPU

Question

Yuto Ozaki on 10 Apr 2016

1
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/278243-efficient-training-of-lstm-network-with-gpu

Commented: Joss Knight on 20 Apr 2016

Hi all,

I recently introduced a GPU implemented computer and currently trying to refactor my LSTM codes to take advantage of GPU. However, I found my implementation doesn't show improvement on speed, actually using CPU is faster than using GPU. Below testing codes are testing of basic algorithm of LSTM for comparison. Could anyone give some advice on how to employ the potential of GPU for LSTM? I tried using pagefun, arrayfun and bsxfun but they seemed not working to improve speed.

This one is for GPU.

function LSTM_gpu2()
  vis = 700; hid = 500;
  T = 80; epochs = 10;
  sigmoid = @(x) 1./(1+exp(-x));
    x = rand(vis,1,T); h = zeros(hid,1,T+1); c = h;
    W_z = rand(hid,vis,'gpuArray'); W_i = rand(hid,vis,'gpuArray');
    W_f = rand(hid,vis,'gpuArray'); W_o = rand(hid,vis,'gpuArray');
    R_z = rand(hid,hid,'gpuArray'); R_i = rand(hid,hid,'gpuArray');
    R_f = rand(hid,hid,'gpuArray'); R_o = rand(hid,hid,'gpuArray');
    P_i = diag(rand(hid,1,'gpuArray')); P_f = diag(rand(hid,1,'gpuArray'));
    P_o = diag(rand(hid,1,'gpuArray'));
    b_z = rand(hid,1,'gpuArray'); b_i = rand(hid,1,'gpuArray');
    b_f = rand(hid,1,'gpuArray'); b_o = rand(hid,1,'gpuArray');
    I = zeros(hid,T,'gpuArray'); F = zeros(hid,T,'gpuArray'); 
    O = zeros(hid,T,'gpuArray'); G = zeros(hid,T,'gpuArray');
    x = gpuArray(x); h = gpuArray(h); c = gpuArray(c);
    tic;
    for i=1:epochs
        for t=1:T
            G(:,t) = tanh(W_z*x(:,:,t) + R_z*h(:,:,t) + b_z);
            I(:,t) = sigmoid(W_i*x(:,:,t) + R_i*h(:,:,t) + P_i*c(:,:,t) + b_i);
            F(:,t) = sigmoid(W_f*x(:,:,t) + R_f*h(:,:,t) + P_f*c(:,:,t) + b_f);
            c(:,:,t+1) = G(:,t).*I(:,t) + c(:,:,t).*F(:,t);
            O(:,t) = sigmoid(W_o*x(:,:,t) + R_o*h(:,:,t) + P_o*c(:,:,t+1) + b_o);
            h(:,:,t+1) = tanh(c(:,:,t+1)).*O(:,t);
        end
        %%backprop
        %%update
    end
    toc;
  return;

And this one is for CPU.

function LSTM_cpu()
  vis = 700; hid = 500;
  T = 80; epochs = 10;
  sigmoid = @(x) 1./(1+exp(-x));
    x = rand(vis,1,T); h = zeros(hid,1,T+1); c = h;
    W_z = rand(hid,vis); W_i = rand(hid,vis);
    W_f = rand(hid,vis); W_o = rand(hid,vis);
    R_z = rand(hid,hid); R_i = rand(hid,hid);
    R_f = rand(hid,hid); R_o = rand(hid,hid);
    P_i = diag(rand(hid,1)); P_f = diag(rand(hid,1));
    P_o = diag(rand(hid,1));
    b_z = rand(hid,1); b_i = rand(hid,1);
    b_f = rand(hid,1); b_o = rand(hid,1);
    I = zeros(hid,T); F = zeros(hid,T); 
    O = zeros(hid,T); G = zeros(hid,T);
    tic;
    for i=1:epochs
        for t=1:T
            G(:,t) = tanh(W_z*x(:,:,t) + R_z*h(:,:,t) + b_z);
            I(:,t) = sigmoid(W_i*x(:,:,t) + R_i*h(:,:,t) + P_i*c(:,:,t) + b_i);
            F(:,t) = sigmoid(W_f*x(:,:,t) + R_f*h(:,:,t) + P_f*c(:,:,t) + b_f);
            c(:,:,t+1) = G(:,t).*I(:,t) + c(:,:,t).*F(:,t);
            O(:,t) = sigmoid(W_o*x(:,:,t) + R_o*h(:,:,t) + P_o*c(:,:,t+1) + b_o);
            h(:,:,t+1) = tanh(c(:,:,t+1)).*O(:,t);
        end
        %%backprop
        %%update
    end
    toc;
  return;

OS: Windows 10,

GPU: NVIDIA Quadro M5000,

CPU: Intel i7-5820K,

MATLAB: R2016a

Thank you,

Yuto Ozaki

1 Comment
Show -1 older commentsHide -1 older comments

Yuto Ozaki on 10 Apr 2016

Edited: Yuto Ozaki on 10 Apr 2016

Additional question:

Some papers[1] [2] use affine transform notation to realize a more compact way of calculation but they do not using peephole connections. In fact, Chainer's LSTM model does not implement peephole connections and TensorFlow provides LSTM models both having and not having peephole connections. To pursue calculation efficiency, would omitting peephole be the current best practice? If a model does not include peephole, all affine transform can be done at once and I think it can lead to more GPU-friendly coding.

[1] Kelvin Xu, et al.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (2015)

[2] Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals: RECURRENT NEURAL NETWORK REGULARIZATION (2014)

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 15 Apr 2016

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/278243-efficient-training-of-lstm-network-with-gpu#answer_218045

To get good performance out of the GPU, you need to give it a lot of data to process. Your best bet is to vectorize your code to remove the inner loop. Your sigmoid and tanh activation functions, for instance, are element-wise operators and so should vectorize trivially, while your matrix multiplies can be executed in batch using pagefun.

Alternatively, have you considered using the new Deep Learning features in the Neural Network Toolbox in MATLAB R2016a, or the free 3rd party deep learning solution MatConvNet?

2 Comments
Show NoneHide None

Yuto Ozaki on 16 Apr 2016

Joss,

Thank you for your reply. I just tried with bigger size samples training with mini batch and it yielded around 35% faster speed on GPU. However, I think removing the inner loop would be challenging since RNN basically gets input from previous state and that would make sequential for-loop be essential algorithm of RNN.

I have checked Neural Network Toolbox but seemingly the toolbox has not implemented RNN. My main interest is in music information retrieval so time-series models such as RNN and other variants are my main focus.

Joss Knight on 20 Apr 2016

Support for RNNs is considered high priority by the development team. Meanwhile, take a look at MatConvNet.

Sign in to comment.

Efficient training of LSTM network with GPU

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Efficient training of LSTM network with GPU

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None