Main Content

Solve Partial Differential Equations Using Deep Learning

This example shows how to solve Burger's equation using deep learning.

The Burger's equation is a partial differential equation (PDE) that arises in different areas of applied mathematics. In particular, fluid mechanics, nonlinear acoustics, gas dynamics, and traffic flows.

Given the computational domain[-1,1]×[0,1], this examples uses a physics informed neural network (PINN) [1] and trains a multilayer perceptron neural network that takes samples (x,t) as input, where x[-1,1] is the spatial variable, and t[0,1] is the time variable, and returns u(x,t), where u is the solution of the Burger's equation:


with u(x,t=0)=-sin(πx)as the initial condition, and u(x=-1,t)=0 and u(x=1,t)=0 as the boundary conditions.

The example trains the model by enforcing that given an input (x,t), the output of the network u(x,t) fulfills the Burger's equation, the boundary conditions, and the initial condition.

Training this model does not require collecting data in advance. You can generate data using the definition of the PDE and the constraints.

Generate Training Data

Training the model requires a data set of collocation points that enforce the boundary conditions, enforce the initial conditions, and fulfill the Burger's equation.

Select 25 equally spaced time points to enforce each of the boundary conditions u(x=-1,t)=0 and u(x=1,t)=0.

numBoundaryConditionPoints = [25 25];

x0BC1 = -1*ones(1,numBoundaryConditionPoints(1));
x0BC2 = ones(1,numBoundaryConditionPoints(2));

t0BC1 = linspace(0,1,numBoundaryConditionPoints(1));
t0BC2 = linspace(0,1,numBoundaryConditionPoints(2));

u0BC1 = zeros(1,numBoundaryConditionPoints(1));
u0BC2 = zeros(1,numBoundaryConditionPoints(2));

Select 50 equally spaced spatial points to enforce the initial condition u(x,t=0)=-sin(πx).

numInitialConditionPoints  = 50;

x0IC = linspace(-1,1,numInitialConditionPoints);
t0IC = zeros(1,numInitialConditionPoints);
u0IC = -sin(pi*x0IC);

Group together the data for initial and boundary conditions.

X0 = [x0IC x0BC1 x0BC2];
T0 = [t0IC t0BC1 t0BC2];
U0 = [u0IC u0BC1 u0BC2];

Select 10,000 points to enforce the output of the network to fulfill the Burger's equation.

numInternalCollocationPoints = 10000;

pointSet = sobolset(2);
points = net(pointSet,numInternalCollocationPoints);

dataX = 2*points(:,1)-1;
dataT = points(:,2);

Create an array datastore containing the training data.

ds = arrayDatastore([dataX dataT]);

Define Deep Learning Model

Define a multilayer perceptron architecture with 9 fully connect operations with 20 hidden neurons. The first fully connect operation has two input channels corresponding to the inputs x and t. The last fully connect operation has one output u(x,t).

Define and Initialize Model Parameters

Define the parameters for each of the operations and include them in a struct. Use the format parameters.OperationName.ParameterName where parameters is the struct, OperationName is the name of the operation (for example "fc1") and ParameterName is the name of the parameter (for example, "Weights").

Specify the number of layers and the number of neurons for each layer.

numLayers = 9;
numNeurons = 20;

Initialize the parameters for the first fully connect operation. The first fully connect operation has two input channels.

parameters = struct;

sz = [numNeurons 2];
parameters.fc1.Weights = initializeHe(sz,2);
parameters.fc1.Bias = initializeZeros([numNeurons 1]);

Initialize the parameters for each of the remaining intermediate fully connect operations.

for layerNumber=2:numLayers-1
    name = "fc"+layerNumber;

    sz = [numNeurons numNeurons];
    numIn = numNeurons;
    parameters.(name).Weights = initializeHe(sz,numIn);
    parameters.(name).Bias = initializeZeros([numNeurons 1]);

Initialize the parameters for the final fully connect operation. The final fully connect operation has one output channel.

sz = [1 numNeurons];
numIn = numNeurons;
parameters.("fc" + numLayers).Weights = initializeHe(sz,numIn);
parameters.("fc" + numLayers).Bias = initializeZeros([1 1]);

View the network parameters.

parameters = struct with fields:
    fc1: [1×1 struct]
    fc2: [1×1 struct]
    fc3: [1×1 struct]
    fc4: [1×1 struct]
    fc5: [1×1 struct]
    fc6: [1×1 struct]
    fc7: [1×1 struct]
    fc8: [1×1 struct]
    fc9: [1×1 struct]

View the parameters of the first fully connected layer.

ans = struct with fields:
    Weights: [20×2 dlarray]
       Bias: [20×1 dlarray]

Define Model and Model Gradients Functions

Create the function model, listed in the Model Function section at the end of the example, that computes the outputs of the deep learning model. The function model takes as input the model parameters and the network inputs, and returns the model output.

Create the function modelGradients, listed in the Model Gradients Function section at the end of the example, that takes as input the model parameters, the network inputs, and the initial and boundary conditions, and returns the gradients of the loss with respect to the learnable parameters and the corresponding loss.

Specify Training Options

Train the model for 3000 epochs with a mini-batch size of 1000.

numEpochs = 3000;
miniBatchSize = 1000;

To train on a GPU if one is available, specify the execution environment "auto". Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For information on supported devices, see GPU Support by Release (Parallel Computing Toolbox) (Parallel Computing Toolbox).

executionEnvironment = "auto";

Specify ADAM optimization options.

initialLearnRate = 0.01;
decayRate = 0.005;

Train Network

Train the network using a custom training loop.

Create a minibatchqueue object that processes and manages mini-batches of data during training. For each mini-batch:

  • Format the data with the dimension labels 'BC' (batch, channel). By default, the minibatchqueue object converts the data to dlarray objects with underlying type single.

  • Train on a GPU according to the value of the executionEnvironment variable. By default, the minibatchqueue object converts each output to a gpuArray if a GPU is available.

mbq = minibatchqueue(ds, ...
    'MiniBatchSize',miniBatchSize, ...
    'MiniBatchFormat','BC', ...

Convert the initial and boundary conditions to dlarray. For the input data points, specify format with dimensions 'CB' (channel, batch).

dlX0 = dlarray(X0,'CB');
dlT0 = dlarray(T0,'CB');
dlU0 = dlarray(U0);

If training using a GPU, convert the initial and conditions to gpuArray.

if (executionEnvironment == "auto" && canUseGPU) || (executionEnvironment == "gpu")
    dlX0 = gpuArray(dlX0);
    dlT0 = gpuArray(dlT0);
    dlU0 = gpuArray(dlU0);

Initialize the parameters for the Adam solver.

averageGrad = [];
averageSqGrad = [];

Accelerate the model gradients function using the dlaccelerate function. To learn more, see Accelerate Custom Training Loop Functions.

accfun = dlaccelerate(@modelGradients);

Initialize the training progress plot.

C = colororder;
lineLoss = animatedline('Color',C(2,:));
ylim([0 inf])
grid on

Train the network.

For each iteration:

  • Read a mini-batch of data from the mini-batch queue

  • Evaluate the model gradients and loss using the accelerated model gradients and dlfeval functions.

  • Update the learning rate.

  • Update the learnable parameters using the adamupdate function.

At the end of each epoch, update the training plot with the loss values.

start = tic;

iteration = 0;

for epoch = 1:numEpochs

    while hasdata(mbq)
        iteration = iteration + 1;

        dlXT = next(mbq);
        dlX = dlXT(1,:);
        dlT = dlXT(2,:);

        % Evaluate the model gradients and loss using dlfeval and the
        % modelGradients function.
        [gradients,loss] = dlfeval(accfun,parameters,dlX,dlT,dlX0,dlT0,dlU0);

        % Update learning rate.
        learningRate = initialLearnRate / (1+decayRate*iteration);

        % Update the network parameters using the adamupdate function.
        [parameters,averageGrad,averageSqGrad] = adamupdate(parameters,gradients,averageGrad, ...

    % Plot training progress.
    loss = double(gather(extractdata(loss)));
    addpoints(lineLoss,iteration, loss);

    D = duration(0,0,toc(start),'Format','hh:mm:ss');
    title("Epoch: " + epoch + ", Elapsed: " + string(D) + ", Loss: " + loss)

Check the effectiveness of the accelerated function by checking the hit and occupancy rate.

accfun = 
  AcceleratedFunction with properties:

          Function: @modelGradients
           Enabled: 1
         CacheSize: 50
           HitRate: 99.9984
         Occupancy: 2
         CheckMode: 'none'
    CheckTolerance: 1.0000e-04

Evaluate Model Accuracy

For values of t at 0.25, 0.5, 0.75, and 1, compare the predicted values of the deep learning model with the true solutions of the Burger's equation using the l2 error.

Set the target times to test the model at. For each time, calculate the solution at 1001 equally spaced points in the range [-1,1].

tTest = [0.25 0.5 0.75 1];
numPredictions = 1001;
XTest = linspace(-1,1,numPredictions);


for i=1:numel(tTest)
    t = tTest(i);
    TTest = t*ones(1,numPredictions);

    % Make predictions.
    dlXTest = dlarray(XTest,'CB');
    dlTTest = dlarray(TTest,'CB');
    dlUPred = model(parameters,dlXTest,dlTTest);

    % Calcualte true values.
    UTest = solveBurgers(XTest,t,0.01/pi);

    % Calculate error.
    err = norm(extractdata(dlUPred) - UTest) / norm(UTest);

    % Plot predictions.
    ylim([-1.1, 1.1])

    % Plot true values.
    hold on
    plot(XTest, UTest, '--','LineWidth',2)
    hold off

    title("t = " + t + ", Error = " + gather(err));


The plots show how close the predictions are to the true values.

Solve Burger's Equation Function

The solveBurgers function returns the true solution of Burger's equation at times t as outlined in [2].

function U = solveBurgers(X,t,nu)

% Define functions.
f = @(y) exp(-cos(pi*y)/(2*pi*nu));
g = @(y) exp(-(y.^2)/(4*nu*t));

% Initialize solutions.
U = zeros(size(X));

% Loop over x values.
for i = 1:numel(X)
    x = X(i);

    % Calculate the solutions using the integral function. The boundary
    % conditions in x = -1 and x = 1 are known, so leave 0 as they are
    % given by initialization of U.
    if abs(x) ~= 1
        fun = @(eta) sin(pi*(x-eta)) .* f(x-eta) .* g(eta);
        uxt = -integral(fun,-inf,inf);
        fun = @(eta) f(x-eta) .* g(eta);
        U(i) = uxt / integral(fun,-inf,inf);


Model Gradients Function

The model is trained by enforcing that given an input (x,t) the output of the network u(x,t) fulfills the Burger's equation, the boundary conditions, and the intial condition. In particular, two quantities contribute to the loss to be minimized:


where MSEf=1Nfi=1Nf|f(xfi,tfi)|2 and MSEu=1Nui=1Nu|u(xui,tui)-ui|2.

Here, {xui,tui}i=1Nu correspond to collocation points on the boundary of the computational domain and account for both boundary and initial condition. {xfi,tfi}i=1Nf are points in the interior of the domain.

Calculating MSEf requires the derivatives ut,ux,2ux2 of the output u of the model.

The function modelGradients takes as input, the model parameters parameters, the network inputs dlX and dlT, the initial and boundary conditions dlX0, dlT0, and dlU0, and returns the gradients of the loss with respect to the learnable parameters and the corresponding loss.

function [gradients,loss] = modelGradients(parameters,dlX,dlT,dlX0,dlT0,dlU0)

% Make predictions with the initial conditions.
U = model(parameters,dlX,dlT);

% Calculate derivatives with respect to X and T.
gradientsU = dlgradient(sum(U,'all'),{dlX,dlT},'EnableHigherDerivatives',true);
Ux = gradientsU{1};
Ut = gradientsU{2};

% Calculate second-order derivatives with respect to X.
Uxx = dlgradient(sum(Ux,'all'),dlX,'EnableHigherDerivatives',true);

% Calculate lossF. Enforce Burger's equation.
f = Ut + U.*Ux - (0.01./pi).*Uxx;
zeroTarget = zeros(size(f), 'like', f);
lossF = mse(f, zeroTarget);

% Calculate lossU. Enforce initial and boundary conditions.
dlU0Pred = model(parameters,dlX0,dlT0);
lossU = mse(dlU0Pred, dlU0);

% Combine losses.
loss = lossF + lossU;

% Calculate gradients with respect to the learnable parameters.
gradients = dlgradient(loss,parameters);


Model Function

The model trained in this example consists of a series of fully connect operations with a tanh operation between each one.

The model function takes as input the model parameters parameters and the network inputs dlX and dlT, and returns the model output dlU.

function dlU = model(parameters,dlX,dlT)

dlXT = [dlX;dlT];
numLayers = numel(fieldnames(parameters));

% First fully connect operation.
weights = parameters.fc1.Weights;
bias = parameters.fc1.Bias;
dlU = fullyconnect(dlXT,weights,bias);

% tanh and fully connect operations for remaining layers.
for i=2:numLayers
    name = "fc" + i;

    dlU = tanh(dlU);

    weights = parameters.(name).Weights;
    bias = parameters.(name).Bias;
    dlU = fullyconnect(dlU, weights, bias);



  1. Maziar Raissi, Paris Perdikaris, and George Em Karniadakis, Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

  2. C. Basdevant, M. Deville, P. Haldenwang, J. Lacroix, J. Ouazzani, R. Peyret, P. Orlandi, A. Patera, Spectral and finite difference solutions of the Burgers equation, Computers & fluids 14 (1986) 23–41.

See Also

| | |

Related Topics