Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
x=A\b speedup parallel processing

Subject: x=A\b speedup parallel processing

From: William

Date: 9 May, 2014 17:14:11

Message: 1 of 2

Hi all,
I am solving a stochastic process model of electron flow in a 3D grid of paths. During the simulation flow shuts-off sequentially in one after another of paths until about 40% are off. In the simulation 90% of the computational time is spent solving a sparse linear equation set, A x =b, over and over and over again.
A is symmetric , banded (7), very sparse and has a real, positive diagonal. The vector b is all zeros except for the b(1) element which never changes. Mldivide works very well on my workstation giving highly accurate solutions in 4 minutes for an A matrix 10^6 by 10^6. At a size of 1.1x10^6 by of 1.1x10^6 the solution bogs down because all RAM is in use and the hard disk is employed. No parallel processing tool box is employed.

spparms('spumoni',2) gives:

sp\: bandwidth = 10000+1+10000.
sp\: is A diagonal? no.
sp\: is band density (0.00) > bandden (0.50) to try banded solver? no.
sp\: is A triangular? no.
sp\: is A morally triangular? no.
sp\: is A a candidate for Cholesky (symmetric, real positive diagonal)? yes.
sp\: is CHOLMOD's symbolic Cholesky factorization (with automatic reordering) successful? yes.
sp\: is CHOLMOD's numeric Cholesky factorization successful? yes.
sp\: is CHOLMOD's triangular solve successful? yes.
Cholesky flop count: 2.128e+013
    Nonzeros in L: 1.5914e+009
  memory blocks in use: 15
  memory in use (MB): 16016.3
  peak memory usage (MB): 19036.5
  maxrank: update/downdate rank: 8
  supernodal control: 1 40 (supernodal if flops/lnz >= 40)
  nmethods=0: default strategy: Try user permutation if given. Try AMD.
    Select best ordering tried.
    method 0: user permutation (if given)
    method 1: AMD (or COLAMD if factorizing AA')
        prune_dense: for pruning dense nodes: 10
        a dense node has degree >= max(16,(10)*sqrt(n))
        flop count: 2.128e+013
        nnz(L): 1.5914e+009


I am running Matlab 2011b on dual processor hardware (each with 6 cores) and 32 GB ram.

During a single experiment after an initial pre-event solution of the matrix equation, a sequence of 3.6x10^5 events occur. For each event, only four elements in the A matrix change. They become near zero (10^-09 times smaller). These elements consist of only two elements each on two separate rows of the A matrix which are within 1 to 10^4 rows of each other. That is rows that are adjacent or within 10,000 rows of each other. Local values of x in the immediate vicinity of the event change in value the most significantly. The location of the next event in the sequence depends upon a random event, and very importantly the past history and the current, updated values of the x vector. Therefore, x =A\b needs to be recalculated after each event.
THE QUESTIONS:
1. Since 4 minutes times 3.6x10^5 events equals 65.7 years with my primitive program using mldivide, are there ways to speed the solution of x =A\b . Are there special iterative methods, matrix partition methods, symbolic factorization methods or etc. that can produce speed-ups of 100 to 1000 times for this repetitive solution problem. Note that I see ways to reduce the run time spent doing everything but the mldivide by a factor of 1000 (vectorization, allocation, optimization, etc.).
2. Can parallel processing software and hardware (Tesla gpu’s) speed this model by a factor of 1000? Hardware and software costs are a constraint. I cannot go wild here.
3. Are the Matlab software on a Windows workstation the tools I should be working with?

Looking forward to your suggestions.

Thank you,
Bill

Subject: x=A\b speedup parallel processing

From: Steven Lord

Date: 9 May, 2014 17:47:59

Message: 2 of 2


"William " <afloemans@comcast.net> wrote in message
news:lkj2d3$ope$1@newscl01ah.mathworks.com...
> Hi all,
> I am solving a stochastic process model of electron flow in a 3D grid of
> paths. During the simulation flow shuts-off sequentially in one after
> another of paths until about 40% are off. In the simulation 90% of the
> computational time is spent solving a sparse linear equation set, A x =b,
> over and over and over again.
> A is symmetric , banded (7), very sparse and has a real, positive
> diagonal. The vector b is all zeros except for the b(1) element which
> never changes. Mldivide works very well on my workstation giving highly
> accurate solutions in 4 minutes for an A matrix 10^6 by 10^6. At a size
> of 1.1x10^6 by of 1.1x10^6 the solution bogs down because all RAM is in
> use and the hard disk is employed. No parallel processing tool box is
> employed.
> spparms('spumoni',2) gives:

*snip*

> I am running Matlab 2011b on dual processor hardware (each with 6 cores)
> and 32 GB ram.
>
> During a single experiment after an initial pre-event solution of the
> matrix equation, a sequence of 3.6x10^5 events occur. For each event,
> only four elements in the A matrix change. They become near zero (10^-09
> times smaller). These elements consist of only two elements each on two
> separate rows of the A matrix which are within 1 to 10^4 rows of each
> other. That is rows that are adjacent or within 10,000 rows of each
> other. Local values of x in the immediate vicinity of the event change
> in value the most significantly. The location of the next event in the
> sequence depends upon a random event, and very importantly the past
> history and the current, updated values of the x vector. Therefore, x
> =A\b needs to be recalculated after each event.
> THE QUESTIONS: 1. Since 4 minutes times 3.6x10^5 events equals 65.7
> years with my primitive program using mldivide, are there ways to speed
> the solution of x =A\b . Are there special iterative methods, matrix
> partition methods, symbolic factorization methods or etc. that can
> produce speed-ups of 100 to 1000 times for this repetitive solution
> problem. Note that I see ways to reduce the run time spent doing
> everything but the mldivide by a factor of 1000 (vectorization,
> allocation, optimization, etc.).

I don't know what kind of speed up you'll find, but perhaps the sparse
iterative solvers can give you a couple of benefits.

http://www.mathworks.com/help/matlab/linear-equations-iterative-methods.html

1) If you can compute A*x and possibly A'*x for a solver-specified x without
actually constructing A most (or all) of these solvers will let you specify
a function handle that evaluates those operations. Even though the matrices
are sparse, they're large enough that they will still consume a good chunk
of memory. That's likely your bottleneck.

2) If solving the system "close enough" is sufficient for your simulation,
try specifying a termination tolerance.

3) Using a preconditioner may help.

http://en.wikipedia.org/wiki/Preconditioner

4) If you're only making minor changes to the coefficient matrix, then
hopefully the solution at the previous iteration is close to the solution
for the current iteration. If that's the case, you could specify the
previous iteration's solution as the initial guess for the current
iteration.

> 2. Can parallel processing software and hardware (Tesla gpus) speed this
> model by a factor of 1000? Hardware and software costs are a constraint.
> I cannot go wild here.

I don't know, but I suspect the answer is no.

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us