What algorithm pagemldivide uses?

The question starts here from Paul remark that pagemldivide does not return the same numerical solution than backslash "\" applied on each page.
Further test seem to show that happens only for size(A) < 10 in case of square matrices.
nmax = 100;
errx = zeros(1,nmax);
for n=1:nmax
A = rand(n,n,10);
B = rand(n,n,10);
X = pagemldivide(A,B);
errx(n) = norm(X(:,:,end)-A(:,:,end)\B(:,:,end));
end
% Check if solutions match backskash for n >= 10
all(errx(n>=10)==0)
ans = logical
1
plot(1:nmax, errx);
xlabel('Matrix size');
ylabel('Error between pagemldivide and \\');
The question is: what algorithm pagemldivide uses (for n < 10) ?
Note that it seems not to be LU-based either (test not showed).
The document implies the output of pagemldivide should match backslash.

23 Comments

Perhaps @Christine Tobler can provide some insight?
The discrepancies are tiny. Couldn't it just be a difference in multithreading?
Paul
Paul on 19 Jul 2022
Edited: Paul on 19 Jul 2022
Have you tested any other page* functions, like pagemtimes, pagemrdivide, pageinv, pagesvd?
No.
Before that I intend to test next for non-squared matriices, and possibly change maxNumCompThreads as Matt's hint.
But no, the discrepency not something that bother me at the moment, just curious what happens under the hood.
Assuming maxNumCompThreads is applicable when running on Answers ...
maxNumCompThreads(1);
rng(100)
nmax = 100;
[errmld,errmdrd,errmt,errinv,errsvd] = deal(zeros(1,nmax));
for n=1:nmax
A = rand(n,n,10);
B = rand(n,n,10);
X = pagemldivide(A,B);
errmld(n) = norm(X(:,:,end)-A(:,:,end)\B(:,:,end));
X = pagemrdivide(A,B);
errmrd(n) = norm(X(:,:,end)-A(:,:,end)/B(:,:,end));
X = pagemtimes(A,B);
errmt(n) = norm(X(:,:,end)-A(:,:,end)*B(:,:,end));
X = pageinv(A);
errinv(n) = norm(X(:,:,end)-inv(A(:,:,end)));
X = pagesvd(A);
errsvd(n) = norm(X(:,:,end)-svd(A(:,:,end)));
end
% Check if solutions match backskash for n >= 10
%all(errx(n>=10)==0)
figure;
subplot(5,1,1);plot(1:nmax,errmld),ylabel('mldivide')
subplot(5,1,2);plot(1:nmax,errmrd),ylabel('mrdivide')
subplot(5,1,3);plot(1:nmax,errmt),ylabel('mtimes')
subplot(5,1,4);plot(1:nmax,errinv),ylabel('inv')
subplot(5,1,5);plot(1:nmax,errsvd),ylabel('svd')
xlabel('Matrix size');
@Bruno Luong, if you re-run your example in R2022b Pre-Release you'll see the following picture:
I'll update this thread with more detaild info once R2022b is released.
What about mrdivide and mtimes in 2022b pre-release?
pagemrdivide and mrdivide show the same behavior as pagemldivide and mldivide.
(page)mtimes is unchanged compared to R2022a.
If (page)mtimes is unchanged, can you shed any light on why mtimes and pagemtimes sometimes yield different results?
In general, one should not expect the answer to be identical. When computing each page separately, the input may have different memory alignment and MATLAB may use a different algorithm for best performance which yields a slight different results. But they should be all numerically equivalent results.
This problem is not new. If instead of doing a matrix multiply, you pick individual vectors and form each element by dot-products, it will yield slightly different answers due to different used optimizaton.
WRT pagemtimes, perhaps its doc page pagemtimes should be updated to alert the user to not expect that mtimes and pagemtimes to always yield identical results.
As for (page)m(r/l)divide, in 2022b pre-release do they always yield identical results? Or are the identical results only in this case for @Bruno Luong's example?
I look forward to your update on this thread after 2022b is released. I'm quite curious why the matrix divide functions were updated to be consistent with each other (at least for Bruno's example), but pagemtimes was not.
On your last question, basically pagemtimes is a much simpler operation than pagemldivide, which allowed us to do more optimizations. Think of mtimes as three nested loops, with a fourth loop for the pages in pagemtimes. The order of all these loops can be switched for performance - that's not the case for mldivide or svd when called in a loop.
Here's an example: The output of pagemtimes doesn't match a for-loop exactly, because this case boils down to just one matrix-matrix multiplication, which pagemtimes recognizes and uses instead.
n = 1000;
rng default
A = randn(n);
B = randn(n, 1, 3);
C = pagemtimes(A, B);
% Compute each matrix-vector multiplication separately:
Cloop = cat(3, A*B(:, :, 1), A*B(:, :, 2), A*B(:, :, 3));
norm(C - Cloop, 'fro')
ans = 1.2559e-12
% Reshape B to be a matrix with 3 columns and apply matrix-matrix
% multiplication.
Creshape = reshape(A*B(:, :), n, 1, 3);
norm(C - Creshape, 'fro')
ans = 0
In fact now that I think about it, this specific case also applies to pagemldivide, and will continue to apply there in future.
n = 1000;
rng default
A = randn(n);
B = randn(n, 1, 3);
C = pagemldivide(A, B);
% Compute each matrix-vector multiplication separately:
Cloop = cat(3, A\B(:, :, 1), A\B(:, :, 2), A\B(:, :, 3));
norm(C - Cloop, 'fro')
ans = 1.9121e-12
% Reshape B to be a matrix with 3 columns and apply matrix-matrix
% multiplication.
Creshape = reshape(A\B(:, :), n, 1, 3);
norm(C - Creshape, 'fro')
ans = 0
Bruno Luong
Bruno Luong on 2 Aug 2022
Edited: Bruno Luong on 2 Aug 2022
@Christine Tobler "In fact now that I think about it, this specific case also applies to pagemldivide, and will continue to apply there in future."
This statement seems not coherent with @Heiko Weichelt above post where as I understood the discrepency will be resolved in R2022b.
Paul
Paul on 2 Aug 2022
Edited: Paul on 2 Aug 2022
Maybe there is now and will still be different behavior for (page)mldivide when B is n x 1 x m as in Christine's example as compared to B being n x n x m as in your example. Gets back to my question to Heiko above where I asked if the 2022b update will yield identical results in all cases, or if there there was something unique about your example as executed in 2022a and 2022b prerelease. I don't think we yet have an answer to that question.
Still unclear to me what's going on with pagem(l/r)divide in 2022a and why the behavior changed in 2022b pre for Bruno's example. The fact that these functions can't be optimized, in general, in the same way as pagemtimes makes me think page and non-page functions should yield the same results for Bruno's example, but they don't in 2022a, and then something changed in 2022b so that they do.
Heiko said that nothing can be disclosed before the official release, so I wait until then.
IMO there is no point of discussing now, because TMW employees cannot discuss freely.
@Bruno Luong, thanks for the understanding. As I said before, we will update this thread once R2022b is released.
What I stated in my post above, also applies to pagem{l/r}divide and basically all page-functions that calculate data and do not just re-arrange data (like page(c)transpose) and not just to (page)mtimes. Sorry if that wasn't clear from my post.
So the example on top yields identical results in R2022b-Prerelease but, in general, we cannot expect identical results due to various factors, such as:
  • different threading-strategies (threading per page or threading over all pages; those cannot be combined)
  • re-arranging data as explained in @Christine Tobler's post above
Bruno Luong
Bruno Luong on 15 Sep 2022
Edited: Bruno Luong on 15 Sep 2022
Just a note to anounce that the discrepency between pagemldivide/pagemrdivide and for-loop backslash/slash are resolved in R2022b , at least for the specific example given in the question.
Rerunning example code in 2022b
maxNumCompThreads(1);
rng(100)
nmax = 100;
[errmld,errmdrd,errmt,errinv,errsvd] = deal(zeros(1,nmax));
for n=1:nmax
A = rand(n,n,10);
B = rand(n,n,10);
X = pagemldivide(A,B);
errmld(n) = norm(X(:,:,end)-A(:,:,end)\B(:,:,end));
X = pagemrdivide(A,B);
errmrd(n) = norm(X(:,:,end)-A(:,:,end)/B(:,:,end));
X = pagemtimes(A,B);
errmt(n) = norm(X(:,:,end)-A(:,:,end)*B(:,:,end));
X = pageinv(A);
errinv(n) = norm(X(:,:,end)-inv(A(:,:,end)));
X = pagesvd(A);
errsvd(n) = norm(X(:,:,end)-svd(A(:,:,end)));
end
% Check if solutions match backskash for n >= 10
%all(errx(n>=10)==0)
figure;
subplot(5,1,1);plot(1:nmax,errmld),ylabel('mldivide')
subplot(5,1,2);plot(1:nmax,errmrd),ylabel('mrdivide')
subplot(5,1,3);plot(1:nmax,errmt),ylabel('mtimes')
subplot(5,1,4);plot(1:nmax,errinv),ylabel('inv')
subplot(5,1,5);plot(1:nmax,errsvd),ylabel('svd')
xlabel('Matrix size');
Will these new results for (page)ml(r)divide hold for all inputs or only certain types of inputs? Any other insights you can offer now that 2022b is released may be of interest.
@Paul, now that R2022b is released, please compare the updated doc pages for MLDIVIDE and PAGEMLDIVIDE, especially the Algorithms section (which is new for PAGEMLDIVIDE):
In R2022b, we tried to make both flow-graphs, and hence the used methods for the different inputs, as equal as possible. This yields that the above example now shows the same results.
However, you will notice that MLDIVIDE has a few more special solvers that PAGEMLDIVIDE does not support, i.e., 'permuted triangular' and 'upper Hessenberg'.
Both of these cases are rarely to find in created data but are rather the result of previos calls. E.g. 2-output LU returns a permuted triangular L factor in general. Another example is HESS or SCHUR, that creates upper Hessenberg form matrices as outputs.
Since we do not have page-wise solvers for those methods, we decided to not add these cases to PAGEMLDIVIDE.
Overall, as I pointed out earlier, even if your inputs will pick up the same solver, you can, in general, not assume that we will reach exactly the same results as the threading strategy might change between page-wise threading and threading over all pages. These settings will create differences similar to comparing results using different number of threads.
Since you uses 'maxNumCompThreads(1)' in the example above, you most likely will see the same results unless your inputs hit any of these special matrix forms. However, this is not guranteed as underlying algorithms might change.
As general tip for all page functions I'd say that results obtained using pagemldivide (or any other page function) are numerically equivalent to a for-loop calculating over the same matrices. However, the two results might differ due to floating-point round-off error.
I hope this helps.
Also, check out the performance release notes for both methods:
Thank you very much for your responses in this thread.

Sign in to comment.

Answers (0)

Categories

Find more on Linear Algebra in Help Center and File Exchange

Products

Release

R2022a

Asked:

on 18 Jul 2022

Commented:

on 27 Sep 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!