Finding identity matrix-like square matrix given two rectangular matrices

15 views (last 30 days)
Hi everyone. This is my first ever question in this platform.
I want to know how to obtain identity matrix-like square matrix.
For example, in the AX = B, where A is rectangular [m x n] matrix and X is square [n x n] matrix and n > m, we can simply multiply A and X to get [m x n] matrix B.
In my case the A and B are known, and they are quite similar. Now, I want to find X, which I expect it is close to identity matrix.
I tried some methods including penrose pseudo-inverse, but all failed to satisfy my expected answer. For example, some diagonal values are near 0 in the penrose pseudo-inverse case, I think it is due to the SVD. In regular inverse cannot be used due to Rank deficiency of the square matrix (I think). I tried linear equation with gradient descent also failed to yield square matrix that is near the identity matrix.
Please let me know if you have this kind of experience

Answers (2)

John D'Errico
John D'Errico on 26 Feb 2022
Edited: John D'Errico on 26 Feb 2022
Since you have A which is mxn, and n>m, the problem is not that A is singular, or even rank deficient, it is that the problem is underdetermined. So there would be infinitely many solutions, all equally good.
First, I'll make up an example problem with reasonably small A.
m = 4;n = 6;
A = rand(m,n)
A = 4×6
0.0608 0.9452 0.5824 0.1638 0.0319 0.4516 0.7141 0.9551 0.9604 0.8369 0.4117 0.0368 0.7557 0.5839 0.6525 0.9857 0.6667 0.5347 0.0316 0.3414 0.0694 0.5188 0.1857 0.4496
A has full rank with probability 1, but the rank is no more than 4, since it has only 4 rows.
rank(A)
ans = 4
Next, I'll create B. I'll add in some tiny perturbations, just to make the problem interesting.
B = A*eye(n) + randn(m,n)/100
B = 4×6
0.0663 0.9370 0.5917 0.1599 0.0319 0.4449 0.7290 0.9597 0.9671 0.8440 0.4101 0.0304 0.7512 0.5905 0.6568 0.9826 0.6631 0.5471 0.0476 0.3305 0.0786 0.5000 0.1810 0.4427
Now as you have seen, we see that recovery of X using the pseudo-inverse is not trivial. Even though the perturbations I added to get B were pretty small, see that the diagonal elements of X0 can be quite far from 1.
X0 = pinv(A)*B
X0 = 6×6
0.5285 -0.1365 0.3135 0.0413 0.3372 -0.0297 -0.1320 0.7333 0.3476 0.1096 -0.1717 0.0627 0.3050 0.3487 0.5088 -0.1180 0.0827 -0.0650 0.0725 0.1007 -0.1250 0.9081 0.1446 -0.0880 0.3095 -0.1590 0.0727 0.1595 0.3046 0.2248 -0.0895 0.0834 -0.0754 -0.0695 0.2040 0.9580
diag(X0)
ans = 6×1
0.5285 0.7333 0.5088 0.9081 0.3046 0.9580
So pretty much crapola for results there. But you should expect that. Instead, you need to think about reformulating the problem as how to solve:
A*(eye(n) + deltaX) = B
So think about deltaX as a matrix of perturbations of the identity matrix. And we want to find the smallest possible perturbations to the identity matrix that still gives a viable solution. But that means we can expand the above expression, then move A onto the right hand side. What we really need to solve is the problem
A*deltaX = B - A
Do you see that? Here we are explicitly looking for the smallest possible perturbations to an identity matrix. And that makes the problem now directly solvable by pinv. (In fact, this is a perfect use of PINV, a tool often used for the wrong reasons. I recall Cleve Moler mentions that in his latest blog post.)
deltaX = pinv(A)*(B-A)
deltaX = 6×6
-0.0367 0.0227 -0.0115 0.0231 0.0038 0.0279 0.0271 -0.0143 0.0111 -0.0083 -0.0013 -0.0225 -0.0200 0.0123 -0.0047 0.0202 0.0043 0.0110 0.0543 -0.0172 0.0132 -0.0274 -0.0089 -0.0295 -0.0239 0.0114 -0.0057 0.0061 0.0006 0.0206 -0.0319 -0.0018 0.0007 -0.0110 -0.0003 0.0238
As you can see, the perturbation matrix is indeed uniformly tiny, in fact, it is as small as it could possibly be as a solution to the problem posed. Now we can recover Xhat as the sum of the identitiy matrix plus deltaX.
Xhat = eye(n) + deltaX
Xhat = 6×6
0.9633 0.0227 -0.0115 0.0231 0.0038 0.0279 0.0271 0.9857 0.0111 -0.0083 -0.0013 -0.0225 -0.0200 0.0123 0.9953 0.0202 0.0043 0.0110 0.0543 -0.0172 0.0132 0.9726 -0.0089 -0.0295 -0.0239 0.0114 -0.0057 0.0061 1.0006 0.0206 -0.0319 -0.0018 0.0007 -0.0110 -0.0003 1.0238
This is quite good now. You can actually view this solution as one where we know the true solution is close to an identity matrix, so now we use that information as essentially prior information. In effect, this verges on a Bayesian solution, which also has close links to what Bjorn suggested in his answer that suggested a style of Tikhonov regularization. The difference is the standard regularization solution tries to bias the results towards zero, and while that is appropriate for some problems, it subtly misses the mark here because it uses the wrong target. The solution I proposed uses a different prior, where I implicitly bias the results towards a minimal perturbation of the identity matrix.
Finally, I might also point out that this solution would still work, even if A was truly rank deficient.
  2 Comments
Bjorn Gustavsson
Bjorn Gustavsson on 26 Feb 2022
Edited: Bjorn Gustavsson on 26 Feb 2022
This type of solution John suggests is very close to the Backus-Gilbert method - for external reference-reading purposes. Which in turn should be reasonably similar to the first and second-order Tikhonov regularizations where the solution is biased toward small local deviations/spreading by minimizing:
for first-order Tikhonov, or:
for second-order Tikhonov. Here the differential operations on X is to be taken as differential operations on some discretized X representing a scalar function. All these three suggestions (John's "B-G inspired" and the 1st and 2nd order T-R) rely on X having some "reasonably simple" geometric structure or closeness between neiboring elements (I think?).

Sign in to comment.


Bjorn Gustavsson
Bjorn Gustavsson on 26 Feb 2022
When you have a (set of) mixed-determined linear inverse problem(s) you first have to come to terms that you cannot resolve everything in X. Then you can proceed to use zero-th order Tikhonov (or higher-order, or some combination of orders) regularization. It takes a bit of linear-algebra-reading. But once you've taken these steps you'll solve this type of problems rather straightforwardly. I find the regtools-package very useful for working with this, its documentation is hopefully sufficient for you.
In short: when What you have an underdetermined linear inverse problem (n>m) you only have information enough to determine n parameters of your unknown, at most. If some of the singular values of the A-matrix are really small you might lose further information (due to "measurement"-noise (either true measurement noise of simply digitization-noise) or numerical noise due to singular-values being smaller than eps(1)*max(S(:))). To obtain a meaningful/stable solution these problems have to be dealt with. A standard way to do this is to use a Tikhonov-regularization (instead of straight-forward use of the Moore-Penrose), where the singular-values are filtered such that:
wher λ are the singular-values-array (diag(S)) and α is a regularization-parameter. For singular values larger than alpha the diagonal elements of inv(Z) are very close to the diagonal elements of inv(S), while for singular values smaller than alpha they approach zero - which leads to damping of their contribution to the solution. The matlab-code for a minimalistic 0th-order Tikhonov-solution would be:
[U,S,Vt] = svd(A,0);
alpha = 1;
[m,n] = size(A);
invZ = diag(diag(S(1:m,1:m))./(diag(S(1:m,1:m)).^2+alpha));
X_T0_alpha = Vt(:,1:m)*diag(diag(S(1:m,1:m))./(diag(S(1:m,1:m)).^2+alpha))*U.'*B;
When solving inverse problems you should always look at the singular-values - to see how many are large enough to give retrievable information. If it was an identity-matrix that was multiplied with A to produce B this should give you a filtered model-resolution-matrix for 0th order Tikhonov with alpha = 1;
HTH

Categories

Find more on Mathematics and Optimization in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!