Row & Column Wise Normalisation

Question

1 vote

Objective: Normalise a matrix such that all rows and columns sum to 1.

The below normalises each column, then row and repeats until row and column totals, equal one another.

This seems to work for randomly generated arrays.

However, the data I wish to use it on has some zeros - and that is generating lots of NaN and Infs, which is making things quite messy and sometimes when running the while loop won't execute (no error message, it just hops over it)

I've tried changing the while condition to be rounded to 3 decimal places (because that's good enough) but still no success.

a = rand(7)
rows = sum(a,2) % orginal row totals
cols = sum(a,1) % original col totals
b = a;
i = 1; % for counting how many iterations
while sum(b,1,"omitnan") ~= sum(b,2,"omitnan")' %when column totals == row totals, stop.
    b = b ./ sum(b,1,"omitnan"); %divide by col totals
    b = b ./ sum(b,2,"omitnan"); %divide by row totals
    i = i + 1;
end
i %how many loops
b % normalised output
brows = sum(b,2,"omitnan") %check that all rows sum 1
bcols = sum(b,1,"omitnan") %check that all cols sum 1.

attached are two 7 x 7 matrices. These are the desired input for a.

Suggestions welcome.

edit:

The margfit function (row 345 - 376) in link below, is (I think) what I am trying to implement. My python is non-existant

https://github.com/GoricaB/Land-cover-validation/blob/master/pts_lcval.py

9 Comments
Show 7 older comments Hide 7 older comments

John D'Errico on 13 Feb 2020

Edited: John D'Errico on 13 Feb 2020

Open in MATLAB Online

Anyway, assume the matrices shown are indicative of what we should expect, thus entirely non-negative. Any zero rows or columns can be extracted, and then returned to the array later on, which leaves us with a possibly rectangular array that has no fully zero rows or columns.

In that context, what can we say about the solution? That is, consider an array A0, of size NxM. Do there exist vectors of L and R, length N and M respectively, such that

A = diag(L)*A0*diag(R)

where the matrix A has all unit row and column sums?

First, if a solution does exist, can it be unique? NO. If any such solution with vectors L and R does exist, then L*k and R/k is also an equally valid solution, for any non-zero scalar value k. We might decide to require that norm(L) == norm(R), or some similar requirement, thus forcing the solution to be unique.

Personally, I alwsys like to play around and get my hands dirty, before I think more seriously about a problem.

A = rand(7);
A0 = rand(7);A = A0;
for i = 1:100
  A = A./sum(A,1); % requires R2016b or later
  A = A./sum(A,2); % requires R2016b or later
end
[sum(A0,1);sum(A,1)]
ans =
        3.828       2.2711       3.9473       4.6529       2.5008       3.7227         3.23
            1            1            1            1            1            1            1
[sum(A0,2),sum(A,2)]
ans =
       3.4035            1
       3.0916            1
       4.5867            1
       3.0196            1
       4.0064            1
       1.8467            1
       4.1985            1

As we see, a simple iterative scheme works sufficiently well. Better code would of course have been testing for convergence, removing and replacing all zero rows or columns, etc., but you get the drift. Randomly interspersed zeros are not a problem, as long as any row or column is not fully and identically zero. We cannot have a row or column with zero sum however.

But despite my success in the above simple example, it still begs the question: Does a solution always exist? (Probably, but a proof would need to be slightly more rigorous than my assertion. Some time is now necessary...)

John D'Errico on 13 Feb 2020

Thanks to Matt for providing the (now obvious) counterexample.

edward holt on 13 Feb 2020

Matt, thank for the Sinkhom-Knopp information (a fair chunk of that is beyond my skill-set)

And John, thank you for making me realise something that now seems glaringly obvious. Removing the columns / rows that are entirely comprised of zeros is certainly the first step.

Furthermore, a solution doesn't seem possible in the data I attached, as there were a few instances of a column containing only one non-zero element, with the corresponding row containing multiple non-zero elements.

Thank you for your efforts.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 13 Feb 2020

Edited: Matt J on 13 Feb 2020

2 votes

Sinkhorn-Knopp.pdf

For a non-negative square matrix, the attached article mentions necessary and sufficient conditions (p. 3, Theorem 1) both for the normalization you are trying to achieve to be possible and for the alternating row/column normalization approach (the Sinkhorn-Knopp algorithm ) to work. The required condition for both are the same. So basically, if you are seeing Infs and NaNs in your iterations, the normalization is known to be impossible from the get-go.

The condition is:

"A necessary and sufficient condition ... is that A has total support"

The given matrix A having total support means that for every non-zero element A(i,j)>0, a column permutation Ap of A exists such that Ap has only strictly positive elements on the diagonal, one of which is A(i,j).

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Row & Column Wise Normalisation

9 Comments
Show 7 older comments Hide 7 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Row & Column Wise Normalisation

9 Comments Show 7 older comments Hide 7 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

9 Comments
Show 7 older comments Hide 7 older comments

0 Comments
Show -2 older comments Hide -2 older comments