Row & Column Wise Normalisation
Show older comments
Objective: Normalise a matrix such that all rows and columns sum to 1.
The below normalises each column, then row and repeats until row and column totals, equal one another.
This seems to work for randomly generated arrays.
However, the data I wish to use it on has some zeros - and that is generating lots of NaN and Infs, which is making things quite messy and sometimes when running the while loop won't execute (no error message, it just hops over it)
I've tried changing the while condition to be rounded to 3 decimal places (because that's good enough) but still no success.
a = rand(7)
rows = sum(a,2) % orginal row totals
cols = sum(a,1) % original col totals
b = a;
i = 1; % for counting how many iterations
while sum(b,1,"omitnan") ~= sum(b,2,"omitnan")' %when column totals == row totals, stop.
b = b ./ sum(b,1,"omitnan"); %divide by col totals
b = b ./ sum(b,2,"omitnan"); %divide by row totals
i = i + 1;
end
i %how many loops
b % normalised output
brows = sum(b,2,"omitnan") %check that all rows sum 1
bcols = sum(b,1,"omitnan") %check that all cols sum 1.
attached are two 7 x 7 matrices. These are the desired input for a.
Suggestions welcome.
edit:
The margfit function (row 345 - 376) in link below, is (I think) what I am trying to implement. My python is non-existant
9 Comments
Temu Gautama
on 13 Feb 2020
Isn't this a divide-by-zero thingy?
for c1 = ( 1 : 10 )
b = b ./ repmat( sum( b, 1 ) + eps, size( b, 1 ), 1 );
b = b ./ repmat( sum( b, 2 ) + eps, 1, size( b, 1 ));
end
John D'Errico
on 13 Feb 2020
Are you saying that some of the orws and columns of your target array is entirely zero? If so, then you can never rescale that row or column. So I would suggest you just remove the offending all zero rows or columns. That reduces the problem to solving for a now RECTANGULAR matrix rescaling. This is far easier than worrying about the NaNs or infs created by the zeros. You can later on restore those zero rows or columns, inserting them back into the renormalized rectangular matrix. Don't make the problem harder than it needs to be.
Temu Gautama
on 13 Feb 2020
Hi John,
The example uses rand(), but I don't see it mentioned that all elements should be positive. If negative values are allowed, a row that sums to zero can still impact the column-sum.
Temu
John D'Errico
on 13 Feb 2020
Edited: John D'Errico
on 13 Feb 2020
True. However, if we look at the example cases we are given, we see:
c1984 =
142 0 0 1 0 1 8
1 4 0 0 0 0 2
0 0 0 0 0 0 0
1 0 0 4 1 1 0
0 0 0 1 10 0 3
3 0 0 0 0 8 9
0 0 0 0 0 0 0
c2004 =
59 0 0 4 0 0 2
0 14 0 0 0 0 5
0 0 9 0 0 0 4
0 0 0 40 0 0 4
0 0 0 2 17 0 4
2 0 0 4 0 21 9
0 0 0 0 0 0 0
So it is indeed all zero rows or columns that are the problem, at least for the OP.
Temu Gautama
on 13 Feb 2020
You are right. Didn't look at the data, to be honest.
:)
John D'Errico
on 13 Feb 2020
Edited: John D'Errico
on 13 Feb 2020
Anyway, assume the matrices shown are indicative of what we should expect, thus entirely non-negative. Any zero rows or columns can be extracted, and then returned to the array later on, which leaves us with a possibly rectangular array that has no fully zero rows or columns.
In that context, what can we say about the solution? That is, consider an array A0, of size NxM. Do there exist vectors of L and R, length N and M respectively, such that
A = diag(L)*A0*diag(R)
where the matrix A has all unit row and column sums?
First, if a solution does exist, can it be unique? NO. If any such solution with vectors L and R does exist, then L*k and R/k is also an equally valid solution, for any non-zero scalar value k. We might decide to require that norm(L) == norm(R), or some similar requirement, thus forcing the solution to be unique.
Personally, I alwsys like to play around and get my hands dirty, before I think more seriously about a problem.
A = rand(7);
A0 = rand(7);A = A0;
for i = 1:100
A = A./sum(A,1); % requires R2016b or later
A = A./sum(A,2); % requires R2016b or later
end
[sum(A0,1);sum(A,1)]
ans =
3.828 2.2711 3.9473 4.6529 2.5008 3.7227 3.23
1 1 1 1 1 1 1
[sum(A0,2),sum(A,2)]
ans =
3.4035 1
3.0916 1
4.5867 1
3.0196 1
4.0064 1
1.8467 1
4.1985 1
As we see, a simple iterative scheme works sufficiently well. Better code would of course have been testing for convergence, removing and replacing all zero rows or columns, etc., but you get the drift. Randomly interspersed zeros are not a problem, as long as any row or column is not fully and identically zero. We cannot have a row or column with zero sum however.
But despite my success in the above simple example, it still begs the question: Does a solution always exist? (Probably, but a proof would need to be slightly more rigorous than my assertion. Some time is now necessary...)
In that context, what can we say about the solution? That is, consider an array A0, of size NxM. Do there exist vectors of L and R, length N and M respectively, such that A = diag(L)*A0*diag(R) where the matrix A has all unit row and column sums?
According to this article on Sinkhorn's theorem,
it is true for square strictly positive A0, and moreover the alternating row/column normalization approach (the Sinkhorn-Knopp algorithm) will find A. It is clearly not true for nonsquare matrices, however. For example, it is trivial to see that no L and R exist for A0=ones(N,1) for any N>1.
John D'Errico
on 13 Feb 2020
Thanks to Matt for providing the (now obvious) counterexample.
edward holt
on 13 Feb 2020
Accepted Answer
More Answers (0)
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!