Thread Subject: curious problem with sparse matrices that have explicit "0" entries.....

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Eric J. Holtman

Date: 3 Jan, 2009 15:19:26

Message: 1 of 8



I've run into a quirky problem with MATLAB having to
do with sparse matrices built elsewhere (in a C++ program)
and linprog.

The datafile for this can be downloaded at

http://www.ericholtman.com/sparse_error.mat

If you download that file, and then attempt to solve the
linear program with the command:

[x fval exitflag output] = linprog (f, A, b, Aeq, beq, lb, ub);

you get the following error:

Exiting: cannot converge because the primal residual, dual residual,
 or upper-bound feasibility is NaN.

however, if you do

A = sparse(full(A));
Aeq = sparse(full(Aeq));

and then re-run the command, it succeeds, with fval = 1.5185e+003.

I built this problem in C++, and transferred the data into
MATLAB using the mx* and eng* functions.

As an artifact of construction, some of the entries in the
A and Aeq matrices are explicitly 0.

I don't understand why this causes a problem...... why
should there be any difference between a sparse matrix
with say, 17 non-zero entries, and one with the same 17
non-zero entries, and an 18th with an explicit zero?

Note that I can change my C++ program to not store the
explicit zeros, and everything works perfectly, without
having to do the x = sparse(full(x)); conversion. I've
run it with huge arrays, so am I 100% convinced that I
am setting up the sparse array correctly (so that I know
I'm setting up the offsets in the PR, IR and JC arrays
correctly.)

I'm just curious as to why linprog would fail with a
sparse matrix that has explicit zeros?

 

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Matt

Date: 3 Jan, 2009 16:28:02

Message: 2 of 8


> As an artifact of construction, some of the entries in the
> A and Aeq matrices are explicitly 0.
>
> I don't understand why this causes a problem...... why
> should there be any difference between a sparse matrix
> with say, 17 non-zero entries, and one with the same 17
> non-zero entries, and an 18th with an explicit zero?

I'm probably not the best one to answer that, but it is pretty clear that the designers of the sparse matrix data type have worked very hard to make it impossible to do what you have done. I have found no way to create a sparse matrix with explicit zeros from within MATLAB alone. So you could easily be violating an important assumption of the sparse data toolbox.

Just out of curiosity, what happens if you do the following?

>>find(Aeq), find(A)

Does it return the indices of the explicit zeros? If so, the find() function has failed. It is only supposed to return explicit non-zeros.

My speculation is that somewhere internally linprog() is using something like find() to analyze the sparsity pattern of A and Aeq. Your handcrafted sparse matrix might cause it to fail.

On the other hand, if you discover that find() works as expected, that's also very surprising. You wouldn't expect the version of find() implemented in the sparse matrix toolbox to double check that the explicit values of the sparse matrix are actually non-zero. That would mean additional overhead which ought not to be necessary, and the code designers would want to exploit that.

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Eric J. Holtman

Date: 3 Jan, 2009 17:26:47

Message: 3 of 8

"Matt " <mjacobson.removethis@xorantech.com> wrote in
news:gjo3mi$3pr$1@fred.mathworks.com:

>
> Does it return the indices of the explicit zeros? If so, the find()
> function has failed. It is only supposed to return explicit non-zeros.
>

Yes, it does.

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Tim Davis

Date: 3 Jan, 2009 23:59:02

Message: 4 of 8

"Eric J. Holtman" <ejh@ericholtman.com> wrote in message <Xns9B885ED987C87ejhericholtmamcom@216.168.3.30>...
>
>
> I've run into a quirky problem with MATLAB having to
> do with sparse matrices built elsewhere (in a C++ program)
> and linprog.
>
> The datafile for this can be downloaded at
>
> http://www.ericholtman.com/sparse_error.mat
>
> If you download that file, and then attempt to solve the
> linear program with the command:
>
> [x fval exitflag output] = linprog (f, A, b, Aeq, beq, lb, ub);
>
> you get the following error:
>
> Exiting: cannot converge because the primal residual, dual residual,
> or upper-bound feasibility is NaN.
>
> however, if you do
>
> A = sparse(full(A));
> Aeq = sparse(full(Aeq));
>
> and then re-run the command, it succeeds, with fval = 1.5185e+003.
>
> I built this problem in C++, and transferred the data into
> MATLAB using the mx* and eng* functions.
>
> As an artifact of construction, some of the entries in the
> A and Aeq matrices are explicitly 0.
>
> I don't understand why this causes a problem...... why
> should there be any difference between a sparse matrix
> with say, 17 non-zero entries, and one with the same 17
> non-zero entries, and an 18th with an explicit zero?
>
> Note that I can change my C++ program to not store the
> explicit zeros, and everything works perfectly, without
> having to do the x = sparse(full(x)); conversion. I've
> run it with huge arrays, so am I 100% convinced that I
> am setting up the sparse array correctly (so that I know
> I'm setting up the offsets in the PR, IR and JC arrays
> correctly.)
>
> I'm just curious as to why linprog would fail with a
> sparse matrix that has explicit zeros?
>
>


A valid MATLAB sparse matrix cannot have any explicit zeros.

"find" is doing the right thing in this case, though ... it's finding
the entries of the matrix. "find" (and all of MATLAB) assumes
that the input matrices are all valid (no explicit zeros, for example).
What's broken is the matrix, not "find".

I'm not sure why linprog would fail, but I'm not too surprised.
Some MATLAB operations work fine on invalid matrices (with
explicit zeros) and some do not.

Try the "spok" mexfunction on the File Exchange to test your
sparse matrix. It will tell you if it has other invalid properties or not.
It should also detect the presence of explicit zeros, if I wrote it correctly...

Now, I would argue that it's a pity MATLAB doesn't allow explicit
zeros (they happen quite naturally, when solving a nonlinear
system for example). But given the design decision it would be
pretty tough to change it now, since so much of MATLAB assumes
the matrix has no explicit zeros.

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Eric J. Holtman

Date: 4 Jan, 2009 15:25:29

Message: 5 of 8

"Tim Davis" <davis@cise.ufl.edu> wrote in
news:gjou46$45r$1@fred.mathworks.com:

>
> Now, I would argue that it's a pity MATLAB doesn't allow explicit
> zeros (they happen quite naturally, when solving a nonlinear
> system for example). But given the design decision it would be
> pretty tough to change it now, since so much of MATLAB assumes
> the matrix has no explicit zeros.
>

Thanks, Tim. That was clear and concise.

Since I control the C++ code that generates the
sparse matrix, I'll just make sure I prohibit
the explicit zeros.

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Matt

Date: 4 Jan, 2009 16:20:03

Message: 6 of 8

"Tim Davis" <davis@cise.ufl.edu> wrote in message <gjou46$45r$1@fred.mathworks.com>...

> Now, I would argue that it's a pity MATLAB doesn't allow explicit
> zeros (they happen quite naturally, when solving a nonlinear
> system for example).

But surely the arguments for disallowing them have some substance. It's obviously more storage-efficient. It also allows a more optimal implementation of things like find(). As we've seen, if explicit zeros were allowed, the find() function would need to do additional work to weed out the indices of explicit zero entries from the final output.

 

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Tim Davis

Date: 6 Jan, 2009 21:40:20

Message: 7 of 8

"Matt " <mjacobson.removethis@xorantech.com> wrote in message <gjqnjj$jjd$1@fred.mathworks.com>...
> "Tim Davis" <davis@cise.ufl.edu> wrote in message <gjou46$45r$1@fred.mathworks.com>...
>
> > Now, I would argue that it's a pity MATLAB doesn't allow explicit
> > zeros (they happen quite naturally, when solving a nonlinear
> > system for example).
>
> But surely the arguments for disallowing them have some substance. It's obviously more storage-efficient. It also allows a more optimal implementation of things like find(). As we've seen, if explicit zeros were allowed, the find() function would need to do additional work to weed out the indices of explicit zero entries from the final output.
>
>

There are some arguments for excluding them, but it would be better if MATLAB could include them, and allow you to exclude them if needed.

Exact explicit zeros are rare, at least when doing x=A\b. The problem arises when a few entries are dropped. The pattern changes, which takes time to reconstruct. What's worse, it means that MATLAB could not be modified to analyze a system once (doing the ordering, for example) for a sequence of related matrices (with "same" nonzero pattern but different numerical values). An entry that becomes zero will most likely become nonzero the very next iteration of a nonlinear solver.

Try this code:

A = sprand (1000,1000,0.1) ;
for k = 1:10000
A(1,1) = 1 ;
A(1,1) = 0 ;
end

and you will be shocked to find out how much time it takes. Each iteration takes O(nnz(A)) time, just to squeeze out a single explicit zero entry.


There are other issues. For example, suppose you want to "precompute" the submatrix assigment A(i,j)=x. That statement requires a search of where A(i,j) is. If it could be guaranteed that the pattern wouldn't change, then the statement could be done in O(nnz(x)) time. As MATLAB stands now, that is not feasible.

Finally, there are some algorithms that cannot be exploited because of explicit zero dropping. The pattern of L from a Cholesky factorization has a very specific property ("chordal"). Exploiting that property in x=L\b or cholupdate can can cut the time in half. Problem is, though, that if even a single entry is dropped (or worse, if you don't know if an entry is dropped or not), then the chordal property is no longer guaranteed. As a result, x=L\b is slower than it could be (by a factor of 2) when L comes from a sparse Cholesky factorization. Likewise, a sparse cholupdate is made nearly infeasible by the policy of always dropping explicit zeros.

It's also likely that x=A*b could be faster if A had some special property to its nonzero pattern that could be exploited ... but can't if explicit zeros get dropped, thereby removing that special property.

But all this discussion is moot, anyway, since MATLAB probably can't be easily changed to allow explicit zeros to be kept.

The "find" command would be trivial to modify to allow its input matrix to include explicit zeros, in any case, with no real increase in the amount of work it would need to do.

The reason explicit zeros are always dropped is that sometimes it is the right thing to do. For example, if you want to really delete 1/2 the entries of a matrix, or more, then it makes sense to drop explicit zeros automatically, to save space. But there are lots of reasons why that design decision wasn't a good one (as given above).

Subject: curious problem with sparse matrices that have explicit "0" entries.....

From: Matt

Date: 7 Jan, 2009 22:21:02

Message: 8 of 8

"Tim Davis" <davis@cise.ufl.edu> wrote in message <gk0j44$46r$1@fred.mathworks.com>...
> "Matt " <mjacobson.removethis@xorantech.com> wrote in message <gjqnjj$jjd$1@fred.mathworks.com>...
> > "Tim Davis" <davis@cise.ufl.edu> wrote in message <gjou46$45r$1@fred.mathworks.com>...
> >
> > > Now, I would argue that it's a pity MATLAB doesn't allow explicit
> > > zeros (they happen quite naturally, when solving a nonlinear
> > > system for example).
> >
> > But surely the arguments for disallowing them have some substance. It's obviously more storage-efficient. It also allows a more optimal implementation of things like find(). As we've seen, if explicit zeros were allowed, the find() function would need to do additional work to weed out the indices of explicit zero entries from the final output.
> >
> >
>
> There are some arguments for excluding them, but it would be better if MATLAB could include them, and allow you to exclude them if needed.
>
> Exact explicit zeros are rare, at least when doing x=A\b. The problem arises when a few entries are dropped. The pattern changes, which takes time to reconstruct. What's worse, it means that MATLAB could not be modified to analyze a system once (doing the ordering, for example) for a sequence of related matrices (with "same" nonzero pattern but different numerical values). An entry that becomes zero will most likely become nonzero the very next iteration of a nonlinear solver.
>
> Try this code:
>
> A = sprand (1000,1000,0.1) ;
> for k = 1:10000
> A(1,1) = 1 ;
> A(1,1) = 0 ;
> end
>
> and you will be shocked to find out how much time it takes. Each iteration takes O(nnz(A)) time, just to squeeze out a single explicit zero entry.
------------------------------------------------------------------------------------------

It is a long time, but I guess I shouldn't be shocked. Since the table of explicit values has changed in size, MATLAB must re-allocate memory for it every time.


>
>
> There are other issues. For example, suppose you want to "precompute" the submatrix assigment A(i,j)=x. That statement requires a search of where A(i,j) is. If it could be guaranteed that the pattern wouldn't change, then the statement could be done in O(nnz(x)) time. As MATLAB stands now, that is not feasible.
>
> Finally, there are some algorithms that cannot be exploited because of explicit zero dropping. The pattern of L from a Cholesky factorization has a very specific property ("chordal"). Exploiting that property in x=L\b or cholupdate can can cut the time in half. Problem is, though, that if even a single entry is dropped (or worse, if you don't know if an entry is dropped or not), then the chordal property is no longer guaranteed. As a result, x=L\b is slower than it could be (by a factor of 2) when L comes from a sparse Cholesky factorization. Likewise, a sparse cholupdate is made nearly infeasible by the policy of always dropping explicit zeros.
>
> It's also likely that x=A*b could be faster if A had some special property to its nonzero pattern that could be exploited ... but can't if explicit zeros get dropped, thereby removing that special property.
>
----------------------

OK.


> The "find" command would be trivial to modify to allow its input matrix to include explicit zeros, in any case, with no real increase in the amount of work it would need to do.
>
-----------------------

I guess I don't know what you consider a "real increase". In a world where explicit zeros were allowed, the find() function would need to probe the explicit values list and test to see which of them are zero. Currently, it doesn't need to do this extra work.

Granted though, the same probing for zeros is already done in other operations to enforce the no-explicit-zeros rule, so one could say the burden has simply been shifted elsewhere.

> But all this discussion is moot, anyway, since MATLAB probably can't be easily changed to allow explicit zeros to be kept.
>
-----------------------------------------------

Can't easily be changed, but a new sparse matrix data class could be created alongside the old one, one with new class methods exploiting the capabilities of explicit zeros that you describe.

I would guess this to be the most principled approach because, as you've said, sometimes methods and operations that keep explicit zeros are optimal and sometimes the converse. To have alternative method implementation you to have represent the input as a different class. That's assuming you want to keep the same interface...

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com