I don't think there's anything available to accelerate an exact calculation of J'*J for general J. However, if you know in advance that J'*J happens to be banded to diagonals -k:k for small k (or if it can be approximated as such), then it might help to compute the 2*k+1 non-trivial diagonals individually. You can do so without transposition as below.
Whether this is actually faster will probably depend on the specifics of J. If nothing else, it spares you the large memory consumption of holding wide sparse matrices such as J' in RAM
>> J=sparse(m,n); Jt=J'; whos J Jt
Name Size Bytes Class Attributes
J 3192027x3225 25824 double sparse
Jt 3225x3192027 25536240 double sparse
Replacing J'*J by a banded approximation is something I haven't tried myself with Gauss-Newton specifically, but the role of J'*J is already as an approximation there, so I think it could work. Other minimization algorithms tend to be robust to small errors in the derivatives.