| Date | File | Comment by | Comment | Rating |
|---|---|---|---|---|
| 22 Nov 2009 | Variable Precision Integer Arithmetic Arithmetic with integers of fully arbitrary size. Arrays and vectors of vpi numbers are supported. | D'Errico, John | Sorry about that. Only the binomfactors code uses consolidator, and only nchoosek calls binomfactors. I never caught it in testing, since consolidator is on my search path. Anyway, and would be more efficient to use a call to unique there, then accumarray anyway, since no tolerance is employed. I've submitted a new release. It will be there on Monday morning. |
|
| 16 Nov 2009 | Polyfitn N-d polynomial regression model | D'Errico, John | As I explained the first time, you need to put polyfitn into a directory that is on your search path. help addpath
Your particular test problem gives me the results: uv = rand(100,2);
polyn2sympoly(p)
Nothing special needs be done except putting it in a directory where matlab can find it. |
|
| 16 Nov 2009 | Polyfitn N-d polynomial regression model | D'Errico, John | If you are trying to place downloaded files into the Matlab/toolbox directories, this is generally wrong. Those directories are cached by matlab when it starts up, so any files that you put in those directories will not be seen until you restart matlab. Also, anytime you make any changes to files that you put in those directories, the changes will not be recognized by matlab. Better is to create your own new (and separate) directory for downloaded files. Add that directory on your search path, using the pathtool function, or using addpath and then savepath. |
|
| 04 Nov 2009 | Counter. Counter. Hours, minutes and seconds. | D'Errico, John | There is no purpose at all in this mess. Nothing at all of value. |
|
| 04 Nov 2009 | Counter. Counter. Hours, minutes and seconds. | D'Errico, John | ||
| 29 Oct 2009 | SLM - Shape Language Modeling Least squares spline modeling using shape primitives | D'Errico, John | The new version just got uploaded to repair the 'active-set' problems incurred with the newer optimization toolbox releases. |
|
| 22 Oct 2009 | Polyfitn N-d polynomial regression model | D'Errico, John | James, You don't need every possible combination of (x,y,z) to be known to estimate a model, certainly not a first order model like this. To estimate the 3 unknown parameters in this model, in theory, you need at least 3 data points at a minimum. In practice, you won't get very good estimates of those parameters with only 3 data points. In the past, I often had clients ask me how much data they needed to obtain. After all, data is often scarce, expensive, hard to obtain. It costs money, and time. You can't have too much data, at least if it is good data. In practice, if you end up with 10 or 20 points to estimate these three parameters, that will (probably) be entirely adequate. More data will give you higher quality estimates. Even 6 points, carefully chosen, if the noise is low enough, will often be adequate. Only you know the quality of your data, and how much you have, as well as how accurately you need the model to be estimated. If you have further questions, I can only suggest you send me an e-mail, with your data, and any other information you have on the problem. John |
|
| 16 Oct 2009 | SLM - Shape Language Modeling Least squares spline modeling using shape primitives | D'Errico, John | The warning message that fmincon returns is new, due apparently to a change in the optimization toolbox. It is only a warning though, that does not hurt the operation of the code itself. I'll fix the problem. |
|
| 14 Oct 2009 | Aggregate data using any number of multiple columns Using two of Matlabs functions, unique and accumarray, this function summarizes any number of cols. | D'Errico, John | I might point out that this is identical to what the already existing function consolidator does. In addition, consolidator allows a tolerance for floating point data. |
|
| 04 Oct 2009 | SLM - Shape Language Modeling Least squares spline modeling using shape primitives | D'Errico, John | Fabian - This is more difficult to solve. Ordinarily, one would simply fit x(t) and y(t) independently. However, your constraint is on the term sqrt((dx/dt)^2 + (dy/dt)^2) Do you need a cubic result? If so, then it is more complex yet, since any overall slope type of constraint is bad enough to formulate for a cubic. I might use a brute force approach. Use a pair of independent models, x(t), y(t). You can put a global constraint on dx/dt and dy/dt on these models, limiting the maximum and minimum slopes attained. Now, go back, and test the actual velocity when the two curves are united into a parametric path in the (x,y) plane. If sqrt((dx/dt)^2 + (dy/dt)^2) never exceeds your velocity limit, then you are done. Otherwise, you will now need to use fmincon to perturb the parameters of the splines, while minimizing the global sum of squares of errors to (x(t), y(t)). The constraints for this will clearly be nonlinear. You might set one constraint at every point, but this will not constrain the true maximum velocity attained. So you might sample each curve at perhaps 1000 points, returning 1000 nonlinear constraints along the curve. This will give you a necessary condition on the velocity, but it need not be truly sufficient. Thus it might exceed the aim max velocity by a tiny amount. Finally, the optimization over the spline parameter space will also have other linear constraints on those parameters. I suppose one could (if you were adventuresome) go into the SLM code to extract (and return) the actual equations used to estimate the model as it is sent to lsqlin. Fmincon would need to employ those constraints too. HTH,
|
|
| 24 Sep 2009 | Moving window standard deviation A (fast) windowed std on a time series | D'Errico, John | Simplest is to replace the nans by interpolation. Use my inpaint_nans (also on the file exchange) to do this. Then apply this code to compute the moving standard deviation. High variance sections of the curve will still be high variance after this process. The virtue of this scheme is that both parts are fully vectorized codes. The alternative is to use a loop, computing a windowed standard deviation on varying numbers of points by excluding the nan elements. This scheme may be tripped up if you have enough nan elements. Of course, with very many nan elements, no scheme will be perfect. |
|
| 23 Sep 2009 | INT64 arithmetic in MATLAB Enables int64 Addition, subtraction, multiplication, division and modulus. | D'Errico, John | I can't test this myself, so I cannot rate it. But if it now prevents A*B when A and B are both matrices, neither of which is a scalar, then this is good if it would otherwise have done the wrong thing. Better of course is to implement array multiplication too (perhaps in a future version.) Regardless, there have been a few times recently when I wanted to have int64 operations defined, so this is a good addition to MATLAB. |
|
| 16 Sep 2009 | fminsearchbnd Bound constrained optimization using fminsearch. | D'Errico, John | Ryan, This is really not a bound constrained problem. Your constraint is a general one, either a linear or a nonlinear inequality constraint. The simple solution is to use a code that allows explicit linear or nonlinear constraints. There are a few on the FEX. In fact my fminsearchcon on the FEX does that, by applying a penalty to the problem when a constraint is violated. You might also look at optimize, by Rody Oldenhuis. This code allows explicit constraints of all forms. However, there is a simple way to solve this class of problem with fminsearchbnd. Use another class of transformation. For example, suppose one wishes to minimize the function f(x,y), subject to the "bound" constraint that x <= erf(y). Transform your problem such that u = x + erf(y)
Clearly, v must be bounded above by 0. So use fminsearchbnd to optimize over the two dimensional domain (u,v). Inside your objective function, for any value of (u,v) you will compute the parameters (x,y) as x = (u + v)/2;
Now you can finish evaluating the function f(x,y). The only problem here happens if you also had other fixed bounds on x and y. But many other transformations would also have worked. For example, this transformation: u = x
will still allow simple bound constraints on the parameter x, as well as allowing the nonlinear constraint x <= erf(y) as a bound constraint. John |
|
| 16 Sep 2009 | SLM - Shape Language Modeling Least squares spline modeling using shape primitives | D'Errico, John | Pete - A goal to be finished as soon as I can is to write a gui that will wrap SLM inside. Note that the computational tool in SLM is called SLMENGINE. This was purposeful, since my expectation is that most users would use the gui form when I am able to offer it. So I have definitely been planning for a gui wrapper. Even at that though, SLM would not have allowed you to just draw a curve through your data freehand, and then return the coefficients of that curve. A freehand drawn curve may be arbitrarily complex, or not even a single-valued function. SLM is best when you fit the curve, then apply your own special knowledge of a system to be modeled, in the form of constraints on the overall shape of the underlying functional form. But a drawn curve to follow is too broad of a constraint to follow. So SLM might not be capable in general of returning such a functional form in that eventuality. As well, this becomes a unique problem. Is the problem then to find the curve that fits the original data, and has the general shape of the freehand drawn curve? This involves two separate problems of approximation. It seems one must use a tool to smooth and approximate the drawn curve. Then take that same tool and try to fit the drawn curve through the data? This task would take some serious amount of effort, and would never be possible to do so where it would be transparent to the user. Worse, suppose the curve in the end had some lack of fit to the data (as perceived by the user?) Where did the lack of fit arise? Is it lack of fit to the freehand drawn curve? Or is it lack of fit to the data itself? |
|
| 15 Sep 2009 | INT64 arithmetic in MATLAB Enables int64 Addition, subtraction, multiplication, division and modulus. | D'Errico, John | Derek - it points out that even though the function mtimes is provided, this package only claims to support ELEMENTWISE operations. The quadratic behavior tells me that this code did not do a true matrix multiply. Did you check that A64*B64 was correct in your test? Element wise multiplication will take O(N^2) flops, whereas a matrix multiply (as * is supposed to generate) is O(N^3). I would have preferred that an error be generated if you use * on a pair of matrices, when that operation will not return the proper result. This would normally make me downrate this code. Since I cannot test it without compiling it, I won't give any rating at all. |
|
| 09 Sep 2009 | Surface Fitting using gridfit Model 2-d surfaces from scattered data | D'Errico, John | Use of an interpolant followed by a smooth is a poor second choice, for several reasons. Gridfit finds the surface that is as smooth as possible, that is consistent with the data. Smoothing a interpolated surface after the fact does not ensure that the result is consistent with the data. When you do a posterior smooth of the surface, the act of smoothing is now disconnected from the data. Next, if you use griddata to interpolate a surface in advance, you will only get a result that lies within the convex hull of the data. Griddata will not extrapolate unless you use the v4 option, and that option is VERY slow for any significant number of points. Gridfit can extrapolate using several methods, depending upon your goals. Extrapolation is an important capability of gridfit. But, extrapolation can come in many different forms. For example, consider a data set which is just slightly concave along one edge of the data. The published demo has a good example of this. See the second example, fitting a trigonometric surface. Along the edges of that data, see that the griddata interpolant generates long, thin triangles. Long thin triangles are terrible for interpolation, so what happens is you see strange interpolation artifacts along the edges. Gridfit allows you to have replicates in your data, treating them properly in a least squares sense to generate the surface. Try out griddata with replicate data points. Even near replicate points can introduce nasty artifacts in the interpolant. Worse, the delaunay triangulation used in griddata will often have problems if you have sets of collinear data points. This is no problem at all for gridfit. Finally, you can easily control the extent of smoothing done by gridfit. In short, griddata has its purposes. There are general circumstances when I recommend griddata. But I would never recommend the use of griddata to be then followed by a smoothing operator. |
|
| 03 Sep 2009 | Eigenshuffle Consistently sorted eigenvalue and eigenvector sequences | D'Errico, John | For schur, I'd like to help. But schurly, then I would have to call the solution the schurshuffle. http://www.youtube.com/watch?v=0V-VgRqsEcg Seriously, it seems the same methodology would work. And with a name like the schurshuffle, I'd hate to pass up the opportunity. |
|
| 29 Aug 2009 | Numerical integration with Simpson's rule This file computes a given definite integral using Simpson's rule. | D'Errico, John | If you want to improve things from what I first saw, recognize that this is purely for educational value, that it does not offer any computational utility. If so, then it should explain how Simpson's rule may be derived. (There are several ways to do that.) What is the error term? What does this error term mean? What types of functions will Simpson's rule be exact for? Show how it is a member of the Newton-Cotes family of integration rules. Show how to do an integration over multiple intervals as a composite of these rules over the successive intervals. Are there scenarios where Simpson's rule would be preferred to a lower order rule? To a higher order rule? Show some examples of the application of Simpson's rule to a problem. If you wish to go further, show how one might build an adaptive quadrature from Simpson's rule as a building block. Show how to use Richardson extrapolation to gain accuracy in an adaptive Simpson's rule. Use the tool as a way to teach about numerical integration, and along the way, possibly learn/teach a few things about numerical analysis too. I'll look back again in a few days. I'm not saying that a better rating from me absolutely must have all of the things answered that I've suggested above, although anything that discussed all of these topics in detail would be worth a high rating. At the very least, a better submission must make more of an effort than what I have seen here. |
|
| 28 Aug 2009 | Numerical integration with Simpson's rule This file computes a given definite integral using Simpson's rule. | D'Errico, John | Sorry, but I fail to see any value here. This does not teach anything about Simpson's rule. It uses only a 3 point rule, not a composite over multiple intervals, so this will be of no use to anyone who actually needs to do numerical integration. On top of this, there are several other Simpson's rule submissions on the FEX. By the way, why use the symbolic toolbox at all? The code is even misleading. The input statement (a terrible way to build an interface, by the way) tells you that you need to supply f(x) = 0. Sorry, but that has no meaning in the context of a numerical integration. ANY function may be integrated. So either the author misunderstands integration, or they misunderstand what a function is. This is nothing more than a homework assignment. As such, the only place for it is on your parents refrigerator, and then only if you are actually proud of it. |
|
| 27 Aug 2009 | Polyfitn N-d polynomial regression model | D'Errico, John | Assuming that you have an array, that contains only w(x,y,z) for each element of the array, use ndgrid to generate the combinations of (x,y,z). Then call polyfitn with the data. |
|
| 18 Aug 2009 | Automatic Numerical Differentiation Numerical derivative of an analytically supplied function, also gradient, Jacobian & Hessian | D'Errico, John | Hi Michelle, The problem is, when you define myfun as you have, it is not a scalar valued function. It returns a vector argument. Thus we can do this: myfun=inline('[exp(x);exp(2*x)]'); myfun(2)
See that there are two numbers returned for a scalar input. (By the way, I'd strongly recommend the use of function handles and anonymous functions instead of inline functions. An inline function is comparatively very slow to evaluate.) The derivest code computes the derivative of a scalar valued function. Change your function to return a scalar value, as do myfun1 and myfun2 below, and derivest works as designed. myfun1 = inline('exp(x)');
d =
myfun2 = inline('exp(2*x)');
Now, how might one define the derivative of a vector valued function? This is defined mathematically as the Jacobian. In fact, in the derivest suite, I have provided the jacobianest function to compute what you wish. It works on a vector valued function, returning a jacobian matrix. Here, since the parameter of these functions is a scalar, the Jacobian matrix is just a vector of length 2. [d,err] = jacobianest(myfun,1)
|
|
| 12 Aug 2009 | Mixing / Combining 2 Images (Image Fusion) This program fuses/combines/mixes 2 images It supports both Gray & Color Images | D'Errico, John | Beyond what us mentions, this only does a trivial linear combination of the two images. The images that it works on are fixed, hard coded file names. This is a terrible interface. No help. No error checks. Christopher is a bit infatuated with this author. |
|
| 30 Jul 2009 | Fractions Toolbox create and manipulate fractions (K+N/D) using exact arithmetic | D'Errico, John | Nifty |
|
| 22 Jul 2009 | sumsqint Finds all distinct ways of writing a number as the sum of squares, i.e. solve x^2+y^2=n for 0<=x<=y. | D'Errico, John | A nifty piece of code. Well written. Good help. Fast - even on my old cpu. tic,xy = sumsqint(vpi(51)^32);toc
xy
|
|
| 22 Jul 2009 | Variable Precision Integer Arithmetic Arithmetic with integers of fully arbitrary size. Arrays and vectors of vpi numbers are supported. | D'Errico, John | Hi Gwendolyn, I was going to submit a new version today again. The simplest thing to do is to save that file in a form that is compatible with older releases that go back to version 7. John |
|
| 14 Jul 2009 | Variable Precision Integer Arithmetic Arithmetic with integers of fully arbitrary size. Arrays and vectors of vpi numbers are supported. | D'Errico, John | Ben - I've submitted an update, that now handles the pathological cases for plus and times. |
|
| 14 Jul 2009 | Variable Precision Integer Arithmetic Arithmetic with integers of fully arbitrary size. Arrays and vectors of vpi numbers are supported. | D'Errico, John | John - I think the problem is shown by your path. Are you using an older release of MATLAB? Perhaps release 11? If so, I'll not be at all surprised at seeing a problem. I would not expect vpi to run on older versions, and I have not tested it there. If you actually have a more recent release, then please contact me via mail and I will help to resolve the problem. |
|
| 03 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | D'Errico, John | Improving. It now runs as claimed for the 1-d case, so the bug is removed. I also like that this is called sqdistance. The problem with calling it distance is that it was not a distance. Better is to make that clear, that this returns the square of the Euclidean distance. So this is also good. There is now even an attempt at error checking, in that the author uses the assert function to flag problems. However, assert allows you to provide TWO arguments. See what happens when I call sqdistance with improperly sized arrays: sqdistance(rand(3,4),rand(2))
All it tells me is "Assertion failed". For gods sake, what assertion? Use the second argument! Allow the code to exit gracefully and descriptively. Tell the user that the arguments are incompatible in size for this operation. To just tell them "Assertion failed" is silly. Why bother to do so? Obviously from the authors last comment, this is just a code for his own academic purposes. So apparently it needs not be any good, or do what he claims it does. I'll argue that he is wrong. When you post something on a website like this, hundreds or even many thousands of MATLAB users may use your code, or try to do so. They may look at it, hoping to learn something from what you post. So I'm sorry, but it would be a disservice on my part to any MATLAB user or student for me NOT to tell them that I see a problem, and what is wrong, and if possible, how the problem should best have been resolved (in my opinion.) And if this author has improved his code or coding style because of what I show to be wrong, then I've helped him too. |
|
| 02 Jul 2009 | Solution to Project Euler Problem 54 How many hands did player one win in the game of poker? | D'Errico, John | I see some changes have been made. These extra comments really help the person reading the code. My thanks there. Personally, I'd still hope that beginning PE solvers try the problem first, before looking at this. Only afterwards should you read through this submission to compare this code to yours. But this is a reasonable approach to the problem. I'll still abstain on an actual rating, but if I were to give this a numeric rating, my rating would be a decent one. |
|
| 02 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | D'Errico, John | Somewhat better now in ONE respect. However, just because you have never suffered from the extreme case I showed does not mean that you have never had the problem. You've just never noticed it, or even known to look. There is still a problem. Perhaps my long comments before were not extensive enough. This is NOT actually a valid distance metric. NOT. NOT. Read the definition of a distance metric. Here are a few sites: http://www.mathreference.com/top-ms,dm.html
See that one requirement for a valid distance metric is the triangle inequality. The triangle inequality states that D(x,y) <= D(x,z) + D(y,z) Does distance satisfy that? TRY IT! distance(1,5)
distance(1,3) + distance(5,3)
(distance(1,3) + distance(5,3)) > distance(1,5)
So this function fails to satisfy one of the basic requirements for a distance metric. This does NOT compute a distance. Just because you call the function distance does not make it a distance. The failure to compute the sqrt makes this not a valid distance. Next, this function fails to work properly in one dimension! distance([1,2])
I would have expected to see the matrix [0 1;1 0] returned as the result. (Surely you will concede this fact.) Don't complain that it was my suggested change that caused it to fail, because the original code fails too here, and by a larger margin. originaldistance([1,2])
Finally, the changes to the help made it more accurate, but still fail to describe the behavior of this code. The help now states that when when called with two arguments, it returns the square of the interpoint Euclidean distances between columns of the matrices. It does not state at all what happens when called with only one argument. Looking slightly deeper at the code pointed out serious flaws still. My rating is verging closer to one star at this point. |
|
| 02 Jul 2009 | Pairwise Euclidean distances Fully vectorized function to compute square Euclidean or Mahalanobis distances between vectors. | D'Errico, John | The problem with this code is it is potentially inaccurate. It uses an identity that people are oh so impressed with, but squares the numbers unnecessarily. I'll talk about this problem eventually, but first, let me discuss another major flaw with this code. It does not actually compute the Euclidean distance. It computes the square of that distance. As such, it has the wrong units. Don't forget that if your data has units miles, then all "distances" produced by this function will have units of miles squared. Note that this code claims to produce the same numbers as does the pdist function. But it does not. pdist produces TRUE distances, not the square of the distance. There is a difference. A = randn(3,5);
distance(A,B)
The true interpoint distances, as one will normally see by any correct tool available to compute distance is closer to this: sqrt(distance(A,B))
I imagine the author will claim that by not computing the square root, it saves time. I suppose so, but to then claim that the result is a distance as most expect to see would be misleading. And surely to claim that it produces the same as other codes is very misleading. A nice property for a distance measure to have is linearity. We would like to see that distance(k*A,k*B) = k*distance(A,B) It is something that pdist will give you. But this is not a property one will gain from distance. On to the other problem, the one that I see as most damaging. Recall the results of the above experiment for distance(A,B). Now, try adding any fixed offset to both A and B. I'll add a large one to the matrices to show that complete trash is generated. distance(A+1e8,B+1e8)
See that NO significant digits remain in the result. A higher quality tool (like pdist for example) will survive even this extreme test quite nicely. As it turns out, this is not an extreme test for pdist. (Since pdist does only interpoint distances for a single matrix, these numbers will not be comparable. See only that adding 1e8 did not change any digits printed out.) pdist(A'+1e8)
pdist(A')
I will point out that all of the above mentioned flaws could have been repaired with a few simple changes to the code. Had the code been written as follows, it would take only about 20% longer to execute in my quick test, but it would have worked properly and robustly. if nargin == 1
Finally, the help for this code is itself wrong. Here is what it says when you do "help distance". Compute distances between all sample pairs
See that it tells you that X is a d by n matrix of data. Does it tell you that the function actually accepts TWO arguments? That if there are two arguments, then it computes distances between all pairs of columns of the two matrices? No error checks to verify the data actually conforms for distance computation. I'll be generous and give this a 2 star rating. |
|
| 02 Jul 2009 | Solution to Project Euler Problem 54 How many hands did player one win in the game of poker? | D'Errico, John | Ok. I'll admit that part of me wishes that Husam did not post this submission, but only a small part of me thinks that way. Why? Because I am a great admirer of the Project Euler problems. They afford a great sense of satisfaction in the solving, knowing that you have used your skills in MATLAB to solve these problems. Most problems require some insight, some understanding of a mathematical concept, as well as not insignificant programming skill to solve in your chosen language. As such they are a tremendous vehicle to help you to learn to use a tool like MATLAB. I have always maintained that the best way to learn a language is to have a project that requires solving. So I hope that most Euler solvers will choose to solve the problems on their own, first, only then look to the occasional published solutions they will find on the net. On the other hand, this is one of the moderately easy PE problems. if this particular solution causes others to find an interest of their own in these same problems, then it is worth having posted it. As I said, my opinion is mixed here. While I am quite happy to see some advertising for others to begin the Project Euler adventure, I hope not to see solutions to each problem appear here on the FEX. As for the code itself, I won't offer a rating because of those mixed feelings. I do very much like that the code is written in a modular fashion. There are useful comments in the main block of code - a good start there, although the comments get sparser towards the end. The individual modules are entirely without comments though, and this is a place where those comments would have been most valuable. How for example, did the author choose to test if a hand is a flush? 3 of a kind? A tutorial code should explain itself. Is there anything else that I might have hoped to see? Yes. Very often solving a problem like this requires the programmer to implement an encoding for the data. In this case, is there an efficient way to encode the hands for evaluation of the potential of those hands? I'd have liked to see some exposition on the data structure employed, as I can think of at least two or three ways to store them. Careful advance thought about the data structures employed is often critical to an efficient solution of the problem down the line. Again, my point is to use this tool as a means to teach how to use matlab to solve a problem. Help the person reading your code to understand how you solved the problem. |
|
| 27 Jun 2009 | Automatic Numerical Differentiation Numerical derivative of an analytically supplied function, also gradient, Jacobian & Hessian | D'Errico, John | Benn, My guess is that you have written your own factorial function, probably as a homework assignment, since the error messages show a recursive implementation of factorial. This causes a failure in derivest, since your version is not properly vectorized. By vectorized, I mean that when called with a list of numbers, the Mathworks version of factorial computes the factorial of each number. You can learn which factorial version is called by typing this at the command line: which factorial -all MATLAB will show which version of the factorial function will be used. You should see the version that you wrote listed at the very top. At first, I thought that possibly you simply are using an old version of MATLAB, but this is unlikely, since the Mathworks would never have written the factorial function to have a recursive definition as I see. (Recursive factorial definitions are a nice way to teach a student recursive coding styles, but they are a terrible coding style in practice because of the extraneous, wasted function call overhead just to multiply a list of consecutive numbers together.) The solution is to remove the bad (homework assignment) version of factorial from your search path. It will be inefficient anyway compared to the supplied version of factorial that MATLAB already comes with, and your version is not properly vectorized. At the very least, move it to another place far down on your search path so that matlab will use the correct version of factorial instead. John |
|
| 16 Jun 2009 | Analytical intersection area between two circles Compute the intersection area between 2 circles. The function allows to evaluate the intersection ar | D'Errico, John | Hmm. I just noticed that my followup rating was shadowed out, and the original one remains. I'll see if I can't have the old rating removed. |
|
| 14 Jun 2009 | mregress Performs multiple linear regression. | D'Errico, John | Ok. I had to look here. The problem is, the formulas are poor. Her is a paste of part of the code: =========================================== % Solve for the regression coefficients using ordinary least-squares
if (INTCPT == 1)
=========================================== See that the normal equations are used, as well as (shudder) inv. This is the WRONG way to solve this problem. It is a mistake that many novices to regression make. It is a mistake that many people who have done regression for years make, simply because they have never been told they are doing it the wrong way. It is a mistake that many authors of statistics texts make, simply because nobody has ever told them either. The normal equations, are C = inv(X' * X) * X' * Y are ONE way to solve the over-determined problem X*C = Y They are a POOR solution. First of all, this formula uses inv. Somewhat better would be to write it as C = (X' * X) \ (X' * Y) The use of backslash here avoids the use of inv. But it still fails because the normal equations are still involved, just in another form. Look at a better solution. C = X \ Y; This uses the built-in backslash to solve the problem. How would backslash solve it internally? (I'll assume that no pivoting is done to make things simple.) Suppose we choose to form a QR factorization of X. X = Q*R Here, Q is an orthogonal matrix, R is upper triangular. (Again, I'll ignore any pivoting considerations.) Our problem is now to solve Q*R*C = Y Since Q is orthogonal, then we can left multiply by the transpose of Q, to reduce the problem to R*C = Q' * Y Remember, Q was orthogonal, so that (Q' * Q) was an identity matrix. We now solve for C as C = R \ (Q' * Y) Backslash is smart here. It knows that R is upper triangular, so backslash will do the solution as a back-substitution, an O(n^2) operation. No explicit inverse is required at all. Again, backslash does all of this internally. But it is a far better solution than the normal equations. What happens in the normal equations to make that solution poor? The problem is the condition number of the matrix. If your matrix is at all poorly conditioned (something that is very common in multiple regression problems) then when you form (X'*X), the condition number of the matrix will be SQUARED. So the normal equations are a terribly poor way to solve the system posed. We can get the condition number of a matrix from the cond function. It is defined as the ratio of the largest to the smallest singular value of the matrix. When this number is large, then you should expect to loose accuracy in the solution of your equations. Look at a simple, random matrix. (By the way, the condition number will be small here. But most regression problems have a much larger condition number.) X = rand(10,5);
See that the largest and smallest singular values would have a ratio of a wee bit over 10. cond(X)
What happens when I form X'*X? cond(X'*X)
Yes. As expected, the condition number was squared. Try this again with a more typical regression problem. Here, just a simple 10th order polynomial in one variable. x = rand(20,1);
cond(X)
cond(X'*X)
See that the condition number of (X' * X) is roughly 1e15. In fact, if we use the normal equations to solve this problem, the coefficients will have few correct digits in them, because (X' * X) is nearly a numerically singular matrix. Now try an 11th order polynomial. Make some random data too. Since the actual numbers don't matter, just see if there is a difference in the solution. X = bsxfun(@power,x,[11 10 9 8 7 6 5 4 3 2 1 0]);
X\Y
Now compare that to the normal equations. See that inv complains here, because the system is now effectively singular. inv(X'*X)*X'*Y
See that many of the resulting coefficients do not even agree in a single digit of the result. The problem is when we formed (X' * X). |
|
| 09 Jun 2009 | Skin Detection in RGB images Program to detect skin regions in an RGB image. | D'Errico, John | Impatient are you? 145 downloads with no comment is not at all uncommon. Given that this has been on the FEX for all of 9 whole days? Relax. Stop drinking so much coffee. |
|
| 29 May 2009 | optimize optimize (non)linear (in)equality constrained functions with FMINSEARCH | D'Errico, John | Well done. |
|
| 27 May 2009 | Designing serial method computing of cardiac partial differential equations VI1,on 1D,2D and 3D To solve cardiac PDe equations i propose *-serie,or x-serie solution V(x,t)=no(t)+n1(t)x+n2(t)*x*x | D'Errico, John | Not MATLAB related. No code. No hints about what this is. Unreadable description. Useless to anyone in the entire world. |
|
| 20 May 2009 | Polyfitn N-d polynomial regression model | D'Errico, John | Agustin found his problem - this is always a path problem. Something got put where MATLAB does not look. |
|
| 18 May 2009 | choose.m compute number of ways of choosing m objects from n distinct objects | D'Errico, John | Why submit functions that do the same thing that existing utilities in MATLB do better? nchoosek does the same computation, but does it better, since nchoosek offers more functionality, and already exists in every copy of matlab for as far back as I can recall. This has little in the way of error checking. No H1 line. Admittedly, if you are trying to compute large binomial coefficients, you will be far better off using a variety of other tools. But this one fails when it needs not do so. >> nchoosek(1000,800)
>> choose(1000,800)
|
|
| 17 May 2009 | show_colors.m generates a labeled plot showing all colors recognized by RGB.m | D'Errico, John | Perhaps fig may be available on the internet, but where? Give a link. Is it really that difficult to be friendly to those who might try to download your work? Is it really necessary for others to be forced to do these searches to find out how to use your code? IF your code uses something that will not be available in MATLAB proper, then tell people about it. If it uses a toolbox function, then show that toolbox as a dependency. If it uses another FEX submission, then tell people about it. Put a link in your code that shows where to find it. Do NOT post code that has dependencies without that information, as this then is non-working code. Non-working code is by definition poor. |
|
| 15 May 2009 | Don't let that INV go past your eyes; to solve that system, FACTORIZE! A simple-to-use object-oriented method for solving linear systems and least-squares problems. | D'Errico, John | Absolutely splendid! Many thanks to Tim for writing this. A great title too. |
|
| 14 May 2009 | hostname.m Report the name of computer on which Matlab is currently running. | D'Errico, John | Macs will return either of MACI or MAC, depending upon whether the system is an older or new inter Mac system. The help for computer.m states this fact. So hostname still fails to work on older MAC systems, although it will probably now run on intel based macs. Also, this submission too has the flaw that it has a blank comment line BEFORE what serves as an H1 line. Since that disables lookfor, it is a flaw, and a silly one. I'm not at all sure why the author insists on doing this. It is unfriendly to disable a very useful feature of MATLAB for absolutely no good reason. |
|
| 14 May 2009 | Percent Done Displays the percentage done of a job to the command line interface. | D'Errico, John | Fixed the problems I saw. It does what it claims to do. It is faster now too, taking a much smaller chunk of time away from your own computations. Whereas the last time I did this test, it too almost 10 seconds just to do the progress report, this test is down to only about a second. tic,for i = 1:10000,perccount(i,10000),end,toc
Perccount is not infinitely fast, but waitbar will be just as slow, if not worse. This begs the question of just how slow? h = waitbar(0,'I''m busy!');
Yes, just 10000 calls to waitbar, with nothing else in the loop, took over 2 minutes! So, for the person who will have wanted to ask, "Why not just use waitbar?" this is a good reason. All of those calls to update a slider bar take considerable time. Yes, I'll admit that more careful coding, calling waitbar only infrequently in that loop would have been faster. So, h = waitbar(0,'I''m busy!');
Elapsed time is 1.280422 seconds. So for the person who does not wish to write the extra code to make waitbar function passably, or for the person who would actually prefer to have progress reported in the command line, perccount is a very viable option. My thanks for fixing this. Well done. |
|
| 14 May 2009 | logarithmic rounding does logarithmic rounding using any of 7 modes | D'Errico, John | I see that the code has been updated to provide vectorized operation. As always, I should consider raising my rating. First, I looked for an H1 line. No. Still a blank line as the very first line of the help. This is utterly silly, since the second line of the help is a perfectly serviceable H1 line. This disables lookfor. Interestingly, the help facility that automatically builds a contents file for directories using the first lines of the help blocks finds roundl. But lookfor is itself disabled here. So this problem has not been repaired. How bout the vectorized question? x = rand(1,5)*100
roundl(x)
By default, roundl uses a rounding strategy, i.e., a rounding mode of 'n'. But it can round up or down, more like a ceiling or floor operation then. All of these operations work nicely, and a default appears to be supplied for the rounding mode if necessary. Or is it? roundl(x,'n')
roundl(x,'u')
roundl(x,'d')
Now try the code by adding the third argument. The third argument to roundl allows you to control the levels rounded to. roundl(x,'n',7)
No problem so far. But what happens if you don't wish to change the default for the rounding mode, only the default for n? The paradigm in MATLAB when you have variables that will take on a default is if the variable is left empty ([]), then use the default. Does this work? No. This fails. I tried these things: roundl(x,[],7)
roundl(x,'',7)
I even tried this: roundl(x,7)
As I said, this fails to follow the standard MATLAB paradigm for default arguments, which makes the code not very friendly. If you wish to change the rounding level, you must also provide a default for the other parameter. It is a truly silly flaw, because is it really that difficult to write your code as: if (nargin < 2) || isempty(n), mode= 'n'; end versus the line as it was written? if nargin < 2, mode= 'n'; end Sorry, but it is easy to write friendly code, and it is easy to make the code compatible with lookfor. Given this newly found problem, a 3 rating still applies here. |
|
| 14 May 2009 | nextprime For any given number (also vpi numbers), find the next prime number in the sequence of primes. | D'Errico, John | The new release that I just submitted works up to 2^46. Beyond that, vpi numbers will still work, as they have always done. |
|
| 14 May 2009 | nextprime For any given number (also vpi numbers), find the next prime number in the sequence of primes. | D'Errico, John | I do appreciate the review by Dr. Feldman. However, it shows a few basic misunderstandings of MATLAB. They are logical mistakes, but they reflect the point of view of someone who still thinks they are using perhaps C or JAVA or FORTRAN or PYTHON, pick your favorite language other than MATLAB. To start with, he suggests that this code should use integers, to make it faster. The first problem (a truly fundamental one) with that suggestion is that support for operations between integers is limited in matlab. For example, in R7007b of MATLAB (not all users use the latest release, so it makes sense to do this test in a slightly older release anyway) >> N = int64(101);
>> N = int64(101);
See that even the most trivial operations are not supported for int64. Likewise, uint64 is also lacking in support for these operations in this release. >> P = uint64(23);
So it might be a good idea to suggest use of integers. That might possibly have been a good suggestion were we writing in one of those other languages rather than MATLAB. But we are using MATLAB here. Dr. Feldman's suggestion is a practical impossibility when arithmetic for that variable type does not exist. While I admit that that support is present for the INT32 variable type: >> N = int32(101);
>> N = int32(1:10000);
>> N = double(1:10000);
In fact, arithmetic using doubles is FASTER than int32 arithmetic. int64 or uint64 arithmetic is not even defined in the target release, so the speed is incomparable there. ;-) NEXTPRIME does work for my vpi class, in fact, it was originally designed for those tools. However, they are slower to use than the same direct operations on double variables. So there is no reason to force all operations into the vpi class. Worse, this would restrict the utility of this function, forcing the user to have my vpi toolbox installed even to do something as trivial as finding the next prime number above 20. The reason I posted this submission separately from my vpi toolbox is because it has a broader range of utility than for just those people who would work with truly large integers. Finally, the author suggests that NEXTPRIME should support doubles beyond 2^32. The main problem here is with MATLAB again. >> isprime(2^32+1)
The isprime function does not work beyond 2^32, and nextprime uses isprime to make the final determination of whether a candidate number is prime, having weeded out most alternatives with a partial number sieve first. So this operation will fail. One solution here is to convert to my vpi class if the user supplies a number as large or larger than 2^32, testing first to see if the vpi tools are installed. A different solution would be to replace the call to isprime with a custom isprime call. I'll consider these alternatives as options. |
|
| 13 May 2009 | Percent Done Displays the percentage done of a job to the command line interface. | D'Errico, John | Help is reasonable here. No error checks. (sadly, error checks will just slow this down.) An H1 line. This works reasonably if you never call it more than 100 times, and if each pass through the loop is lengthy. The fact is, many loop variables will exceed 100. So I tried this: for i = 1:100000
What you will see is the numbers flickering, so quickly that you cannot read the percentages. Perhaps a better solution is to use a persistent variable, checking to see if there is a reason to update the screen (i.e., the percentage as reported would be different.) Do something like round(100*jj/Imax). If this changes, then and only then do an update to the screen. Don't forget that calling such a function many thousands or even millions of times may add significant additional time to the running of your code. 0.001 seconds for each call adds up when you make that call 1e6 times. Admittedly, this is a problem with waitbar too. In general, it is probably better to restrict calls to any style of progress report to only a few hundred at most. So a test outside of this code may make some sense. Whenever I put in a waitbar, I generally do exactly that. Next, I ran a timing test: tic,for i = 1:10000
Percentage complete: 99%Elapsed time is 9.704124 seconds. First, see that toc has reported its result on the same line as the progress report. This would be irritating, at least to me. Clean up after yourself! Next, it too almost 10 seconds (on my admittedly slow CPU) to just call this function 10000 times. So big loops will indeed add a significant amount of time to your code. Finally, this code takes its second variable with the name "max"! (Even thought the help calls it Imax, the code itself calls the variable max.) While this is in itself not a bug, it is a poor choice. You have superseded the max function. While this code does not try to call max, it is just bad form. Use names for variables that are NOT the names of existing, frequently used, built-in functions. Someone who is reading your code in the future may easily be confused. Worse, suppose you decided to change this code in a way that required you to use the max function itself? A bug waiting to happen. I do appreciate that all waitbars are time hogs, even so, this seems to be worth about a 3 rating. As always, improvements are encouraged. |
|
| 13 May 2009 | hostname.m Report the name of computer on which Matlab is currently running. | D'Errico, John | Despite the claim, this fails to run on all operating systems. Try it on a PPC Mac. No go. unix('hostname') works fairly well if you are on a PPC mac. I've not verified it is correct for an intel mac. |
|
| 12 May 2009 | 3D Least squares polynomial fit in x and y Fit a two dimensional f(x,y) polynomial to sampled x,y,z data triplets | D'Errico, John | But if all you want is to show how to build a simple code, it is still easily done without using the normal equations. For example, given column vectors x, y, z, here is a better solution using only ONE line of code: coef = (bsxfun(@power,x,m(:,1)').*bsxfun(@power,y,m(:,2)'))\z; Better is to include various error checks to make the code friendly, so that one need not worry about whether the user has actually provided a set of column vectors. At least, make it robust, so either row or column vectors will suffice. Even this can be done in one line! coef=(bsxfun(@power,x(:),m(:,1)').*bsxfun(@power,y(:),m(:,2)'))\z(:); Simple scalings can be easily added too, even returning parameter variance estimates in only a few lines. So even without all the bells and whistles, this could/should have been done better. I'll repeat that the normal equations are a poor way to solve linear least squares problems if you are EVER worried that those equations may be poorly conditioned. And polynomial problems are a common source of ill-conditioned linear systems. AVOID the normal equations. Too often they are taught as the way to solve linear least squares problems by someone who does not understand why they are bad. This propagates to their students. You will even find them in books, but just because something is in a book means nothing. You don't need to pass a test on numerical analysis to write a book. |
|
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.
Contact us at files@mathworks.com