ans =
Using MATLAB to find a generative equation for a sequence
This stems purely from some play on my part. Suppose I asked you to work with the sequence formed as 2*n*F_n + 1, where F_n is the n'th Fibonacci number? Part of me would not be surprised to find there is nothing simple we could do. But, then it costs nothing to try, to see where MATLAB can take me in an explorative sense.
n = sym(0:100).';
Fn = fibonacci(n);
Sn = 2*n.*Fn + 1;
Sn(1:10) % A few elements
For kicks, I tried asking ChatGPT. Giving it nothing more than the first 20 members of thse sequence as integers, it decided this is a Perrin sequence, and gave me a recurrence relation, but one that is in fact incorrect. Good effort from the Ai, but a fail in the end.
Is there anything I can do? Try null! (Look carefully at the array generated by Toeplitz. It is at least a pretty way to generate the matrix I needed.)
X = toeplitz(Sn,[1,zeros(1,4)]);
rank(X(5:end,:))
Hmm. So there is no linear combination of those columns that yields all zeros, since the resulting matrix was full rank.
X = toeplitz(Sn,[1,zeros(1,5)]);
rank(X(6:end,:))
But if I take it one step further, we see the above matrix is now rank deficient. What does that tell me? It says there is some simple linear combination of the columns of X(6:end,:) that always yields zero. The previous test tells me there is no shorter constant coefficient recurrence releation, using fewer terms.
null(X(6:end,:))
Let me explain what those coefficients tell me. In fact, they yield a very nice recurrence relation for the sequence S_n, not unlike the original Fibonacci sequence it was based upon.
S(n+1) = 3*S(n) - S_(n-1) - 3*S(n-2) + S(n-3) + S(n-4)
where the first 5 members of that sequence are given as [1 3 5 13 25]. So a 6 term linear constant coefficient recurrence relation. If it reminds you of the generating relation for the Fibonacci sequence, that is good, because it should. (Remember I started the sequence at n==0, IF you decide to test it out.) We can test it out, like this:
SfunM = memoize(@(N) Sfun(N));
SfunM(25)
2*25*fibonacci(sym(25)) + 1
And indeed, it works as expected.
function Sn = Sfun(n)
switch n
case 0
Sn = 1;
case 1
Sn = 3;
case 2
Sn = 5;
case 3
Sn = 13;
case 4
Sn = 25;
otherwise
Sn = Sfun(n-5) + Sfun(n-4) - 3*Sfun(n-3) - Sfun(n-2) +3*Sfun(n-1);
end
end
A beauty of this, is I started from nothing but a sequence of integers, derived from an expression where I had no rational expectation of finding a formula, and out drops something pretty. I might call this explorational mathematics.
The next step of course is to go in the other direction. That is, given the derived recurrence relation, if I substitute the formula for S_n in terms of the Fibonacci numbers, can I prove it is valid in general? (Yes.) After all, without some proof, it may fail for n larger than 100. (I'm not sure how much I can cram into a single discussion, so I'll stop at this point for now. If I see interest in the ideas here, I can proceed further. For example, what was I doing with that sequence in the first place? And of course, can I prove the relation is valid? Can I do so using MATLAB?)
(I'll be honest, starting from scratch, I'm not sure it would have been obvious to find that relation, so null was hugely useful here.)
13 Comments
n = sym(0:100).';
Fn = fibonacci(n);
Sn = 2*n.*Fn + 1;
null(Sn((hankel(1:6,6:11))))
Also, there are an infinite number of higher order difference equations that can generate this sequence (though I'd imagine the lowest order equation is of most interest). For example
c = null(Sn((hankel(1:7,7:13))))
The first column of coefficents is what we already had, and the second defines a 6th order difference equations that can generate Sn. The equations can be linearly combined as well.
s = @(n) Sn(n);
eq1 = @(n) c(1:end-1,1).'*s(n:n+5); % first equation
eq2 = @(n) c(1:end,2).'*s(n:n+6); % second equation
eq3 = @(n) 5*eq1(n) + 6*eq2(n); % linear combination
% test
[eq1(23) eq2(23) eq3(23)]
[eq1(84) eq2(84) eq3(84)]
I can post an alternative approach to deriving the difference equation for s(n) and a closed-form expression for s(n) if appropriate for this discussion ....
As @Adam Danz said, "There are several MATLAB functions here that I don't often cross paths with, so I appreciate the reminders and introductions!"
One of the things I hope to do in this sequence of discussions I plan to post, is to teach more people about some nifty things we can do In MATLAB, sometimes using tools you have never seen, or never thought of in some context, and maybe using MATLAB on problems you never thought possible.
In this post, for example, we see a tool from linear algebra like Toeplitz used to create a matrix, and then rank and null applied, all in the context of what is really a number theoretic problem. Personally, this is something I love about mathematics, that sometimes different branches of mathematics can come together in surprising ways.
Yep. In fact, it will have faster growth than any polynomial function of n, since we know the Fibonacci numbers grow exponentially, if I recall as O(phi^n/sqrt(5)) where phi comes from the golden ratio.
phi = (sqrt(5) + 1)/2
And any exponential function will eventually outpace any polynomial. So you know no polynomial can approximate it.
By that logic, we can even find a good approximation for the sequence, in terms of phi, valid for at least larger values of n.
n = (50:60).';
Sn = 2*n.*double(fibonacci(sym(n))) + 1;
Snhat = 2*n.*phi.^n/sqrt(5) + 1;
format long g
[Sn,Snhat]
I'm almost a little surprised Alpha did not try a model of that form.
Great article, @John D'Errico! There are several MATLAB functions here that I don't often cross paths with, so I appreciate the reminders and introductions!
Out of curiosity, I asked Wolfram Alpha, "What sequence is this? [1, 3, 5, 13, 25, 51, 97, 183, 337, 613]". It generated a closed-form equation, which I translated into MATLAB as follows:
f = @(n)(47*n^8 - 2000*n^7 + 36596*n^6 - 366758*n^5 + 2184833*n^4 - 7816130*n^3 + 16124124*n^2 - 17294832*n + 7000560)/(2520*(13*n - 66));
I didn't bother vectorizing it, so I tested it in a loop. The results were as expected for the training set
for i = 1:10
fprintf('%g ',f(i))
end
However, when testing it beyond the initial set:
for i = 11:15
fprintf('%g ',f(i))
end
The results did not match the expected values (1101, 1959, 3457, 6059, 10557). This isn't surprising with fitted polynomials.
The obvious question is, can I prove in MATLAB, what I was able to derive from data? That is, I can assert that the recurrence relation I derived is valid for n up to 100. But is it always valid? Logically, it seems likely true, but is it ALWAYS valid?
What can I do here? First, reduce everything to Fibonacci numbers. That is, rewrite the recurrence relation into the same relation, but in terms of the Fibonacci sequence. Given the relation:
S(n) = 2*n*F(n) + 1
and the recurrence:
S(n) = S(n-5) + S(n-4) - 3*S(n-3) - S(n-2) +3*S(n-1)
First, we can simpy toss away the additive constant, since the coefficients in the recurrence relation sum to zero. If it is valid, the recurrence relation would be equally valid for
S(n) = a*n*F(n) + b
where a and b are ANY finite constants! Remember, this is a LINEAR recurrence. As such, I'll arbitrarily choose a=1, and b=0, only to make things easier to follow below.
Now rewrite the recurrence as:
-n*F(n) + 3*(n-1)*F(n-1) - (n-2)*F(n-2) - 3*(n-3)*F(n-3) + (n-4)*F(n-4) + (n-5)*F(n-5) = 0
Finally, convert this into matrix form, where I will store only the coefficients of the Fibonacci numbers. That will look like this:
syms n
A = [[-n, 3*(n-1), -(n-2), -3*(n-3), (n-4), (n-5)];toeplitz([1;0;0;0],[1 -1 -1 0 0 0])]
Look carefully at the matrix A. You should see that rows 2 through 5 represent the basic Fibonacci recurrence. These rows merely reflect that any Fibonacci number can always be written as the sum of the previous two Fibonacci numbers.
The first row is the essence of the recurrence. Again, I've chosen to discard the additive and multiplicative constants to make it easier to read. But that changes nothing in any substantial way.
Next, comput the rank of the subarray, composed of rows 2 through 5,
rank(A(2:5,:))
It tells us no linear combination of those rows will yield a zero vector. But if we compute the rank of the entire matrix A
rank(A)
The rank is still 4. We learn by appending the first row, it can be represented as a sum of the other 4 rows, since the overall rank is still 4.
The conclusion is, the linear recurrence relation for S_n is valid, for any n.
I tinkered with some of the provided code. Rather than calculate a specific value in the sequence, I wrote functions to return the entire sequence up to the specified integer, n (1:n). I observed the following:
- Using symbolic variables is about three orders of magnitude slower than using regular doubles (tested for n = 12). I am curious why you chose to use them. Given their slowness, I commented that out for subsequent testing.
- Memoized functions appear to not work with feval, which I used in a driver function to run each variation as a test. (The feval error stated that the memoized function was unrecognized.)
- Using a vectorized version of the function is significantly faster than the recursive Sfun. For values of n = 20, 25, and 30, the speed-up factor was about 10, 100, and 2000+, and Sfun runtimes were on the order of 0.01s, 0.1s, and 2–3s, respectively. Testing values beyond n = 30 for the recursive function was impractical. On the other hand, the vectorized function (which is included below) consistently completed on the order of 0.001s, even for n = 100. (Preallocation of Sn for larger values of n may slightly improve this function, though n is never expected to be very large.)
function [Sn] = Sfun_vectorized(n)
Sn = [1, 3, 5, 13, 25];
if n < 5
Sn = Sn(1:(n+1));
else
coeff = [1; 1; -3; -1; 3];
for ii = 5:n
Sn(end+1) = Sn((end-4):end) * coeff; %#ok<AGROW>
end
end
end