Tips & Tricks
Follow


John D'Errico

Using MATLAB to find a generative equation for a sequence

John D'Errico on 31 Jul 2024 (Edited on 5 Aug 2024)
Latest activity Reply by Paul on 10 Aug 2024

This stems purely from some play on my part. Suppose I asked you to work with the sequence formed as 2*n*F_n + 1, where F_n is the n'th Fibonacci number? Part of me would not be surprised to find there is nothing simple we could do. But, then it costs nothing to try, to see where MATLAB can take me in an explorative sense.
n = sym(0:100).';
Fn = fibonacci(n);
Sn = 2*n.*Fn + 1;
Sn(1:10) % A few elements
ans = 
For kicks, I tried asking ChatGPT. Giving it nothing more than the first 20 members of thse sequence as integers, it decided this is a Perrin sequence, and gave me a recurrence relation, but one that is in fact incorrect. Good effort from the Ai, but a fail in the end.
Is there anything I can do? Try null! (Look carefully at the array generated by Toeplitz. It is at least a pretty way to generate the matrix I needed.)
X = toeplitz(Sn,[1,zeros(1,4)]);
rank(X(5:end,:))
ans = 5
Hmm. So there is no linear combination of those columns that yields all zeros, since the resulting matrix was full rank.
X = toeplitz(Sn,[1,zeros(1,5)]);
rank(X(6:end,:))
ans = 5
But if I take it one step further, we see the above matrix is now rank deficient. What does that tell me? It says there is some simple linear combination of the columns of X(6:end,:) that always yields zero. The previous test tells me there is no shorter constant coefficient recurrence releation, using fewer terms.
null(X(6:end,:))
ans = 
Let me explain what those coefficients tell me. In fact, they yield a very nice recurrence relation for the sequence S_n, not unlike the original Fibonacci sequence it was based upon.
S(n+1) = 3*S(n) - S_(n-1) - 3*S(n-2) + S(n-3) + S(n-4)
where the first 5 members of that sequence are given as [1 3 5 13 25]. So a 6 term linear constant coefficient recurrence relation. If it reminds you of the generating relation for the Fibonacci sequence, that is good, because it should. (Remember I started the sequence at n==0, IF you decide to test it out.) We can test it out, like this:
SfunM = memoize(@(N) Sfun(N));
SfunM(25)
ans = 3751251
2*25*fibonacci(sym(25)) + 1
ans = 
3751251
And indeed, it works as expected.
function Sn = Sfun(n)
switch n
case 0
Sn = 1;
case 1
Sn = 3;
case 2
Sn = 5;
case 3
Sn = 13;
case 4
Sn = 25;
otherwise
Sn = Sfun(n-5) + Sfun(n-4) - 3*Sfun(n-3) - Sfun(n-2) +3*Sfun(n-1);
end
end
A beauty of this, is I started from nothing but a sequence of integers, derived from an expression where I had no rational expectation of finding a formula, and out drops something pretty. I might call this explorational mathematics.
The next step of course is to go in the other direction. That is, given the derived recurrence relation, if I substitute the formula for S_n in terms of the Fibonacci numbers, can I prove it is valid in general? (Yes.) After all, without some proof, it may fail for n larger than 100. (I'm not sure how much I can cram into a single discussion, so I'll stop at this point for now. If I see interest in the ideas here, I can proceed further. For example, what was I doing with that sequence in the first place? And of course, can I prove the relation is valid? Can I do so using MATLAB?)
(I'll be honest, starting from scratch, I'm not sure it would have been obvious to find that relation, so null was hugely useful here.)
Paul
Paul on 10 Aug 2024
Similar approach using hankel.
n = sym(0:100).';
Fn = fibonacci(n);
Sn = 2*n.*Fn + 1;
null(Sn((hankel(1:6,6:11))))
ans = 
Also, there are an infinite number of higher order difference equations that can generate this sequence (though I'd imagine the lowest order equation is of most interest). For example
c = null(Sn((hankel(1:7,7:13))))
c = 
The first column of coefficents is what we already had, and the second defines a 6th order difference equations that can generate Sn. The equations can be linearly combined as well.
s = @(n) Sn(n);
eq1 = @(n) c(1:end-1,1).'*s(n:n+5); % first equation
eq2 = @(n) c(1:end,2).'*s(n:n+6); % second equation
eq3 = @(n) 5*eq1(n) + 6*eq2(n); % linear combination
% test
[eq1(23) eq2(23) eq3(23)]
ans = 
[eq1(84) eq2(84) eq3(84)]
ans = 
Paul
Paul on 8 Aug 2024 (Edited on 8 Aug 2024)
I can post an alternative approach to deriving the difference equation for s(n) and a closed-form expression for s(n) if appropriate for this discussion ....
John D'Errico
John D'Errico on 8 Aug 2024
I'd certainly be interested in seeing an alternative. My personal interest in this was merely the pretty way it drops out, a bit unexpectedly, with so little effort invested on my part.
Paul
Paul on 8 Aug 2024
Difference equation that defines s(n) in terms of f(n), where f(n) is the Fibonacci series.
syms n integer
syms f(n) s(n)
eqn = s(n) == 2*n*f(n) + 1;
Find the z-transform of s(n), defined in terms of the F(z), the z-transform of f(n)
eqn = ztrans(eqn);
syms z S(z) F(z)
eqn = subs(eqn,[ztrans(s(n),n,z),ztrans(f(n),n,z)],[S(z),F(z)]);
S(z) = rhs(eqn)
S(z) = 
Find F(z), use the IC's of f(n)
eqn = f(n+2) == f(n) + f(n+1);
eqn = ztrans(eqn,n,z);
eqn = subs(eqn,ztrans(f(n),n,z),F(z));
F(z) = simplify(rhs(isolate(eqn,F(z))));
F(z) = subs(F(z),[f(0) f(1)],[0 1])
F(z) = 
Sub F(z) into the definition of S(z) and simplify
S(z) = subs(S(z));
S(z) = simplify(S(z));
[num,den] = numden(S(z));
S(z) = expand(num)/expand(den)
S(z) = 
Rant: Matlab's insistence on putting negative signs on the leading terms of the numerator and denominator polynomials makes my eyes hurt.
The coefficients of the denominator of S(z) immediately tells us that s(n) satisfies:
s(n+5) - 3*s(n+4) + s(n+3) + 3*s(n+2) - s(n+1) - s(n) = 0
The initial conditions of this difference equation that would be needed to iteratively generate the solution can be found by (or from just iterating the first few terms of the equation for s(n) in terms of f(n)).
cden = coeffs(den,z,'all');
cnum = coeffs(num,z,'all');
s0 = tril(toeplitz(cden(1:5))) \ cnum(1:5).'
s0 = 
Amazingly, iztrans finds a result even with that fifth order denominator, though the result looks kind of weird with the asin(1j/2) term
s(n) = simplify(iztrans(S(z),z,n))
s(n) = 
Can simplify to an expression that has all real terms
assume(n >= 0)
s(n) = simplify(real(rewrite(s(n),'exp')),100);
[num,den] = numden(s(n));
s(n) = simplify(num)/den
s(n) = 
Verify (first 101 terms)
n = sym(0:100).';
Fn = fibonacci(n);
Sn = 2*n.*Fn + 1;
sn = simplify(s(n),100);
all(isAlways(Sn == sn))
ans = logical
1
John D'Errico
John D'Errico on 9 Aug 2024
Very pretty. I enjoy seeing how fundamentally different approaches to a problem still arise at the same conclusion.
John D'Errico
John D'Errico on 2 Aug 2024 (Edited on 2 Aug 2024)
As @Adam Danz said, "There are several MATLAB functions here that I don't often cross paths with, so I appreciate the reminders and introductions!"
One of the things I hope to do in this sequence of discussions I plan to post, is to teach more people about some nifty things we can do In MATLAB, sometimes using tools you have never seen, or never thought of in some context, and maybe using MATLAB on problems you never thought possible.
In this post, for example, we see a tool from linear algebra like Toeplitz used to create a matrix, and then rank and null applied, all in the context of what is really a number theoretic problem. Personally, this is something I love about mathematics, that sometimes different branches of mathematics can come together in surprising ways.
Adam Danz
Adam Danz on 2 Aug 2024
Reminder to readers that you can follow contributors by clicking "follow" at the top of thier profile!
John D'Errico
John D'Errico on 1 Aug 2024 (Edited on 1 Aug 2024)
Yep. In fact, it will have faster growth than any polynomial function of n, since we know the Fibonacci numbers grow exponentially, if I recall as O(phi^n/sqrt(5)) where phi comes from the golden ratio.
phi = (sqrt(5) + 1)/2
phi = 1.6180
And any exponential function will eventually outpace any polynomial. So you know no polynomial can approximate it.
By that logic, we can even find a good approximation for the sequence, in terms of phi, valid for at least larger values of n.
n = (50:60).';
Sn = 2*n.*double(fibonacci(sym(n))) + 1;
Snhat = 2*n.*phi.^n/sqrt(5) + 1;
format long g
[Sn,Snhat]
ans = 11x2
1.0e+00 * 1258626902501 1258626902501 2077231129549 2077231129549 3426933130297 3426933130297.01 5651526864339 5651526864339.01 9316897697377 9316897697377.02 15354224868951 15354224868951 25295360576305 25295360576305 41659623762469 41659623762469.1 68589260665965 68589260665965.1 112893199072839 112893199072839
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
I'm almost a little surprised Alpha did not try a model of that form.
Adam Danz
Adam Danz on 1 Aug 2024
Great article, @John D'Errico! There are several MATLAB functions here that I don't often cross paths with, so I appreciate the reminders and introductions!
Out of curiosity, I asked Wolfram Alpha, "What sequence is this? [1, 3, 5, 13, 25, 51, 97, 183, 337, 613]". It generated a closed-form equation, which I translated into MATLAB as follows:
f = @(n)(47*n^8 - 2000*n^7 + 36596*n^6 - 366758*n^5 + 2184833*n^4 - 7816130*n^3 + 16124124*n^2 - 17294832*n + 7000560)/(2520*(13*n - 66));
I didn't bother vectorizing it, so I tested it in a loop. The results were as expected for the training set
for i = 1:10
fprintf('%g ',f(i))
end
1 3 5 13 25 51 97 183 337 613
However, when testing it beyond the initial set:
for i = 11:15
fprintf('%g ',f(i))
end
1125.94 2113 4028.57 7679.97 14411.2
The results did not match the expected values (1101, 1959, 3457, 6059, 10557). This isn't surprising with fitted polynomials.
John D'Errico
John D'Errico on 1 Aug 2024
The obvious question is, can I prove in MATLAB, what I was able to derive from data? That is, I can assert that the recurrence relation I derived is valid for n up to 100. But is it always valid? Logically, it seems likely true, but is it ALWAYS valid?
What can I do here? First, reduce everything to Fibonacci numbers. That is, rewrite the recurrence relation into the same relation, but in terms of the Fibonacci sequence. Given the relation:
S(n) = 2*n*F(n) + 1
and the recurrence:
S(n) = S(n-5) + S(n-4) - 3*S(n-3) - S(n-2) +3*S(n-1)
First, we can simpy toss away the additive constant, since the coefficients in the recurrence relation sum to zero. If it is valid, the recurrence relation would be equally valid for
S(n) = a*n*F(n) + b
where a and b are ANY finite constants! Remember, this is a LINEAR recurrence. As such, I'll arbitrarily choose a=1, and b=0, only to make things easier to follow below.
Now rewrite the recurrence as:
-n*F(n) + 3*(n-1)*F(n-1) - (n-2)*F(n-2) - 3*(n-3)*F(n-3) + (n-4)*F(n-4) + (n-5)*F(n-5) = 0
Finally, convert this into matrix form, where I will store only the coefficients of the Fibonacci numbers. That will look like this:
syms n
A = [[-n, 3*(n-1), -(n-2), -3*(n-3), (n-4), (n-5)];toeplitz([1;0;0;0],[1 -1 -1 0 0 0])]
A = 
Look carefully at the matrix A. You should see that rows 2 through 5 represent the basic Fibonacci recurrence. These rows merely reflect that any Fibonacci number can always be written as the sum of the previous two Fibonacci numbers.
The first row is the essence of the recurrence. Again, I've chosen to discard the additive and multiplicative constants to make it easier to read. But that changes nothing in any substantial way.
Next, comput the rank of the subarray, composed of rows 2 through 5,
rank(A(2:5,:))
ans = 4
It tells us no linear combination of those rows will yield a zero vector. But if we compute the rank of the entire matrix A
rank(A)
ans = 4
The rank is still 4. We learn by appending the first row, it can be represented as a sum of the other 4 rows, since the overall rank is still 4.
The conclusion is, the linear recurrence relation for S_n is valid, for any n.
goc3
goc3 on 1 Aug 2024
I made a few changes to the first row in A (the symbolic coefficients) one at a time to show that any such change increased the rank from 4 to 5. Each time that held true, thereby disproving each modified coefficient vector.
This is an insightful perspective. Thanks.
goc3
goc3 on 31 Jul 2024 (Edited on 31 Jul 2024)
I tinkered with some of the provided code. Rather than calculate a specific value in the sequence, I wrote functions to return the entire sequence up to the specified integer, n (1:n). I observed the following:
  1. Using symbolic variables is about three orders of magnitude slower than using regular doubles (tested for n = 12). I am curious why you chose to use them. Given their slowness, I commented that out for subsequent testing.
  2. Memoized functions appear to not work with feval, which I used in a driver function to run each variation as a test. (The feval error stated that the memoized function was unrecognized.)
  3. Using a vectorized version of the function is significantly faster than the recursive Sfun. For values of n = 20, 25, and 30, the speed-up factor was about 10, 100, and 2000+, and Sfun runtimes were on the order of 0.01s, 0.1s, and 2–3s, respectively. Testing values beyond n = 30 for the recursive function was impractical. On the other hand, the vectorized function (which is included below) consistently completed on the order of 0.001s, even for n = 100. (Preallocation of Sn for larger values of n may slightly improve this function, though n is never expected to be very large.)
function [Sn] = Sfun_vectorized(n)
Sn = [1, 3, 5, 13, 25];
if n < 5
Sn = Sn(1:(n+1));
else
coeff = [1; 1; -3; -1; 3];
for ii = 5:n
Sn(end+1) = Sn((end-4):end) * coeff; %#ok<AGROW>
end
end
end
John D'Errico
John D'Errico on 1 Aug 2024 (Edited on 1 Aug 2024)
I'll admit, Sfun is not fast, and memoizing did not really do much. However, it is not the real thrust of the post, just there to show the recurrence relation is working. I should have done a better job there though, so thanks for greatly improving it.