I was profiling an ode solver today and found something very strange. The system has 9 states and the computation of the ode RHS was taking about 100 times longer to compute the 5th state than any other state.
The 5th state equation was longer than the other equations, but not so drastically longer that it should take 100 times longer to compute. It ran over 4 lines in the editor so there were "..." continuations at the end of each line.
I rewrote the equation using multiple assignments instead of one big assignment and got about a 50 times speed up. By this, I mean I changed
value(5, :) = big_expression1 ... + big_expression2 ... + big_expression3 ... + big_expression4;
value(5,:) = big_expression1; value(5,:) = value(5,:) + big_expression2; value(5,:) = value(5,:) + big_expression3; value(5,:) = value(5,:) + big_expression4;
Can you think of any reason why the second version should drastically outperform the first?
I don't think it has anything to do with "...". Shot in the dark:
Cache locality? Maybe big_expression(n) and big_expression(n+1) fit together in the cache whereas when you call them all at the same time values need to be moved around in memory.