MATLAB Answers

Chad Greene
3

Why not use square brackets?

Asked by Chad Greene
on 16 Apr 2012
Latest activity Edited by Stephen Cobeldick on 20 Jun 2019
Accepted Answer by Jan
Why does Matlab suggest that I shouldn't use square brackets unless absolutely necessary? For example, if I type
x = [1:10];
the default code-checking feature suggests "Use of brackets [] is unnecessary. Use parentheses to group, if needed." I realize that the square brackets here are not necessary, but is there some cost to including them? Are parentheses more computationally efficient?

  1 Comment

The best reason for not using square brackets like that: they don't do anything in this situation. It is easy to fill code with plenty of other useless additions that don't do anything, but making code more complex than it needs to be just makes code harder to understand, debug, and maintain:
V = [+(+([+1])):[-(-9)]]
vs. simply
V = 1:9
Don't add anything to your code that does not serve a purpose.

Sign in to comment.

6 Answers

Jan
Answer by Jan
on 26 Jan 2017
Edited by Jan
on 26 Jan 2017
 Accepted Answer

The square brackets impede the boundary checks when the vector is used for indexing:
x = zeros(1, 10000);
tic
for k = 1:10000
x(1:k) = k;
end
toc
x = zeros(1, 10000);
tic
for k = 1:10000
x([1:k]) = k; % <- [...] added
end
toc
tic
for k = 1:10000
v = 1:k;
x(v) = k;
end
toc
Elapsed time is 0.087256 seconds.
Elapsed time is 0.374184 seconds. !!! Factor 4 !!!
Elapsed time is 0.375447 seconds.
When Matlab accesses an array element, it has to check, if the index is inside the allowed range: > 0, < length, integer. It looks like Matlab performs this in x(1:k) only for 1 and k, while in x([1:k]) this test is applied to each element as in the 3rd case x(v).
A similar effect occurres for logical indexing: Matlab has to check only the size of the index vector once, not for each element.
tic
Lv = false(size(x));
for k = 1:10000
Lv(k) = true;
x(Lv) = k;
end
toc
Elapsed time is 0.176343 seconds.
It is plausible that this is slower than x(1:k), because the values of the mask Lv have to be considered. But it is much faster than x([1:k]) or the equivalent usage of an index vector.

  0 Comments

Sign in to comment.


Answer by Richard Brown on 19 Apr 2012

OK, I looked into this a little more rigorously. I thought I'd test James Tursa's suggestion from my previous answer to see if the order of the tests is actually important. So I did 500 repetitions of computing 1:100 100,000 times, with and without enclosing square brackets. I did the experiment twice, once with the square brackets first, once with them second. I performed a two-tailed paired t-test testing for a difference of the mean(t1 - t2) from zero.
n = 100000;
N = 500;
t1 = zeros(1, N);
t2 = zeros(1, N);
for k = 1:N
tic
for i = 1:n
A = [1:100];
end
t1(k) = toc;
tic
for i = 1:n
A = 1:100;
end
t2(k) = toc;
end
t = mean(t1 - t2) / (std(t1 - t2) / sqrt(N));
For the brackets first experiment I got a t-value of 1.18, and brackets second got a t-value of -0.3. The p-value for 95% significance in both cases is +/- 2.2, so in neither case is there a statistically significant difference between brackets and no brackets.
EDIT as Jan Simon points out in this thread http://www.mathworks.com/matlabcentral/answers/35972-how-to-best-time-differences-between-function-implementations A should be cleared each iteration. This makes an enormous difference - suddenly the version with the brackets is around half the speed (t values around 300) as without (and n needs to come down by two orders of magnitude). The JIT compiler had obviously recognised that it only had to define A once! I'll leave the original code as otherwise things will get confusing!
So, m-lint is correct!
SECOND EDIT
See my comment below for more comments.

  6 Comments

Daniel Shub
on 19 Apr 2012
@Oleg I don't think the overhead of the loop and 1:100 matters since it should cancel when the difference is taken. Variability in the overhead of the loop and 1:100 could be obscuring differences in the timing of []. As for the assumption of normality and the stats, I agree they could be better, but I have yet to see a better approach on Answers. How would you time the difference assess it for signifigance? I think I will ask this as a new question.
Oleg Komarov on 20 Apr 2012
I wasn't clear about the overhead of the loop and the 1:100 but Daniel got it. Variability in the overheads might be greater than [] effect. In fact, this is what I get, a huge change in the t-ratios.
Also, like with financial data, the lower the frequency (monthly, quarterly data) the closer to normality...i.e. if you time 500 times the sum of times (1:n) instead of timing 'n' times it does matter for the distribution of t1, t2.
Richard Brown on 20 Apr 2012
Hi Oleg et. al. This problem is getting more and more tricky! Comments:
Firstly, on my system I get no difference at all between using 1:2 and 1:100, the mean simulation time is the same (in the first s.f. at least). I think 1:100 and 1:2 basically have the same overhead - which presumably is the cost of the call to ':'. It is possible that this call has more variability than the call to [], but there's very little we can do to control or measure that. And if that is in fact the case, then the m-lint message is unnecessary. The m-lint message is implying that the cost of the call to [] is significant compared with the cost of ':'.
Secondly, it's essential to clear A after the call to [1:2] or 1:2 -- the JIT optimises away all subsequent calls if you don't do this, so my initial results were not relevant. Essentially it was timing a loop full of no-ops.
Thirdly, if I have no semicolon! on the t = ... line, then I get large t-values. If I have a semicolon, then I again get t-values of the order of 1 or less. Not sure what the deal is there. I also observe differences in behaviour between my 64 bit Win7 and Ubuntu installs.
So it is difficult to disentangle these results from the behaviour of the JIT compiler, and presumably the calls to tic and toc.
@Oleg, this is not like financial data. The point I think that you are making is that samples close together in time are correlated, so to get approximately independent samples you need to sample less frequently. These samples should be largely independent, although this assumes a uniform system load during the simulation time (which I'm not going to bother to try to control).
The inner n needs to be large so that the central limit theorem applies - the mean (and hence sum) of the 'n' iterations should be pretty close to normally distributed, and so the distribution of t1 and t2 should be very close to normal, making a paired t-test appropriate.
Conclusions? Not quite sure. I think that there is very little difference between [] and not, but it's pretty dependent on the JIT.

Sign in to comment.


Jan
Answer by Jan
on 16 Apr 2012

1:100 is a vector already. Addition square brackets joins all elements to a vector. And this does not change anything on the data, but needs time.
If a grouping is required, parenthesis are more efficient, because the do not cost any runtime. They are considered during parsing the M-file.
[EDITED]:
The square brackets could be overloaded, if the contents contains user-defined objects. Therefore the JIT should hesitate to "optimize them away". Imagine:
a = 3; b = 10;
for i = 1:100
if rand > 0.5
eval('b = myStrangeUserDefinedObject'); % Don't do this!
end
v = [a:b]; % ? Is [.] overloaded now ?
end
Another idea: The runtime difference is small, but measurable, it might vanish with some JIT versions. But the code is cleaner and possibly easier to debug, if the unnecessary brackets are omitted. Compare:
x = a:b; % Obviously clean
y = [a:b]; % Obviously or not?!
z = [[a:b]]; % Obviously messy
The last line forces the reader to think twice and some doubts will remain, while the first line is perfectly clear. The intermediate case should catch the attention also, therefore I prefer "a:b" for reasons of simplicity.

  4 Comments

Show 1 older comment
That's what I thought would happen too (given that mlint can spot it) -- the timing results surprised me.
James Tursa
on 17 Apr 2012
Maybe try the loops in reverse order (no [ ] first, then [ ]) and see if t1-t2 is always negative to support that it is in fact the brackets making the difference and not just the order of the loops.
eval should quite simply be deprecated. For the record, my preferences are the same. I really don't like:
[a:b]
If you must group them for visual effect, use parentheses. [] should only be used for concatenations.

Sign in to comment.


Jan
Answer by Jan
on 9 Aug 2017

Using square brackets, because they look matlabish and have anything to do with vectors, is Cargo Cult Programming, see https://en.wikipedia.org/wiki/Cargo_cult_programming. [ ] is the concatenation operator and nothing else and therefore corresponding warnings appear in the editor.
While the overhead for this call is really tiny, it is valuable and important to be aware of Cargo Cults and to clean up the programming techniques. See also: Wiki: Programming Anti-Patterns.

  0 Comments

Sign in to comment.


Answer by Richard Brown on 16 Apr 2012

A quick test reveals that there is a small cost to including them
tic
for i = 1:1000000
A = [1:100];
end
t1 = toc;
tic
for i = 1:1000000
A = 1:100;
end
t2 = toc;
disp(t1 - t2)
The displayed number is always positive. My guess is that when it sees the brackets it needs to determine whether a call to horzcat or vertcat is required.

  0 Comments

Sign in to comment.


Jan
Answer by Jan
on 21 Aug 2017
Edited by Jan
on 21 Aug 2017

An equivalent effect occurs in for loops:
v = 1:1e5;
r = zeros(1, 1e5);
tic;
for loop = 1:1000
for k = v % Loop over pre-defined vector
r(k) = k;
end
end
toc
r = zeros(1, 1e5);
tic;
for loop = 1:1000
for k = 1:1e5 % Loop over vector defined by limits
r(k) = k;
end
end
toc
Elapsed time is 3.304159 seconds.
Elapsed time is 0.700051 seconds.
It seems like the JIT can handle for k=a:b much more efficiently. The advantage is even higher, if the time to create v=1:1e5 is considered in addition.

  0 Comments

Sign in to comment.