MATLAB Answers

per isakson
3

'ABC'-'A' Considered Harmful

Asked by per isakson
on 5 May 2012
Latest activity Edited by per isakson
on 18 Jul 2016

The trick 'ABC'-'A' is that good programming style?

--- edit ---

"tag: goto" alludes to Edsger Dijkstra's famous letter ;-)

"tag: answer" is a trace of the reason why I submitted the question. I've seen questions by "new to Matlab", which have received answers including not so obvious code with "'ABC'-'A'" embedded for no good reason.

  3 Comments

Jan Simon
on 5 May 2012

The tag "cody" is a revelation. But what do "answer" and "goto" mean in this context?

I have probably used this kind of expression from time to time in replying to people who might not be experienced in MATLAB.

In the cases where I was not adding or subtracting '0', the situation was likely one in which the lack of "good programming style" was intentional, such as cases where code was provided to prove that the task could be done, but where it "felt" likely to me that if plain code had been provided, the user would have copied the plain code for their assignment without attempting to understand it -- cases where if they blindly copied the more obscure code, any marker who actually read the code would immediately know that the person did not write the expression themselves.

@Walter, the hidden message to the teacher is a reason.

Products

No products are associated with this question.

5 Answers

Answer by Walter Roberson
on 5 May 2012
 Accepted answer

Adding or subtracting '0' is the most efficient method of converting between decimal-coded binary and character-coded binary.

Subtracting 'A' or 'a' (and then adding 10) is a well known and efficient conversion from character-encoded hexadecimal to binary.

Adding or subtracting ' ' (space) is used often in base64 encoding/decoding (e.g., MIME) [though you do need to special-case that binary 0 is coded as period instead of as space)

Adding or subtracting 32 used to be very common magic for converting between upper and lower-case ASCII. So common that it became a problem when dealing with EBCDIC and then later with ISO-8896-* and UNICODE. So common that this bug was hard to find, because programmers would read the 32, know that it was upper/lower case conversion, and then be puzzled that letters weren't being converted properly...

The characters '1' through '9' have been in consecutive coding positions since the ITA2 code of 1930. Any program that is not required to work with Baudot or Murray or older codes may assume that for a fact. Any program written the ASCII / ANSI / ISO / UNICODE line may assume that upper-case "Latin" (English) characters are consecutive, and that the lower-case "Latin" (English) characters are consecutive: this is a fundamental standardization no worse than assuming that all of the MATLAB operator characters are present in the character set. As best I know, MATLAB has never been supported on any EBCDIC-based system on which the assumption is not true.

  0 Comments


Answer by per isakson
on 13 Aug 2012
Edited by per isakson
on 18 Jul 2016

To explicitly convert to numeric before doing arithmetic is faster. (Real reason: I find 'abc'-'a' confusing.:)

Try

>> [ t1, t2 ] = cssm( 1e5 )
n1==n2 is true
t1 =
    0.3252
t2 =
    0.0838
>>     

where

    function [ t1, t2 ] = cssm( N )
        str = char(49:120);
        id1 = tic;
        for ii = 1 : N 
           n1  = str - 'A'; 
        end
        t1  = toc( id1 );
        id2 = tic;
        for ii = 1 : N 
           n2  = double( str ) - double('A'); 
        end
        t2  = toc( id2 );
        if all( n1 == n2 )
            disp( 'n1==n2 is true' )
        else
            disp( 'n1==n2 is false' )
        end
    end

 

2016-07-18, Rerun of the test with R2016a. The first run was done with R2011a(?) and on the same old vanilla desktop.

>> [ t1, t2 ] = cssm( 1e5 )
n1==n2 is true
t1 =
    0.0327
t2 =
    0.0214
>> 

The new "JIT-engine" seems to be more efficient in this case. And the effect of using double is smaller.

 

Testing is tricky! In contrary to Jammes Tursa I see some advantage of the double() at the command line.

>> S = char('A'+floor(rand(1,1e7)*25));
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.137705 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.125812 seconds.
>> 
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.082686 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.081962 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.078693 seconds.

  1 Comment

Interesting result. This appears to be the work of the parser optimizing stuff. E.g., from the command line:

>> S = char('A'+floor(rand(1,1e7)*25));
>> 
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.068686 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.068900 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.064843 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.059291 seconds.
>> clear a; tic; a = S - '0'; toc
Elapsed time is 0.070644 seconds.
>> 
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.075243 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.067354 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.077742 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.073420 seconds.
>> clear a; tic; a = double(S) - double('0'); toc
Elapsed time is 0.078540 seconds.

So at the command line the advantage of the double( ) disappears. Maybe there is some compiled code that MATLAB is using in the double( ) case that the parser has available, and no such compiled code exists for the character minus case.

Since this result seems to be the result of optimized code that the parser is able to use, I would not be surprised if this result varied quite a bit between MATLAB versions.

E.g., a simple mex routine result shows that the raw S - '0' calculation could be significantly improved with compiled code:

>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.044921 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.043699 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.046204 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.051441 seconds.
>> clear a; tic; a = char_minus(S,'0'); toc
Elapsed time is 0.058610 seconds.

The mex routine (bare bones, not production quality):

#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mwSize i, n;
    mwSize *dims;
    mwSize ndim;
    mxChar *cp;
    mxChar c;
    double *pr;
      n = mxGetNumberOfElements(prhs[0]);
      dims = mxGetDimensions(prhs[0]);
      ndim = mxGetNumberOfDimensions(prhs[0]);
      cp = mxGetChars(prhs[0]);
      c = *mxGetChars(prhs[1]);
      plhs[0] = mxCreateNumericArray(ndim, dims, mxDOUBLE_CLASS, mxREAL);
      pr = mxGetPr(plhs[0]);
      while( n-- ) {
          *pr++ = *cp++ - c;
      }
  }

Answer by Jan Simon
on 5 May 2012

It depends. The result is clear and well defined, but not obvious. If you store large arrays in an M-file, char occupies less memory in the RAM than double arrays. But storing large data sets in M-files is a bad programming style already, because this mixes data and program.

I use 'abc' - 'a' only to encode icons in M-files, because it allows a vague view of the result.

      color = ['CCCCCHFFHCCCCC'; ...
               'CDFNBFFFFFFFDC'; ...
               'DPGGGGGGGGGGBH'; ...
               'DPDMMMMOOAADPD'; ...
               'DFFFNFFFFFFFIH'; ...
               'CILLKJGEKNGEIC'; ...
               'CILBKJLGKFNKIC'; ...
               'CILBKJLGKFIKIC'; ...
               'CILBKJLGKFIKIC'; ...
               'CDLBKJLGKFIKDC'; ...
               'CDLBKJLGKFIKDC'; ...
               'CDLBKJLGKFIKDC'; ...
               'CDLBKJLGKFNKDC'; ...
               'CDLLKJGEKNGEDC'; ...
               'CHBGEEEGGLPNHC'; ...
               'CMDDIIDDDDDHMC'] - ('A' - 1);
         map = [28,  26,  36; ...
               116, 118, 132; ...
               NaN, NaN, NaN; ...
               73,   74,  89; ...
               177, 176, 193; ...
               96,   98, 112; ...
               151, 152, 167; ...
               60,   63,  76; ...
               84,   84,  96; ...
               220, 222, 236; ...
               191, 193, 206; ...
               135, 133, 143; ...
               39,   41,  55; ...
               108, 107, 118; ...
               36,   34,  44; ...
               124, 126, 140];
[x, y] = size(color);
Icon   = reshape(map(color, :) / 255, x, y, 3);
uicontrol('Position', [10, 10, 32, 32], 'CData', Icon);

This is, in my opinion, the best way to store an icon in an M-file. But icons can be stored and edited much more comfortable in graphic files.

  6 Comments

I agree fully regarding the value of extra parenthesis to improve readability. [a, b] I never omit that comma - no way. However, I rely on new line in arrays. I don't use "; ..." as in your example above and I don't use comma after functions, which don't return values. I write plot(t,x) and load(file_spec) without trailing comma. Haven't thought too much about why.

Geoff
on 10 May 2012

If MatLab used a single backslash for line continuation, I'd probably do it more often. =) I find the typing-in of three characters ...

on ...

every ...

line ...

quite ...

enervating.

Jan Simon
on 10 May 2012

Btw., "..." does not only continuate the line, but starts a comment also. There is no need for an additional % and this is even documented.


Answer by Daniel Shub
on 5 May 2012

I beleive that coding styles that sacrifice readibility for efficiency are generally bad style. It is possible that under some circumstances the gain in efficiency can offset the loss in readability. For example, in MATLAB loops used to be so slow that that we had to sacrifice readability for performance all the time by vectorizing everything. Thankfully that is not the case anymore.

  0 Comments


Answer by Daniel Shub
on 9 May 2012

I asked a similar question, although not identical by any means, a while back:

http://www.mathworks.com/matlabcentral/answers/31888-123-0-vs-1-2-3-and-mtree

  0 Comments


Join the 15-year community celebration.

Play games and win prizes!

Learn more
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

MATLAB Academy

New to MATLAB?

Learn MATLAB today!