2 views (last 30 days)

The trick 'ABC'-'A' is that good programming style?

--- edit ---

"tag: answer" is a trace of the reason why I submitted the question. I've seen questions by "new to Matlab", which have received answers including not so obvious code with "'ABC'-'A'" embedded for no good reason.

Walter Roberson
on 5 May 2012

Adding or subtracting '0' is the most efficient method of converting between decimal-coded binary and character-coded binary.

Subtracting 'A' or 'a' (and then adding 10) is a well known and efficient conversion from character-encoded hexadecimal to binary.

Adding or subtracting ' ' (space) is used often in base64 encoding/decoding (e.g., MIME) [though you do need to special-case that binary 0 is coded as period instead of as space)

Adding or subtracting 32 used to be very common magic for converting between upper and lower-case ASCII. So common that it became a problem when dealing with EBCDIC and then later with ISO-8896-* and UNICODE. So common that this bug was hard to find, because programmers would read the 32, know that it was upper/lower case conversion, and then be puzzled that letters weren't being converted properly...

The characters '1' through '9' have been in consecutive coding positions since the ITA2 code of 1930. Any program that is not required to work with Baudot or Murray or older codes may assume that for a fact. Any program written the ASCII / ANSI / ISO / UNICODE line may assume that upper-case "Latin" (English) characters are consecutive, and that the lower-case "Latin" (English) characters are consecutive: this is a fundamental standardization no worse than assuming that all of the MATLAB operator characters are present in the character set. As best I know, MATLAB has never been supported on any EBCDIC-based system on which the assumption is not true.

Walter Roberson
on 9 Jun 2018

A few weeks ago I was helping someone learn C for a Harvard online course. Some of the early exercises involved ciphers (such as Caeser Cipher) and later involved translation of musical note letters (note and octave) into frequencies.

It turned out to be surprisingly difficult to get the person to retain the idea of computing relative position by subtracting the first member of an ordered sequence.

On the other hand, it would have been difficult to teach a beginner the idea of indexing a mostly-unpopulated matrix by a character. It would have been ridiculous to have them test for equality with each alphabetic character individually. The only realistic implementations within reach for the person were subtracting the base character, or looping comparing against a reference vector of character to extract out the index of the match.

Now that they have retained the idea of finding relative position, that is a efficient general technique they can apply in future programming in many situations; looping comparing against possibilities known to be consecutive is not, I would say, any more "clean" than subtracting the base.

per isakson
on 13 Aug 2012

Edited: per isakson
on 18 Jul 2016

James Tursa
on 17 Jul 2016

Interesting result. This appears to be the work of the parser optimizing stuff. E.g., from the command line:

>> S = char('A'+floor(rand(1,1e7)*25));

>>

>> clear a; tic; a = S - '0'; toc

Elapsed time is 0.068686 seconds.

>> clear a; tic; a = S - '0'; toc

Elapsed time is 0.068900 seconds.

>> clear a; tic; a = S - '0'; toc

Elapsed time is 0.064843 seconds.

>> clear a; tic; a = S - '0'; toc

Elapsed time is 0.059291 seconds.

>> clear a; tic; a = S - '0'; toc

Elapsed time is 0.070644 seconds.

>>

>> clear a; tic; a = double(S) - double('0'); toc

Elapsed time is 0.075243 seconds.

>> clear a; tic; a = double(S) - double('0'); toc

Elapsed time is 0.067354 seconds.

>> clear a; tic; a = double(S) - double('0'); toc

Elapsed time is 0.077742 seconds.

>> clear a; tic; a = double(S) - double('0'); toc

Elapsed time is 0.073420 seconds.

>> clear a; tic; a = double(S) - double('0'); toc

Elapsed time is 0.078540 seconds.

So at the command line the advantage of the double( ) disappears. Maybe there is some compiled code that MATLAB is using in the double( ) case that the parser has available, and no such compiled code exists for the character minus case.

Since this result seems to be the result of optimized code that the parser is able to use, I would not be surprised if this result varied quite a bit between MATLAB versions.

E.g., a simple mex routine result shows that the raw S - '0' calculation could be significantly improved with compiled code:

>> clear a; tic; a = char_minus(S,'0'); toc

Elapsed time is 0.044921 seconds.

>> clear a; tic; a = char_minus(S,'0'); toc

Elapsed time is 0.043699 seconds.

>> clear a; tic; a = char_minus(S,'0'); toc

Elapsed time is 0.046204 seconds.

>> clear a; tic; a = char_minus(S,'0'); toc

Elapsed time is 0.051441 seconds.

>> clear a; tic; a = char_minus(S,'0'); toc

Elapsed time is 0.058610 seconds.

The mex routine (bare bones, not production quality):

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])

{

mwSize i, n;

mwSize *dims;

mwSize ndim;

mxChar *cp;

mxChar c;

double *pr;

n = mxGetNumberOfElements(prhs[0]);

dims = mxGetDimensions(prhs[0]);

ndim = mxGetNumberOfDimensions(prhs[0]);

cp = mxGetChars(prhs[0]);

c = *mxGetChars(prhs[1]);

plhs[0] = mxCreateNumericArray(ndim, dims, mxDOUBLE_CLASS, mxREAL);

pr = mxGetPr(plhs[0]);

while( n-- ) {

*pr++ = *cp++ - c;

}

}

Jan
on 5 May 2012

It depends. The result is clear and well defined, but not obvious. If you store large arrays in an M-file, char occupies less memory in the RAM than double arrays. But storing large data sets in M-files is a bad programming style already, because this mixes data and program.

I use 'abc' - 'a' only to encode icons in M-files, because it allows a vague view of the result.

color = ['CCCCCHFFHCCCCC'; ...

'CDFNBFFFFFFFDC'; ...

'DPGGGGGGGGGGBH'; ...

'DPDMMMMOOAADPD'; ...

'DFFFNFFFFFFFIH'; ...

'CILLKJGEKNGEIC'; ...

'CILBKJLGKFNKIC'; ...

'CILBKJLGKFIKIC'; ...

'CILBKJLGKFIKIC'; ...

'CDLBKJLGKFIKDC'; ...

'CDLBKJLGKFIKDC'; ...

'CDLBKJLGKFIKDC'; ...

'CDLBKJLGKFNKDC'; ...

'CDLLKJGEKNGEDC'; ...

'CHBGEEEGGLPNHC'; ...

'CMDDIIDDDDDHMC'] - ('A' - 1);

map = [28, 26, 36; ...

116, 118, 132; ...

NaN, NaN, NaN; ...

73, 74, 89; ...

177, 176, 193; ...

96, 98, 112; ...

151, 152, 167; ...

60, 63, 76; ...

84, 84, 96; ...

220, 222, 236; ...

191, 193, 206; ...

135, 133, 143; ...

39, 41, 55; ...

108, 107, 118; ...

36, 34, 44; ...

124, 126, 140];

[x, y] = size(color);

Icon = reshape(map(color, :) / 255, x, y, 3);

uicontrol('Position', [10, 10, 32, 32], 'CData', Icon);

This is, in my opinion, the best way to store an icon in an M-file. But icons can be stored and edited much more comfortable in graphic files.

Geoff
on 10 May 2012

If MatLab used a single backslash for line continuation, I'd probably do it more often. =) I find the typing-in of three characters ...

on ...

every ...

line ...

quite ...

enervating.

Jan
on 10 May 2012

Daniel Shub
on 5 May 2012

Daniel Shub
on 9 May 2012

I asked a similar question, although not identical by any means, a while back:

Opportunities for recent engineering grads.

Apply TodayFind the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 3 Comments

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/37532-abc-a-considered-harmful#comment_77767

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/37532-abc-a-considered-harmful#comment_77767

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/37532-abc-a-considered-harmful#comment_78048

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/37532-abc-a-considered-harmful#comment_78048

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/37532-abc-a-considered-harmful#comment_78537

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/37532-abc-a-considered-harmful#comment_78537

Sign in to comment.