Code covered by the BSD License  

Highlights from
What's New In MATLAB 7.2 R2006a

image thumbnail

What's New In MATLAB 7.2 R2006a

by

 

07 Mar 2006 (Updated )

Webinar presentation and demo files

newregexp_long.m
%% New Regular Expression Features in MATLAB 7.2 (2006a)
% In MATLAB 7.2 there are a number of new features for regular expression
% matching and replacing. The most important is dynamic regular expression
% matching and replacing, often asked by customers and not available in any
% other language except Perl. The dynamic features can also make use of
% MATLAB code inside an expression. Other new features include
% new matching modes and a flag to display warnings during expression
% processing.
%
% The following examples of new features require a good understanding of
% the current regular expression features which are extensive, and are not
% explained here. Please see the documentation for a review. 
%
% Thanks to Jason Breslau for helping me with this.

%% Dynamic creation of a search string 
% Let's say we want to find patterns of numbers and X's in some strings. Let's
% first define some strings.
strs={'3XXX', '12XXXXXXXXXXXX' '5XXXXX'};

%%
% Traditionally with regular expression matching, to find a static pattern
% like '3' followed by 3 'X's i.e. '3XXX', you could use,
match=regexp(strs, '^3X{3}$', 'match', 'once')

%%
% Or to find any number followed by 12 X's, you would use,
match=regexp(strs, '^(\d+)X{12}$', 'match', 'once')

%%
% Previously, if you wanted to find any number followed by that number of X's,
% it was very difficult. Now in MATLAB 7.2 you can change a search pattern
% based on what you have just found. So to find any number followed by that number of X's
% we use,
match=regexp(strs, '^(\d+)(??X{$1})$', 'match', 'once')

%%
% This says, find one or more digits (forming token 1) followed by that
% number of X's (token 2). You could then replace the matching patterns
% with something such as just the number and 'X' 
match=regexprep(strs, '^(\d+)(??X{$1})$', '$1X')

%% Dynamic creation of a search string with M code
% You can now enter MATLAB code in a expression.  For
% example, say we want to check to see if an English sentence is a
% palindrome.

%%
% First define a mixed case sentence to test,
str = 'Never odd or even'; 

%%
% Next, set to lower case and remove spaces, by calling the MATLAB function
% |lower| as part of the replace process in |regexprep|
s1=regexprep(str, '(\w*)(\W*)', '${lower($1)}'); 

%%
% Now, find a sequence of letters that are followed by their mirror image
match=regexp(s1, '^(.*).?(??@fliplr($1))$') 

%%
% Try with a different string that is not a palindrome,
str = 'Never odd oyr even'; 
s1=regexprep(str, '(\w*)(\W*)', '${lower($1)}'); % Set to lower case and remove spaces
match=regexp(s1, '^(.*).?(??@fliplr($1))$') % Find a sequence of letters that are followed by their mirror image

%%
% Try with a different string that is a palindrome,
str = 'Never odXd or even'; 
s1=regexprep(str, '(\w*)(\W*)', '${lower($1)}'); % Set to lower case and remove spaces
match=regexp(s1, '^(.*).?(??@fliplr($1))$') % Find a sequence of letters that are followed by their mirror image

%% Dynamic creation of a replacement string with M code
% Let's say we had converted a string into a legal variable name using
% |genvarname|. |genvarname| converts illegal characters to a hex code.
% Say you then wanted to regenerate the original illegal characters. 
str = 'Will this work?'; % Test string
var = genvarname(str) % Convert to legal variable name (chars to 0xhex number)

%%
% *How it was done in the past*
% 
% Find the hex code characters
found = regexp(var, '0x([0-9A-F]{2})', 'match', 'once');

%%
% Replace each occurrence of a hex code in the string, one at a time
while ~isempty(found) % Repeat until all replaced
    hex = found(3:4); % Find codes
    ch = char(hex2dec(hex)); % Convert back to single char
    var = strrep(var, found, ch); % Replace code with single char
    found = regexp(var, '0x([0-9A-F]{2})', 'match', 'once'); % Find again
end 

var

%%
% *Now its much easier*
%
% First, regenerate the legal name string,
var = genvarname(str)

%%
% Now use dynamic replacement with M code
match=regexprep(var, '0x([0-9A-F]{2})', '${char(hex2dec($1))}')

%%
% Let' try it with another string,
str = 'Will this work#';
var = genvarname(str)

%%
match=regexprep(var, '0x([0-9A-F]{2})', '${char(hex2dec($1))}')

%% Dynamic recursive matching with M code
% When searching hierarchical strings like an HTML page,
% it can be difficult matching ending tags with beginning tags. Let's say
% we want to find strings inside something a bit simpler, matching parenthesis.
% 
% First create a string,
str='asdf(ASDF(ASD)ASDF(ASDF(ASDF)ASDF)ASDF)asdf';

%%
% Define a recursive MATLAB expression,
levelN = '\(([^()]|(??@levelN))*\)'; % This says match either a parethesis or the levelN expression
match=regexp(str, levelN, 'match', 'once')

%%
% It can hurt your head thinking about what this has just done.

%% Matching modes
% There are a few more matching modes, given as flags to the |regexp|
% functions.

%%
% The matching mode |dotexceptnewline| indicates that regexp should not
% match newline characters. This used to be done by using |[^\n]|. 
%
% Here is a string with a new line
str = ['this is the first line' 10 'this is the second line']

%%
% For example,
match=regexp(str, '.*', 'match')

%%
match=regexp(str, '.*', 'match', 'dotexceptnewline');
match{1}
match{2}

%%
% The matching mode |lineanchors| indicates that |^| and |$| should be true at the start and end of
% a line in addition to the start and end of the whole string
match=regexp(str, '^.*?$', 'match')

%%
match=regexp(str, '^.*?$', 'match', 'lineanchors')

%% Warnings
% Finally you can get verbose warnings displayed during regular expression processing
% which can help catch errors.
s = regexp('a(b[c)d]e','a(b','warnings')

%%
% This warns you that the special regexp character '(' is being treated as a
% literal. You should escape the special parenthesis character for deterministic results.
s = regexp('a(b[c)d]e','a\(b','warnings')

Contact us