MATLAB Answers

Modify Strings in cell array according to pattern

11 views (last 30 days)
Martin Muehlegger
Martin Muehlegger on 16 Jan 2020
Commented: Martin Muehlegger on 14 Feb 2020 at 15:36
I have to cell arrays A & B
A = ['C1 H2 O1 N1 S1';
[];
'C5 H2 O0 N1 S0';
'C2 H6 O0 N1 S1';
'C2 H6 O2 N1 S0';
[];
'C3 H10 O1 N1 S0';
'C6 H5 O2 N2 S1';
'C8 H9 O4 N0 S0';
[];
'C9 H13 O3 N0 S0';
'C10 H17 O2 N0 S0';
'C11 H21 O1 N0 S0';
'C12 H25 O0 N0 S0'];
and
B = ['C5H9O2+'
'C3H5O4'
'C3H7O5']
and I would like to have B in the same pattern as A so that (C* H* O* N* S*)
and deleting the +
B[1] = 'C5 H9 O2 NO S0'
kinda tried with regexprep() but didn't get anything by now.....

  0 Comments

Sign in to comment.

Accepted Answer

Allen
Allen on 20 Jan 2020
Let me know how this works for you.
B = {'C3H7O';'C2H7O2';'C2H7O2+';'C4H5O';'C4H5O';'C3H3O2';'';'C4H7O';'C3H5O2';'C3H7O2';'C2H5O3';...
'C2H5O3';'C2H7O3';'';'';'C4H5O2';'C4H5O2';'C4H5O2';'C4H7O2'};
I = {'C0','H0','O0','N0','S0'};
B = regexprep(B,{'\W','(?![CHONS]\d+)[CHONS]'},''); % Removes all but letter/number pairs
B(cellfun(@isempty,B)) = join(I,''); % Replaces empty cells with I
% Adds missing letter/number pairs to each cell of B and joins with no spacing
expr = {'(C\d+)','(H\d+)','(O\d+)','(N\d+)','(S\d+)'};
for i=1:size(B,1)
B(i) = join([B{i},I(cellfun(@isempty,regexp(B{i},expr)))],'');
end
% Adds a single space between letter/number pairs
B = regexprep(B,'(C\d+)(H\d+)(O\d+)(N\d+)(S\d+)','$1 $2 $3 $4 $5');

  3 Comments

Allen
Allen on 20 Jan 2020
You're welcome.
One thing to be aware of is that if the order of CHONS matter and they are not in this order in your original B variable, then you will need to tweak this code some.
Martin Muehlegger
Martin Muehlegger on 14 Feb 2020 at 15:36
And here we are,.... I have to tweak the code,
My new B is
B = ['C9H19NO2', 'C9H20O3', 'C6H12O6', 'C11H20O2', 'C10H19NO2', 'C10H21NO2', ...
'C10H20O3', 'C11H19NO2', 'C11H18O3', 'C12H22O2', 'C11H21NO2', 'C11H20O3']
So the order has changed to CHNOS and i would like to have it in the order as before CHONS and filled up with letter/number pair to each cell of B
PS: found a problem in the first code, when in B some letter without a number was found it shouldn't be zero e.g.
'C3H7O' = C3 H7 O0 N0 S0
but instead it should be
'C3H7O' = C3 H7 O1 N0 S0
because there is one 'O'
thanks

Sign in to comment.

More Answers (1)

Allen
Allen on 16 Jan 2020
I am sure that there is a much more elegant method using just regular expressions, but the following should give you what you are looking for if your input cell-array is similar to the example provided.
B = {'C5H9O2+'
'C3H5O4'
'C3H7O5'};
I = {'C','H','O','N','S'};
% Looks for 'C', 'H', 'O', 'N',and 'S' and when not present in a cell-array element, append the
% missing charater followed by a '0' to the cell-array element.
for i=1:length(I)
idx = ~contains(B,I{i});
B(idx) = append(B(idx),I{i},'0');
end
% Use regular expression to replace all non-alphanumeric or underscore characters, then find
% matching tokens consisting of 'C', 'H', 'O', 'N', and 'S' each followed by any number of digits
% and recombine them as a single space-delimited string.
regexprep(B,{'\W','(C\d*)(H\d*)(O\d*)(N\d*)(S\d*)'},{'','$1 $2 $3 $4 $5'})

  3 Comments

Martin Muehlegger
Martin Muehlegger on 17 Jan 2020
Thanks Allen, appreciate your help! Got an error according to append();
%Error using append (line 38)
%Wrong number of input arguments for obsolete matrix-based syntax.
Tried to fix it with:
B(idx) = [B(idx); I{i}; '0']; instead of %B(idx) = append(B(idx),I{i},'0');
But regexprep gives me an error as well
% In an assignment A(:) = B, the number of elements in A and B must be the same.
Here is a bigger set of my Cell B.
B = ['C3H7O';
'C2H7O2';
'C2H7O2+'
'C4H5O';
'C4H5O';
'C3H3O2';
''; % the empty shouldn't make any trouble
'C4H7O';
'C3H5O2';
'C3H7O2';
'C2H5O3';
'C2H5O3';
'C2H7O3';
'';
'';
'C4H5O2';
'C4H5O2';
'C4H5O2';
'C4H7O2']
Allen
Allen on 17 Jan 2020
What version of MATLAB are you using? Also, you appear to be defining B as a non-cell-array instead of a cell-array. That might also be part of the problem.
Martin Muehlegger
Martin Muehlegger on 17 Jan 2020
Matlab 2017b
whos B
Name Size Bytes Class Attributes
B 177x1 21946 cell

Sign in to comment.

Sign in to answer this question.