combining similarly named variables

2 views (last 30 days)
Corey McDowell
Corey McDowell on 29 Jun 2022
Edited: Vatsal on 29 Sep 2023
in a dataset I have variables that are functionally identical but have slightly different names due to being imported from different machines, one example is:
'chest_abd_pelvis_w_contrast_over_50kg' & 'cap_w_contrast_over_50kg'
When doing group analysis on these it is often better for them to be considered a single variable. I have been able to merge them 1 at a time using a regexp based method shown below
protocols = groupcounts(B,"Protocol");
protocols = sortrows(protocols,"GroupCount","descend")
idx1 = ~cellfun(@isempty,(regexp(protocols.Protocol(:),'(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)')));
B.idx1 = ismember(B.Protocol,protocols.Protocol(idx1));
B.Protocol(B.idx1) = {'CAP w/ contrast over 50 kg'};
B{:,(~cellfun(@isempty,(strfind(B.Properties.VariableNames,'idx'))))} = []
The minor differences in names come in a variety of forms so I do not have much hope for being able to group all of them at once, however several of these have to be repeated several times, an example of this is that for the example above there is also a:
'chest_abd_pelvis_w_contrast_21_to_50kg' & 'cap_w_contrast_21_to_50kg'
I am asking to see if there is a way to merge the over the two over 50s together and the two 21-50s together simulataneously

Answers (1)

Vatsal
Vatsal on 21 Sep 2023
Edited: Vatsal on 29 Sep 2023
I understand that you have variables in the dataset that are functionally identical but have different variable names. Now when doing group analysis, you wanted to group these variables and consider them as a single variable and you also wanted to do the same for a different set of variables simultaneously.
If your task is to merge the two over 50 variables and the two 21-50 variables , and not merge all four of them, then you have two use two different “regexp”, one will merge the two over 50 variables and another “regexp” will merge the two 21-50 variables together.
I am also providing the updated code for the reference:
protocols = groupcounts(B, "Protocol");
protocols = sortrows(protocols, "GroupCount", "descend");
idx_over_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)'));
B.idx_over_50 = ismember(B.Protocol, protocols.Protocol(idx_over_50));
B.Protocol(B.idx_over_50) = {'CAP w/ contrast over 50 kg'};
idx_21_to_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*21.*50|cap.*w.*21.*50)'));
B.idx_21_to_50 = ismember(B.Protocol, protocols.Protocol(idx_21_to_50));
B.Protocol(B.idx_21_to_50) = {'CAP w/ contrast 21 to 50 kg'};
B{:, (~cellfun(@isempty, (strfind(B.Properties.VariableNames, 'idx'))))} = [];
You can also refer to the MATLAB documentation for "regexp" to obtain more information on its usage and syntax. The link is provided below: -

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!