How to extract a subset of a field from a structure while keeping respective information from the other fields?

I have a MATLAB structure with 20 fields. My structure is for example called Data and I only want to concentrate on five of them.
My important field is Data.Options. It has 3000x1 cells. It has patterns like "pt002", "gm010", "mde02" etc. in it.
The other field is Data.specifics. It also has 3000x1 cells, and it contains information relative to the rows in Data.Options.
I have two other fields of Data.Up and Data.Down with 3000x1 cells again and they contain information relative to the rows in Data.specifics. These two fields only store numbers like 1, 2, 3, etc.
My other field I would like to keep is Data.components. It has 1100x1 cells. It is basically the components that are found in Data.specifics. For example, if one row of Data.specifics is: A*D = C, Data.specifics contains A and D and C in different rows.
Now I would like to extract a subset of Data.Options which has patterns of "mtc03" and "yhk90" in it and based on that, I also want to keep the rows of that fields in which these patterns are found. I have prepared something like this from previous answers, but it is incomplete:
Data2= Data
Components = [Data2.Components]
Specifics = [Data2.Specifics]
Options = [Data2.Options]
Up= [Data2.Up]
Down= [Data2.Down]
%Index = How to give index for "mtc03" and "yhk90" based on Data2.Options?
Data2.Components = Components(Index);
Data2.Specifics= Specifics(Index);
Data2.Options= Options(Index);
Data2.Up= Up(Index);
Data2.Down= Down(Index);
I don't know if it is correct, or if there is any quicker way. Thanks for any help!

 Accepted Answer

field components doesn't have the same size as others. Maybe it's better if you provide part of your struct at least. Anyways, you can do something like this:
% take care of capital letters first!
idx = ismember(data.options, {'yhk90', 'mtc03'});
fi = ["options", "up", "down", "components"];
for i = 1:numel(fi)
data.(fi(i))(~idx) = [];
end
end

10 Comments

Thanks, but it does not work. The structure is attached and the fields I am interested in are mets, rxns, ub, lb and rxnkegg.... I would like to filter them based on patterns in rxnkegg.
I'm pretty sure you mentioned:
My important field is Data.Options. It has 3000x1 cells. It has patterns like "pt002", "gm010", "mde02" etc. in it.
The other field is Data.specifics. It also has 3000x1 cells, and it contains information relative to the rows in Data.Options.
Regardless, you can modify filed names accordingly:
fi = ["mets", "rxnkegg", "rxns", "ub", "lb"];
It's also usefull to explain why "it does not work". Maybe attach your struct?
Please find the original structure attached. For simplicity, I changed their names in my question and gave a random number, but what's important is that the sizes of fields are different. The new structure has empty lb and ub, and the number of cells in rxnkegg (prev. Options), mets (prev. components) and rxns (prev. specifics) fields have remained the same. Thanks.
there is no field name "rxnkegg" in your struct. Please check everything.
That should be rxnKEGGPathways and using that, I get no results.
This is because there is no KEGG pathway called yhk90 or mtc03.
model = load('yeast-GEM.mat').model;
kegg = unique(model.rxnKEGGPathways);
kegg(cellfun(@isempty, kegg)) = []; % remove empty entries
% display few entries
kegg(1:3)
ans = 3×1 cell array
{'sce00010; sce00020; sce00260; sce00280; sce00310; sce00380; sce00620; sce00630; sce00640; sce00670; sce01110; sce01130; sce01200'} {'sce00010; sce00020; sce00260; sce00280; sce00310; sce00380; sce00620; sce00630; sce00640; sce01110; sce01130; sce01200' } {'sce00010; sce00020; sce00260; sce00280; sce00620; sce00630; sce00640; sce00670; sce01110; sce01130; sce01200' }
% search for target pathways
checkpaths = ["mtc03", "yhk90"];
any(contains(kegg, checkpaths)) % 0 --> not found
ans = logical
0
That's true and thanks a lot for helping with my questions :) Everything goes fine until I get error 'Matrix index is out of range for deletion' upon running:
fi = ["mets", "rxnkegg", "rxns", "ub", "lb"]; % These are correctly defined in my workspace
for i = 1:numel(fi)
model.(fi(i))(~idx) = [];
end
For this I used KEGG pathway of sce00740. I noticed that when a pathway is listed with other pathways, the indexing does not work, and returns zero for that row even if the pathway number is there. An example of this is sce00620.
model.mets and model.rxnkegg don't have the same size (as I mentioned above), so you cannot use the idx for model.mets. In terms of metabolic models: number of metabolites and reactions are not the same (and don't have to be!), so you cannot use the same indices for both. If you intend to keep only reactions (and their corresponding metabolites) within the KEGG pathway sce00740, it's easier to extract the subnetwork using COBRA toolbox. You can do it by yourself as well, but you need to loop over columns (i.e. rxns) of stoichiometric matrix.
In line with my comment, this snippet extracts a submodel from your model with reaction and mets involved in sec00740 pathway:
model = load('yeast-GEM.mat').model;
% all rxns in sce00740 pathway
idx = contains(model.rxnKEGGPathways, 'sce00740');
submodel = struct;
fi = ["rxnKEGGPathways", "rxns", "ub", "lb", "rxnKEGGID"];
for i = 1:numel(fi)
submodel.(fi(i)) = model.(fi(i))(idx);
end
% now find mets corresponding to rxns in submodel.rxns
submodel.S = model.S(:, idx);
orphanMetsIdx = sum(abs(submodel.S), 2) == 0; % remaining orphan mets in the submodel
submodel.S(orphanMetsIdx, :) = []; % remove those mets from the submodel
fi = ["mets", "metNames", "metKEGGID"]; % some desired fields for metabolites
for i = 1:numel(fi)
submodel.(fi(i)) = model.(fi(i))(~orphanMetsIdx);
end
submodel
submodel = struct with fields:
rxnKEGGPathways: {12×1 cell} rxns: {12×1 cell} ub: [12×1 double] lb: [12×1 double] rxnKEGGID: {12×1 cell} S: [34×12 double] mets: {34×1 cell} metNames: {34×1 cell} metKEGGID: {34×1 cell}

Sign in to comment.

More Answers (0)

Tags

Asked:

on 22 Jan 2022

Commented:

on 22 Jan 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!