MATLAB Answers

0

Importing data files with the same name in different directories (seeds) , setting them equal to "loop able" variable

Asked by Sophie Sophie on 12 Oct 2018
Latest activity Commented on by Stephen Cobeldick on 14 Oct 2018
Hello world!
I am a noob and wondering if anyone can help me with this problem.
I have multiple seeds in a simulation that output the same data filenames.
I would like to import all the data files from a structure like this:
path1/ seed_0001/Ca_cyt.dat
path1/ seed_0002/Ca_cyt.dat
path1/ seed_0003/Ca_cyt.dat
path1/ seed_000n/Ca_cyt.dat
I'm not sure if it is possible to loop over the entries, creating some variable name that corresponds seed numbers so I can individually use the data files if I need to (ex. seed 1 vs. seed 2)
The real issue is that the data overwrites itself when you import something with the same name.
I would like to set up a calculation to average the values of the data file over a time series (average of time-series of calcium from seeds 1,2,3...n)
Lastly, I would like to compare the averages of different situations.
Like:
path1/seed_0001/Ca_cyt.dat
path2/seed_0001/Ca_cyt.dat
path3/seed_0001/Ca_cyt.dat
My final issue: the length of some of the matrices are not the same. I've tried using something like (length(x)) to define the calculation and am having horrible luck getting it to plot them comparatively using the same X value of time.

  1 Comment

I will say that I've written a verbose form of this code that sets variables for every import.
I have 25 simulations with 20 seeds each to compare and its very error prone. Any help would be appreciated!

Sign in to comment.

1 Answer

Answer by Stephen Cobeldick on 13 Oct 2018
Edited by Stephen Cobeldick on 14 Oct 2018
 Accepted Answer

"I will say that I've written a verbose form of this code that sets variables for every import. I have 25 simulations with 20 seeds each to compare and its very error prone."
That is not a surprise, because your approach is one way that beginners force themselves into writing slow, complex, buggy code. The approach of "..creating some variable name that corresponds seed numbers...": magically creating/accessing variable numbers is the problem. Read this to know why:
Forcing meta-data into variable names is bad way to write code. Meta-data, like the sequential folder names, is data, so it should be stored as data and not forced awkwardly into variable names.
You should simply store the data in one array (e.g. a structure, or cell array, or an ND array) which is then trivial to access using efficient indexing. Here is something to get you started:
D = 'path to where the PATH* directories are located';
P = 'path1';
S = dir(fullfile(D,P,'seed*));
for jj = 1:numel(S)
F = fullfile(D,P,S(jj).name,'Ca_cyt.dat');
S(jj).data = csvread(F); % use whatever function imports your data files
end
And that is all. The data for the files are all stored in the structure S, along with the folder names. So you can access them trivially using indexing, e.g. the second folder's name and data:
S(2).name
S(2).data
It also means that you can use a comma-separated list to perform actions on all of the imported data or filenames. For example, to put all of the filenames into one cell array:
{S.name}
or to vertically concatenate all of the imported data:
vertcat(S.data)
or you can access it in other ways, depending on the size and classes of your data.
Read this to know more about how comma separated lists work:
You might also want to look at the examples in the MATLAB documentation:
Because you did not upload any sample files or explain anything about them I have no idea how your data files are formatted, so you will have to pick the file importing function yourself. I just used csvread as an example. If you upload a sample data file then I could help you with picking a suitable data importing function.
"Lastly, I would like to compare the averages of different situations.... path1 ... path2 ... path3 ..."
Then you will probably have to put the entirety of my code into another loop, basically like this:
D = 'path to where the PATH* directories are located';
N = 3;
C = cell(1,N);
for ii = 1:N
P = sprintf('path%d',ii);
C{ii} = dir(fullfile(D,P,'seed*));
for jj = 1:numel(C{ii})
F = fullfile(D,P,C{ii}(jj).name,'Ca_cyt.dat');
C{ii}(jj).data = .... use whatever function imports your data files
end
end
And then access the structures inside the cell array C. You could generate one structure array from that cell array, which might make accessing the data easier:
S = [C{:}]

  4 Comments

Show 1 older comment
"Not sure if there is another way to do this.."
Exactly like the MATLAB documentation recommends and as I wrote in my answer: you should use one array and indexing. This will actually make potting easier, as you can do them in a loop and automatically label/save them. Your approach (with lots of separate variables) will make processing your data much more difficult.
"I am not sure how to modify the code so that it can loop through all my simulation cases without listing them all explicitly."
Of course, you can just use dir to read the folder names. After all, you are already doing this with the first loop (which you told me works perfectly). However the best way to do this depends on what MATLAB version you are using: please let us know what version you have.
If you have R2016b or later, then you can simply do something like this:
D = '~/projects/Dyad/2018/'; %location on my computer
P = 'mcell/output_data/react_data';
S = dir(fullfile(D,'*LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files',P,'seed*'));
for jj = 1:numel(S)
F = fullfile(S(jj).folder,S(jj).name,'Ca2_cyt.World.dat');
S(jj).data = load(F);
end
Note how the dir input uses the wildcard character '*' in two locations (being the folders that you want to search for).
Sorry I wasn't more clear. I meant is there a better way to index? Other than memorizing which folder or file name belongs to which ever column or row in the array? I used the variables as short-hand, but I'm not sure of a better way to do that..
I do understand that the code slows down considerably when you use too many variables.. I have lived it.. Took me 30 minutes to compile only to figure out that I made an error. Debugging took another 4 hours, hence why I reached out for help :)
Your code has taken care of all but one of my inquiries!
D = '~/projects/Dyad/2018/'; %location on my computer
P = 'mcell/output_data/react_data'; %common path in all directories
S = dir(fullfile(D,'*LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files',P,'seed*'));
for jj = 1:numel(S) %import file named *dat
F = fullfile(S(jj).folder,S(jj).name,'Ca2_cyt.World.dat');
S(jj).data = load(F); %load all the files
end
the last question of the original post was about selecting sections of the indexed array in order to plot them.
In other words
How do I plot something that is indexed in an array and select a section of it to plot vs. another index?
for example:
plot (S.data[column 1 of seed 2 in row 1 of S, aka time(rows 1-n)], S.data[column 2 of seed 2 in row 1 of S, aka # molecules )rows 1-n)) %plot x and y from row 1 to n of seed 1
hold
plot (S.data[column 1 of seed 1 in row 1 of S, aka time(rows 1-n)], S.data[column 2 of seed 1 in row 1 of S, aka # molecules )rows 1-n)) %plot x and y from row 1 to n of seed 1
I want to be able to get the minimum length of the data, say, the time elapsed in the simulation with:
length(time)
and use it to make the vectors equal each other so that they can be plotted.
I tried something like:
plot(S.data[1(:1,length(time))],S.data[1(:2,length(time))])%plot x and y values of seed 1
but it didn't work. Most likely because I don't know how to properly sort or index what I am looking for in terms of rows and columns in an array..
Thanks for your patience and all of your help.
"the last question of the original post was about selecting sections of the indexed array in order to plot them."
Use strcmpi, strncmpi, strfind, regexpi, etc. to identify the parts of the structure S that you want to plot. Use that as the index in to S (not into data, like you were trying to do), for example:
N = {S.name};
X = strcmpi(N,'seed2');
You can then use those indices to select the elements of S that you want to plot:
for k = reshape(find(X),1,[])
S(k).data(1,:) % time
S(k).data(2,:) % # molecules
...
end
You can combine multiple conditions in your logical array X to select the combinations that you want. You need to read this:

Sign in to comment.