Importing data files with the same name in different directories (seeds) , setting them equal to "loop able" variable

Question

Sophie Sophie on 12 Oct 2018

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/423733-importing-data-files-with-the-same-name-in-different-directories-seeds-setting-them-equal-to-lo

Commented: Stephen23 on 14 Oct 2018

Accepted Answer: Stephen23

Hello world!

I am a noob and wondering if anyone can help me with this problem.

I have multiple seeds in a simulation that output the same data filenames.

I would like to import all the data files from a structure like this:

path1/ seed_0001/Ca_cyt.dat

path1/ seed_0002/Ca_cyt.dat

path1/ seed_0003/Ca_cyt.dat

path1/ seed_000n/Ca_cyt.dat

I'm not sure if it is possible to loop over the entries, creating some variable name that corresponds seed numbers so I can individually use the data files if I need to (ex. seed 1 vs. seed 2)

The real issue is that the data overwrites itself when you import something with the same name.

I would like to set up a calculation to average the values of the data file over a time series (average of time-series of calcium from seeds 1,2,3...n)

Lastly, I would like to compare the averages of different situations.

Like:

path1/seed_0001/Ca_cyt.dat

path2/seed_0001/Ca_cyt.dat

path3/seed_0001/Ca_cyt.dat

My final issue: the length of some of the matrices are not the same. I've tried using something like (length(x)) to define the calculation and am having horrible luck getting it to plot them comparatively using the same X value of time.

1 Comment
Show -1 older commentsHide -1 older comments

Sophie Sophie on 12 Oct 2018

I will say that I've written a verbose form of this code that sets variables for every import.

I have 25 simulations with 20 seeds each to compare and its very error prone. Any help would be appreciated!

Sign in to comment.

Sign in to answer this question.

Answer 1

Stephen23 on 13 Oct 2018

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/423733-importing-data-files-with-the-same-name-in-different-directories-seeds-setting-them-equal-to-lo#answer_341178

Edited: Stephen23 on 14 Oct 2018

Open in MATLAB Online

"I will say that I've written a verbose form of this code that sets variables for every import. I have 25 simulations with 20 seeds each to compare and its very error prone."

That is not a surprise, because your approach is one way that beginners force themselves into writing slow, complex, buggy code. The approach of "..creating some variable name that corresponds seed numbers...": magically creating/accessing variable numbers is the problem. Read this to know why:

https://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval

Forcing meta-data into variable names is bad way to write code. Meta-data, like the sequential folder names, is data, so it should be stored as data and not forced awkwardly into variable names.

You should simply store the data in one array (e.g. a structure, or cell array, or an ND array) which is then trivial to access using efficient indexing. Here is something to get you started:

D = 'path to where the PATH* directories are located';
P = 'path1';
S = dir(fullfile(D,P,'seed*));
for jj = 1:numel(S)
    F = fullfile(D,P,S(jj).name,'Ca_cyt.dat');
    S(jj).data = csvread(F); % use whatever function imports your data files
end

And that is all. The data for the files are all stored in the structure S, along with the folder names. So you can access them trivially using indexing, e.g. the second folder's name and data:

S(2).name
S(2).data

It also means that you can use a comma-separated list to perform actions on all of the imported data or filenames. For example, to put all of the filenames into one cell array:

{S.name}

or to vertically concatenate all of the imported data:

vertcat(S.data)

or you can access it in other ways, depending on the size and classes of your data.

Read this to know more about how comma separated lists work:

https://www.mathworks.com/matlabcentral/answers/320713-how-to-operate-on-comma-separated-lists

You might also want to look at the examples in the MATLAB documentation:

https://www.mathworks.com/help/matlab/import_export/process-a-sequence-of-files.html

Because you did not upload any sample files or explain anything about them I have no idea how your data files are formatted, so you will have to pick the file importing function yourself. I just used csvread as an example. If you upload a sample data file then I could help you with picking a suitable data importing function.

"Lastly, I would like to compare the averages of different situations.... path1 ... path2 ... path3 ..."

Then you will probably have to put the entirety of my code into another loop, basically like this:

D = 'path to where the PATH* directories are located';
N = 3;
C = cell(1,N);
for ii = 1:N
    P = sprintf('path%d',ii);
    C{ii} = dir(fullfile(D,P,'seed*));
    for jj = 1:numel(C{ii})
        F = fullfile(D,P,C{ii}(jj).name,'Ca_cyt.dat');
        C{ii}(jj).data = .... use whatever function imports your data files
    end
end

And then access the structures inside the cell array C. You could generate one structure array from that cell array, which might make accessing the data easier:

S = [C{:}]

4 Comments
Show 2 older commentsHide 2 older comments

Sophie Sophie on 13 Oct 2018

Edited: Stephen23 on 14 Oct 2018

Open in MATLAB Online

example.zip

Thanks for the rapid response!

The reason I tried variable names is so that I could know what systems I am plotting or comparing with a short-hand code.

For example:

diseased,deformed,2 channels, seed 1 = disdef2s1
diseased,deformed,5 channels, seed 1 = disdef5s1
healthy, normal, 2 channels, seed 1 = healnorm2s1
healthy, normal, 2 channels, seed 2 = healnorm2s2

This helps when I am plotting:

plot(x,healnorm2s2, 'r');
plot(x,healnorm5s1, 'b');

Not sure if there is another way to do this..

I used the first portion of the code that you provided and it worked! Thanks so much!

Here is the code I used:

    D = '~/projects/Dyad/2018/'; %location on my computer
    P = 'healthy_AP_1-LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files/mcell/output_data/react_data'; %location of the seeds
    S = dir(fullfile(D,P,'seed*'));
    for jj = 1:numel(S)
        F = fullfile(D,P,S(jj).name,'Ca2_cyt.World.dat'); %name of file I am trying to load
        S(jj).data = load (F); % imports my data file
    end

The second portion of the code does not work, however. Perhaps I should clarify what I am trying to do.

In my '2018' directory, I have several simulation cases. To list a few:

deformedTT-diseased_AP_10-LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files  
diseased_AP_5-LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files
deformedTT-diseased_AP_1-LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files 
healthy_AP_10-LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files
deformedTT-diseased_AP_2-LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files   
healthy_AP_10-LTCC_volume-filled-RyR_PMCA_NCX_no-SERCA_dyad_files

All of them have the common structure below, each with multiple seeds:

**simulationcase#1**/mcell/output_data/seed_000*/Ca2_cyt.World.dat
**simulationcase#2**/mcell/output_data/seed_000*/Ca2_cyt.World.dat

I have attached an example to this comment.

When I use the following code, I get an error:

    D = '~/projects/Dyad/2018';
    N = 3;
    C = cell(1,N);
    for ii = 1:N
        P = sprintf('/mcell/react_data/output_data/',ii);
        C{ii} = dir(fullfile(D,P,'seed*'));
        for jj = 1:numel(S)
            F = fullfile(D,P,S(jj).name,'Ca2_cyt.World.dat');
            C{ii}(jj).data = load (F)
        end
    end

Undefined function or variable 'S'.

I am not sure how to modify the code so that it can loop through all my simulation cases without listing them all explicitly. Is it possible?

Thanks for all your help and your time!

Stephen23 on 14 Oct 2018

Edited: Stephen23 on 14 Oct 2018

Open in MATLAB Online

"Not sure if there is another way to do this.."

Exactly like the MATLAB documentation recommends and as I wrote in my answer: you should use one array and indexing. This will actually make potting easier, as you can do them in a loop and automatically label/save them. Your approach (with lots of separate variables) will make processing your data much more difficult.

"I am not sure how to modify the code so that it can loop through all my simulation cases without listing them all explicitly."

Of course, you can just use dir to read the folder names. After all, you are already doing this with the first loop (which you told me works perfectly). However the best way to do this depends on what MATLAB version you are using: please let us know what version you have.

If you have R2016b or later, then you can simply do something like this:

D = '~/projects/Dyad/2018/'; %location on my computer
P = 'mcell/output_data/react_data';
S = dir(fullfile(D,'*LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files',P,'seed*'));
for jj = 1:numel(S)
    F = fullfile(S(jj).folder,S(jj).name,'Ca2_cyt.World.dat');
    S(jj).data = load(F);
end

Note how the dir input uses the wildcard character '*' in two locations (being the folders that you want to search for).

Sophie Sophie on 14 Oct 2018

Edited: Sophie Sophie on 14 Oct 2018

Open in MATLAB Online

Sorry I wasn't more clear. I meant is there a better way to index? Other than memorizing which folder or file name belongs to which ever column or row in the array? I used the variables as short-hand, but I'm not sure of a better way to do that..

I do understand that the code slows down considerably when you use too many variables.. I have lived it.. Took me 30 minutes to compile only to figure out that I made an error. Debugging took another 4 hours, hence why I reached out for help :)

Your code has taken care of all but one of my inquiries!

D = '~/projects/Dyad/2018/';           %location on my computer
P = 'mcell/output_data/react_data';    %common path in all directories
S = dir(fullfile(D,'*LTCC_RyR_PMCA_NCX_no-SERCA_dyad_files',P,'seed*')); 
for jj = 1:numel(S)                    %import file named *dat
  F = fullfile(S(jj).folder,S(jj).name,'Ca2_cyt.World.dat');
  S(jj).data = load(F);                %load all the files
end

the last question of the original post was about selecting sections of the indexed array in order to plot them.

In other words

How do I plot something that is indexed in an array and select a section of it to plot vs. another index?

for example:

plot (S.data[column 1 of seed 2 in row 1 of S, aka time(rows 1-n)], S.data[column 2 of seed 2 in row 1 of S, aka # molecules )rows 1-n)) %plot x and y from row 1 to n of seed 1

hold

plot (S.data[column 1 of seed 1 in row 1 of S, aka time(rows 1-n)], S.data[column 2 of seed 1 in row 1 of S, aka # molecules )rows 1-n)) %plot x and y from row 1 to n of seed 1

I want to be able to get the minimum length of the data, say, the time elapsed in the simulation with:

length(time)

and use it to make the vectors equal each other so that they can be plotted.

I tried something like:

    plot(S.data[1(:1,length(time))],S.data[1(:2,length(time))])%plot x and y values of seed 1

but it didn't work. Most likely because I don't know how to properly sort or index what I am looking for in terms of rows and columns in an array..

Thanks for your patience and all of your help.

Stephen23 on 14 Oct 2018

Open in MATLAB Online

"the last question of the original post was about selecting sections of the indexed array in order to plot them."

Use strcmpi, strncmpi, strfind, regexpi, etc. to identify the parts of the structure S that you want to plot. Use that as the index in to S (not into data, like you were trying to do), for example:

N = {S.name};
X = strcmpi(N,'seed2');

You can then use those indices to select the elements of S that you want to plot:

for k = reshape(find(X),1,[])
    S(k).data(1,:) % time
    S(k).data(2,:) % # molecules
    ...
end

You can combine multiple conditions in your logical array X to select the combinations that you want. You need to read this:

https://www.mathworks.com/help/matlab/matlab_prog/access-multiple-elements-of-a-nonscalar-struct-array.html

Sign in to comment.

Importing data files with the same name in different directories (seeds) , setting them equal to "loop able" variable

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Importing data files with the same name in different directories (seeds) , setting them equal to "loop able" variable

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

4 Comments
Show 2 older commentsHide 2 older comments