Help optimizing inefficient code

Question

0 votes

I'm relatively new to matlab, and I have a code for merging two datasets based on a common attribute (in this case names). However, my code is very inefficient, so I'd be grateful for any suggestions to make it more efficient.

The gist of it is this: I have two datasets. They have complementary data, and share certain attributes that I'd like to use to combine them. I'm reading Dataset 1 as a cell array. It contains promoters names (which I'll just call promoters), and a value associated with each.

I've been reading Dataset 2 as a table then turning it into cell array. Each row represents one gene, however each gene can possess no promoters, one promoter, or multiple promoters (which I'll just call 'Names' from now on) from dataset 1. I'd like to find a way to append the value in dataset 1 to it's associated identifier as well as gene in dataset 2. In dataset 2, if a gene has multiple promoters they are originally stored in a single string cell and are separated by: ' // '

Essentially:

Dataset 1 (10,000 x 2): 'Name' 'Value' i.e. dataset1= {'Name1', 2.32; 'Name2', 3.42}

Dataset 2 (5000 x 2): 'Gene' 'Name,Name' i.,e. dataset2 = {'Gene1', []; 'Gene2', 'Name1'; 'Gene3', 'Name2 // Name3'}

My solution to this was to split by separator.

for j=[2 3 4 5 6] %dataset2 actually has multiple columns of significance that need splitting, but not important now
    for i=1:length(a)
        if isempty(a{i,j}) == 1
            continue
        end
        b = char(a{i,j});
        c = strsplit(b,' // ');
        a{i,j} = [c];
 
    end
end

This converted cells with multiple promoter names into a cell array with promoter names (i.e. {'Gene' 1x3} where the 1x3 is = {'Name1' 'Name2' 'Name3'}.

My solution to merge the data was to use a for loop that assesses the size of that 1x3 cell (that could just be a single name), and search dataset1 for the associated name and append the associated value to dataset 2 in a manner such as:

Dataset 2: {'Gene1', 2x3} where the 2x3 = {'Name1', 'Name2', 'Name3'; 'Value1', Value2', 'Value3'}

Here is my code, I tried to annotate it to make it easier to follow:

    for i=1:length(dataset2)
        if isempty(dataset2{i,5}) == 1 % 0 'Names' associated w/ gene
            continue
        end
        s = size(a{i,5}); 
        if s(1,2) == 1 % One 'Name' associated w/ gene
            x = a{i,5};
            for j=1:length(dataset1)
                y = dataset1{j,1};
                if strcmp(x,y) == 1 % Using strcmp to find matching 'Names'
                    a{i,7} = dataset1{j,3};
                    a{i,8} = dataset1{j,4};
                end
            end
        end
        if s(1,2) > 1 % Multiple 'Names' associated with gene
            r = a{i,5};
            p = 1;
            for rr=1:length(s(1,2))
                x = r(1,rr);
                for m=1:length(dataset1)
                    y = dataset1{m,1};
                    if strcmp(x,y) == 1
                        a{i,7}(2,rr) = dataset1{j,3};
                        a{i,8}(3,rr) = dataset1{j,4};
                    end
                end
            end
        end
    end

I'm sure this is a very convulted script, so any insight would be appreciated.

1 Comment
Show -1 older comments Hide -1 older comments

Jan on 18 Feb 2019

Open in MATLAB Online

This part is not clear:

Dataset 1 (10,000 x 2): 'Name' 'Value' i.e. dataset1= {'Name1', 2.32; 'Name2', 3.42}

Dataset 2 (5000 x 2): 'Gene' 'Name,Name' i.,e. dataset2 = {'Gene1', []; 'Gene2', 'Name1'; 'Gene3', 'Name2 // Name3'}

What does this mean? Please provide the input data in a clear format. What is the variable a in the first code snippet? I guess, this is a simplification:

for j = 2:6
    for i = 1:size(a, 1)  % Safer than: length(a)
        if ~isempty(a{i,j})
            a{i,j} = strsplit(a{i,j}, ' // ');
        end 
    end
end

Sign in to comment.

Sign in to answer this question.

Follow Question

Help optimizing inefficient code

1 Comment
Show -1 older comments Hide -1 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

Help optimizing inefficient code

1 Comment Show -1 older comments Hide -1 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

1 Comment
Show -1 older comments Hide -1 older comments