Asked by Paolo Binetti
on 17 Dec 2016

As input I have a vector of a few million char, which can be 'A', 'C', 'G', 'T'. The vector is called sequence.

As output I want a vector with the cumulative sums of the elements of the input vector, after I have converted its chars in numbers. This code works, but takes forever. What might be the problem?

n = numel(sequence);

skew = zeros(1,n+1,'int32');

for i = 1:n

switch sequence(i)

case 'C'

skew(i+1) = skew(i)-1;

case 'G'

skew(i+1) = skew(i)+1;

otherwise

skew(i+1) = skew(i);

end

end

By the way, I am not interested in the cumulative sums vectors as is. The only thing I care is the index of its minimum.

Answer by Image Analyst
on 17 Dec 2016

Accepted Answer

Then you're doing something wrong. Are you preallocating your cdf array? It should be only a fraction of a second. For example:

numPoints = 20000000; % Twenty million elements

m = randi(-1,1, 1, numPoints);

tic;

cdf = zeros(1, numPoints);

cdf(1) = m(1);

for k = 2 : length(m)

cdf(k) = cdf(k-1) + m(k);

end

toc;

The elapsed time on my computer for twenty million elements is 0.21 seconds. How long is it taking on your computer?

Paolo Binetti
on 17 Dec 2016

Thank you. I copy-pasted your code, just edited the randi function because I am using Octave on this computer. It takes 4 seconds with numPoints = 200000 ! Of course I did not even try going to 20 millions.

So now I have no idea of what could be wrong, maybe I have the wrong version of Octave. Maybe I should try the same on Matlab.

Image Analyst
on 17 Dec 2016

Paolo Binetti
on 17 Dec 2016

Sign in to comment.

Answer by Star Strider
on 17 Dec 2016

If your memory is that limited, this may be a work-around:

V = int32(randi([-1 1], 1, 1000)); % Create Data

Vr = reshape(V, 100, []); % Create Matrix (Can Be Assigned As ‘V’, Separate Here To Check Code)

vctmin = Inf; % Initialise ‘vctmin’

endsum = 0; % Initialise ‘endsum’

for k1 = 1:size(Vr,2)

colsum = cumsum(Vr(:,k1))+endsum; % Calculate Column ‘cumsum’

endsum = colsum(end); % End Value

[colmin,idx] = min(colsum);

if (colmin < vctmin) % Test Minima & Replace

vctmin = colmin; % New Vector Minimum

minidx = idx + (k1 - 1)*size(Vr,1); % New Minimum Index

end

end

vctmin

minidx

[testmin,testidx] = min(cumsum(V))

It creates a matrix from your vector. (I gave them different names here so I could check the code, but assigning the reshaped vector to ‘V’ instead of ‘Vr’ would not use any additional memory.) The code then does the cumulative sum on each column of the matrix, stores the minimum and its index, and proceeds through the matrix. The advantage is that only one column of the matrix at a time is in your workspace, so memory should not be an issue.

When I checked the matrix approach with directly calculating the minimum and its index, the results were the same over several test runs.

Sign in to comment.

Answer by Star Strider
on 17 Dec 2016

With respect to your creating ‘skew’, this may be more efficient:

bases = {'A','C','T','G'}; % Cell Array

sequence = bases(randi(4, 1, 20));

skew = zeros(1, length(sequence)+1,'int32');

Cix = find(ismember(sequence, 'C'));

Gix = find(ismember(sequence, 'G'));

skew(Cix+1) = -1;

skew(Gix+1) = +1;

I don’t know what ‘sequence’ is, so I created it as a cell array here.

Paolo Binetti
on 17 Dec 2016

Thanks, string is actually a char array generated like this:

sequence = fileread('source.txt');

Can I adapt your code to work with a char array?

Star Strider
on 17 Dec 2016

Yes! It works the same way, producing the same (correct) result:

bases = ['A','C','T','G']; % Character Array

sequence = bases(randi(4, 1, 20));

skew = zeros(1, length(sequence)+1,'int32');

Cix = find(ismember(sequence, 'C'));

Gix = find(ismember(sequence, 'G'));

skew(Cix+1) = -1;

skew(Gix+1) = +1;

Image Analyst
on 17 Dec 2016

If you're going to ask another question, then please also attach 'source.txt'.

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 2 Comments

## John D'Errico (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/317262-given-a-very-long-string-replace-chars-with-numbers-and-obtain-cumulative-sum-vector#comment_414450

## Paolo Binetti (view profile)

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/317262-given-a-very-long-string-replace-chars-with-numbers-and-obtain-cumulative-sum-vector#comment_414466

Sign in to comment.