Is there anyway to speed this code up ??

Question

Putsandcalls on 15 Jan 2016

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/263858-is-there-anyway-to-speed-this-code-up

Commented: Putsandcalls on 16 Jan 2016

sampledata.mat

Basically, the case with me is that I have a large set of data with only 4 columns. I currently have a dataset array and I want to separate the observations. I have indexed the different groups of observations by 1, 2, 3, ..... 12984. All of these are currently stored in a dataset array. What I want to do is to create a numerical matrix that consists of columns for the 12984 different observations as well as their different rows since I can operate on it faster and more efficiently and have repeated observations. The problem is some of them are all unique and so some of the groups of observations have for instance (616,3), (500,3) etc etc.

This is the code I have so far:

mat = ones(200,3);
i = 1;
while i <= 10
    c = ds(ds.newid == i, {'permno','monthlycumlnret','dates',});
    d = cat(1, [ double(c) ]);
    if length(d) == length(mat);
        mat = [mat d];
    if length(d) > length(mat)
        z = ones(length(d)-length(mat),3);
        mat = [vertcat(1,z,mat) d];
    else
        z = zeros(length(mat)-length(d),3);
        mat = [mat vertcat(1,z,d)];
    end
    i = i + 1;
    end
end

I haven't even done it up to 100 and it is already taking for ever to run. Can someone please help me? I am a bit new at matlab so any help is appreciated.

5 Comments
Show 3 older commentsHide 3 older comments

Putsandcalls on 15 Jan 2016

Thank you, I will be working to improve myself and will keep the link in mind.

Guillaume on 15 Jan 2016

Open in MATLAB Online

In addition, do not use length on 2D arrays. If your arrays has less rows than columns, length will return the number of columns, which is not what you want in your code above.

Always be explicit. You want the number of rows, so use

size(mat, 1)

Sign in to comment.

Sign in to answer this question.

Answer 1

Guillaume on 15 Jan 2016

1
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/263858-is-there-anyway-to-speed-this-code-up#answer_206250

Edited: Guillaume on 15 Jan 2016

Open in MATLAB Online

Your code has bugs (the vertcat calls all have an invalid 1 as first argument, you're using length with arbitrary sized matrices) and sometimes doesn't make much sense ( d = cat(1, [ double(c) ]) is the same as d = double(c)). In any case, the way you're approaching the problem is very inneficient.

I'm not familiar with datasets nor nominal, they both have been deprecated but the following should work for you:

if you have 2015b, there are some new functions that makes it somewhat easy to do what you want: findgroups and splitapply:

groupeddata = splitapply(@(subd) {subd}, double(ds(:, {'permno','monthlycumlnret','dates'})), findgroups(ds.newid))  %creates a cell array for each id
maxheight = max(cellfun(@(m) size(m, 1), groupeddata)); %get height of output matrix
groupeddata = cellfun(@(m) [m; ones(maxheight - size(m, 1), size(m, 2))], groupeddata, 'UniformOutput', false); %resize all group to the maxheight
grouppeddata = [grouppeddata{:}] %and concatenate

If you don't have 2015b, then the splitapply step can be replaced by:

grouppeddata = arrayfun(@(id) double(ds(ds.newid == id, {'permno','monthlycumlnret','dates'})), unique(ds.newid), 'UniformOutput', false)

3 Comments
Show 1 older commentHide 1 older comment

Guillaume on 15 Jan 2016

dataset doc says to use table instead. It's part of base matlab, so you don't even need the stat toolbox to use it.

nominal doc says to use categorical instead. Again, part of base matlab.

I made a typo in the name of the function (it's findgroups, not findgroupds) but it was spelled correctly in the code, so there's no reason it shouldn't work for you.

Putsandcalls on 16 Jan 2016

Open in MATLAB Online

Thank you for your advice. I will make sure to keep your tips in mind. Also, I have another problem now which is that I want to try to find the mean of an array that I have separated into another matrix for all the permno. I was able to replace the ones with "0" but however, however I need to tell matlab to stop if it sees a zero that has been used to ensure that all the columns have the same number of rows. So far what I have is something along the lines of this by executing an for-loop combined with an if statement as follows:

d = 0.95;
Z = [0 2 3; 4 5 6; 7 8 9; 0 9 0; 0 0 0; 0 0 0];
i = 1;
  for j=length(Z-1):-1:1
      if X(j,:) == 0
          P(j,:) = (1/i)*sum(((1-d)*(d.^(i)))*Z);
      else
          P(j,:) = (1/(i + 1))*sum(((1-d)*(d.^(i)))*Z);
      end
  end

So for the matrix that I have constructed, how can I tell it to count the zero with the first row but not the last 2 rows ? So far my if statement does not work at all and it just outputs a value in the first row and then the same values in the rest.

Also, I am wondering how I can at the same time build up an accumulator that counts the number of rows it has already gone through. I am not sure if I have done it properly in my for loop here but the idea is that I want to add it to the number of samples for the calculation of my mean if it is not 0 and increase it for each non-zero value. I am not sure if the i will continue to accumulate in each loop in this case if I predefine it.
The j is the fact that I want to do a reverse starting from the bottom so that the most recent date has the most weight placed to it.

Any help is much appreciated in advance.

Sign in to comment.

Is there anyway to speed this code up ??

5 Comments
Show 3 older commentsHide 3 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Is there anyway to speed this code up ??

5 Comments Show 3 older commentsHide 3 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

5 Comments
Show 3 older commentsHide 3 older comments

3 Comments
Show 1 older commentHide 1 older comment