Is there anyway to speed this code up ??

1 view (last 30 days)
Basically, the case with me is that I have a large set of data with only 4 columns. I currently have a dataset array and I want to separate the observations. I have indexed the different groups of observations by 1, 2, 3, ..... 12984. All of these are currently stored in a dataset array. What I want to do is to create a numerical matrix that consists of columns for the 12984 different observations as well as their different rows since I can operate on it faster and more efficiently and have repeated observations. The problem is some of them are all unique and so some of the groups of observations have for instance (616,3), (500,3) etc etc.
This is the code I have so far:
mat = ones(200,3);
i = 1;
while i <= 10
c = ds(ds.newid == i, {'permno','monthlycumlnret','dates',});
d = cat(1, [ double(c) ]);
if length(d) == length(mat);
mat = [mat d];
if length(d) > length(mat)
z = ones(length(d)-length(mat),3);
mat = [vertcat(1,z,mat) d];
else
z = zeros(length(mat)-length(d),3);
mat = [mat vertcat(1,z,d)];
end
i = i + 1;
end
end
I haven't even done it up to 100 and it is already taking for ever to run. Can someone please help me? I am a bit new at matlab so any help is appreciated.
  5 Comments
Putsandcalls
Putsandcalls on 15 Jan 2016
Thank you, I will be working to improve myself and will keep the link in mind.
Guillaume
Guillaume on 15 Jan 2016
In addition, do not use length on 2D arrays. If your arrays has less rows than columns, length will return the number of columns, which is not what you want in your code above.
Always be explicit. You want the number of rows, so use
size(mat, 1)

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 15 Jan 2016
Edited: Guillaume on 15 Jan 2016
Your code has bugs (the vertcat calls all have an invalid 1 as first argument, you're using length with arbitrary sized matrices) and sometimes doesn't make much sense ( d = cat(1, [ double(c) ]) is the same as d = double(c)). In any case, the way you're approaching the problem is very inneficient.
I'm not familiar with datasets nor nominal, they both have been deprecated but the following should work for you:
if you have 2015b, there are some new functions that makes it somewhat easy to do what you want: findgroups and splitapply:
groupeddata = splitapply(@(subd) {subd}, double(ds(:, {'permno','monthlycumlnret','dates'})), findgroups(ds.newid)) %creates a cell array for each id
maxheight = max(cellfun(@(m) size(m, 1), groupeddata)); %get height of output matrix
groupeddata = cellfun(@(m) [m; ones(maxheight - size(m, 1), size(m, 2))], groupeddata, 'UniformOutput', false); %resize all group to the maxheight
grouppeddata = [grouppeddata{:}] %and concatenate
If you don't have 2015b, then the splitapply step can be replaced by:
grouppeddata = arrayfun(@(id) double(ds(ds.newid == id, {'permno','monthlycumlnret','dates'})), unique(ds.newid), 'UniformOutput', false)
  3 Comments
Guillaume
Guillaume on 15 Jan 2016
dataset doc says to use table instead. It's part of base matlab, so you don't even need the stat toolbox to use it.
nominal doc says to use categorical instead. Again, part of base matlab.
I made a typo in the name of the function (it's findgroups, not findgroupds) but it was spelled correctly in the code, so there's no reason it shouldn't work for you.
Putsandcalls
Putsandcalls on 16 Jan 2016
Thank you for your advice. I will make sure to keep your tips in mind. Also, I have another problem now which is that I want to try to find the mean of an array that I have separated into another matrix for all the permno. I was able to replace the ones with "0" but however, however I need to tell matlab to stop if it sees a zero that has been used to ensure that all the columns have the same number of rows. So far what I have is something along the lines of this by executing an for-loop combined with an if statement as follows:
d = 0.95;
Z = [0 2 3; 4 5 6; 7 8 9; 0 9 0; 0 0 0; 0 0 0];
i = 1;
for j=length(Z-1):-1:1
if X(j,:) == 0
P(j,:) = (1/i)*sum(((1-d)*(d.^(i)))*Z);
else
P(j,:) = (1/(i + 1))*sum(((1-d)*(d.^(i)))*Z);
end
end
So for the matrix that I have constructed, how can I tell it to count the zero with the first row but not the last 2 rows ? So far my if statement does not work at all and it just outputs a value in the first row and then the same values in the rest.
  • Also, I am wondering how I can at the same time build up an accumulator that counts the number of rows it has already gone through. I am not sure if I have done it properly in my for loop here but the idea is that I want to add it to the number of samples for the calculation of my mean if it is not 0 and increase it for each non-zero value. I am not sure if the i will continue to accumulate in each loop in this case if I predefine it.
  • The j is the fact that I want to do a reverse starting from the bottom so that the most recent date has the most weight placed to it.
Any help is much appreciated in advance.

Sign in to comment.

More Answers (0)

Categories

Find more on Get Started with MATLAB in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!