How so subset observations in a matrix?

I have a matrix that has the follwing columns:
[GVKEY FYEAR SALE ASSET]
FYEAR ranges from 2005-2019 and there are many GVKEYs which are basically identifiers. I want to keep only those GVKEYS(observations) for which I have the entire 2005-2019 observations, as some may have only sub ranges like 2007-2009.
How do I do this?

 Accepted Answer

Hi Prerna,
My understanding is that you want to filter out the rows of a matrix for those GVKEYs having observations from 2005-2019. I would like to make the assumption that the year range is continuous. You could try out the following piece of code.
keys = unique(data(:,1))
for i=1:numel(keys)
keyData = data(data==keys(i),:)
% from here you can compare with keyData length or compare the year
% array to the 2nd column and based on that modify values in the actual
% data array.
col2 = sort(keyData(:,2))
flag = isequal(col2,2005:2019)
if(flag == 0)
data(data==keys(i),:) = []
end
end
I Hope this helps.

7 Comments

I get the following error:
The logical indices in position 1 contain a true value outside of the array bounds.
Error in CalcMoments (line 7)
keyData = Data(Data==keys(i),:);
So I fixed it a bit:
keys = unique(Data(:,1));
for i=1:numel(keys)
keyData = Data(Data(:,1)==keys(i),:);
% from here you can compare with keyData length or compare the year
% array to the 2nd column and based on that modify values in the actual
% data array.
col2 = sort(keyData(:,2));
flag = isequal(col2,2005:2019);
if(flag == 0)
Data(Data(:,1)==keys(i),:) = [];
end
end
But I get an empty dataset in final.
Can you share an example dataset if possible. I'll try it out from my end and see if i missed anything.
Following are a few lines:
1004 2005 978.819 2.289 91.877
1004 2006 1067.633 74.245 118.907
1004 2007 1362.01 22.994 168.522
1004 2008 1377.511 63.6 166.08
1004 2009 1501.042 100.833 134.345
1004 2010 1703.727 114.075 196.312
1004 2011 2195.653 122.865 222.693
1004 2012 2136.9 86.4 245.2
1004 2013 2199.5 69.7 256
1004 2014 1515 69 83.7
1004 2015 1442.1 12 136.9
1004 2016 1504.1 2 148.2
1004 2017 1524.7 0 126.5
1004 2018 1517.2 0 153.5
1013 2005 1535 0.3 159
1013 2006 1611.4 0 140.4
1013 2007 1764.8 200.6 167.9
1013 2008 1921 2.6 173.3
1013 2009 1343.6 0.6 84.7
1013 2010 1474.5 0.3 121.4
1019 2007 34.18 1.615 5.616
1019 2008 33.486 1.628 7.34
1019 2009 32.678 0.243 5.956
1019 2010 29.546 1.358 4.042
1019 2011 30.349 2.374 4.576
1019 2012 35.158 4.791 6.139
1019 2013 35.812 1.32 5.335
1019 2014 36.893 4.117 5.071
1019 2015 32.707 1.289 3.317
1019 2016 32.817 1.336 2.876
1019 2017 30.485 1.336 2.471
1019 2018 34.031 3.005 3.657
1019 2019 33.355 1.833 5.226
1021 2005 8.153 0.675 1.856
1021 2006 14.341 0 1.252
1021 2007 27.171 0.537 -0.432
1021 2008 21.401 8.578 -2.721
1034 2005 1623.383 416.669 163.101
1034 2006 927.239 0 138.914
1034 2007 1288.165 11.032 40.816
I missed the transpose of col 2 while comparing.
flag = isequal(col2',2005:2019)
This code should work as expected now.
A follow up, if i want to filter rows where column 7 begins with 31, how would I do that?

Sign in to comment.

More Answers (1)

idx = FYEAR >= 2007 & FYEAR <= 2009 ;
iwant_FYEAR = FYEAR(idx) ;
iwant_GVKEYS = GVKEYS(idx) ;

Tags

Asked:

on 25 Jul 2022

Answered:

on 25 Jul 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!