How to create an N*1 matrix for n individual fixed effects under unbalanced panel data?

2 views (last 30 days)
I have a panel data set for individual i=1,2,...,n. The panel is unbalanced, so individual i shows up in the data Ti times, leading to a total of N=T1+T2+...+Tn observations. I also have a n*1 matrix of individual fixed effects, i.e. A=[theta1, theta2, ..., theta_n]'. Then, I want to create a N*1 matrix B of individual fixed effects that fit into the original panel.
For example, if T1=3, T2=1, T3=2,..., my goal is to create B=[theta1, theta1, theta1, theta2, theta3, theta3, ...]'. I can create B using a loop, but n is too big to rely on coding with a loop. Is there any efficient vectorization way to overcome this hurdle?
Thank your very much for your help.
  1 Comment
Image Analyst
Image Analyst on 29 Jun 2012
An actual numerical example with actual arrays would help explain.
How big is n? I just ran through a loop with one billion (yes, billion with a b and nine zeros) iterations on my computer and it took only 3.7 seconds. Do you have more than that?

Sign in to comment.

Accepted Answer

Image Analyst
Image Analyst on 29 Jun 2012
Edited: Image Analyst on 29 Jun 2012
I whipped up this code. I hope it does what you were thinking:
tic
n = 10000000; % Number of individuals (persons).
% Generate a random number of observations for each of the n persons.
% Each person may have up to 5 observations (measurements).
Ti = randi(5, [1 n]);
% Let's print out sum T so we know how big B needs to be.
T = sum(Ti);
% Generate the A matrix.
% Let's just have it be 10 through 10*n in steps of 10
A = 10 : 10 : 10*n;
% Make up the B matrix where each element of A
% is replicated Ti times.
% Preallocate B
B = zeros(1, T);
% Brute force loop
index = 1; % Let's start at the beginning.
for k = 1 : n
B(index:index + Ti(k) -1) = A(k); % Vectorized assignment
index = index + Ti(k);
end
B; % Display B
toc
Here's what it does for an n of 10:
Ti =
3 1 1 2 5 4 1 1 1 4
T =
23
B =
Columns 1 through 14
10 10 10 20 30 40 40 50 50 50 50 50 60 60
Columns 15 through 23
60 60 70 80 90 100 100 100 100
Elapsed time is 0.001806 seconds.
When n was 10 million, it took 10.6 seconds.
  3 Comments
Zhixiao
Zhixiao on 10 Dec 2014
In your example, you actually know the number of observations for each individual. However, how can we declare panel data in MatLab without knowing the exact number of observations for each individual? Since, most of time, we only have given index for the firms, like firm1, firm2, firm3, also year index, like 1999, 2000, 2001... how can we make sure that our panel is organised correctly from older year to newer year? thx
Image Analyst
Image Analyst on 10 Dec 2014
Not sure what you're asking. Are you asking how to sort your data? You have to know the exact number of data points/observations. Everything that is a variable stored in MATLAB has a size, and you can determine this size with things like the size() function, or the isempty() function, for example. You can't have something that's unknown. It might be unknown in advance but once you get it into MATLAB, it's known.

Sign in to comment.

More Answers (0)

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!