Creating dummy variables from categorical variable

4 views (last 30 days)
Suppose there is a column vector array n containing unique but repeating values of the form
1
1
5
7
7
The aim is to create a matrix D which contains in its columns dummy variables for each unique value in n of the form
1 0 0
1 0 0
0 1 0
0 0 1
0 0 1
I use the following code:
uniq = unique(n);
N_obs = size(n,1);
N_ind = size(uniq,1);
D = NaN(N_obs,N_ind);
D(:,1) = n == uniq(1,1);
D(:,2) = n == uniq(2,1);
D(:,3) = n == uniq(3,1);
This produces the desired D matrix. However, it is tedious to write the last three lines so I wanted to use a for loop of the form
for i = N_ind
D(:,i) = n == uniq(i,1);
end
But this gives
NaN NaN 0
NaN NaN 0
NaN NaN 0
NaN NaN 1
NaN NaN 1
Where is my mistake in the loop?

Accepted Answer

Walter Roberson
Walter Roberson on 23 Sep 2017
for i = 1 : N_ind

More Answers (3)

Stephen23
Stephen23 on 24 Sep 2017
Vectorized:
>> vec = [1,1,5,7,7];
>> [~,~,idc] = unique(vec);
>> siz = [numel(idc),max(idc)];
>> out = zeros(siz);
>> out(sub2ind(siz,1:siz(1),idc)) = 1
out =
1 0 0
1 0 0
0 1 0
0 0 1
0 0 1
>>

Guillaume
Guillaume on 24 Sep 2017
Another shorter vectorised option (possibly no faster than Stephen's, I haven't tested):
[~, ~, col] = unique(vec);
out = accumarray([find(col), col], 1) %only purpose of the find is to generate (1:numel(col))'

Snoopy
Snoopy on 24 Sep 2017
I would like to ask a question here. One should always vectorise the code where possible, as I read in the literature. But in this simple example, it seems the for loop is a little less complicated when compared to the vectorised code examples. Would you agree? Or would you think that this depends on how well one knows about MATLAB built-in functions such as accumarray, find, etc.?

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!