Recode Categorical Variable into New Binary Variables

Adds N-1 binary (effect or dummy coded) variables based on a categorical variable to your dataset.
287 Downloads
Updated 5 Apr 2012

View License

% OUTPUT
%Returns your dataset with N-1 binary variables recoded from a
%categorical var that has N categories.
%The Nth category isn't included as a distinct variable (will be represented by all 0's or all -1's, depending on recoding_type)
%So list the least important category last in the category_values parameter
%Also (optionally) drops the original categorical var.
%
% Works best when the variable contains chars or numeric/logical values
%
%
% INPUTS (** = optional)
% If you plan to omit an optional param, you must also omit all the params that follow it
%
% 1) dataset - (dataset) the actual dataset variable
%
% 2) variable_name - (char or number) name/column number of a categorical variable in dataset
%
% 3) category_values** - (cell vector of chars/numbers/logicals, or a vector
% of numbers/logicals) names of cats in the var (default=unique(dataset.variable))
% MUST ONLY INCLUDE LEGAL CHARACTERS FOR NAMING CONVENTIONS
% e.g. no '?' or '!' or '%' involved in any category values
%
% 4) recoding_type** - (char) 'dummy' or 'effect' (case insensitive).
% Dummy creates 0,1 variables, with all 0's representing the Nth category,
% Effect creates -1,1 variables, with all -1's representing the Nth category
% default = 'dummy'
%
% 5) drop_original** - (logical) whether to drop the original un-recoded variable from the dataset (default=false)
%
% 6) separator** - (char) the char string to put inside the new varname,
% between the name of the original variable and the category value
% default = '_'
% e.g., by default, a dummy variable that represents the category 'T' in
% Var1 will be named 'Var1_T'
% MUST ONLY INCLUDE LEGAL CHARACTERS FOR NAMING CONVENTIONS
% e.g. no '?' or '!' or '%' involved in the separator
%
%
% EXAMPLES
%if dataset Exam_Data's variable c1 has 2 cats: 'a' and 'b',
%you would type:
% Exam_Data = categorical2bins(Exam_Data,'c1',{'a','b'});
%and the function would do the following:
% Exam_Data.c1_a = zeros(nrows,1);
% Exam_Data.c1_a(strcmp(Exam_Data.c1,'a'))=1;
%
% %if 3 cats: a b and c (listed in that order in category_values)
%you would type:
% Exam_Data = categorical2bins(Exam_Data,'c1',{'a','b','c'});
%and the function would do the following:
% Exam_Data.c1_a= zeros(nrows,1);
% Exam_Data.c1_a(strcmp(Exam_Data.c1,'a'))=1;
% Exam_Data.c1_b= zeros(nrows,1);
% Exam_Data.c1_b(strcmp(Exam_Data.c1,'b'))=1;
%
% Also works for numbers
% If 3 cats: 1,2, and 3 (doubles)...
%you would type:
% Exam_Data = categorical2bins(Exam_Data,'c1',{1,2,3});
% OR
% Exam_Data = categorical2bins(Exam_Data,'c1',[1,2,3]);
%and the function would do the following:
% Exam_Data.c1_1= zeros(nrows,1);
% Exam_Data.c1_1(Exam_Data.c1==1)=1;
% Exam_Data.c1_2= zeros(nrows,1);
% Exam_Data.c1_2(Exam_Data.c1==2)=1;
%

Cite As

Brian Weidenbaum (2024). Recode Categorical Variable into New Binary Variables (https://www.mathworks.com/matlabcentral/fileexchange/36042-recode-categorical-variable-into-new-binary-variables), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2011b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Categories
Find more on Categorical Arrays in Help Center and MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
1.0.0.0