Code covered by the BSD License  

Highlights from
Create Unique Interaction Variables

5.0

5.0 | 1 rating Rate this file 11 Downloads (last 30 days) File Size: 6.81 KB File ID: #35981

Create Unique Interaction Variables

by

 

03 Apr 2012 (Updated )

Efficiently adds N-way, custom-named, *unique* interaction variables to your dataset.

| Watch this File

File Information
Description

function data_set = create_interaction_variables(data_set,vars,range_nways,separator,max_varname_length)
% create_interaction_variables
%
% for Matlab R13+
% version 1.1 (April 2012)
% (c) Brian Weidenbaum
% website: http://www.BrianWeidenbaum.com/.
%
%
% OUTPUT: your dataset (or a dataset based on your matrix), updated with new, aptly-named *unique* interaction variables,
% ranging from at least 2 to any number the user specifies
%
% INPUTS (** = OPTIONAL)
% input name: (input datatype/s) -- description
%
% data_set: (dataset OR matrix) -- the data you want to alter
%
% **vars: (cell array of chars/numbers, OR vector of numbers, OR 'ALL') --
% default: 'ALL'
% the names of the variables you want to interact; alternatively, the column numbers of the variables you want to interact
% OR, you can just say 'ALL' to include all variables automatically
%
% **range_nways (vector OR 'MAX') --
% default: 2
% the range of numbers of variables to include in the interaction terms generated by this function
% alternatively, just type 'MAX' to use 2 variables to the maximum possible number of vars
%
% **separator (char) --
% default: '_'
% the separator string you want to use to divide the
% variable names that contributed to a new interaction variable. Default
% is '_'. E.g., by default, an interaction between Var1 and Var2 will be
% named 'Var1_Var2'.
%
% **max_varname_length (number) --
% default: 63
% the maximum length of the newly created interaction terms' variable names.
% Any dynamically generated variable names (e.g. 'Var1_Var2') that exceed this number will be excluded from the new dataset.
% You should set max_varname_length according to the database you plan to use with your data--
% e.g., if you only want to use this data in MATLAB, you should set max_varname_length=63 (the maximum length supported by the dataset class,
% but if you plan to export your data to Oracle, you should set it to around 30
%
%
%
% EXAMPLES
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, up to 3 ways, for all of your variables.
% You type:
% new_dataset = create_interaction_variables(old_dataset,'all','max');
% new_dataset will contain:
% a.*b, named 'a_b'
% a.*c, named 'a_c'
% b.*c, named 'b_c'
% and a.*b.*c, named 'a_b_c',
% plus all original variables.
% It will NOT contain b.*a, c.*a, etc because these are not unique combos.
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, up to 2 ways, only for columns 2 and 3.
% You type:
% new_dataset = create_interaction_variables(old_dataset,[2 3],2);
% new_dataset will contain only b.*c,
% plus all original variables.
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, up to 2 ways, only for 'a' and 'c'.
% You type:
% new_dataset = create_interaction_variables(old_dataset,{'a','c'},2);
% new_dataset will contain only a.*c,
% plus all original variables.
%
% You have a dataset with 3 variables: a, b, and c.
% You want to create interaction terms, ONLY 3 ways, only for all vars.
% You type:
% new_dataset = create_interaction_variables(old_dataset,'all',3);
% new_dataset will contain only a .* b .* c,
% plus all original variables.
%

Acknowledgements

Allcomb inspired this file.

MATLAB release MATLAB 7.13 (R2011b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (1)
15 May 2012 sam

Very useful, thank you! Besides interaction terms, also 'a^2' variables can be created!

Updates
04 Apr 2012

Changes between 1.0 and 1.1:
-added 'separator' parameter
-changed 'max_nways' parameter to 'range_nways'
-enforced maximum max_varname_length -reduced min number of arguments to 1
-minor performance tweaks

Contact us