Thread Subject:
How to create dummy variables from categorical data?

Subject: How to create dummy variables from categorical data?

From: Economist

Date: 21 Feb, 2011 15:39:25

Message: 1 of 7

I have a fairly large dataset with a variable that represents host country, e.g. “Finland”. How do I create dummy variables from this country variable, i.e. one dummy for each country? I plan to use the dummies in survival analysis.

Subject: How to create dummy variables from categorical data?

From: someone

Date: 21 Feb, 2011 15:48:04

Message: 2 of 7

"Economist" wrote in message <iju0vd$6fj$1@fred.mathworks.com>...
> I have a fairly large dataset with a variable that represents host country, e.g. “Finland”. How do I create dummy variables from this country variable, i.e. one dummy for each country? I plan to use the dummies in survival analysis.

% perhaps use a structure?

doc struct

Subject: How to create dummy variables from categorical data?

From: ImageAnalyst

Date: 21 Feb, 2011 15:54:03

Message: 3 of 7

On Feb 21, 10:39 am, "Economist "
<starfaic...@dunflimblag.mailexpire.com> wrote:
> I have a fairly large dataset with a variable that represents host country, e.g. “Finland”. How do I create dummy variables from this country variable, i.e. one dummy for each country? I plan to use the dummies in survival analysis.

-------------------------------------------------------------------------
Not sure I understand. What data type is this variable? Is it a
structure, a string, a numerical array, a class?

Maybe the variable is logical array of 220 elements (or however many
countries there are), and it's true if you have data for that
country. So like maybe logicalCountryList(23) = true if Finland is
country #23.

Maybe it's a structure like
countryInfo.CountryName = 'Finland"
Then you can just add other members dynamically. Like let's say you
want population and GDP. Just assign it
countryInfo.Population = 5300000;
countryInfo.GDP = 183e9;

Generally it's possible to just create variables in MATLAB
immediately, when you want to use them - no declaration or
preallocation is necessary. I guess we'd need more explicit
description of your situation.

Subject: How to create dummy variables from categorical data?

From: Economist

Date: 21 Feb, 2011 16:08:04

Message: 4 of 7

The data consists of strings, i.e. the values for the 10 first observations are:

Albania
Albania
Albania
Argentina
Argentina
Argentina
Argentina
Argentina
Argentina
Argentina

In total there are 88 different countries, from which there are observations. Thus, I would like to have 88-1=87 dummy variables.

Subject: How to create dummy variables from categorical data?

From: ImageAnalyst

Date: 21 Feb, 2011 16:13:01

Message: 5 of 7

On Feb 21, 11:08 am, "Economist "
<starfaic...@dunflimblag.mailexpire.com> wrote:
> The data consists of strings, i.e. the values for the 10 first observations are:
>
> Albania
> Albania
> Albania
> Argentina
> Argentina
> Argentina
> Argentina
> Argentina
> Argentina
> Argentina
>
> In total there are 88 different countries, from which there are observations. Thus, I would like to have 88-1=87 dummy variables.

-----------------------------------------------------------
% Preallocate dummy variable
myDummyVariable = zeros(88);
% Start using it, let's say Finland is the 23rd country.
myDummyVariable = 42; % Just assign something to the element
belonging to Finland.

Subject: How to create dummy variables from categorical data?

From: Richard Startz

Date: 21 Feb, 2011 21:22:31

Message: 6 of 7

On Mon, 21 Feb 2011 16:08:04 +0000 (UTC), "Economist "
<starfaichoo@dunflimblag.mailexpire.com> wrote:

>The data consists of strings, i.e. the values for the 10 first observations are:
>
>Albania
>Albania
>Albania
>Argentina
>Argentina
>Argentina
>Argentina
>Argentina
>Argentina
>Argentina
>
>In total there are 88 different countries, from which there are observations. Thus, I would like to have 88-1=87 dummy variables.

You need to assign a number to each country. This is one approach,
although not terribly elegant.
dummyMatrix = zeros(length(data),88);
for i=1:length(data)
switch countryName(i)
 case 'Albania'
   countryNumber = 1
 case 'Argentina'
   countryNumber = 2
end
dummyMatrix(i,countryNumber)= 1;
end

Subject: How to create dummy variables from categorical data?

From: Peter Perkins

Date: 22 Feb, 2011 03:05:24

Message: 7 of 7

On 2/21/2011 10:39 AM, Economist wrote:
> I have a fairly large dataset with a variable that represents host
> country, e.g. “Finland”. How do I create dummy variables
> from this country variable, i.e. one dummy for each country? I plan to
> use the dummies in survival analysis.

Have you looked at the DUMMYVAR function in the Statistics Toolbox?

Hope this helps.

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us