Code covered by the BSD License  

Highlights from
datato1ofm.m

Be the first to rate this file! 5 Downloads (last 30 days) File Size: 2 KB File ID: #26368

datato1ofm.m

by Amos Storkey

 

13 Jan 2010

Convert categorical data into 1ofM encoding

| Watch this File

File Information
Description

Take categorical data matrix and transform whole matrix to binary sparse 1ofM matrix, keeping track of what came from where. Ideal for any form of count-based probabilistic analysis.

Typically used in a chain following loadcell.m and celltonumeric.m

datato1ofm - recast data in 1 of M format, maintaining multinomial info.

function [newdata, attrmap] = datato1ofm( data );

DATA is the complete dataset. It is presumed that all the possible states are represented in the dataset. If not the data should be augmented with dummy data so that this is the case. Each column of DATA corresponds to a different attribute, and each row a different data item. DATA must be numeric.

NEWDATA is a sparse real-binary 1 of M dataset. All attributes are one of M encoded, including previous binary attributes. The split of these previously binary attributes can be removed trivially: see below.

ATTRMAP gives the attribute mapping information. ATTRMAP(1,k) gives the original atribute number for the kth new attribute. ATTRMAP(2,k) gives the value of the original attribute indicated by the kth new attribute. ATTRMAP(3,k) indicates how many elements the kth new attribute is one of.

To remove 1 of M encoding for previously binary attributes use

ii = find(~(attrmap(2,:)==1 & attrmap(3,:)==2));
newdata = newdata(:,ii); attrmap = attrmap(:,ii);

To compute multinomial probabilities (simply but inefficiently) use

normmatrix = sparse([1:size(attrmap,2)],attrmap(1,:),1);
normmatrix = normmatrix*normmatrix';
probs = mean(newdata)./(mean(newdata)*normmatrix);

See loadcell, celltonumeric

MATLAB release MATLAB 7.6 (R2008a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
data import Amos Storkey 13 Jan 2010 12:26:24
data exploration Amos Storkey 13 Jan 2010 12:26:24
machine learning Amos Storkey 13 Jan 2010 12:26:24
naive bayes Amos Storkey 13 Jan 2010 12:26:24
1 of m Amos Storkey 13 Jan 2010 12:26:24
mutlinomial Amos Storkey 13 Jan 2010 12:26:24
data formatting Amos Storkey 13 Jan 2010 12:26:24
naive bayes Roberto 20 Apr 2010 11:04:42

Contact us at files@mathworks.com