How i can load and using file with type .data for dataset for training and testing of Neural network?

Hi all.
I want to make project for letter recognition data using neural network. I found this dataset: https://archive.ics.uci.edu/ml/datasets/Letter+Recognition but, i don't know how to load and using first 16000 items for training and the remaining 4000 for testing of Neural network from this .data file.

1 Comment

BEFORE GETTING INVOLVED WITH LARGE EXTERNAL SOURCES OF DATA, FAMILIARIZE YOURSELF WITH PATTERNNET
HELP PATTERNNET
DOC PATTERNNET
AND MATLAB CLASSIFICATION DATA EXAMPLES
HELP NNDATASETS
DOC NNDATASETS
HTH, GREG

Sign in to comment.

 Accepted Answer

fid = fopen('TheDataset.data', 'rt');
num_attrib = 16;
fmt = ['%s', repmat('%f', 1, num_attrib)];
datacell = textscan(fid, fmt, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
which_letter = datacell{1};
attribs = datacell{2};
target_codes = which_letter - 'A' + 1;
Then one way of dividing the data would be
train_set = attribs(1:end-4000, :);
train_targets = target_codes(1:end-4000);
test_set = attribs(end-3999:end, :);
test_targets = target_codes(end-3999:end);
This is probably not what you would use in practice in the Neural Network Toolbox: you would normally program it in terms of parameters; see http://www.mathworks.com/help/nnet/ug/divide-data-for-optimal-neural-network-training.html

5 Comments

Realy thank you for answer, but i have 2 questions. The first is : what is the idea of target_codes = which_letter - 'A' + 1; and the second is: when i run this code:
clear all;
fid = fopen('letter-recognition.data', 'rt');
num_attrib = 16;
fmt = ['%s', repmat('%f', 1, num_attrib)];
datacell = textscan(fid, fmt, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
which_letter = datacell{1};
attribs = datacell{2};
target_codes = which_letter - 'A' + 1;
train_set = attribs(1:end-4000, :);
train_targets = target_codes(1:end-4000);
net=feedforwardnet(20);
net.trainFcn = 'traingd';
net.trainparam.epochs = 1500;
net = train(net,train_set,train_targets)
have a error of row
target_codes = which_letter - 'A' + 1;.
The error is ''Undefined function 'minus' for input arguments of type 'cell' ''. What needs to be done to fix error?
Change
which_letter = datacell{1};
to
which_letter = char(datacell{1});
With regards to target_codes = which_letter - 'A' + 1 : the first column of input in the file are the letters 'A', 'B', 'C', ... 'Z' . You are trying to use the rest of the values on each line as attributes to develop a way to classify inputs as belonging to one of those 'A', 'B', 'C', ... 'Z' . But the neural network toolbox does not accept character strings as being the target to classify against: the neural network toolbox needs a numeric group number to classify against. So classify the data into group #1, group #2, group #3, ... group #26. We thus need to transform the letters 'A' to 'Z' into the corresponding group number 1, 2, 3, ... 26. We can do that by taking the input letter, subtracting 'A' to get the relative offset from the beginning of the alphabet, 0, 1, 2, ... 25, and then adding 1 to get the group number 1, 2, 3, ... 26. For example, 'J'-'A'+1 is character code 74 minus character code 65 and add 1, 74-65+1 = 10, corresponding to the fact that 'J' is the 10th letter . Every line that begins with 'J' belongs to the 10th group.
Thank you for your help and examples, really helped me understand the idea. After the error is corrected in row
which_letter = datacell{1};
to
which_letter = char(datacell{1});
now have error
net = train(net,train_set,train_targets)
''Inputs and targets have different numbers of samples.'' . How can be fixed ?
You might need to transpose train_set . I have a hard time keeping straight whether train() wants the data for any one sample to run across the rows or down the columns.
I transpose train_set and train_targets and training started. Нow I have learn a neural network type multilayer perceptron with one hidden layer and algorithm for training: back propagation of the error.
Really thank you very much for your attention and help.

Sign in to comment.

More Answers (2)

Hello again! It turned out that I was wrong when I thought that everything was fine. The problem is that when using this code:
clear all;
fid = fopen('letter-recognition.data', 'rt');
num_attrib = 16;
fmt = ['%s', repmat('%f', 1, num_attrib)];
datacell = textscan(fid, fmt, 'Delimiter', ',', 'CollectOutput', 1);
fclose(fid);
which_letter = char(datacell{1});
attribs = datacell{2};
target_codes = which_letter - 'A' + 1;
train_set = attribs(1:end-4000, :);
train_targets = target_codes(1:end-4000);
tr_train_set = train_set.';
tr_train_targets = train_targets.';
net=patternnet(30,'traingd');
net.trainparam.epochs = 800;
net = train(net,tr_train_set,tr_train_targets)
i have 16 inputs and 1 outputs, but I need 26 (26 letters).I think the problem is coming from :
tr_train_set = train_set.';
tr_train_targets = train_targets.';
but if i don't transpose, have the error: ''Inputs and targets have different numbers of samples.''.
How can be fixed this problem, because when i check 'mse' is 10^2 ++ ?

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!