please help me in loading data to matlab
4 views (last 30 days)
Show older comments
Hi all,
I have some problems in loading file Adult.data into MATLAB. When I try:
>> load adult.data
it displays:
??? Error using ==> load Unknown text on line number 1 of ASCII file C:\Users\Documents\MATLAB\adult.data "Self-emp-not-inc".
A line in the file:
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
I don't know why. I have tried fopen and scan but it is still impossible. Please help me. Thank you so much.
0 Comments
Accepted Answer
Matt Tearle
on 22 Feb 2011
Similar to what Andrew said, but I'd go with
fid = fopen('Adult.data');
A = textscan(fid,'%f%s%f%s%f%s%s%s%s%s%f%f%f%s%s,'delimiter',',');
fclose(fid);
It looks uglier, but it will import the 15 columns as separate cells, and the numeric values will be imported as numeric arrays.
But, that said, if you're doing any kind of statistical analysis on this kind of data, you probably want (and/or may already have) Statistics Toolbox. In which case, use dataset to import this data directly into a dataset array. This will make your life easier. In particular, you can use nominal to turn things like "bachelors", "white", and "male" into nominal arrays
1 Comment
More Answers (6)
Andrew Newell
on 22 Feb 2011
fid = fopen('Adult.data');
A = textscan(fid,'%s','delimiter',',');
fclose(fid)
This reads the data into a cell containing a one-dimensional cell array. The next command will change it to a format that is probably more useful to you (there are 15 fields on each line):
A = reshape(A{:},15,[]);
0 Comments
Matt Tearle
on 23 Feb 2011
BTW, best practice is to accept the answer that solved your initial question, then start a new question for the follow-up. That way, others can follow what's happening (for the benefit of others who might have similar questions). Anyway...
It depends, are you using Stats TB? If not, the base MATLAB way is
nnz((age > 50) & strcmpi(sex,'male'))
I'm assuming age is a numeric array, and sex is a cell array of strings. The > and strcmpi (case-insensitive string comparison) both create logical variables, which are combined using &. Applying the nnz function returns the number of true values.
If you have Stats TB and have the data in a dataset array, with sex as a nominal variable,
nnz((data.age > 50) & (data.sex == 'male'))
0 Comments
Matt Tearle
on 23 Feb 2011
Would you be surprised if I suggested that what you really need is Statistics Toolbox?
But, the brute-force way in MATLAB could be done something like this:
clist = unique(country);
Nctry = length(clist);
num_richbuggers = zeros(Nctry,1);
for k = 1:Nctry
num_richbuggers(k) = nnz(strcmpi(assets,'>50K') & ...
strcmpi(country,clist{k}));
end
0 Comments
See Also
Categories
Find more on Classification in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!