Problem with using readtable:
Show older comments
Here is the code:
opts = detectImportOptions("C:\Users\Onat\Desktop\392\vocab.txt");
opts.VariableTypes=["string", "double"];
opts.LineEnding = ["\n"];
vocab = readtable('C:\Users\Onat\Desktop\392\vocab.txt',opts);
I'am working on an NLP application in which I need the vocabulary and frequency of those words in vocabulary. Naturally, the corpus contains tokens such as single apostrophe. It seems that this is a major problem for MATLAB since it detects it as a special char. Notice that in the output given below, after apostrophe the frequencies are seen as comments to MATLAB. Can anyone help with this issue?
vocab.txt is as follows:
..... (i.e. this is not the beginning)
and 699333
in 603607
" 538122
to 504540
a 476836
was 304423
...... (i.e continues)
the output is as follows:
...... (i.e. this is not the beginning)
"and" 6.9933e+05
"in" 6.0361e+05
" 538122↵to 504540↵a 476836↵was 304423↵The 246510↵- 229901↵is 225721↵for 198733↵)
Accepted Answer
More Answers (1)
I would take a different approach. I would use readlines and string manipulation to create the table.
str = readlines("vocab.txt")
T = array2table(split(str),'VariableNames',["Word","Freq"]);
T.Freq = str2double(T.Freq)
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!