code with the same function
Info
This question is closed. Reopen it to edit or answer.
Show older comments
The code below gives me the right amount of how many times a letter repeats itself in a large text.txt.
I wanted another simple code, but that would do the same thing as this, in case it gave me the number of letters in a text (A = number of letters a, B = number of letters b and so on.)
if there is no simpler than this, accept another more complicated or the same level of difficulty.
fileread('mytextfile.txt')
data = fileread('mytextfile.txt');
nnz(data=='A')
nnz(ismember(data,'A'))
Answers (2)
Walter Roberson
on 3 Apr 2019
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A, accumarray(AA, 1)].')
8 Comments
Walter Roberson
on 3 Apr 2019
Note that I had already answered you on this matter at https://www.mathworks.com/matlabcentral/answers/453555-help-me-please-please?s_tid=prof_contriblnk#answer_368356
Gabriel Cunha
on 3 Apr 2019
Edited: per isakson
on 4 Apr 2019
Rik
on 3 Apr 2019
It is a bit easier to resolve the error in his previous answer:
%random test data instead of fileread:
%data=char(randi([64 65+25],1,40));data(data==64)=' ';
data = fileread('mytextfile.txt');
[a, ~, aa] = find(accumarray(reshape(double(data),[],1), 1));
fprintf('%c = %d\n', [a(:).'; aa(:).']);
Walter Roberson
on 3 Apr 2019
Edited: Walter Roberson
on 4 Apr 2019
fprintf('%c = %d\n', [0+A(:), accumarray(AA, 1)].')
Rik
on 4 Apr 2019
Curiously, this doesn't seem to work for documents as large as a Bible translation (which seems to be the goal). I have attached a public domain translation for testing. Notice the difference between the two methods for lower case common letters. The accumarray seems to cap out at 65535.
data=fileread('WEB.txt');
clc
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A(:), accumarray(AA, 1)].')
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Walter Roberson
on 4 Apr 2019
double(char_list).'
Otherwise the char data type has priority over numeric in determining the data type of the concatenation.
Rik
on 4 Apr 2019
Despite of its name, char_list is already a double. I didn't notice your last edit with 0+A(:), so that is why that method is capped (as chars are capped to 16 bit).
Walter Roberson
on 4 Apr 2019
I did the 0+ after you (correctly) mentioned about the 65535.
There are two easy options: a loop and a histogram:
%for loop method:
data = fileread('mytextfile.txt');
letters='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
counts=zeros(1,numel(letters));
for n=1:numel(letters)
counts(n)=nnz(data==letters(n));
end
%histogram method:
data = fileread('mytextfile.txt');
counts=histc(data,65:(65+25));
4 Comments
Gabriel Cunha
on 4 Apr 2019
Rik
on 4 Apr 2019
Those are the ASCII value of A and the number letters in the alphabet (minus 1). But you should probably be using something like this:
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Gabriel Cunha
on 4 Apr 2019
Rik
on 4 Apr 2019
The edited for-loop method should be a bit easier to understand.
This question is closed.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!