I want to get the Unicode for a character. Could you please help me. Which encoding type should I need to choose? UTF8 or Unicode???
Show older comments
I want to convert text to speech for Malayalm language(native language of Kerala). First I need the Unicode value of each letter. I saved the characters in text file . What is the commant to read a text file to get the Unicode.
Answers (1)
Walter Roberson
on 3 Mar 2018
0 votes
The available characters are listed at https://en.wikipedia.org/wiki/Malayalam_(Unicode_block) . They start at U+0D00 which is char(3328) for the first entry.
The way to read the file to get the unicode depends upon exactly how the file was stored. Sometimes the method can be quite simple, but until you know which encoding is being used you have to be more careful. See https://www.mathworks.com/matlabcentral/answers/267176-read-and-seperate-csv-data#answer_209938 for some code of mine that figures out how a documented has been encoded.
8 Comments
Walter Roberson
on 3 Mar 2018
You changed the title of your posting to ask the additional question,
"Which encoding type should I need to choose? UTF8 or Unicode???"
UTF-8 is one of the ways of representing Unicode. It is probably the most common way of representing Unicode.
Unicode Code Points in the range you need, U+0Dxx, require 3 bytes each to represent in UTF-8. If you had a long document, if you were to instead use UTF16-LE or UTF16-BE then they would only require 2 bytes each, but fewer applications expect UTF16 so UTF-8 is sometimes more convenient.
Walter Roberson
on 3 Mar 2018
"I saved a Malayalam phoneme in a text file with utf8. I want to get the Unicode of that letter. Could you please send me the code to fetch the file and get the corresponding Unicode value"
Use fileread() to read the file. If you assign the result to the variable S, then the unicode code point corresponding to each character is double(S). Just be careful about the fact that Unicode charts are organized in Hexadecimal rather than in decimal. If you want to see the hex version of the Unicode code point numbers, then you would use dec2hex(S, 4)
Walter Roberson
on 6 Mar 2018
Please attach a sample text file.
Neethu K
on 7 Mar 2018
Walter Roberson
on 7 Mar 2018
fid = fopen('a.txt', 'r', 'n', 'UTF8');
S = fread(fid,'*char', [1 inf]);
fclose(fid);
if S(1) == 65279; S(1) = ''; end %UTF8 Byte Order Mark
Neethu K
on 8 Mar 2018
Walter Roberson
on 8 Mar 2018
audiodir = 'C:\Users\NeeK\Documents\MATLAB\EE403\Final Project\malayalm\wav'; %adjust as appropriate
[filename, pathname] = uigetfile('*.txt', 'Choose a text file');
if ~ischar(filename)
fprintf('Cancel!\n')
return; %user cancel
end
fullname = fullfile(pathname, filename);
[fid, msg] = fopen(fullname, 'r', 'n', 'UTF8');
if fid < 0
error('Failed to open file "%s" because "%s"', fullname, msg);
end
S = fread(fid, '*char', [1 inf]);
fclose(fid);
if isempty(S)
fprintf('Text file "%s" is empty!\n', fullname);
return
end
if S(1) == 65279; S(1) = ''; end
audio_data = [];
fs = 1;
for thischar = S
basename = sprintf('%04x.wav', thischar);
this_filename = fullfile(audiodir, basename);
if ~exist(this_filename, 'file')
fprintf('audio file "%s" not found, skipping character "%c"\n', basename, thischar);
else
[thissound, fs] = audioread(this_filename);
if isempty(audio_data)
audio_data = thissound;
else
oldchan = size(audio_data, 2);
newchan = size(thissound, 2);
if newchan < oldchan
thissound(end,oldchan) = 0;
elseif oldchan < newchan
audio_data(end,newchan) = 0;
end
audio_data = [audio_data; thissound];
end
end
end
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!