How to write Hindi unicode characters from MATLAB into a file.

8 views (last 30 days)
Hi,
I am working on Hindi OCR. I have completed my recognition of characters . And now i need to write it back into file with help of MATLAB.
I have tried with native2unicode(str,encoding) function. But i don't know which encoding should i give as second parameter to this function if i have hindi to write hindi characters into a file.
Hoping to get a reply soon.
  2 Comments
Amith Kamath
Amith Kamath on 20 May 2013
encoding must be the empty string ('') or a name or alias for an encoding scheme. Some examples are 'UTF-8', 'latin1', 'US-ASCII', and 'Shift_JIS'. Unfortunately, none of these options include the Hindi script.
Abhay
Abhay on 21 May 2013
Thanx amith for commenting over this question. But this is precisely my problem. Anyway thanx.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 21 May 2013
Devanagari is coded from 0x0900 to 0x097F
Example ('आ' = decimal 2310 = 0x0906)
fid = fopen('testAA.txt', 'w);
fwrite(fid, unicode2native(char(hex2dec('0906')),'UTF-8'));
fclose(fid)
This would store the byte equivalent of
0xA4 0xE0 0x00 0x86
in the file, which is the IEEE-LE storage order for the byte sequence
0xE0 0xA4 0x86
which is the UTF-8 representation of 6 bytes into the 0x0900 range.
Note: when you get sufficiently far in to the 0x09XX range, then the 0xA4 will switch to 0xA5, so I advise using unicode2native instead of just writing out constant bytes followed by 0x80 plus the desired offset.
  16 Comments
ayushi
ayushi on 6 Jul 2016
Edited: ayushi on 6 Jul 2016
ok thank you sir actually i am using this code for characters recognition and want to take a image as input and if the characters match with the saved characters templates the file will display only those characters but the code is not resulting as i want it to work please suggest me where i have to make changes:
% PRINCIPAL PROGRAM
warning off %#ok<WNOFF>
% Clear all
clc, close all, clear all
% Read image LOAD AN IMAGE
[filename, pathname] = uigetfile('*','LOAD AN IMAGE');
imagen=imread(fullfile(pathname, filename));
% Show image
imshow(imagen);
title('ENTREE');
% Convert to gray scale
if size(imagen,3)==3 %RGB image
imagen=rgb2gray(imagen);
end
% Convert to BW
threshold = graythresh(imagen);
imagen =~im2bw(imagen,threshold);
% Remove all object containing fewer than 100 pixels
imagen = bwareaopen(imagen,100);
%Storage matrix word from image
word=[ ];
re=imagen;
%Make and open *.txt as file for write
b=find(filename=='.');
pathname = [pathname filename(:,1:b-1) '.TXT'] ;
[FileName, PathName]= uiputfile('*.txt', pathname);
save(fullfile(PathName, FileName));
fid = fopen(fullfile(PathName, FileName), 'wt');
fclose(fid);
% Load templates
load templates
global templates
% Load threshold
load seuil
global seuil
% Compute the number of letters in template file
num_letras=size(templates,2);
while 1
%Fcn 'lines' separate lines in text
[fl , re]=lines(re);
re2=fl;
%Uncomment line below to see lines one by one
%imshow(fl);pause(0.5)
%-----------------------------------------------------------------
while 1
%Fcn 'chars' separate characters in text
[fc, re2, inter]=chars(re2);
%Width of a letter
largeur=size(fc,2);
%ratio of the space between two characters over the width of a letter
rapport=inter/largeur;
%If this ratio exceeds the predefined threshold of when creating models
if rapport >= seuil
% We add a space
word=[word ' '];
end
% Resize letter (same size of template)
img_r=imresize(fc,[86 24]);
%Uncomment line below to see letters one by one
%imshow(img_r);pause(0.5)
%-------------------------------------------------------------------
% Call fcn to convert image to text
letter=read_letter_perso(img_r,num_letras);
% Letter concatenation
word=[word letter];
if isempty(re2) %if variable 're2' in Fcn 'chars' is empty
word=[word '\n']; %newline
break %breaks the loop
end
end
%When the sentences finish, breaks the loop
if isempty(re) %if variable 're' in Fcn 'lines' is empty
break %breaks the loop
end
end
fid = fopen('C:\Users\Omm\Desktop\New folder\testAA.txt', 'w');
fwrite(fid, unicode2native(word, 'UTF-8') );
fprintf(fid, '\r\n');
fclose(fid);
%Open '*.txt' file
winopen(fullfile(PathName, FileName))
for template creation:
% CREATE TEMPLATES Machine Learning
% CAUTION:
% We must open an image with models in a specific order:
% First the numbers '0 'to '9' and the letters MAJUSQULES from 'A' to 'Z'. And lowercase letters 'a' to 'z'
% Dim templates must be equal to [42, 24 * 62] = [42, 1488]
Clear all
clc, close all, clear all
% Read image LOAD AN IMAGE
[filename, pathname] = uigetfile('*','LOAD AN Image');
modele=imread(fullfile(pathname, filename));
% Convert to gray scale
if size(modele,3)==3 %RGB image
modele=rgb2gray(modele);
end
% Convert to BW
threshold = graythresh(modele);
modele =~im2bw(modele,threshold);
% Remove all object containing fewer than 100 pixels
modele = bwareaopen(modele,100);%Change the value if the dimension of templates are bad
re=modele;
%Storage matrix character from image
character=[];
%Storage matrix interchar from the space between two characters
interchar=[];
%Storage matrix from the width of a letter
largeur=[];
while 1
%Fcn 'lines' separate lines in text
[fl, re]=lines(re);
re2=fl;
%Uncomment line below to see lines one by one
%imshow(fl);pause(0.5)
while 1
%Fcn 'chars' separate characters in text
[fc, re2, inter]=chars(re2);
% witdth of letter concatenation
largeur=[largeur size(fc,2)];
% space between two characters concatenation
interchar=[interchar inter];
% Resize letter (size of template)
img_r=imresize(fc,[42 24]);
%Uncomment line below to see template letters one by one
imshow(img_r);pause(0.5)
%-------------------------------------------------------------------
% character concatenation
character=[character img_r];
if isempty(re2) %if variable 're2' is empty
break %breaks the loop
end
end
if isempty(re) %if variable 're' is empty
break %breaks the loop
end
end
%dividing the matrix templetes
templates=mat2cell(character,86,...
[24 24 24 24 24 24 24 24 24 24 ...% dimension of the templates must be equal to [42, 24 * 62] = [42, 1488]
24 24 24 24 24 24 24 24 24 24 ...% if it is too small reduce the threshold line 20 else increase it
24 24 24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24 24 24 24 24 ...
24 24 24 24 24 24]);
%threshold= the average of the space between two characters over the
%maximum of witdth of letter
seuil=mean(interchar)/max(largeur);
% save threshold and templetes
save('seuil','seuil')
save ('templates','templates')
disp('Les modeles sont creer')%the templetes are created
%clear all
clear all;
read letters:
function letter=read_letter(imagn,num_letras)
% Computes the correlation between template and input image
% and its output is a string containing the letter.
% Size of 'imagn' must be 42 x 24 pixels
% Example:
% imagn=imread('D.bmp');
% letter=read_letter(imagn)
global templates
comp=[ ];
for n=1:num_letras
sem=corr2(templates{1,n},imagn);
comp=[comp sem];
end
vd=find(comp==max(comp));
%we start with the digits
if vd==1
letter='0';
elseif vd==2
letter='1';
elseif vd==3
letter='2';
elseif vd==4
letter='3';
elseif vd==5
letter='4';
elseif vd==6
letter='5';
elseif vd==7
letter='6';
elseif vd==8
letter='7';
elseif vd==9
letter='8';
elseif vd==10
letter='9';
%We move on with CAPITAL LETTERS
elseif vd==11
letter='A';
elseif vd==12
letter='B';
elseif vd==13
letter='C';
elseif vd==14
letter='D';
elseif vd==15
letter='E';
elseif vd==16
letter='F';
elseif vd==17
letter='G';
elseif vd==18
letter='H';
elseif vd==19
letter='I';
elseif vd==20
letter='J';
elseif vd==21
letter='K';
elseif vd==22
letter='L';
elseif vd==23
letter='M';
elseif vd==24
letter='N';
elseif vd==25
letter='O';
elseif vd==26
letter='P';
elseif vd==27
letter='Q';
elseif vd==28
letter='R';
elseif vd==29
letter='S';
elseif vd==30
letter='T';
elseif vd==31
letter='U';
elseif vd==32
letter='V';
elseif vd==33
letter='W';
elseif vd==34
letter='X';
elseif vd==35
letter='Y';
elseif vd==36
letter='Z';
%We end whith the lowercase letters
elseif vd==37
letter='a';
elseif vd==38
letter='b';
elseif vd==39
letter='c';
elseif vd==40
letter='d';
elseif vd==41
letter='e';
elseif vd==42
letter='f';
elseif vd==43
letter='g';
elseif vd==44
letter='h';
elseif vd==45
letter='i';
elseif vd==46
letter='j';
elseif vd==47
letter='k';
elseif vd==48
letter='l';
elseif vd==49
letter='m';
elseif vd==50
letter='n';
elseif vd==51
letter='o';
elseif vd==52
letter='p';
elseif vd==53
letter='q';
elseif vd==54
letter='r';
elseif vd==55
letter='s';
elseif vd==56
letter='t';
elseif vd==57
letter='u';
elseif vd==58
letter='v';
elseif vd==59
letter='w';
elseif vd==60
letter='x';
elseif vd==61
letter='y';
elseif vd==62
letter='z';
elseif vd==63
letter='!';
elseif vd==64
letter='@';
elseif vd==65
letter='#';
elseif vd==66
letter='$';
elseif vd==67
letter='%';
elseif vd==68
letter='^';
elseif vd==69
letter='&';
elseif vd==70
letter='*';
elseif vd==71
letter='(';
elseif vd==72
letter=')';
elseif vd==73
letter='-';
elseif vd==74
letter='+';
elseif vd==75
letter='=';
elseif vd==76
letter='\';
elseif vd==77
letter='/';
elseif vd==78
letter='|';
elseif vd==79
letter=';';
elseif vd==80
letter='?';
elseif vd==81
letter='<';
elseif vd==82
letter='>';
elseif vd==83
letter='.';
elseif vd==84
letter=',';
elseif vd==85
letter=':';
%This is the error character
else
letter='¤';
end
end
Walter Roberson
Walter Roberson on 6 Jul 2016
Your read_letters code has no possibility of Hindi characters.
Your read_characters could could also be much much more compact.
chars_table = ['0' : '9', 'A' : 'Z', 'a' : 'z', '!@#$%^&*()-+=\/|;<>.:'];
if vd >= 1 & vd <= length(chars_table)
letter = chars_table(vd);
else
letter = '¤';
end
which has obvious compact extensions if you add more templates.

Sign in to comment.

More Answers (1)

shubham tawade
shubham tawade on 2 Oct 2018
Can you share the code for hindi OCR??

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!