How to segment/crop characters from a Bengali handwriting word in an image in matlab?

4 views (last 30 days)
Hello, Im trying to segment/crop characters from a Bengali handwriting words in an image.
How do i achieve this? Please help me with the code.
Input image:
Want to remove red circle region:
Thank you in advance.

Answers (8)

Image Analyst
Image Analyst on 12 Aug 2018
No, I don't have code for any of the Bengali OCR methods nor is any built into MATLAB. You'll have to either ask the authors for it, or write it yourself.
  10 Comments
Rubina Easmin
Rubina Easmin on 5 Oct 2021
I works on the above code and got the desired output from noise-free images. But this code does not work properly for noisy images (like the below image). Actually, I want to get the above output from the below given noisy image.
Image Analyst
Image Analyst on 5 Oct 2021
Try doing a background correction on it. Or maybe a bottomhat filter, imbothat(). The thing I'm worried about is that once you get it working for this image, it won't work for some other one that is not as "perfect" as this, like some snapshot someone took of a sign with their phone camera.

Sign in to comment.


Rubina Easmin
Rubina Easmin on 12 Mar 2020
Please find this attatchment.Here I uploaded my previous code and your provided code where I included some portion of codes. I hope,now you get my point what I want to say and what problem I faced.

Rubina Easmin
Rubina Easmin on 4 Mar 2020
Edited: DGM on 11 Feb 2023
I can view output in the Workspace panel but I want to see the value of ratioOfZeroAndOne and ratioOfSize on the output screen such as, on the figure as like below after running the program.I used sprintf but it does not work.
caption =sprintf('ratioOfZeroAndOne #%d, ratioOfSize #%d', ratioOfZeroAndOne, ratioOfSize);
I faced problem bold portion of my below codes when I want to show output of ratio of zero and ones on the output screen in the above figure.
% Demo to find lines and characters from text in a Bengali letter.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 15;
%[fname path]=uigetfile('*.*','Enter an Image');
%fname=strcat(path,fname);
%myimage = imread(fname);
%===============================================================================
% Read in demo image.
folder = pwd;
baseFileName = 'ocr_find_lines_Bengali.jpeg';
% Get the full filename, with path prepended.
fullFileName = fullfile(folder, baseFileName);
% Check if file exists.
if ~exist(fullFileName, 'file')
% The file doesn't exist -- didn't find it there in that folder.
% Check the entire search path (other folders) for the file by stripping off the folder.
fullFileNameOnSearchPath = baseFileName; % No path this time.
if ~exist(fullFileNameOnSearchPath, 'file')
% Still didn't find it. Alert user.
errorMessage = sprintf('Error: %s does not exist in the search path folders.', fullFileName);
uiwait(warndlg(errorMessage));
return;
end
end
% Read in the image from disk. If storedColorMap is not empty, it's an indexed image with a stored colormap.
[grayImage, storedColorMap] = imread(fullFileName);
if ~isempty(storedColorMap)
end
% Get the dimensions of the image.
% numberOfColorChannels should be = 1 for a gray scale image, and 3 for an RGB color image.
[rows, columns, numberOfColorChannels] = size(grayImage);
if numberOfColorChannels > 1
% It's not really gray scale like we expected - it's color.
% Use weighted sum of ALL channels to create a gray scale image.
grayImage = rgb2gray(grayImage);
% ALTERNATE METHOD: Convert it to gray scale by taking only the green channel,
% which in a typical snapshot will be the least noisy channel.
% grayImage = grayImage(:, :, 2); % Take green channel.
end
% Display the image.
hFig = figure;
subplot(2, 3, 1);
imshow(grayImage, []);
title('Original Grayscale Image', 'FontSize', fontSize, 'Interpreter', 'None');
% Binarize the image
binaryImage = imbinarize(grayImage);
% Display the image.
h1 = subplot(2, 3, 1:2);
imshow(binaryImage, []);
title('Binary Image', 'FontSize', fontSize, 'Interpreter', 'None');
axis('on', 'image');
hp = impixelinfo();
hFig.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
% Get the vertical profile
verticalProfile = sum(~binaryImage, 2);
x = 1 : length(verticalProfile);
% Show the vertical profile. Plot x vs. y instead of y vs. x so it shows up in the same orientation.
h3 = subplot(2, 3, 3);
plot(verticalProfile, x, 'b-', 'LineWidth', 2);
grid on;
% Make it go from top to bottom, like the image.
h3.YDir = 'reverse';
h3.YLim = [1, rows];
ylabel('Row (Line)', 'FontSize', fontSize);
xlabel('Number of Black Pixels', 'FontSize', fontSize);
title('Vertical Profile', 'FontSize', fontSize);
% Find areas (horizontal bands) with no black pixels
verticalProfile = verticalProfile == 0;
% Sometimes there are diacritical marks (like dots over i's, etc.)
% and we want these to be on the same line as the character it belongs to.
% Find out what the gaps are right now to get an idea of how big a gap has to be
% and how small it needs to be to be considered part of the same line.
props = regionprops(verticalProfile, 'Area');
gapWidths = [props.Area]; % 10 24 13 28 19 3 8 One Mark is only 3 pixels above the other.
% Only allow separations if the gap is 6 pixels or more.
verticalProfile = bwareaopen(verticalProfile, 6);
% Find the centroids and areas of those blank bands:
props = regionprops(verticalProfile, 'Centroid');
% Get all centroids in a N by 2 (x,y) matrix.
centroids = round(vertcat(props.Centroid));
separationLines = centroids(:, 2);
% Draw horizontal lines at the centroids.
for k = 1 : length(separationLines)
%yline(h1, separationLines(k), 'Color', 'r', 'LineWidth', 2);
%yline(h3, separationLines(k), 'Color', 'r', 'LineWidth', 2);
line(h1, xlim, [separationLines(k), separationLines(k)], 'Color', 'r', 'LineWidth', 2);
line(h3, xlim, [separationLines(k), separationLines(k)], 'Color', 'r', 'LineWidth', 2);
end
% Now process each band in the other direction to get the individual characters
h4 = subplot(2, 3, 4:5);
h5 = subplot(2, 3, 6);
%h6 = subplot(2, 3, 6:7);
for k = 1 : length(separationLines)-1
thisBand = binaryImage(separationLines(k) : separationLines(k+1), :);
imshow(thisBand, [], 'Parent', h4);
% Now get the characters by basically doing the same thing we did to get the lines but in the other direction.
% Get the horizontal profile
horizontalProfile = sum(~thisBand, 1);
% Find areas (horizontal bands) with no black pixels
horizontalProfile = horizontalProfile == 0;
% Only allow separations if the gap is 6 pixels or more.
horizontalProfile = bwareaopen(horizontalProfile, 6);
% Find the centroids and areas of those blank bands:
props = regionprops(horizontalProfile, 'Centroid');
% Get all centroids in a N by 2 (x,y) matrix.
centroids = round(vertcat(props.Centroid));
separationColumns = centroids(:, 1);
for k2 = 1 : length(separationColumns)
% Draw vertical separation lines.
line(h4, xlim, [separationLines(k), separationLines(k)], 'Color', 'r', 'LineWidth', 2);
end
caption = sprintf('Line of text #%d has %d characters in it', k, length(separationColumns)-1);
title(h4, caption, 'FontSize', fontSize);
for k2 = 1 : length(separationColumns)-1
% Extract this one character:
thisCharacter = thisBand(:, separationColumns(k2):separationColumns(k2+1));
% Now the character is black, and we might actually want the bounding box of it, so get the bounding box.
[cRows, cCols] = find(~thisCharacter);
row1 = min(cRows);
row2 = max(cRows);
col1 = min(cCols);
col2 = max(cCols);
% Crop:
thisCharacter = thisCharacter(row1:row2, col1:col2);
imshow(thisCharacter, 'Parent', h5);
numberOfPixel =numel(thisCharacter); %number of pixel in the entire image
%NoOfNonZeroElement=nnz(binaryimage);
numberOfOnes =sum(thisCharacter(:)); %number/sum of true/1 value
numberOfZeros =sum(~thisCharacter(:));
%numberOfZeros = numberOfPixel - numberOfOnes;
ratioOfZeroAndOne = numberOfOnes/numberOfZeros;
[theLength, width] = size(thisCharacter);%calculation of height and width of binary image
ratioOfSize = theLength/width;
caption = sprintf('Line #%d, Character #%d', k, k2);
title(h5, caption, 'FontSize', fontSize);
pause(0.9);
%How can I show below output on figure ?
imshow(thisCharacter, 'Parent', h6);
caption =sprintf('ratioOfZeroAndOne #%d, ratioOfSize #%d', ratioOfZeroAndOne, ratioOfSize);
title(h6, caption);
end
end
  4 Comments
Image Analyst
Image Analyst on 4 Mar 2020
Increase the number of subplots, like to 3, or something
h9 = subplot(3, 3, 9);
And I think you should be able to use text(gcf, x, y, string) to place test on the figure wherever you want.

Sign in to comment.


Rubina Easmin
Rubina Easmin on 6 Mar 2020
Edited: DGM on 11 Feb 2023
I want to find ratio of zero's and one's and ratio of images height and width .My codes are given below.Its works well in my previous program.But when I used it in your provided code I have different value. Is the problem arise because of croping image inappropriately? your cropped images width and height is bigger than mine.thats why number of ones increases in your cropped image.How can I solve this problem? you used fontsize = 15 in your provided source code.Is the fontsize effect on image size?
numberOfPixel =numel(thisCharacter); %number of pixel in the entire image
numberOfOnes =sum(thisCharacter(:)); %number/sum of true/1 value
numberOfZeros =sum(~thisCharacter(:));
ratioOfZeroAndOne = numberOfOnes/numberOfZeros;
[theLength, width] = size(thisCharacter);%calculation of height and width of binary image
ratioOfSize = theLength/width;
imshow(thisCharacter, 'Parent', h9);
caption =sprintf('ratioOfZeroAndOne %.4f, ratioOfSize %.4f', ratioOfZeroAndOne, ratioOfSize);
title(h9, caption, 'FontSize', fontSize);
pause(0.9);
  1 Comment
Image Analyst
Image Analyst on 7 Mar 2020
The fontSize only affects the caption/title which is above the image. It has nothing at all to do with the character/image itself. No effect on aspect ratio (rows/columns) or area fraction of bright or dark pixels.

Sign in to comment.


Rubina Easmin
Rubina Easmin on 9 Mar 2020
okay,but whats the reason of having different value ? your cropped images width is bigger than my previous cropped image.thats why number of ones increases.How can I solve this problem?
  6 Comments
Rubina Easmin
Rubina Easmin on 9 Mar 2020
exactly sir, thats not your code.The screenshots code was my previous code. last two screenshot is gathered from your code. And I take workpspace panels screenshot of your code.
Everything is ok with your source code. My question arise when I added some code within your code for having aspect ratio. because of width value I did not get proper value from your code. For character 'e' I have width value 24 in my code but its 25 in your code.thats why, the number of one increases and the ratio of zero and one varies.

Sign in to comment.


Rubina Easmin
Rubina Easmin on 30 Sep 2021
How can I segment properly Isolated Bangla Printed characters for a noisy image captured by mobile?
  3 Comments
Rubina Easmin
Rubina Easmin on 1 Oct 2021
I am unable to understand this code. I didn't get my desired output with this code.
I want to segment Isolated Bangla Printed characters for a noisy image captured by mobile.
I did segmentation for the noise-free image of the Isolated Bangla Printed Character. I used the above codes on this page. But by using these codes I did not get proper output for a noisy image that is captured by mobile.
Rubina Easmin
Rubina Easmin on 1 Oct 2021
I used these codes. But these codes doesn't work properly for noisy image captured by mobile.

Sign in to comment.


Image Analyst
Image Analyst on 1 Oct 2021
@Rubina Easmin, Try this. If this helps you, please "Accept the answer".
% Demo by Image Analyst
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
markerSize = 40;
%--------------------------------------------------------------------------------------------------------
% READ IN IMAGE
fileName = 'Bengali characters.jpeg';
grayImage = imread(fileName);
% Get the dimensions of the image.
% numberOfColorChannels should be = 1 for a gray scale image, and 3 for an RGB color image.
[rows, columns, numberOfColorChannels] = size(grayImage)
if numberOfColorChannels > 1
% It's not really gray scale like we expected - it's color.
% Extract the red channel (so the magenta lines will be white).
grayImage = grayImage(:, :, 1);
end
%--------------------------------------------------------------------------------------------------------
% Display the image.
subplot(2, 2, 1);
imshow(grayImage, []);
impixelinfo;
axis('on', 'image');
title('Original Image', 'FontSize', fontSize, 'Interpreter', 'None');
hold on
drawnow;
grayImage(grayImage == 1) = 255;
% Maximize window.
g = gcf;
g.WindowState = 'maximized'
drawnow;
lowThreshold = 0;
highThreshold = 125;
% Interactively threshold with Image Analyst's interactive thresholding utility.
% https://www.mathworks.com/matlabcentral/fileexchange/29372-thresholding-an-image?s_tid=srchtitle
%[lowThreshold, highThreshold] = threshold(lowThreshold, highThreshold, grayImage);
mask = grayImage >= lowThreshold & grayImage <= highThreshold;
% Display the image.
subplot(2, 2, 2);
imshow(mask, []);
impixelinfo;
axis('on', 'image');
title('Initial Mask Image', 'FontSize', fontSize, 'Interpreter', 'None');
% Merge the parts into a single blob.
se = strel('disk', 41, 0);
mask = imclose(mask, se);
% Display the image.
subplot(2, 2, 4);
imshow(mask, []);
impixelinfo;
axis('on', 'image');
title('Final Mask Image', 'FontSize', fontSize, 'Interpreter', 'None');
% Find out areas in initial mask to find outliers.
props = regionprops(mask, 'Area');
allAreas = sort([props.Area])
subplot(2, 2, 3);
histogram(allAreas);
grid on;
title('Histogram of Blob Areas', 'FontSize', fontSize, 'Interpreter', 'None');
xlabel('Blob Area', 'FontSize', fontSize, 'Interpreter', 'None');
ylabel('Count', 'FontSize', fontSize, 'Interpreter', 'None');
% Take blobs only in a certain range.
mask = bwareafilt(mask, [1000, 20000]);
% Fill holes
mask = imfill(mask, 'holes');
% Display the image.
subplot(2, 2, 4);
imshow(mask, []);
impixelinfo;
axis('on', 'image');
title('Final Mask Image', 'FontSize', fontSize, 'Interpreter', 'None');
% Make measurements.
props = regionprops(mask, 'BoundingBox', 'Area');
allBB = vertcat(props.BoundingBox);
hold on;
for k = 1 : length(props)
rectangle('Position', props(k).BoundingBox, 'EdgeColor', 'r', 'LineWidth', 2);
end
% Find out areas in final mask.
props = regionprops(mask, 'Area');
allAreas = sort([props.Area])
subplot(2, 2, 3);
histogram(allAreas);
grid on;
title('Histogram of Blob Areas', 'FontSize', fontSize, 'Interpreter', 'None');
xlabel('Blob Area', 'FontSize', fontSize, 'Interpreter', 'None');
ylabel('Count', 'FontSize', fontSize, 'Interpreter', 'None');

Rubina Easmin
Rubina Easmin on 2 Oct 2021
Thanks a lot for your kind help. Please see the below second image, I want to segment like this to Isolated Bangla Printed characters from this noisy image.
Figure: Bengali characters (noisy image)
Figure: Noise free image segmented and croped image.
  3 Comments
Image Analyst
Image Analyst on 3 Oct 2021
When I use my code on this image, I get the same thing:
Every character is found. Not sure what else you're wanting. Obviously I didn't go any further and identify which character is inside the box -- I assume you can do that because I don't know that character set. Maybe you can use deep learning to identify the characters. But I don't see what's wrong with my code. Perhaps you attached the wrong second image.

Sign in to comment.

Categories

Find more on Images in Help Center and File Exchange

Products


Release

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!