Word segmentation based on projection histogram ?

2 views (last 30 days)
Hi all,
I am currently working on an OCR project, and I am stuck now at word segmentation. The basic algorithm is to base on the horizontal projection of a segmented line, I will look for space between rising edge and falling edge. The problem is I could not differentiate between word space and character space, or I could not automatically find the proper threshold to crop out a word. Please help, any help would be appreciated, thank you guys By the way, how could I contact mr Image Analysis directly please ? Here is how my work is at the moment:
%read in an image
close all, clear all;
I = imread('C:\Users\Nguyen Duy Hien\Desktop\bible.jpg');
%to grayscale image
I = rgb2gray(I);
level = graythresh(I);
%binarization
BW = im2bw(I,level);
%BW = imadjust(I);
%smoothering image
h = fspecial('gaussian',[3 1],0.8);
BW = imfilter(BW,h);
BW=~BW;
BWedge = edge(uint8(BW));
BW = imfill(BWedge,'holes');
figure(1),imshow(BW)
%---line segmentation
pV = sum(BW,1);
pH = sum(BW,2);
figure(2),plot(pH)
figure(3),plot(pV)
lines = pH > 0;
%Detect rising edge and falling edge
d = diff(lines);
startingColumns = find(d>0);
endingColumns = find(d<0);
subImage = [];
n = length(startingColumns);
space = []>0;
y=[];
count = 1;
for k = 1 : n
subImage{k} = BW(startingColumns(k):endingColumns(k),:);
figure(4)
subplot(n,1,k),imshow(subImage{k})
pHline{k} = sum(subImage{k},1);
figure(5)
subplot(n,1,k),plot(pHline{k})
lineN = pHline{k} > 0;
a = diff(lineN);
startingRow = find(a>0);
endingRow = find(a<0);
buf_end = [];
buf_start = [startingRow(1)];
m = length(startingRow)-1;
for j = 1 : m
space{j} =startingRow(j+1) - endingRow(j);
A = cell2mat(space);
y = [y, max(A)];
if min(y)<space{j} && max(y)>space{j}
buf_end = [buf_end; endingRow(j)];
buf_start = [buf_start; startingRow(j+1)];
end
end;
buf_end = [buf_end; endingRow(end)];
o = length(buf_end);
for i=1:o
word{i} = subImage{k}(:,buf_start(i):buf_end(i));
wordarr{count} = word{i};
figure, imshow(wordarr{count})
%figure(6), subplot(o,n,count),imshow(wordarr{count})
count = count+1;
end;
end;
  2 Comments
Walter Roberson
Walter Roberson on 4 Aug 2015
Image Analyst does not wish to be contacted privately. He responds to some posts, if it amuses him to do so.
sayar chit
sayar chit on 14 Nov 2017
Hi Sir! I am studying image segmentation from printed documents. I got well line segmentation and words segmentation but I cannot get character segmentations from words. So can anyone help me. This is my words a
s inputs. I want to get its as follows မ,ိ,ှု,င,်,း,တ,ိ,ု,က,်,၍

Sign in to comment.

Answers (3)

Nguyen Hien
Nguyen Hien on 4 Aug 2015
Thank you guys so much for your help, fortunately I have figured out the solution

Image Analyst
Image Analyst on 4 Aug 2015
You just did contact me directly - as direct as it gets. Sorry, I don't do private consulting, besides, OCR is not even my field. I'd just refer you to either the Computer Vision System Toolbox, or, if that doesn't work, then Vision Bib: http://www.visionbib.com/bibliography/contentschar.html#OCR,%20Document%20Analysis%20and%20Character%20Recognition%20Systems Besides you didn't even attach your image so we can't try your code and I couldn't detect problems like yours just by looking over the code and imagining what it would do with an image. Sorry but if it's major algorithm development, we just don't have the time for that here. If it's something quick, like a few minutes to correct syntax or logic flow or something, then maybe we can help with something that short.

Walter Roberson
Walter Roberson on 4 Aug 2015
There is no fixed number of pixels that can be used to define the difference between spacing between characters and spacing between words. Some languages do not have spacing between words. And the spacing between characters on a very large sign could be larger than the total length of a word on a smaller sign.
You need to examine the relative distance between centroids, perhaps as compared to the width of the blobs.

Categories

Find more on Image Processing and Computer Vision in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!