Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
middle letter extraction

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 15:46:08

Message: 1 of 18

 Hello,
I m using textread to read the file.
 let xy = matlab newsgroup is good
 
a= textread('xy.txt','%s');
 Now, I want extract 2 alphabets of medial from all words.
 I want: at tl la ---- from matlab
ew ws sg gr ro ou ----- from newsgroup
oo ------ good
( "is" has no medial letters)

I m using if & for loop but excution time is very huge. So, somethg else
Thanks alot n waiting for response
 

Subject: middle letter extraction

From: ImageAnalyst

Date: 5 Apr, 2011 17:02:07

Message: 2 of 18

% First download allwords.
% http://www.mathworks.com/matlabcentral/fileexchange/27184-allwords
% Then use it.
a = 'matlab newsgroup is good'
% parse out into separate words in a cell array.
words = allwords(a)
numberOfWords = length(words);
for word = 1 : numberOfWords
thisWord = words{word};
fprintf('\nCurrent word = %s\n', thisWord);
lengthOfThisWord = length(thisWord);
if lengthOfThisWord >= 4
for k = 2 : lengthOfThisWord-2
letters = thisWord(k:k+1);
fprintf(' Letters = %s\n', letters);
end
else
fprintf(' "%s" has no medial letters\n', thisWord);
end
end

Results:

a =
matlab newsgroup is good
words =
    'matlab' 'newsgroup' 'is' 'good'

Current word = matlab
   Letters = at
   Letters = tl
   Letters = la

Current word = newsgroup
   Letters = ew
   Letters = ws
   Letters = sg
   Letters = gr
   Letters = ro
   Letters = ou

Current word = is
   "is" has no medial letters

Current word = good
   Letters = oo

Subject: middle letter extraction

From: Florin Neacsu

Date: 5 Apr, 2011 17:05:08

Message: 3 of 18

"Mousmi Chaurasia" wrote in message <infdfv$4rm$1@fred.mathworks.com>...
> Hello,
> I m using textread to read the file.
> let xy = matlab newsgroup is good
>
> a= textread('xy.txt','%s');
> Now, I want extract 2 alphabets of medial from all words.
> I want: at tl la ---- from matlab
> ew ws sg gr ro ou ----- from newsgroup
> oo ------ good
> ( "is" has no medial letters)
>
> I m using if & for loop but excution time is very huge. So, somethg else
> Thanks alot n waiting for response
>

Hi,

The following code is most likely to be an overkill (it still uses a for loop, image processing toolbox and a FEX :) ), but it was fun to think about it. Please, do test if is faster (N.B. you said your execution time is huge, could you provide your code ?) It also might allow you for more flexibility (if you need some other kinds of letter combinations).
Here it goes :
-----------------------------------------------
t='matlab newsgroup is good';
temp=ismember(t,' '); %looking for words delimiters
se=strel('line',3,0); temp=imdilate(temp,se); %dilate the words delimiters, so we can ignore first and last letters
temp(1)=1;temp(end)=1; %ignorinf first letter of first word and last letter of last word
[L,num]=bwlabel(~temp); %labeling connex components (here words)
for i=1:num
    temp=circulant(t(L==i),-1); %create circular permutation matrix
    temp(1:end-1,1:2) %we are interested in only combination of 2 letters
end
------------------------------------------------
circulant is a very useful piece of code (http://www.mathworks.com/matlabcentral/fileexchange/22876-circulant-v2-0-feb-2009).
Hope it helps,
Florin

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 18:14:05

Message: 4 of 18


> t='matlab newsgroup is good';
> temp=ismember(t,' '); %looking for words delimiters
> se=strel('line',3,0); temp=imdilate(temp,se); %dilate the words delimiters, so we can ignore first and last letters
> temp(1)=1;temp(end)=1; %ignorinf first letter of first word and last letter of last word
> [L,num]=bwlabel(~temp); %labeling connex components (here words)
> for i=1:num
> temp=circulant(t(L==i),-1); %create circular permutation matrix
> temp(1:end-1,1:2) %we are interested in only combination of 2 letters
> end
> ------------------------------------------------
> circulant is a very useful piece of code (http://www.mathworks.com/matlabcentral/fileexchange/22876-circulant-v2-0-feb-2009).
> Hope it helps,
> Florin

Hello,

A = lower(textread('101.txt', '%s')); % stores each word in cell array mx1 cell array
i=1:length(A);
g=[];
for i=1:length(A)
        L=length(A{i});
    if L >=2 % for 2 alphabets
       A1=A(i,:);
       for k=2:L-2
          g = [g;(A{i}(k:k+1))];
          C = cellstr(g); % convert char array to cell array since 'g' is char array
    end

Thanks ur code is working but wen thsentence is defined. It is not reading the text (neither dataread nor textread). I have a 15-20K words. So, required to read the text file. Then I can check whether execution time is still huge or lesser??

Thanks alot alot & waiting for response..

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 18:24:04

Message: 5 of 18

ImageAnalyst <imageanalyst@mailinator.com> wrote in message <3df49052-000c-41fc-b1b9-8325ba812f0e@o20g2000yqk.googlegroups.com>...
> % First download allwords.
> % http://www.mathworks.com/matlabcentral/fileexchange/27184-allwords
> % Then use it.
> a = 'matlab newsgroup is good'
> % parse out into separate words in a cell array.
> words = allwords(a)
> numberOfWords = length(words);
> for word = 1 : numberOfWords
> thisWord = words{word};
> fprintf('\nCurrent word = %s\n', thisWord);
> lengthOfThisWord = length(thisWord);
> if lengthOfThisWord >= 4
> for k = 2 : lengthOfThisWord-2
> letters = thisWord(k:k+1);
> fprintf(' Letters = %s\n', letters);
> end
> else
> fprintf(' "%s" has no medial letters\n', thisWord);
> end
> end
 Hi!!

Thanks alot. The same kind of implementation I did but its taking huge execution time & other thg ur code is not reading the text file( dataread or textread). It produces an error in allwords.m line 94 {{ elseif all((round(str) == str) | isnan(str)) }}. I guess, ur code will also take huge execution time for 20K words as it is with my code.

Please suggest STRTOK or cellfun or any other implementation.

Thank-you very much & waiting for response..

Subject: middle letter extraction

From: Florin Neacsu

Date: 5 Apr, 2011 18:37:19

Message: 6 of 18

"Mousmi Chaurasia" wrote in message <infm5d$7v3$1@fred.mathworks.com>...
>
> > t='matlab newsgroup is good';
> > temp=ismember(t,' '); %looking for words delimiters
> > se=strel('line',3,0); temp=imdilate(temp,se); %dilate the words delimiters, so we can ignore first and last letters
> > temp(1)=1;temp(end)=1; %ignorinf first letter of first word and last letter of last word
> > [L,num]=bwlabel(~temp); %labeling connex components (here words)
> > for i=1:num
> > temp=circulant(t(L==i),-1); %create circular permutation matrix
> > temp(1:end-1,1:2) %we are interested in only combination of 2 letters
> > end
> > ------------------------------------------------
> > circulant is a very useful piece of code (http://www.mathworks.com/matlabcentral/fileexchange/22876-circulant-v2-0-feb-2009).
> > Hope it helps,
> > Florin
>
> Hello,
>
> A = lower(textread('101.txt', '%s')); % stores each word in cell array mx1 cell array
> i=1:length(A);
> g=[];
> for i=1:length(A)
> L=length(A{i});
> if L >=2 % for 2 alphabets
> A1=A(i,:);
> for k=2:L-2
> g = [g;(A{i}(k:k+1))];
> C = cellstr(g); % convert char array to cell array since 'g' is char array
> end
>
> Thanks ur code is working but wen thsentence is defined. It is not reading the text (neither dataread nor textread). I have a 15-20K words. So, required to read the text file. Then I can check whether execution time is still huge or lesser??
>
> Thanks alot alot & waiting for response..


Well, we assumed you already had done the reading part ...
Try this

-------------------------
fid = fopen('foo.txt','r');
while 1
    t = fgetl(fid);
    if ~ischar(t), break, end

    temp=ismember(t,' ');
    se=strel('line',3,0);
    temp=imdilate(temp,se);
    temp(1)=1;temp(end)=1;
    [L,num]=bwlabel(~temp);
    for i=1:num
        temp=circulant(t(L==i),-1);
        temp(1:end-1,1:2)
        fprintf(1,'\n');
    end
end
fclose(fid);
-----------------------------------------------

Subject: middle letter extraction

From: dpb

Date: 5 Apr, 2011 18:45:03

Message: 7 of 18

On 4/5/2011 1:24 PM, Mousmi Chaurasia wrote:
...

> Please suggest STRTOK or cellfun or any other implementation.
> Thank-you very much & waiting for response..

You might try some ideas along the following and see how work...

 >> s='matlab';
 >> idx=2:length(s)-2;
 >> jdx=[idx' idx'+1];
 >> s(jdx)
ans =
at
tl
la
 >>

--

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 19:03:05

Message: 8 of 18

Hello, Thanks for quick response.

Showing an error " Index exceeds matrix dimensions" in
line :- temp(1:end-1,1:2)

Thanks alot..
waiting for quick response..

Subject: middle letter extraction

From: Florin Neacsu

Date: 5 Apr, 2011 19:11:05

Message: 9 of 18

"Mousmi Chaurasia" wrote in message <infp19$rt1$1@fred.mathworks.com>...
> Hello, Thanks for quick response.
>
> Showing an error " Index exceeds matrix dimensions" in
> line :- temp(1:end-1,1:2)
>
> Thanks alot..
> waiting for quick response..

Indeed, this happens for 3 letter words. What do you want to do with a 3 letter word ?

Florin

Subject: middle letter extraction

From: ImageAnalyst

Date: 5 Apr, 2011 19:28:08

Message: 10 of 18

On Apr 5, 2:24 pm, "Mousmi Chaurasia" <a...@gmail.com> wrote:
> ImageAnalyst <imageanal...@mailinator.com> wrote in message <3df49052-000c-41fc-b1b9-8325ba812...@o20g2000yqk.googlegroups.com>...
> > % First download allwords.
> > %http://www.mathworks.com/matlabcentral/fileexchange/27184-allwords
> > % Then use it.
> > a = 'matlab newsgroup is good'
> > % parse out into separate words in a cell array.
> > words = allwords(a)
> > numberOfWords = length(words);
> > for word = 1 : numberOfWords
> >    thisWord = words{word};
> >    fprintf('\nCurrent word = %s\n', thisWord);
> >    lengthOfThisWord = length(thisWord);
> >    if lengthOfThisWord >= 4
> >            for k = 2 : lengthOfThisWord-2
> >                    letters = thisWord(k:k+1);
> >                    fprintf('   Letters = %s\n', letters);
> >            end
> >    else
> >            fprintf('   "%s" has no medial letters\n', thisWord);
> >    end
> > end
>
>  Hi!!
>
> Thanks alot. The same kind of implementation I did but its taking huge execution time & other thg ur code is not reading the text file( dataread or textread). It produces an error in allwords.m line 94 {{ elseif all((round(str) == str) | isnan(str)) }}. I guess, ur code will also take huge execution time for 20K words as it is with my code.
>
> Please suggest STRTOK or cellfun  or any other implementation.
>
> Thank-you very much & waiting for response..

---------------------------------------------------------------
Of course my code doesn't read in your file. Everyone is assuming you
know how to do that part. We're assuming you already have the "a"
variable.

What do you mean when you say "the same kind of implementation"? Does
that mean the exact same code I posted? Did you download allwords?
Because on my computer it's faster than a blink of an eye -- only
0.001 seconds. Exactly how long is your "a" string? Is it tens of
megabytes? Or is it like your example?
ImageAnalyst

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 19:30:21

Message: 11 of 18

"Florin Neacsu" <fneacsu2@gmail.com> wrote in message <infpg9$6of$1@fred.mathworks.com>...
> "Mousmi Chaurasia" wrote in message <infp19$rt1$1@fred.mathworks.com>...
> > Hello, Thanks for quick response.
> >
> > Showing an error " Index exceeds matrix dimensions" in
> > line :- temp(1:end-1,1:2)
> >
> > Thanks alot..
> > waiting for quick response..
>
> Indeed, this happens for 3 letter words. What do you want to do with a 3 letter word ?
>
> Florin

This error is coming while extracting 2 letters from each word in file. File contains various size of words (e.g. can could meterological and all other sizes)

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 19:47:04

Message: 12 of 18

> ---------------------------------------------------------------
> Of course my code doesn't read in your file. Everyone is assuming you
> know how to do that part. We're assuming you already have the "a"
> variable.
>
> What do you mean when you say "the same kind of implementation"? Does
> that mean the exact same code I posted? Did you download allwords?
> Because on my computer it's faster than a blink of an eye -- only
> 0.001 seconds. Exactly how long is your "a" string? Is it tens of
> megabytes? Or is it like your example?
> ImageAnalyst

Thanks for response. I knew reading the file & I already used dataread or textread to read the text file e.g. a = textread('xx.txt','%s'); before posting the post but it is giving the same error again after re-try. which command u r using to read the text file??

Thanks alot.. waiting for response

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 5 Apr, 2011 19:50:18

Message: 13 of 18

Hello,
 sorry, i didn't tell that I already downloaded allwords in the starting.

Thanks & waitng for ur reply..

Subject: middle letter extraction

From: Florin Neacsu

Date: 5 Apr, 2011 20:20:20

Message: 14 of 18

"Mousmi Chaurasia" wrote in message <infqkd$pou$1@fred.mathworks.com>...
> "Florin Neacsu" <fneacsu2@gmail.com> wrote in message <infpg9$6of$1@fred.mathworks.com>...
> > "Mousmi Chaurasia" wrote in message <infp19$rt1$1@fred.mathworks.com>...
> > > Hello, Thanks for quick response.
> > >
> > > Showing an error " Index exceeds matrix dimensions" in
> > > line :- temp(1:end-1,1:2)
> > >
> > > Thanks alot..
> > > waiting for quick response..
> >
> > Indeed, this happens for 3 letter words. What do you want to do with a 3 letter word ?
> >
> > Florin
>
> This error is coming while extracting 2 letters from each word in file. File contains various size of words (e.g. can could meterological and all other sizes)


Hi,

the code I suggested deals with words of length = 1,2,4,5,6,7 ... or N\{3} if you like this writting better.
The only case not treated is lenght of word = 3. That is because you did not mentioned what you need in that case.

Florin

Subject: middle letter extraction

From: Florin Neacsu

Date: 5 Apr, 2011 22:33:04

Message: 15 of 18

"Mousmi Chaurasia" wrote in message <infm5d$7v3$1@fred.mathworks.com>...
>
> > t='matlab newsgroup is good';
> > temp=ismember(t,' '); %looking for words delimiters
> > se=strel('line',3,0); temp=imdilate(temp,se); %dilate the words delimiters, so we can ignore first and last letters
> > temp(1)=1;temp(end)=1; %ignorinf first letter of first word and last letter of last word
> > [L,num]=bwlabel(~temp); %labeling connex components (here words)
> > for i=1:num
> > temp=circulant(t(L==i),-1); %create circular permutation matrix
> > temp(1:end-1,1:2) %we are interested in only combination of 2 letters
> > end
> > ------------------------------------------------
> > circulant is a very useful piece of code (http://www.mathworks.com/matlabcentral/fileexchange/22876-circulant-v2-0-feb-2009).
> > Hope it helps,
> > Florin
>
> Hello,
>
> A = lower(textread('101.txt', '%s')); % stores each word in cell array mx1 cell array
> i=1:length(A);
> g=[];
> for i=1:length(A)
> L=length(A{i});
> if L >=2 % for 2 alphabets
> A1=A(i,:);
> for k=2:L-2
> g = [g;(A{i}(k:k+1))];
> C = cellstr(g); % convert char array to cell array since 'g' is char array
> end
>
> Thanks ur code is working but wen thsentence is defined. It is not reading the text (neither dataread nor textread). I have a 15-20K words. So, required to read the text file. Then I can check whether execution time is still huge or lesser??
>
> Thanks alot alot & waiting for response..


Hi,

this is a line from the code you posted related to your reading part
> g = [g;(A{i}(k:k+1))];

are you actually doing this ?! have you use profile to see why your code is so slow. That line might be the cause and not the way you are generating the 2 letters words.
Pre-allocating is very important in matlab's memory handling. To not increase the size of an array inside a for loop!

Florin

Subject: middle letter extraction

From: ImageAnalyst

Date: 5 Apr, 2011 23:54:11

Message: 16 of 18

On Apr 5, 3:47 pm, "Mousmi Chaurasia" <a...@gmail.com> wrote:
> > ---------------------------------------------------------------
> > Of course my code doesn't read in your file.  Everyone is assuming you
> > know how to do that part.  We're assuming you already have the "a"
> > variable.
>
> > What do you mean when you say "the same kind of implementation"?  Does
> > that mean the exact same code I posted?  Did you download allwords?
> > Because on my computer it's faster than a blink of an eye -- only
> > 0.001 seconds.  Exactly how long is your "a" string?  Is it tens of
> > megabytes?  Or is it like your example?
> > ImageAnalyst
>
> Thanks for response. I knew reading the file & I already used dataread or textread to read the text file e.g. a = textread('xx.txt','%s'); before posting the post but it is giving the same error again after re-try. which command u r using to read the text file??
>
> Thanks alot.. waiting for response

--------------------------------------------------------------------
You answered almost none of my questions. And, repeating what I said,
I'm NOT READING your text file. I don't even have it. I'm just using
the example string you gave. I'm assuming you already have "a" and if
you don't know how to use textread() say so explicitly, otherwise we
assume you're using it correctly to read in "a."

Subject: middle letter extraction

From: Mousmi Chaurasia

Date: 6 Apr, 2011 06:44:04

Message: 17 of 18

>
> this is a line from the code you posted related to your reading part
> > g = [g;(A{i}(k:k+1))];
>
> are you actually doing this ?! have you use profile to see why your code is so slow. That line might be the cause and not the way you are generating the 2 letters words.
> Pre-allocating is very important in matlab's memory handling. To not increase the size of an array inside a for loop!

Hi,
U r right. the problem is in might be in that line. Really, that line is working with 500-1000words but not for 30-50K words. How can I allocate memory in matlab..

Really thanks for response...
Reply me..
 

Subject: middle letter extraction

From: Nasser M. Abbasi

Date: 6 Apr, 2011 07:11:28

Message: 18 of 18

On 4/5/2011 11:44 PM, Mousmi Chaurasia wrote:
> How can I allocate memory in matlab..
>
> Really thanks for response...
> Reply me..
>

help zeros

--Nasser

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us