Be the first to rate this file! 8 Downloads (last 30 days) File Size: 1.85 KB File ID: #12778

stok - string tokenizer

by Michael Yoshpe

 

24 Oct 2006 (Updated 25 Oct 2006)

Find the addresses of tokens in the strings

| Watch this File

File Information
Description

%******************************************************************************
% function stok
% Purpose: Find the addresses of tokens in the strings
% Input:
% str - character string to be searched for tokens
% delim - character string holding the token delimiters
% maxtok - maximum allowed number of tokens
% Output:
% strtok - return the number of tokens in the string
% istart - integer array holding the token starting positions in str
% iend - integer array holding the token ending positions in str
% Usage example:
% str='ab;cd de=0'; delim='; ='; [istart, iend] = stok(str, '; =')
% istart =
% 1 4 7 10
% iend =
% 2 5 8 10
% str(istart(1):iend(1))
% ans =
% ab

MATLAB release MATLAB 6.0 (R12)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (8)
25 Oct 2006 Jos x@y.z

This is what regular expressions are designed for:

str = 'ab;cd de=0';
delim = '; =';
% this is the difficult step, but it is really all in the help of regexp:
pat = ['\w*[' delim '\w*'] ;
[starti,endi] = regexp(str,pat)

25 Oct 2006 Jos x@y.z

pat = ['\w*[' delim '\w*'] ;
  should read
pat = ['\w*[^' delim ']\w*'] ;

25 Oct 2006 urs (us) schwarz

jos, please not the ML release the code was created for (r12), which - as far as i remember - did not come with regexp yet...
us

25 Oct 2006 urs (us) schwarz

not
must read
note
us

26 Oct 2006 Jos x@y.z

You're correct, Us.

but, Michael Yoshpe, what about:

j = ismember(str,delim)
starti = strfind([1 j],[1 0])
endi = strfind([j 1],[0 1])

26 Oct 2006 Michael Yoshpe

For all the guys who said that the same results can be accomplished with regular expressions - you are of course course right. But, as Urs (us) Schwarz mentioned, my function will work for Matlab 6.0 (I suspect even for 5.0, but I never tried). Also, I persionally find regesxp too cryptic to use comfortably. As for the suggestion from Jos, I use stok to parse very large text files (I read them into string including newlines, and parse with stok). I suspect my function will be much faster in such case.

26 Oct 2006 Jos x@y.z

Timings (in R13) are not bad at all for stok
However, the ismember/strfind combo is
1) faster (likely more so in releases without JIT accelerator)
2) more condensely coded

30 Oct 2006 Jérôme Briot

Did you try with STRREAD (available in R12) ?

str='ab;cd de=0';
X=strread(str,'%s','delimiter',';= ')

Jérôme

Please login to add a comment or rating.
Tag Activity for this File
Tag Applied By Date/Time
strings Michael Yoshpe 22 Oct 2008 08:45:22
string tokens delimeter address Michael Yoshpe 22 Oct 2008 08:45:22

Contact us at files@mathworks.com