Code covered by the BSD License  

Highlights from
GETCHUNKS

5.0

5.0 | 2 ratings Rate this file 4 Downloads (last 30 days) File Size: 2.99 KB File ID: #10038

GETCHUNKS

by Jiro Doke

 

17 Feb 2006 (Updated 28 Dec 2009)

Get the number of repetitions that occur in consecutive chunks.

| Watch this File

File Information
Description

C = GETCHUNKS(A) returns an array of n elements, where n is the number of consecutive chunks (2 or more repetitions) in A, and each element is the number of repetitions in each chunk. A can be LOGICAL, any numeric vector, or CELL array of strings. It can also be a character array (see below, for its special treatment).

[C, I] = GETCHUNKS(A) also returns the indices of the beginnings of the chunks.

If A is a character array, then it finds words (consecutive non-spaces), returning the number of chararcters in each word and the indices to the beginnings of the words.

GETCHUNKS(A, OPT) accepts an optional argument OPT, which can be any of the following three:

  '-reps' : return repeating chunks only. (default)
  '-full' : return chunks including single-element chunks.
  '-alpha' : (for CHAR arrays) only consider alphabets and numbers as part of words. Punctuations and symbols are regarded as spaces.

Examples:
  A = [1 2 2 3 4 4 4 5 6 7 8 8 8 8 9];
  getchunks(A)
    ans =
       2 3 4

  B = 'This is a generic (simple) sentence';
  [C, I] = getchunks(B)
    C =
       4 2 1 7 8 8
    I =
       1 6 9 11 19 28

  [C, I] = getchunks(B, '-alpha')
    C =
       4 2 1 7 6 8
    I =
       1 6 9 11 20 28

Acknowledgements
This submission has inspired the following:
Find Clipped Trials, FINDSEQ
MATLAB release MATLAB 6.5.1 (R13SP1)
Other requirements Also tested in R14SP3
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (3)
17 Feb 2006 urs (us) schwarz

another nice, specialized run-length encoder snippet...
useful for users and programmers alike
- sleek help, no clutter
- intuitive syntax closely following similar ML functions
- programmers: look at the smart use of STRFIND with various start-stop-templates, something the author (undoubtedly) has learned from CSSM
- one minor comment: a simplified single engine could just do a simple genuine run-length encoding and - if the (def) option -reps is set - return
-- id=id(d>1)
-- d=d(d>1)
us

18 Feb 2006 Jiro Doke

Yes, I learned about STRFIND trick from CSSM (from you, Us).
Good point about simplifying the engine to deal with '-full' and '-reps'. But my rationale was that I found that the '-reps' option required fewer steps, thus faster, so I valued speed over code simplicity. Now, if someone can find a way to do '-full' option in a short elegant way, that would be great.

21 Jun 2006 peppe verdi

awsome, I needed to find for each row the longest consecutive sequence of non-NaN values; with isnan + this function it was very fast

Please login to add a comment or rating.
Updates
28 Dec 2009

Added '-alpha' option. Updated license.

Tag Activity for this File
Tag Applied By Date/Time
matrices Jiro Doke 22 Oct 2008 08:16:00
repetitions Jiro Doke 22 Oct 2008 08:16:00
histogram Jiro Doke 22 Oct 2008 08:16:00
chunks Jiro Doke 22 Oct 2008 08:16:00
blocks Jiro Doke 22 Oct 2008 08:16:00
words Jiro Doke 22 Oct 2008 08:16:00
length Jiro Doke 22 Oct 2008 08:16:00
arrays Jiro Doke 22 Oct 2008 08:16:00

Contact us at files@mathworks.com