13 views (last 30 days)

Hi guys!

I want to implement in matlab function that gets in its input (String , substring) , output the all following data that following my substring, the length of String isn't already known, this means I need to exctract the length of my following Data that I need .

Assumptions:

the length of my following data after occurrence of "0101" isn't already known, I must extract the length from the immediate 8bit that follows the occurrence/appearance of my substring (the length of all my following data after occurrence/appearance my substring is always represented in 8bit in binary and it's always the immediate followed 8bit after occurrence of my substring), all the following data length are the same at each occurance this means that the output matrix columns are the same for all the occurance but I still have to read them and its value (length value are the same at each occurance of my substring "0101").

for example:

string="0101000100001111111111100000001000010100010000111111111110000011000" , substring is always constant and it's "0101".

00010000-> 16 in decimal.

so here the output is the 16 followed data after ("00010000") which it's: 1111111111100000 , how do I know the length of my following data? it's given in the String itself immediately after appearance of substring "0101" and the length is always 8bit !, so here in my question the immediate 8bit followed to my substring ("0101") represents the following data after those 8bit, so here the immediate following 8bit after appearance "0101" is 00010000 and in decimal It's 16 , this 16 is the length of the data that I want to take/output after the 8bits that represetns the size of the following data, so here in my case I look at "0101" and then I must read the 8bit that immediately following it , that 8bit represents the length, so I need to convert the 8bit in decimal value (in my case it's 16) and take all the following data that comes after that 8bit of length represenation which its size is represented in binary in the immediate 8bit followed by occurrence substring(by occurance "0101") ; As a result the output here is 1111111111100000.

the output is:

output=[1111111111100000 ; 1111111111100000] , each row again represents respectively all following data at each occurrence, and first row represents first occurrence, second row represents second occurrence ....respectively ..etc

Another example:

String="01010000111111111111111000001000100101000011111111111111100010111111" , substring is always constant and it's "0101".

00001111 -> 15 in decimal for first occurance of "0101"

so here the output is the 16 followed data after ("00010000") which it's: 111111111110000, how do I know the length of my following data? it's given in the String itself immediately after appearance of substring "0101" and the length is always 8bit !, so here in my question the immediate 8bit followed to my substring ("0101") represents the following data after those 8bit, so here the immediate following 8bit after appearance "0101" is 00001111 and in decimal It's 15 , this 15 is the length of the data that I want to take/output after the 8bits that represetns the size of the following data, so here in my case I look at "0101" and then I must read the 8bit that immediately following it , that 8bit represents the length, so I need to convert the 8bit in decimal value (in my case it's 15) and take all the following data that comes after that 8bit of length represenation which its size is represented in binary in the immediate 8bit followed by occurance substring(by occurance "0101") ; As a result the output here is 111111111110000. (15 offset data that immediately following what I marked on the first occurance of 0101)

00001111 -> 15 in decimal for second occurance of "0101" and the 15 following bit after the 8bit of the length representation is

111111111110001 (15 offset data that immediately following what I marked on the second occurance of 0101)

So the output matrix is two rows because there's two occurance of "0101" , the number of rows is equal to the number of occurance of my substring 0101, and at each row represents the immediated followed data at the current length that I've read it from the immediate 8bit followed by my substring occurance.

the output is:

output=[111111111110000; 111111111110001] , each row again represents respectively all following data at each occurrence, and first row represents first occurrence, second row represents second occurrence ....respectively ..etc

I need to check the length representation (8bit followed immediately at each occurrence of my substring "0101" , it should be the same length at each occurrence of my substring but I need to check it , so you can assume that I must read the length at each occurrence and it should be the same length on whole occurrence of my substrings but I need to check/read it at every occurrence although it must be the same value ..

Note - there can be more than one occurrence of my substring "0101" in my string, I need to return all the followed data respectively to what I explained above in a row of matrix (this means first row represents all offset data that follows first occurance of my substring, the second row represents all offset data that follows the second occurrence...etc ... ) there can't be overlaping between occurance..so assume all work fine and there's no overlaps between occurance (there's always enough data between one occurrence to another .. ).

my substring occurrences can be anywhere and not specifically at the beginning of my string !

so it could be inputs string=[11111111101010000111111111111111000001000100101000011111111111111100010111111]

the function that I tried to implement in matlab is: (I get wrong outputs unfortunately):

function TruncateSubstringResultCheck= TruncateSyncWordResultCheck(input1,substring) %input1 is my string , my substring as I said in my case it's always "0101"

positions = strfind(input1, substring) ;

TruncatedSubstring= cell2mat(arrayfun(@(idx) input1(idx+length(substring):idx+length(substring)+N-1), positions, 'uniform', 0 ).');

for i=1:NumberOfRows

substring = TruncatedSubstring(i,:);

TruncateSubstringResultCheck(i,:)=substring;

end

Could anyone help me to fix that and get the required output ? thanks for any assistance !

Stephen Cobeldick
on 21 Aug 2020

Edited: Stephen Cobeldick
on 21 Aug 2020

>> fun = @(s)sprintf('[01]{%d}',bin2dec(s));

>> rgx = '0101([01]{8})((??@fun($1)))';

>> str = '010100001111111111111110000010101001010000111111111111111000001111100101000010001111111111100000111110';

>> tkn = regexp(str,rgx,'tokens');

>> tkn = vertcat(tkn{:});

>> out = tkn(:,2);

>> out{:}

ans =

111111111110000

ans =

0111111111111111000001111100101000010001

Note that this returns an output following the rules that you described, and so does not match the (incorrect) examples.

the cyclist
on 22 Aug 2020

@Stephen, you misinterpreted my earlier comment. I meant that I had not pursued this type of solution because I did not know how the regular expression would work in this case. (But now I do, thanks to you!) I didn't mean that I thought it would not work.

Really glad to see this works as intended. It's certainly the more elegant algorithm.

Stephen Cobeldick
on 22 Aug 2020

"output=[00000;00000]"

If the input is a character vector and the data subvectors can have different lengths then it is not possible to concatenate them into one character matrix. You could pad them to have the same length and then concatenate them together. Or convert to string, in which case you will get a vector of strings (where each element is a scalar string with a different number of characters).

Converting to numeric is possible, but note that apart from some coincidental visual similarity, the decimal number 101 is totally unrelated to the binary number 101.

"I will explain what my issue, my input(str) isn't string it's a binary array integers.... str=[00010100000101000001010000010100000]"

You example cannot be stored as one integer by any standard integer class supported by MATLAB. Perhaps you actually meant that each of those digits are a separate element of an integer array, e.g.:

vec = [0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0];

in which case you can trivially convert those integers to character:

str = sprintf('%d',vec);

the cyclist
on 20 Aug 2020

Edited: the cyclist
on 20 Aug 2020

If 0101 is always at the beginning of the string, then

% Example input

str ="0101000011111111111111100000101010";

% The 8 digits after 0101 define the length.

% In other words, the 5th to 12th digits.

L = bin2dec(extractBetween(str,5,12));

% The L digits after 0101 and the next 8, are the output string.

% In other words, start from the 13th digit, and get L digits.

output = extractBetween(str,13,12+L);

or if you actually have a character array :

% Example input

str ='0101000011111111111111100000101010';

% The 8 digits after 0101 define the length.

% In other words, the 5th to 12th digits.

L = bin2dec(str(5:12));

% The L digits after 0101 and the next 8, are the output string.

% In other words, start from the 13th digit, and get L digits.

output = str(13:(12+L));

the cyclist
on 20 Aug 2020

% Sample input

str ="0101000000011010100000001101010000000110101000000011010100000001101010000000110101000000111110101";

% Initialize with first index of 0101, and string length

idx0101 = regexp(str,"0101","once");

strL = strlength(str);

% Initialize string array for output

segments = strings(0);

% Loop over string, while it is long enough to hold 0101 and the lenght

% identifier segment

while strL >= idx0101 + 11

% Find the segment length

segmentL = bin2dec(extractBetween(str,idx0101+4,idx0101+11));

% If the string is long enough to contain a string of that length,

% extract it

if strL >= 12+segmentL

% Pull segment of the correct length

thisSegment = extractBetween(str,idx0101+12,idx0101+11+segmentL);

% Append the segment to the array

segments = [segments; thisSegment];

% Remove the segment and its identifiers

str = extractAfter(str,idx0101+11+segmentL);

% Find the length of the shortened string, and first location of

% "0101", so that we can start over

strL = strlength(str);

idx0101 = regexp(str,"0101","once");

else

break % Break out of the loop if the string is not long enough to have a new segment

end

end

the cyclist
on 21 Aug 2020

My solution here gives the output that you specified for the input/ouput combinations you specified in the other location, if you do

str2double(segments)

as I sugested.

per isakson
on 22 Aug 2020

This is an answer to the follow_up question, which was closed when I tried to submit.

%%

chr = '01010000111111111111111000001000100101000011111111111111100010111111';

sbs = '0101';

%%

pos = strfind( chr, sbs );

out = cell( numel(pos), 1 );

%%

for pp = 1 : numel(pos)

ix1 = pos(pp) + 4;

ix2 = ix1 + 8 - 1;

if ix2+len <= numel(chr)

len = bin2dec( chr(ix1:ix2) );

out{pp,1} = chr(ix2+1:ix2+len);

else

out(pp) = [];

end

end

%%

output = string( out );

This script prints

output =

2×1 string array

"111111111110000"

"111111111110001"

And the script outputs the same result for

chr = 'xxxxxxxxxxxx01010000111111111111111000001000100101000011111111111111100010111111';

and for

chr = '11111111101010000111111111111111000001000100101000011111111111111100010111111';

There is at least one problem with the script and that is handling of the case where the distance between substring is less than 12+1 positions.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 4 Comments

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_980087

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_980087

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_980183

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_980183

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_980312

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_980312

## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_981422

⋮## Direct link to this comment

https://www.mathworks.com/matlabcentral/answers/582338-extracting-length-information-of-pattern-from-specific-string-not-fixed-string#comment_981422

Sign in to comment.