I have few questions regarding string. I have applied Run Length Encoding on a large string. But at the output I am getting some unwanted symbols. why it is so?

2 views (last 30 days)
function y = estring(str)
len = numel(str); %65536
i = 0;
count = zeros(1,len);
y=[];
while( i<len )
j=0;
count(i+1) = 1;
while( true )
j = j + 1;
if( i+j+1 > len )
break;
end
if( str(i+j+1)==str(i+1) )
count(i+1) = count(i+1) + 1;
else
break;
end
end
if false
a=str(i+1);
length(a);
y = [y a];
i = i + 1;
else
a=str(i+1);
b=count(i+1);
y =[y a b];
i = i + b;
if(count==1)
y=[y a b]
end
end
end

Answers (1)

Walter Roberson
Walter Roberson on 12 May 2015
I already answered this in a previous discussion. Every second character of your returned string is the char() equivalent of a binary count. Remember, if you have ['P' 9] the result is not 'P9', it is 'P' followed by char(9) which happens to be the tab character. If you want to have 'P9' as the result when the count is 9 then you need to program the code that way and you need to decide exactly what you want to have happen if you get more than 9 in a row of the same thing.
  3 Comments
Walter Roberson
Walter Roberson on 12 May 2015
You do not represent counts with a whitespace. You represent counts with the character whose binary value is the count. If that count happens to be (for example) 42, then you are going to get char(42) which is '*'. And if the count happens to be 116 then you are going to get char(116) which happens to be 't' and you won't be able to tell that apart from a normal 't' of your output.
If you want to output the string without the counts then output every second character... like I already showed you. Or, encode the counts as printable digits and accept that a count of 10 will take more characters to represent than a count of 9, and get smarter about decoding the compressed string.
You need to define: exactly what string should be output to run-length encode (for example) 'PPPPPPPPPPPtPP'. A completely valid answer is ['P' char(11) 't' char(1) 'P' char(2)] which is what you are generating now. It is a valid run-length encoding. You just have to be aware that in that particular encoding every second character is a binary count. You also have to be aware that binary counts from 256 to 65535 imply that you are storing two bytes per character (counts larger than that would give an error unless you were careful) whereas a maximum count of 255 would allow you to store only 1 byte per character. (Remember in that other post about compression ratios earlier tonight I spoke of the difference between the number of bytes of storage per location and the number of "used" bits of storage per location? This is a case where it makes a difference, as MATLAB stores 2 bytes per character in memory.)
There are other run-length encoding schemes, some of which only use printable characters. If you want efficiency in run-length encoding you normally work in binary rather than in printable characters. But even if you restrict yourself to printable characters you can get higher efficiency than a scheme of letter followed by a sequence of decimal characters '0' through '9' that represent counts. An important part of that is to define your allowed output characters. See for example Base64
tina jain
tina jain on 12 May 2015
see Walter sir,I changed the code like this
close all;
clear all;
clc;
X= 'ttPPddgadgtttttttt10 ttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt';
lengh=numel(X);
aaAA=estring(X);
zz=numel(aaAA)
y=[];
for i=1:zz
if((~((mod(i,2))==0))&&( (aaAA(i)>='a'&& aaAA(i)<='z') || (aaAA(i)>='A' && aaAA(i)<='Z')))
a=aaAA(i);
y=[y a];
numel(y);
end
i=i+1;
end
%------------------function estring---------------------
function y = estring(str)
len = numel(str);
i = 0;
count = zeros(1,len);
y=[];
while( i<len )
j=0;
count(i+1) = 1;
while( true )
j = j + 1;
if( i+j+1 > len )
break;
end
if( str(i+j+1)==str(i+1) )
count(i+1) = count(i+1) + 1;
else
break;
end
end
if( count(i+1)==1 )
b=1;
a=str(i+1);
length(a);
y = [y a num2str(b)];
i = i + 1;
else
a=str(i+1);
b=count(i+1);
y =[y a num2str(b)];
i = i + b;
end
end
end
%--------OUTPUTS are------------------------------
aaAA =
t2P2d2g1a1d1g1t81101t525
>> y
y =
tPdgadgtt
>> whos
Name Size Bytes Class Attributes
X 1x545 1090 char
a 1x1 2 char
aaAA 1x24 48 char
ans 1x1 8 double
i 1x1 8 double
lengh 1x1 8 double
y 1x9 18 char
zz 1x1 8 double
Is this going right now?what you say?

Sign in to comment.

Categories

Find more on Data Type Conversion in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!