Thread Subject: Separating doubles into separate integers

Subject: Separating doubles into separate integers

From: james bejon

Date: 13 Jan, 2012 19:51:08

Message: 1 of 21

Dear All,

I have a matrix 3 cols wide and a good few million rows long. The data is of type int8.

I need to do a unique count of the rows of this matrix. I'm minded to do it a bit like this:

% Data:
x = int8([mod(1:100, 2); mod(1:100, 6); mod(1:100, 4)]+1).';

% Engine:
x = sortrows(x);
d = [true; any(diff(x), 2)];
o = x(d, :);
ans = [o, diff([find(c); length(c)+1])]

However, I suspect that I could speed things up by making x into a double like this

x = double(x) * [2^16; 2^8; 1];

and then separating the double back out into integers afterwards. Problem is, I can't work out how to get doubles back into integers.

Could anyone provide a hand?

Subject: Separating doubles into separate integers

From: Steven_Lord

Date: 13 Jan, 2012 20:24:43

Message: 2 of 21



"james bejon" <jamesbejon@yahoo.co.uk> wrote in message
news:jeq1vc$l73$1@newscl01ah.mathworks.com...
> Dear All,
>
> I have a matrix 3 cols wide and a good few million rows long. The data is
> of type int8.
>
> I need to do a unique count of the rows of this matrix. I'm minded to do
> it a bit like this:

*snip*

Why not just use the UNIQUE function with the 'rows' option?

http://www.mathworks.com/help/techdoc/ref/unique.html

--
Steve Lord
slord@mathworks.com
To contact Technical Support use the Contact Us link on
http://www.mathworks.com

Subject: Separating doubles into separate integers

From: John D'Errico

Date: 13 Jan, 2012 20:25:09

Message: 3 of 21

"james bejon" wrote in message <jeq1vc$l73$1@newscl01ah.mathworks.com>...
> Dear All,
>
> I have a matrix 3 cols wide and a good few million rows long. The data is of type int8.
>
> I need to do a unique count of the rows of this matrix. I'm minded to do it a bit like this:
>
> % Data:
> x = int8([mod(1:100, 2); mod(1:100, 6); mod(1:100, 4)]+1).';
>
> % Engine:
> x = sortrows(x);
> d = [true; any(diff(x), 2)];
> o = x(d, :);
> ans = [o, diff([find(c); length(c)+1])]
>
> However, I suspect that I could speed things up by making x into a double like this
>
> x = double(x) * [2^16; 2^8; 1];
>
> and then separating the double back out into integers afterwards. Problem is, I can't work out how to get doubles back into integers.
>
> Could anyone provide a hand?

dec2base would be your friend, at least it would be if it
allowed a base larger than 36. It does not.

Failing that, simplest is to go a wee bit more hardcore
and simply use mod or rem, twice.

Since you have only a very restricted set, even faster
would be a direct table lookup. Of course, that would
be extremely efficient.

John

Subject: Separating doubles into separate integers

From: james bejon

Date: 13 Jan, 2012 21:02:09

Message: 4 of 21

@Steve,

I find that unique(..., 'rows') just grinds to a halt when working with large matrices.

@John,

Thanks. I'll give that a go.

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 14 Jan, 2012 07:56:08

Message: 5 of 21

"james bejon" wrote in message <jeq64g$6bl$1@newscl01ah.mathworks.com>...
> @Steve,
>
> I find that unique(..., 'rows') just grinds to a halt when working with large matrices.
>

I don't believe it.


> However, I suspect that I could speed things up by making x into a double like this
>
> x = double(x) * [2^16; 2^8; 1];
>
> and then separating the double back out into integers afterwards. Problem is, I can't work out how to get doubles back into integers.

x = int8(256*rand(10,3))
xdouble = double(x) * [2^16; 2^8; 1];

% Engine
y = zeros(length(xdouble),3);
for k=size(y,2):-1:1
    y(:,k)=mod(xdouble,2^8);
    xdouble=floor(xdouble/2^8);
end
disp(y)

% Bruno

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 14 Jan, 2012 09:24:08

Message: 6 of 21

You could also get back to x by:

x8 = rot90(reshape(typecast(uint32(xdouble),'uint8'), 4, []),-1)
% Delete first column if needed >> x8(:,1) = [];

% Bruno

Subject: Separating doubles into separate integers

From: james bejon

Date: 14 Jan, 2012 09:56:08

Message: 7 of 21

That's brilliant, Bruno. Thanks very much. The typecast function was what I was after. I felt sure Matlab would have something like this.

Subject: Separating doubles into separate integers

From: james bejon

Date: 14 Jan, 2012 11:14:08

Message: 8 of 21

...in which case I guess I can go to and fro like this:

x1 = uint8(rand(100, 4)*2^8);
tmp = typecast(x1, 'uint32');
x2 = reshape(typecast(tmp, 'uint8'), [], 4);
isequal(x1, x2)

Subject: Separating doubles into separate integers

From: james bejon

Date: 14 Jan, 2012 23:49:07

Message: 9 of 21

So, I expected the following code to work, but it didn't:

% DATA
d0 = uint8(rand(100, 4)*2)+1;
d1 = d0;
d2 = typecast(d0, 'uint32');

% CHECK THE CONVERSION WORKS:
isequal(d1, reshape(typecast(d2, 'uint8'), [], 4))

% NOW CHECK POST-SORTING:
d1 = sortrows(d1);
d2 = reshape(typecast(sort(d2), 'uint8'), [], 4);
isequal(d1, d2) % <- Comes out FALSE

Am I doing something stupid? (If yes, please give details).

Subject: Separating doubles into separate integers

From: James Tursa

Date: 14 Jan, 2012 23:51:08

Message: 10 of 21

"james bejon" wrote in message <jero20$1eh$1@newscl01ah.mathworks.com>...
> ...in which case I guess I can go to and fro like this:
>
> x1 = uint8(rand(100, 4)*2^8);
> tmp = typecast(x1, 'uint32');
> x2 = reshape(typecast(tmp, 'uint8'), [], 4);
> isequal(x1, x2)

FYI, there is a fast typecast function on the FEX here:

http://www.mathworks.com/matlabcentral/fileexchange/17476-typecast-and-typecastx-c-mex-functions

James Tursa

Subject: Separating doubles into separate integers

From: james bejon

Date: 15 Jan, 2012 00:25:08

Message: 11 of 21

Thanks James. That will prove very useful.

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 15 Jan, 2012 08:18:09

Message: 12 of 21

"james bejon" wrote in message <jet49j$639$1@newscl01ah.mathworks.com>...
> So, I expected the following code to work, but it didn't:
>
> % DATA
> d0 = uint8(rand(100, 4)*2)+1;
> d1 = d0;
> d2 = typecast(d0, 'uint32');
>
> % CHECK THE CONVERSION WORKS:
> isequal(d1, reshape(typecast(d2, 'uint8'), [], 4))
>
> % NOW CHECK POST-SORTING:
> d1 = sortrows(d1);
> d2 = reshape(typecast(sort(d2), 'uint8'), [], 4);
> isequal(d1, d2) % <- Comes out FALSE
>
> Am I doing something stupid? (If yes, please give details).

Did you check the endianness? I believe you have to reverse the bytes [4 3 2 1] to respect the high/low priorities.

For UNIQUE count it doesn't matter though.

Bruno

Subject: Separating doubles into separate integers

From: james bejon

Date: 15 Jan, 2012 14:06:08

Message: 13 of 21

Thanks again, Bruno. I can't seem to get the equivalent of multiplying by [2^24; 2^16; 2^8; 1] working though.

Still, I get some reasonable time savings this way. Hopefully the test is a fair one.


% DATA
n = 100000;
reps = 200;
d0 = uint8(rand(n, 4)*2)+1;


% UNIQUE COUNT ONE WAY:
tic
for i = 1:reps
   [u1, ~, c1] = unique(d0, 'rows');
   c1 = accumarray(c1, 1);
   a1 = [uint32(u1), c1];
end
toc


% ANOTHER WAY:
tic
for i = 1:reps
   d2 = sort(uint32(double(d0) * [2^24; 2^16; 2^8; 1]));
   diffs = [true; logical(diff(d2))];
   u2 = d2(diffs);
   u2 = rot90(reshape(typecast(u2, 'uint8'), 4, []), -1);
   c2 = diff([find(diffs); n+1]);
   a2 = [uint32(u2), c2];
end
toc

isequal(a1, a2)

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 15 Jan, 2012 15:16:08

Message: 14 of 21

% DATA
n = 100000;
reps = 1;
d0 = uint8(floor(rand(n, 4)*256));


% UNIQUE COUNT ONE WAY:
tic
for i = 1:reps
   [u1, ~, c1] = unique(double(d0), 'rows');
   c1 = accumarray(c1, 1);
   a1 = [uint32(u1), c1];
end
toc


% ANOTHER WAY:
tic
for i = 1:reps
   d2 = sort(double(d0) * 256.^(3:-1:0)');
   diffs = [true; diff(d2)>0];
   u2 = d2(diffs);
   u2 = rot90(reshape(typecast(uint32(u2), 'uint8'), 4, []), -1);
   c2 = diff([find(diffs); n+1]);
   a2 = [uint32(u2), c2];
end
toc

isequal(a1, a2)

% Bruno

Subject: Separating doubles into separate integers

From: james bejon

Date: 15 Jan, 2012 15:38:08

Message: 15 of 21

Bruno,

Thanks for the response. Not quite sure what you've done here. The expression "sort(uint32(double(d0) * [2^24; 2^16; 2^8; 1]))" seemed to be working (on my computer at least) as it was. What I thought I could do was use a combination of "typecast" and "swap bytes" in order to bypass the need for multiplication in the first place. But I couldn't get it working.

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 15 Jan, 2012 15:58:08

Message: 16 of 21

I told you earlier about the endianess,

 a=uint8(226*rand(1,4))

These two has the same value:

 typecast(a, 'uint32')
 uint32(double(a) * (256.^(0:3)'))

Thus you have to rearrange the data (flip the second dimension of the array) in sortrows to get the same result.

Bruno

Subject: Separating doubles into separate integers

From: james bejon

Date: 15 Jan, 2012 16:24:08

Message: 17 of 21

Right, but I'm trying to do it the other way round. That is, I'm trying to get a typecast statement which is the equivalent of:

uint32(double(a) * (256.^(3:-1:0).'))

If you look at the result of

a = uint8(2*rand(100,4)+1);
disp(sortrows([uint32(a), typecast(a, 'uint32')]))

you'll see that the same 4-number combinations in the first 4 columns can result in different values in the 5-th column.

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 15 Jan, 2012 16:45:09

Message: 18 of 21

"james bejon" wrote in message <jeuuj8$inj$1@newscl01ah.mathworks.com>...
> Right, but I'm trying to do it the other way round. That is, I'm trying to get a typecast statement which is the equivalent of:
>
> uint32(double(a) * (256.^(3:-1:0).'))

 typecast(fliplr(a), 'uint32')

You NEED to to reorder the byte due to the *endianess*. Do you understand what is endianess???

>
> If you look at the result of
>
> a = uint8(2*rand(100,4)+1);
> disp(sortrows([uint32(a), typecast(a, 'uint32')]))
>
> you'll see that the same 4-number combinations in the first 4 columns can result in different values in the 5-th column.

I don't.

% DATA
n = 100000;
reps = 1;
d0 = uint8(rand(n, 4)*2)+1;


% UNIQUE COUNT ONE WAY:
tic
for i = 1:reps
   [u1, ~, c1] = unique(d0, 'rows');
   c1 = accumarray(c1, 1);
   a1 = [uint32(u1), c1];
end
toc


% ANOTHER WAY:
tic
for i = 1:reps
   d2 = sort(typecast(reshape(fliplr(d0)',[],1), 'uint32'));
   diffs = [true; logical(diff(d2))];
   u2 = d2(diffs);
   u2 = rot90(reshape(typecast(u2, 'uint8'), 4, []), -1);
   c2 = diff([find(diffs); n+1]);
   a2 = [uint32(u2), c2];
end
toc

isequal(a1, a2) % 1

Bruno

Subject: Separating doubles into separate integers

From: james bejon

Date: 15 Jan, 2012 17:09:08

Message: 19 of 21

Yes, I understand endianness.

Are you telling me that you get "true" from the last line of the following code?

a = uint8(2*rand(100,4)+1);
x = uint32(double(a) * (256.^(3:-1:0).'));
y = typecast(fliplr(a), 'uint32');
isequal(x, y)

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 15 Jan, 2012 17:46:08

Message: 20 of 21

"james bejon" wrote in message <jev17k$q0a$1@newscl01ah.mathworks.com>...
> Yes, I understand endianness.
>
> Are you telling me that you get "true" from the last line of the following code?
>
> a = uint8(2*rand(100,4)+1);
> x = uint32(double(a) * (256.^(3:-1:0).'));
> y = typecast(fliplr(a), 'uint32');
> isequal(x, y)

Not with this code. With mine yes.

Bruno

Subject: Separating doubles into separate integers

From: Bruno Luong

Date: 15 Jan, 2012 18:00:09

Message: 21 of 21

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message
>
> d2 = sort(typecast(reshape(fliplr(d0)',[],1), 'uint32'));

This achieve the same thing and might be more symmetrical with the back conversion:

d2 = sort(typecast(reshape(rot90(d0),[],1), 'uint32'));

Bruno

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
count james bejon 13 Jan, 2012 14:54:11
unique james bejon 13 Jan, 2012 14:54:11
combine james bejon 13 Jan, 2012 14:54:11
integer james bejon 13 Jan, 2012 14:54:11
double james bejon 13 Jan, 2012 14:54:11
rssFeed for this Thread

Contact us at files@mathworks.com