Finding 'duplicate' rows which have different values in the same pattern

Question

Anthony on 13 Nov 2014

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/162609-finding-duplicate-rows-which-have-different-values-in-the-same-pattern

Commented: Guillaume on 14 Nov 2014

Open in MATLAB Online

For my purposes,

[1 2 2 3 1] is equivalent to

[2 3 3 1 2].

What can I do to have MATLAB recognize and eliminate these 'duplicates?'

Physically, I have 17 populations that I want to group into proportionate voting districts. One population can be 'split' to elect two representatives, and populations can be combined to elect one or more representatives, but a split population cannot be combined with any other population.

I'm using RAND to create all possible groupings, storing unique groupings in C. Unfortunately, C grows too large and I run out of memory. I can reduce the storage requirement by storing only "useful" combinations (some are very disproportionate, and I could check and drop those possibilities within the loop). However, a bigger memory problem is that one type of 'duplicate' is not recognized by UNIQUE, that is, rows that match in their pattern but with entries of different values.

I later plan to sort the list by standard deviation in population of each group, and to evaluate the top 50 or so possibilities on qualitative factors.

I would also appreciate any big-picture ideas you have for solving my problem, particularly for accommodating the possibility that populations or groups of populations could be permitted to elect more than one representative.

I initially tried to tackle this problem as a methodical, iterative grouping, but switched to the random approach after realizing the scope of the possibilities because I only need to run this code once and my coding time is limited while my processing time is not.

C = zeros(0,17);
i = 0;
while i ~= 3
    sizea = size(C,1);
    blockrows = ceil(rand(100000,17).*17);
    C = unique([C;blockrows],'rows');
    sizeb = size(C,1);
    if sizeb-sizea == 0
        i = i+1;
    else
        i = 0;
    end
end

I am in R2012A, but can access a lab with R2014a if beneficial.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Guillaume on 13 Nov 2014

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/162609-finding-duplicate-rows-which-have-different-values-in-the-same-pattern#answer_158895

Edited: Guillaume on 13 Nov 2014

Open in MATLAB Online

Why not sort each row before passing them to unique?

tempC = [C; blockrows];
[~, idx] = unique(sort(tempC, 2), 'rows');
C = tempC(idx, :);

4 Comments
Show 2 older commentsHide 2 older comments

Anthony on 13 Nov 2014

Open in MATLAB Online

The order of the columns encodes additional data, therefore it is the matching of numbers by index which distinguishes different rows, not the numbers themselves. Physically,

[1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3]

means "Populations 1-6 get put into Group 1, Populations 7-12 get put into Group 2, and Populations 13-17 get put into Group 3." This is important because the populations have unique characteristics.

A similar problem is picking teams for a three-way game so that the sum of each team's players' level of skill is as close together as possible. The full data set might look like:

['Angelo' 'Genna' 'Alaine' 'Colleen' 'Howard' 'Bennett' 'Dorene' 'Isaiah' 'Rena';
      70      89       74        58       90        79       80       67     74 ;
       1       2        1         1        3         2        3        1      1];

where the first row is a name, the second row is the level of skill, and the third row is the team each player is placed on. You can see that one team being composed of Genna and Bennett is the same thing whether that team is called "2" or "3."

This is why [1 2 2 3 1] is equivalent to [2 3 3 1 2], which are not equivalent to [1 1 2 2 3] or [1 2 2 3 3].

Guillaume on 14 Nov 2014

Open in MATLAB Online

Yes, sorry, I didn't pay enough attention that your original example wasn't just or reordering of numbers.

I believe my diff answer (with a bug corrected) does what you want though

[~, idx] = unique(mod(diff(C, 1, 2) - 1, 16), 'rows')
C = C(idx, :);

Sign in to comment.

Answer 2

Star Strider on 13 Nov 2014

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/162609-finding-duplicate-rows-which-have-different-values-in-the-same-pattern#answer_158896

I am not certain what you’re doing, but consider using perms or one of its friends (links at the end of the documentation page) to generate your permutations.

2 Comments
Show NoneHide None

Anthony on 13 Nov 2014

Thanks for your reply. I've just tried to use function ALLCOMB from the file exchange, and exceed memory capacity with just three groups. There are 131,072 possibilities with two groups. I suspect that I will similarly exceed memory capacity for any function which tries to take the problem on all at once. I'll do some thinking on how to segment permutation problems into manageable chunks.

Star Strider on 14 Nov 2014

My pleasure!

I still don’t understand what you want to do, but something like this may yield to a graphic approach. I’m thinking of voronoi, inpolygon and related functions.

Sign in to comment.

Finding 'duplicate' rows which have different values in the same pattern

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

4 Comments
Show 2 older commentsHide 2 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

Finding 'duplicate' rows which have different values in the same pattern

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

4 Comments Show 2 older commentsHide 2 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments

2 Comments
Show NoneHide None