Duplicate and delete genotypes from population to obtain a specific phenotypic distribution

Question

Gil Henriques on 28 Apr 2015

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/213967-duplicate-and-delete-genotypes-from-population-to-obtain-a-specific-phenotypic-distribution

I have a list of many individuals, each characterized by 6 loci (numbers), organized as follows (example):

 %      Maternal        Paternal
 %    x1   x2   M      x1  x2   M 
 G = [0.1  0.2 0.3    -0.2 1.2 0.1;
      0.9 -0.2 0.2     0.0 0.9 0.2;
      0.9 -0.2 0.2     0.0 0.9 0.2;
     -0.3  1.0 0.4    -0.1 1.0 0.3;
      ...  ... ...     ... ... ...]

Each row is the genotype of an individual. The order of the loci is as follows: x1 (maternal), x2 (maternal), M (maternal) followed by x1 (paternal), x2 (paternal), M (paternal).

As you can see, individuals #2 and #3 have the same genotype; I can easily obtain the previous matrix in "short" format if it is more convenient (using unique and accumarray), that is:

 %            Maternal         Paternal    Counts
 %           x1   x2   M      x1  x2   M  
 Gunique = [0.1  0.2 0.3    -0.2  1.2 0.1    1;
            1.0 -0.2 0.2     0.0  0.2 0.2    2;
            0.5  1.0 0.4     0.5 -1.0 0.3    1;
            ...  ... ...     ... ... ...   ...]

(Where the seventh column is a "count" column showing how many individuals with that genotype exist in the population.)

I can also easily obtain the phenotypes that exist in this population. A phenotype is, for the current purposes, simply the average of the maternal x1 and x2 traits with the corresponding paternal traits. M is not relevant for the determining the phenotype.

 %               x1   x2  Counts
 Phenotypes = [-0.05 0.7    1;
                0.5  0.0    3;
                ...  ...  ...]

So in this case, individuals #2, #3, and #4 from matrix G all have the same phenotype. If necessary, it's also easy for me to get the Phenotypes matrix in "long" format (no "counts" column, repeated rows instead), in such a way that the length of "long" Phenotypes equals the length of G.

This is the background, I hope everything is clear so far.

*

So here is the problem: after applying a given function, I get a column telling me how many individuals with each existing phenotype I should get in the next generation. I might get, for instance, this column:

NewGenerationCounts = [2; 2; ...];

This would mean that the phenotype [-0.05 0.7] would increase from 1 individual to 2 individuals. Analogously, one of the 3 individuals with phenotype [0.5 0.0] (one of them at random) would be removed from my list. And so on.

So — having all this, how can I obtain the new matrix G where some rows are removed, others are duplicated, in such a way that the phenotypic distribution of the population obeys NewGenerationCounts?

I hope to have made myself clear. I realize this probably requires a loop, and I will be very glad if you can help me using one, but I will be even happier if no loops are necessary. I appreciate any help you can give me! :)