Is there an efficient way to sort specific ranges/subsections of rows independently in a table by a particular column?

I want to sort sets of rows independently within a table e.g. a table such as
A 5
C 3
D 1
B 4
E 2
B 3
C 1
A 2
such that the first 5 rows and the last 3 rows are sorted separately in ascending order by column 2 so that the resulting table produced would be
D 1
E 2
C 3
B 4
A 5
C 1
A 2
B 3
Using nested loops is highly inefficient and I've been unsuccessful using sortrows(). Thanks for any suggestions.

2 Comments

Is there any sort of logic to where the splits occur? Based on what you've stated, it seems like the dividing was done arbitrarily. If you can articulate the criteria, we might be able to help come up with a solution.
Sorry that I left out the critical point on the splits. They are identified by another column that uniquely identifies all the rows that are grouped together. So for example from the previous example I gave, there is another column with a unique identifier (e.g in column 3) that determines all the rows of a subgroup. There is no overlap between subgroups, but beyond that the number of rows in any subgroup is random.
C 3 X
D 1 X
B 4 X
E 2 X
B 3 Y
C 1 Y
A 2 Y
and the desired output is then for this example is of the form
D 1 X
E 2 X
C 3 X
B 4 X
A 5 X
C 1 Y
A 2 Y
B 3 Y

Sign in to comment.

 Accepted Answer

splitapply() provided that you do not mind that disconnected regions with the same identifier code will be brought together

10 Comments

Thanks. I've been trying to use splitapply() (along with findgroups() and sortrows()) but haven't been able to make that work yet.
v1 = {'C'; 'D'; 'B'; 'E'; 'B'; 'C'; 'A'; 'B'};
v2 = [3; 1; 4; 2; 2; 3; 1; 2];
v3 = {'X'; 'X'; 'X'; 'X'; 'Y'; 'Y'; 'Y'; 'Y'};
T = table(v1, v2, v3);
G = findgroups(T(:,3));
temp = splitapply(@(varargin) {sortrows(table(varargin{:}),2)}, T, G);
newT = vertcat(temp{:});
Thanks very much. As I mentioned above though, the number of subgroups and entries in each is quite large, so specifying things a priori is impossible.
The order criteria in your newer description is not the same as the original question.
T = readtable('sample.xlsx', 'readvariablenames', false);
G = findgroups(T(:,1));
temp = splitapply(@(varargin) {sortrows(table(varargin{:}),3)}, T, G);
newT = vertcat(temp{:});
Sorry for the confusion. This works fine. Thank you very much.
Hello Walter,
I have a similar type of query. How to solve the above problem if I want to seggregate table T on the basis of v3 but dont want to change the order of v1 and v2.
Can you please suggest me the solution.
I am not clear as to what you are asking for.
splitapply() works roughly like,
for Gr = reshape(unique(G), 1, [])
subset = TheTable(G == Gr, :);
process subset
end
The entries in the table will not be reordered (though you should not count on that, really), but the subset will have all of the rows in the table that had the same grouping value.
If your table had something like
row1 group1
row2 group1
row3 group2
row4 group1
row5 group1
and if you do not want row4 and row5 to be processed together with row1 and row2 because of the gap, then you are not using the correct grouping variable values.
I am really sorry for not putting my question properly. What I meant is, if my data looks like below.
v1 = {'C'; 'D'; 'B'; 'E'; 'B'; 'C'; 'A'};
v2 = [3; 1; 4; 2; 3; 1; 2];
v3 = [1; 1; 1; 1; 2; 2; 2];
I want ouput to be
C 3 1 B 3 2
D 1 1 C 1 2
B 4 1 A 2 2
E 2 1
Means I simply want to cut the table on the basis of v3 without 'sorting' the rows of the table and store the output in different cells.
I have a very large dataset of similar type.
Thank you so much.
splitapply will not sort the entries. Use a function that returns a cell array around the inputs that you are given.
Oh I see, you are confused by the line of code that I had given other person,
temp = splitapply(@(varargin) {sortrows(table(varargin{:}),2)}, T, G);
Their particular request was to sort values, so there had to be a sorting call for their purposes. For your purposes just leave out the sorting step
temp = splitapply(@(varargin) {table(varargin{:})}, T, G);
Thank you so much.
It seems I am doing something wrong then, because I am using the same code, I will look into my data again.
But thank you so much for the help.

Sign in to comment.

More Answers (1)

Took a stab. Here are my assumptions
  1. Groups do not overlap
  2. There are the same number of elements in each group with the exception of the final group
These allow me to create a new variable to represent the groups. I can then use sortrows to use the new group variable for the primary sort and your numeric variable for the secondary.
% Create your table
var1 = categorical(["A","C","D","B","E","B","C","A"])';
var2 = [5 3 1 4 2 3 1 2]';
T = table(var1,var2)
% Determine number of elements in a group and the number of groups in the table
cnt = length(unique(T.var1))
grps = ceil(height(T)/cnt)
% Create the grouping variable.
% Start by making an array with a row for each element in a group, and a column for each group
var3 = ones(cnt,1)*[1:grps];
% Use linear indexing to convert the array to a column vector
var3 = var3(:);
% Before adding to the table, make column same height as table
var3 = var3(1:height(T))
% Add new grouping variable to the table
T=addvars(T,var3)
% Sort using grouping variable as primary sort (3), and 2nd column for secondary sort(2)
T = sortrows(T,[3 2])
T =
'D' 1 1
'E' 2 1
'C' 3 1
'B' 4 1
'A' 5 1
'C' 1 2
'A' 2 2
'B' 3 2

3 Comments

Thanks for info. Unfortunately, the number of members of any group is arbitrary. Also the number of entries and subgroups of the actual data is quite large so its impossible to assign them a priori.
Attached is some actual sample data. The first row identifies all of the data from a single subgroup (all rows with the sam PIN number which are contiguous). I'm trying to order every row in each subgroup in ascending order according to column 3. I've tried using splitapply() but haven't been able to make that work (I couldn't even get the function to recognize the columns of data!). Any suggestions?
Assuming the same number of members in each group was an assumption I made to help create the grouping variable. If your data has a grouping variable then there can be a different number in each group.
With the dataset you shared, I'd do the following
opts = spreadsheetImportOptions('NumVariables',3);
opts = setvaropts(opts,3,'TreatAsMissing','<missing>'); % C232
opts = setvartype(opts,[1 2],'categorical');
opts = setvartype(opts,3,'double');
data = readtable('sample.xlsx',opts,'ReadVariableNames',false);
dataSorted = sortrows(data,[1,3]);
Note that the groups are placed in alphabetical order. Not sure if one of your design criteria was that they don't change order. If so, you can add the following after the readtable command but before the sortrows to preserve the order of the PIN numbers.
[~,ic,~] = unique(data.Var1);
data.Var1 = categorical(data.Var1,data.Var1(sort(ic)),'Ordinal',true);

Sign in to comment.

Categories

Asked:

on 31 Jan 2019

Commented:

PS
on 2 Mar 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!