# Extracting variables from a table column using information from an other column

4 views (last 30 days)
Deena Mobbs on 4 Feb 2020
Commented: Deena Mobbs on 23 Feb 2020
Hello,
Suppose I have a table
x = ['a';'b';'a','c'];
var_a = [1:4]';
var_b = [6:9]';
var_c = [13:16]';
T = table(x,var_a,var_b,var_c)
I want to create a new column, let's call it y that uses the value in column x to choose from the other columns. For example in row 1,
x(1) = 'a', therefore y(1) = var_a(1)
x(2) = 'b', therefore y(2) = var_b(2)
My final table would look like this:
x var_a var_b var_c Y
_ _____ _____ _____ __
a 1 6 13 1
b 2 7 14 7
a 3 8 15 3
c 4 9 16 16
My actual table is many thousands of rows so I have been trying to solve this using rowfun or varfun but an added complication is that I don't know the variable names in advance. The columns will be 'var_' but the 'a','b','c' are user supplied codes.
Using rowfun? The logic is .. for each row, get the value from column x, make a string 'var_?', find which column is called 'var_?', set y = the value in column 'var_?'.
% find the columns called var_
col = cellfun(@isempty, regexp(T.Properties.VariableNames, 'var.'));
% for row 1, find which column is the right one for the value in x
col2 = cellfun(@isempty, regexp(T.Properties.VariableNames(~col), strcat('var_', T.x(1))))
% set y to the right value
y = max(~col2.*table2array(T(1,~col)));
This works for each row but I don't know how to built this into a rowfun statement that will work on the table. Or am I overcomplicating things!? Is there an easier way to do this?

Sai Bhargav Avula on 21 Feb 2020
Hi,
rowfun is a powerful function when handling tables. But in your case it might require some extra processing to work with rowfun function.
The result which you are trying to produce can be easily be achieved without the use of rowfun(even though rowfun might bring generalization, I believe that is not needed in this case because it increases the complexity). You mentioned the table T have several rows but the columns are limited to var_*(a,b,c). In that case you try the below given code for achieving the intended result
col_a = find(~cellfun(@isempty, regexp(T.Properties.VariableNames, 'var_a')));
col_b = find(~cellfun(@isempty, regexp(T.Properties.VariableNames, 'var_b')));
col_c = find(~cellfun(@isempty, regexp(T.Properties.VariableNames, 'var_c')));
col_x = find(~cellfun(@isempty, regexp(T.Properties.VariableNames, 'x')));
rows_a = find(T.x == 'a');
rows_b = find(T.x == 'b');
rows_c = find(T.x == 'c');
T.Y((rows_a(:))) = T{rows_a(:),col_a};
T.Y((rows_b(:))) = T{rows_b(:),col_b};
T.Y((rows_c(:))) = T{rows_c(:),col_c};
Hope this helps !
Deena Mobbs on 23 Feb 2020
Genius! Thanks. This works a treat and is very fast too.
I can also now see how to use the same row/column logic on other parts of my code, so this is really helpful. Thanks again.