Find average based on multiple columns of data structure

11 views (last 30 days)
I have data in the following format:
A B C
1 1 1 a
2 2 1 0
3 3 1 a
4 4 1 a
5 6 1 0
6 1 1 a
7 2 1 a
8 4 1 a
9 6 1 0
10 1 2 b
11 2 2 b
12 3 2 0
. . . .
. . . .
. . . .
99 1 20 c
100 2 20 0
101 6 20 c
I want to find the average of the values in Column-C for the unique values in Column-B for the unique values in Column-A. For example, average the non-zero values in Column-C (say, "a"), where the value of Column-B is "1" AND the value of Column-A is "1"; then do the same for Column-C values of "b" with Column-B values of "2" AND Column-A values of "1"; and so-forth and so-on through all the entries.
I have tried looping over the values in Column-A along with a loop over the values in Column-B, with no success. I have thought about "unique" or "foreach", but I have little experience with those functions, and I have not had time to attempt an implementation.
Thoughts?
  3 Comments
Image Analyst
Image Analyst on 8 Aug 2014
Please make it easy for us to help you, not hard . Many/most of us are waiting for you to upload code to generate such a matrix, table, or cell array before we try anything.
Paul Huter
Paul Huter on 9 Aug 2014
The data is coming from an Excel file, which I am reading in with 'xlsread':
[data raw text] = xlsread(FILE)
I am then trying to work with 'data', and tried:
for i = 1:length(data(:,1))
for j = 1:length(data(:,2))
...
end
end
But that did not work. I thought about unique or foreach(unique), but I am not comfortable enough with their syntax and the HELP in Matlab really did not help all that much. I know I have used code, before, which used those functions, so it may be what I need to do, but assistance with coding it up would be appreciated (just getting me started would be great).
I realized that I had an error in my original post. Looking at it again, I would like to average the following (row-column):
1-C and 6-C and other Column-A = 1 and Column-B = 1
2-C and 7-C and other Column-A = 2 and Column-B = 1
3-C and other Column-A = 3 and Column-B = 1
4-C and 8-C and other Column-A = 4 and Column-B = 1
5-C and 9-C and other Column-A = 6 and Column-B = 1
10-C and other Column-A = 1 and Column-B = 2
11-C and other Column-A = 2 and Column-B = 2
12-C and other Column-A = 3 and column-B = 2
...etc
Any pointers, code snippets, or full functions are appreciated.
Thank you.

Sign in to comment.

Answers (1)

Nir Rattner
Nir Rattner on 8 Aug 2014
Assuming your data is stored in a matrix, you can use the "unique" and "accumarray" functions. First, it seems that you want to remove all rows where the the third column is equal to zero. Next, you can pull the unique row entries considering only the first two columns to get your subscript for the "accumarray" function. Finally, you can use the "accumarray" function to take the average grouped by your unique subscripts:
M(M(:, 3) == 0, :) = [];
[i,~,j] = unique(M(:, [1,2]), 'rows');
[i, accumarray(j, M(:, 3), [], @mean)]

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!