Analyzing data from dataset structures using kruskalwallis function and grouping variable

4 views (last 30 days)
Hi,
I'm having trouble determining the most elegant method for analyzing data from my dataset structure, and it may hinge on my (lack of) understanding of the 'grouping variable' I'm trying to apply when performing a 'kruskalwallis' function call OR the grouping variable may just be implemented in a clunky way.
The essence of the problem is this: I have a dataset with many rows and columns, and I want to be able to perform analysis (using anova1 or kruskalwallis) on any grouping of the data simply by specifying different groups (specifically, more than one group at a time) in my grouping variable when I invoke the specific analysis function. The problem is that different functions appear to respond differently to the same grouping variable, some accepting it, some rejecting it.
E.g. If I perform a 'boxplot' function call with the following notation, referencing three of my different columns, 'A1', 'txt_length' and 'gxt': boxplot(A_15_acute_chronic.A1,{A_15_acute_chronic.txt_length,A_15_acute_chronic.gxt},'notch','on');
... everything works fine. The boxplot shows results for 8 different groups (2 (from txt_length) x 4 (from gxt) = 8 total).
However, if I perform a 'kruskalwallis' function call with the following notation: [p,t,s] = kruskalwallis(A_15_acute_chronic.A1,{A_15_acute_chronic.txt_length,A_15_acute_chronic.gxt});
I get this error: Error using anova1 (line 80) X and GROUP must have the same length.
Error in kruskalwallis (line 44) [p,anovatab,stats] = anova1('kruskalwallis', varargin{:});
Yet the grouping variables are the same between the calls, and they're taken from the same dataset as the observations so they're definitely the same length.
I know a clunky way of dealing with this problem, and that is to specify a new column/variable in the dataset which simply encodes the combination of the two lower variables, but this seems ridiculously inelegant and tedious. Can anyone tell me what I might be doing wrong?
Thanks for your time, Chris

Answers (2)

Peter Perkins
Peter Perkins on 12 Mar 2012
Chris, I think you should be using FRIEDMAN, not KRUSKALWALLIS. The latter is for a one-way test, and you're using two grouping variables.
  2 Comments
Chris McGraw
Chris McGraw on 12 Mar 2012
>Peter Perkins
Thanks for your reply. You may be correct about this, but running the 'friedman' function didn't work either due a seemingly different error:
|Undefined function 'gt' for input arguments of type 'cell'.|
|Error in friedman (line 48)
if (reps>1)|
My main gripe is the fact that some grouping variable, G, which is a cell array containing two arrays of nominals, is considered valid input for one function ('boxplot') but not others ('kruskalwallis','friedman'), despite the fact that they all technically accept grouping variables.
Chris McGraw
Chris McGraw on 12 Mar 2012
Whoops, 'friedman' doesn't actually accept a grouping variable, which would explain it in that case, but 'kruskalwallis' does.

Sign in to comment.


Peter Perkins
Peter Perkins on 12 Mar 2012
Kruskal-Wallis as I understand it is a one-way test. Accepting two grouping variables would not make sense. Nothing to do with MATLAB.
Friedman's test is for balanced data, which is why the FRIEDMAN function doesn't have the same signature as, say, BOXPLOT or ANOVAN.
It does seem that the error messages you saw are not all that helpful though. I've made a note to look into improving them.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!