How do I do Regression for multiple subjects
Show older comments
Hi
This a stats question for regression analysis
I have data for 19 subjects who did 8 blocks of a task.
I want to see if data increase "significantly" across blocks.
Which option is correct (statistically speaking)
They give different results!!
Option 1: Fit every subject and then t-test on the coefficient of rate of change (slope)
coeff = [];
for s=1:19
fit = fitlm(1:8, data(s,:)) ;
coeff(s,1) = fit.Coefficients.Estimate(2) ;
end
[h,p,ci,stats] = ttest(coeff)
p = 0.1030
Option 2: Average across all subject and then run fitlm (on the group average) and use the stats in the fit LinearModel
M = mean(data) ;
fit = fitlm(1:8, M) ;
fit.Coefficients.pValue(2) = 0.155
thanks
pb
Answers (2)
Strictly speaking, you're supposed to know the test for hypothesis before designing and running the experiment.
The Q? here is what is the actual null hypothesis to test? If the hypotheis is one that there is a difference in difficulty between tasks, then the better test would be Friedman...if the question is whether there's a trend in difficulty versus test, that's different. Then it's a Q? of whether it's for the population as a whole or for each individual.
Without additional background on the test and objectives, I'd suggest Friedman.
data=readmatrix('data.txt');
stackedplot([data mean(data,2)])
friedman(data);
It is pretty low for significance, but you have no replications. As the bottom plot shows, there does appear to be a slight curvature in the average versus block although it would be difficult to show significance I suspect (I didn't try).
If the Friedman test were to turn out significant there are then post-hoc tests that can be utilized to identify which specifically differ.
I considered using friedman however there are no actual reeititions, consideering that there are 19 subjects and 8 tasks. However, the t-test is inappropriate, since the data appear not to be normally distributed. If none of the task scores can be zero or negative, then the lognormal distibution may be more appropriate.
A1 = readmatrix('data.txt')
figure
plot((1:size(A1,2)), A1)
grid
xlabel('Task')
ylabel('Sobject Score')
legend(compose('Subject #%2d', 1:size(A1,1)), Location='northoutside', NumColumns=4)
xlim('padded')
figure
boxchart(A1, Notch='on')
xlabel('Task')
ylabel('Sobject Score')
figure
boxplot(A1, Notch='on')
xlabel('Task')
ylabel('Sobject Score')
figure
tiledlayout(4,5)
for k = 1:size(A1,1)
nexttile
histfit(A1(k,:), 5, 'lognormal')
title(sprintf('Subject #%2d',k))
end
sgtitle('Lognormal Fit')
figure
tiledlayout(4,5)
for k = 1:size(A1,1)
nexttile
histfit(A1(k,:), 5, 'normal')
title(sprintf('Subject #%2d',k))
end
sgtitle('Normal Fit')
tmean = mean(A1) % Task Means
tmed = median(A1) % Task Medians
Looking at the median values (also displayed in the boxchart and boxplot figures), there does not appear to be any specific trend across tasks.
I am not certain what to suggest, however there does not appear to be anything significant across tasks in these results.
.
6 Comments
You have 19 subjects each of whom did eight different tasks once. That's not replication unless some of the eight tasks are the same task presented again?
I don't think regression for slope is meaningful here; the task numbers aren't a suitable predictor variable.
I'm not certain what your oriignal design or intent is.
There doesn't appear to be any specific trend in the median values for each task block.
A1 = readmatrix('data.txt');
tmed = median(A1)
mdl = fitlm(1:numel(tmed), tmed)
figure
plot(mdl)
grid
.
dpb
7 minutes ago
The null hypothesis is yet to be clearly stated. If the "block" is considered an ordinal sequence number and the hypothesis is that there is an overall increase in duration with time/repitition, then the question is whether the conclusion is whether the change is for the population as a whole or for any specific individual. Presuming the population, the answer is obvious, use a measure of the durations of the population. Given @Star Strider's observation the observations are not normally distributed, the median may be a better choice or applying a Box transformation first.
If one were to do each individually, one has the isse of deriving a suitable correction for the problem of significance inflation for multiple comparisons.
Pat
8 minutes ago
Categories
Find more on Gaussian Process Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!






