How to estimate Y using a linear regression and an uncertain X value.

Question

Samuel on 10 Nov 2023

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/2045810-how-to-estimate-y-using-a-linear-regression-and-an-uncertain-x-value

Commented: Star Strider on 18 Nov 2023

I have a training dataset that I'd like to perform regression upon so that I can use the resulting regression to estimate values of Y for other samples for which I only have X. This is standard, using polyfit and polyval, however I have uncertainties in my new X value, and polyval doesn't seem to be able to handle uncertainties in X.

More detail: In my training set I have 100 paired observations of X and Y (in my case, these are observations of Aluminium content and Calcium content of 100 rock samples). These samples don't have any uncertainty, for ease. A linear regression with uncertainty is simple enough:

%Synthetic data for example
X_data = 1:0.25:25;
Y_data = -0.3*X_data + 2*randn(1,100);
%Create linear regression on data
[p, S] = polyfit(X_data, Y_data, 1);

If I now wanted to estimate the Y value in a new sample, if I had it's X value, I would use polyval. However, my new X value (Xnew) has uncertainty, such that Xnew has a mean of 12 and an stddev of 3. I cannot see any way to incorporate the Xnew uncertainty into polyval.

How can I evaluate Y whilst accounting for both the uncertainty in the linear regression and the uncertainty in my new X value?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 10 Nov 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2045810-how-to-estimate-y-using-a-linear-regression-and-an-uncertain-x-value#answer_1350880

If I understand corectly what yoiu want to do, the best approach might be to use fitlm and then use the appropriate predict function (I beleive that is the correct function, however MATLAB will automatically choose the correct one, so no worries) since it will incorporate the uncertainties in the parameters and produce a ‘Y’ value and an associated confidence interval for each value of ‘X’ that you provide. (There are ways to calculate the parameter uncertainties for polyfit — I wrote one and need to post the update — as well as a way of calculating the prediction confidence intervals, one of which I also wrote and does not need to be updated.) Those will work, however fitlm and its predict function are easier in this instance.

As for accounting for the uncertainties in your new ‘X’ value, I would just use a range of each of the ‘X’ values (either the ±95% coinfidence interval extremes as well as the mean, or just the extremes, or something else that makes sense in the context of your calculations). Use those as your arguments to predict.

3 Comments
Show 1 older commentHide 1 older comment

Samuel on 17 Nov 2023

Open in MATLAB Online

I decided to write a code to do what I needed. I might be missing a trick however. I'll put my thought process and then the code below:

Polyval can only run with deterministic values of X. So, randomly sample from uncertain X value I want to evaluate (a normal distribution with X_mu and X_sigma). Run polyval on all the X values.
For each X value evaluated with Polyval, polyval returns a Y_mean and Y_sigma. I randomly sample some number of counts from each of these Y distributions.
I fit a normal distribution to the combined set of all random samples. If in step 1 I randomly sample from X 200 times, and then in step 2, for each 200 distributions of Y, sample 4000 times, the set of samples I am fitting a distribution to in step 3 has 200*4000 samples, capturing both the variability that X can take and the variability of each value of Y.

This returns a mean and 1sigma uncertainty for Y that accounts for both the uncertainty in X and in the linear regression.

My function is below.

function[mu_final, sigma_final] = polyval_wxerr2(p, S, xcal_mu, xcal_sigma, numreps, numsamples)
% p = 1st output of polyfit (linear regression)
% S = 2nd output of polyfit (linear regression)
% xcal_mu = Mean of X to calibrate
% xcal_sigma = Sigma of X to Calibrate
% numreps = How many times to sample from X distribution (good value is
% 500-1000)
% numsamples = How many samples from each Y distribution to contribute to
% final (good value is 10,000)
%% 1. Run Polyval numreps times, each time using a sampled X value
%Randomly sample from the x dist
x_i = normrnd(xcal_mu, xcal_sigma, numreps, 1);
%Return the y distributions for each x value
[y_i, delta_i] = polyval(p,x_i, S);
%% 2. Take Random Samples (#Samples = numsamples) from Each Y distribution (#Y distributions = numreps)
pd_randsamps = zeros(numreps, numsamples);
for i = 1:numreps
    pd_randsamps(i,:) = y_i(i) + delta_i(i).*randn(1,numsamples);
end
%% 3. Fit a distribution to all the resulting samples (Final # samples = numreps*numsamples)
pd_final = fitdist(pd_randsamps(:), 'Normal');
mu_final = pd_final.mu;
sigma_final = pd_final.sigma;
end

Star Strider on 18 Nov 2023

That seems similar to the bootstrp approach. If you have the Statistics and Machine Learning Toolbox, you might consider it. Otherwise, polyfit can return a covariance matrix as one of its ‘S’ outputs, although I am not certain how that fits with what you want to do.

Sign in to comment.

Answer 2

the cyclist on 11 Nov 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2045810-how-to-estimate-y-using-a-linear-regression-and-an-uncertain-x-value#answer_1350930

This is not exactly what you have asked for, but you could also look into doing the first regression using an errors-in-variables model, e.g. total least squares, which will take into account uncertainty in both explanatory and response variables.

If your real application has just one explanatory variable, then this reduces to a Deming regression. To my knowledge, there is no native MATLAB implementation of this model, but there is at least one user-contributed File Exchange submission, e.g. this one.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to estimate Y using a linear regression and an uncertain X value.

0 Comments
Show -2 older commentsHide -2 older comments

Answers (2)

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to estimate Y using a linear regression and an uncertain X value.

0 Comments Show -2 older commentsHide -2 older comments

Answers (2)

3 Comments Show 1 older commentHide 1 older comment

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments