# Nonlinear regression with categorical predictor?

12 views (last 30 days)

Show older comments

wesleynotwise
on 24 May 2017

Commented: wesleynotwise
on 25 May 2017

Suppose I have the following equation

y = (k1|x1)*(k2*x2)*(k3*x3^(1/k4))

where k1 is a coefficient depending on variable x1. Both k2 and k3 are also coefficients, and k4 is a power term, which all the ks are not known to me.

x1 is categorical, say adult male, young male, adult female and young female, and x2 and x3 are numeric, say x2 = weight 75kg, 62kg, 89kg... and x3 = height 180 cm, 172 cm, 170 cm...

Anyone knows how to perform a regression for such a combination of data to find all the ks? and eventually the model has two values for k1, for example: if x1 = male, k1 = 2.5; if x1 = female, k1 = 1.5.

##### 2 Comments

the cyclist
on 24 May 2017

I haven't thought about how to model this whole thing, but the term

k1*x1

is problematic, I think, when x1 is categorical. For example, what does "6 times male" mean?

Since you didn't mention explicitly, I assume that x2 and x3 are interval data?

### Accepted Answer

Michelangelo Ricciulli
on 24 May 2017

Ok, a very simple way to do it is the following.

You can use the function fminunc, that finds the minimum value of something. What is this something? You want your model to predict very well the y value, so the mean square error between y and the model is what you want to minimize. Let's define this function as errFunc depending on a vector param:

errFunc=@(param) mean((y - k1(x1)*(param(1)*x2)*(param(2)*x3.^(1/param(3)))).^2);

This works if you already have in your environment the data x1, x2,x3,y and also the function k1 that returns the right coefficient based on k1.

Then, you just need to call fminunc with the function you just created and a guess of the 3 values you are searching (let's just put random numbers as guess)

fminunc(errFunc,randn(3,1))

this will output the value of param you are searching for.

Probably you'll need to add another another coefficient, let's call it param(4), that is summed to your model to better fit the data.

##### 12 Comments

Michelangelo Ricciulli
on 25 May 2017

Check the other answer, it makes a very good point. Sorry if I didn't notice that before

### More Answers (1)

Ilya
on 25 May 2017

Unless I misunderstood your dot notation, the problem is ill-defined. It has an infinite number of solutions. Rewrite it in this form:

y = ((k1|x1)*k2*k3) * x2*x3^(1/k4)

Observe that you can only fit for ((k1|x1)*k2*k3), but not for separate coefficients k1, k2 and k3.

Generally, the best way to handle multiplicative models is to turn them into additive models by taking the log of both sides. If you do that, you get

q = (c1|x1) + c4*z3

where

q = log(y) - log(x2)

c1 = log(k123|x1)

k123|x1 = (k1|x1)*k2*k3

c4 = 1/k4

z3 = log(x3)

This model can be easily fitted by fitlme. (If you have only two levels in x1, you can easily get away without fitlme.) The formula would be something like 'q ~ (1|x1) + z3'.

##### 3 Comments

Ilya
on 25 May 2017

### See Also

### Categories

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!