Optimization of unknown equation

1 view (last 30 days)
Martin P
Martin P on 3 Feb 2016
Edited: Walter Roberson on 4 Feb 2016
I have a matrix of inputs and a corresponding vector of outputs (f(x), where x is a 4x1 vector). I'd like to do an optimization of these inputs using fmincon, but can't figure out how to work around the lack of a defined f(x).

Answers (3)

John D'Errico
John D'Errico on 3 Feb 2016
Edited: John D'Errico on 4 Feb 2016
You don't have an optimization problem, YET.
You DO have a modeling problem. You need to choose a model that will allow you to predict the outputs from the inputs. This is a problem of mathematical modeling. In the end, it MAY become an optimization problem, but not until you choose a model that represents the process.
For example, you MIGHT choose to use simple interpolation. (A virtue of interpolation is it would would require no optimization at all.) Sadly, this will fail, because you have a replicated row in the inputs array.
That is, there are two rows (the 4th and 5th rows) that are:
-1 -1 1 1
However, the 4th and 5th values of y are [0.74 2.20]. As it turns out, this is a huge amount of variation compared to the variation in the y vector, as large as the entire range of your data.
That suggests that your predictions, for virtually ANY model, will be pure and utter crapola. Effectively, the only data point that was provided twice is completely inconsistent. I'm sorry, but this is true. It appears that your process is not at all repeatable. That means that you will have a great deal of difficulty in estimating a useful model for the process. It also suggests that interpolation may not be terribly useful either.
As it is, the apparent huge amount of variability suggests that the best predictive model for this process may be simply the mean of y. Are you truly confident that the various values of y you have provided are not simply random noise?
mean(outputs)
ans =
0.986666666666667
std(outputs)
ans =
0.502300589640248
As such, this is a simple, predictive model for your process. For ANY input combinations, the output can be simply modeled as 0.986666666666667, with a confidence interval around that value of roughly plus or minus 1.
To do better than that, there are two things you need to do:
1. Get better data. More data is often a good thing too.
2. Decide on a model to represent this process.
Only then can you seriously consider how you will proceed.
  2 Comments
Martin P
Martin P on 4 Feb 2016
Thank you, I'm more interested in a method to attain a minimum value than the actual number that will come out of this particular data set. How would interpolation of this data be executed in matlab?
John D'Errico
John D'Errico on 4 Feb 2016
The replicated value for those 4th and 5th points will preclude any interpolation. You MIGHT decide to choose the average value of the output for that pair of points, replacing the two points with one.
As far as interpolation goes, I'm sorry, but interpolation is a TERRIBLE way to predict a minimum. Interpolation is an especially poor idea for this given the apparent variability of your process.
The next option is to consider a low order polynomial model. The problem here is you simply don't have sufficient data to generate such a model, at least one that would have any viability for minimization. So, given 4 input dimensions, a simple quadratic polynomial model would have these terms:
constant
x,y,z,w
x^2,y^2,z^2,w^2
x*y,x*z,x*w,y*z,y*w,z*w
So 15 terms. You have 18 data points, and a HUGE amount of variability. Even worse, the data points you HAVE got in that set are insufficient to estimate all 15 parameters in the above model that I have implied. (You chose poorly in where the data lives.)
At best, you MIGHT decide to fit the data using a linear model, so somethign of the form:
a0 + a1*x + a2*y + a3*z + a4*w
So using my polyfitn tool (found on the file exchange) and my sympoly toolbox (also on the FEX)...
p = polyfitn(inputs,outputs,1);
format short g
polyn2sympoly(p)
ans =
-0.15412*X1 - 0.12787*X2 - 0.067135*X3 + 0.036615*X4 + 0.9727
Suppose you decide to find the minimum value of this function, on the rectangular domain (actually a hypercube) of your data [-1,1]^4. We could just use linprog to do this, but it is as simple to see that the function above is minimized when X1, X2, X3 are all as large as possible, and X4 is as small as possible. So, the minimum value over that hypercube occurs at [1 1 1 -1]. So evaluating that polynomial at the indicated point, we get:
-0.15412*1 - 0.12787*1 - 0.067135*1 + 0.036615*(-1) + 0.9727
ans =
0.58696
Really, I'm sorry to say that your data (as you have it here) is not worth a much better assessment. Even the prediction above has a huge amount of statistical uncertainty around it.
So again, get better data, get more data. Only then can you really do a better job.

Sign in to comment.


Star Strider
Star Strider on 3 Feb 2016
You must know the process that created the data, so you can write a mathematical model of that process and fit the data to it, estimating the parameters of the model. What is the size of your ‘matrix of inputs’? It seems that you have a small data set, so you can upload it here if you want us to look at it.
  2 Comments
Martin P
Martin P on 3 Feb 2016
Edited: Walter Roberson on 4 Feb 2016
inputs: [ -1 -1 -1 -1;
-1 -1 -1 1;
-1 -1 1 -1;
-1 -1 1 1;
-1 -1 1 1;
-1 1 -1 -1;
-1 1 -1 1;
-1 1 1 -1;
-1 1 1 1;
1 -1 -1 -1;
1 -1 -1 1;
1 -1 1 -1;
1 -1 1 1;
1 1 -1 -1;
1 1 -1 1;
1 1 1 -1;
1 1 1 1;
0 0 0 0;];
outputs: [1.16;1.17;1.16;0.74;2.20;0.70;1.36;0.58;0.55;0.90;
0.76;0.74;0.58;0.91;0.74;0.72;0.58;2.21];
I'm looking to minimize the output with bounds on the inputs.
Star Strider
Star Strider on 3 Feb 2016
Next question: What did you do to get these data? What do they represent?

Sign in to comment.


Walter Roberson
Walter Roberson on 3 Feb 2016
There are an infinite number of functions that can reproduce those exact outputs given those inputs, with the different functions potentially assuming arbitrarily low values in between the defined data points.
Addinf additional data points would not solve this problem, not if only a finite number of points were added.
The problem is impossible to solve given only that information. To have a solution there needs to be constraints put on the form of the model equation.
  1 Comment
John D'Errico
John D'Errico on 4 Feb 2016
"There are an infinite number of functions that can reproduce those exact outputs given those inputs,"
Not really true, since there is a replicated set of inputs.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!