How to predict a future time point in a time series using a predictive model created with the System Identification Toolbox / App

55 views (last 30 days)
I've created a variety of models using the System Identification Toolbox. For example, I created and estimated parameters for a basic 2 pole 1 zero transfer function, called tf1, using the app, and exported it into the workspace.
I have what seems like a very, very basic question, but one that I haven't been able to figure out yet, so I'm asking for help. Guessing others may have trouble working this out as well.
I want to use my model to predict a future value in the timeseries that I don't have yet, because it's not the future yet. How do I do that?
All of the examples that I've been able to find, including the sections on the documentaiont of the functions predict and compare with titles suggesting this is what they are about, seem to encourage me to put in all of the data, including the future data being predicted, and then test how well the model's prediction compares with the actual data. Only problem: this doesn't work if you don't have the data, because it's an actual, real-time prediction that you are trying to make (a central point of having a predictive model) so at any given time point you don't yet have the future data!
I'm sure there's a really simple answer to this, and it's probably a one-liner, but I haven't been able to figure it out.
I'm particularly confused by the examples in the documentation of the functions predict and compare.
Let's say I just want to use my model to predict a single time point in the future, past the data that I've got, so nshift = 1;
I created a data object called dat, creating a random time series of 100 data points (eg like 100seconds of timeseries data) like this:
r = rand(100,1); dat = iddata(r,r);
In practice I use real data instead of random data, but this is simpler for illustration.
I use r as both entries to iddata because I am trying to predict elements of the same sequence. Is this the problem? Am I supposed to do something more like this:
r = rand(100,1); dat = iddata(r,circshift(r,nshift));
or this:
r = rand(100,1); dat = iddata(r,circshift(r,-nshift));
Then, trying to follow the example in the documentation for both functions I tried:
prd=predict(dat,tf1,nshift);
cmp=compare(dat,tf1,nshift);
and
prd1=predict(dat,tf1,nshift+1);
cmp1=compare(dat,tf1,nshift+1);
As far as I can tell, all four of these produce the identical same result, which I completely don't understand.
It's also pretty unclear how you get the numeric data out of prd or cmp, I gather it's like this:
prd.y
when I do this:
sum(prd2.y-prd.y)
I get a value of 0!
Also, the length of prd and cmp are the same as the length of r (and dat), so where do I find my nshift = 1 point ahead prediction value?
Am I supposed to use sim (which I also couldn't understand the documentation for)?
Are prd and cmp some kind of shifted representation of the data based on nshift? If so, why are they the same when I change the nshift value?
I also tried this, but got an error:
prd2=forecast(tf1,rand(1,100),1);
Error using idmodel/forecast (line 101)
The number of inputs and outputs of the model must match that of the data.
Is that telling me that I need to be using a different type of model, and that the tf1 model can't be used for causal prediction? If so, then which models in the System Identification toolbox can?
Also, when I using the System Identification app, is there a way to test the models on causal predictions, ie test the model in the scenario where it can only use past data to generate a future prediction?
It turns out that the future is, indeed, one of the most difficult things to predict.
Thank you for your help.

Accepted Answer

Karan Singh
Karan Singh on 30 Jan 2024
Hi Cdc,
Predicting future values based on past data is indeed a central aspect of time series analysis and system identification. Let me clarify some of your points and guide you through the process of making a prediction for a future time point.
Firstly, the "predict" function in MATLAB is used to predict the output of a system given past inputs and outputs. The "nshift" parameter specifies how many steps ahead you want to predict. The function does not require future data; it only uses past data to make the prediction.
When you create an "iddata" object with the same input and output, as in "dat = iddata(r,r);", you're essentially saying that the output of your system is the same as the input. This is not typical for system identification, where you usually have a distinct input and the system's response as the output. If you're trying to predict future values of a univariate time series (a single sequence of data), you would normally use past values of that sequence to predict future ones.
For a univariate time series, you can create an iddata object with no input (u) and the time series data as the output (y):
r = rand(100,1); % Your time series data
dat = iddata(r, []); % Create iddata object with no input
To predict the next value in the time series, you can use the predict function with nshift = 1:
nshift = 1;
prd = predict(dat, tf1, nshift);
The output of prd will be an iddata object with the predicted values. The first nshift values of prd.y will be NaN because the model cannot predict the first few values without past data. The rest of the values will be the model's one-step-ahead prediction based on the past data.
To extract the predicted values, you can use:
predicted_values = prd.y;
If you want to predict just the next single time point, you can look at the last value of predicted_values:
next_point_prediction = predicted_values(end);
Regarding the "forecast" function, it is used for multi-step prediction when you have a model and want to predict several steps into the future based on past data. The error you're seeing suggests that there's a mismatch between the model's expected inputs and outputs and the data provided. Since you're working with a univariate time series, you don't need to use "forecast" for a single step prediction.
Lastly, when using the System Identification app, you can test models for their predictive capabilities by withholding some data from the estimation process (the "Estimation Data") and then using the "Validation Data" to test the model's predictions. This validation data is used to simulate the scenario where the model must use past data to generate future predictions.
Hope this helps!

More Answers (0)

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!