Technical Articles

Stochastic Modeling Using Virtual Training Sets

By James C. Cross III, MathWorks


Introduction. When predicting the behavior of a stochastic system, a “reference” forecast offers a view of an “expected” outcome, but does not provide any insight on the distribution of alternative outcomes. In this talk, a method is proposed to address this. A solution set of the optimization problem is used to build a model, which is used to accumulate solution statistics for an ensemble in a reasonable time. In many cases such an approach may be the only option for generating a probabilistic forecast at all.

Case Study. A simple instance of the Unit Commitment Problem (UCP) from the electric power industry is used to illustrate the method. The UCP asks: What operating schedule, for a pool of power plants, delivers the total aggregate power demand at the lowest cost?

AIChE_figure1
Figure 1. Graphical illustration of the unit commitment problem.

For a specific set of plant parameters and a specified power demand profile, this optimization problem can usually be solved (e.g. using the intlinprog function in Optimization Toolbox™). This talk considers the case of three sources and a week-long power demand profile (see Figure 1).

The question of interest is: What is the distribution of solutions, and associated metrics (e.g. total cost of meeting the demand profile, or the joint capacity factor of the plants), corresponding to variations of the input parameter values?

The answer is explored in two ways, for an ensemble of scenarios, using the: (1) traditional method (running the optimization model), and (2) alternative method (running a model trained on a limited set of optimization problem solutions), and the results are then compared.

Model Creation. A “virtual” training set was created using a “design of experiments” approach involving optimizations over a specific training ensemble. Fuel prices and load level multipliers were taken to vary. For simplicity, each was assumed to derive from an independent, normal distribution.

The predictors used were total power demand values, fuel prices, hour of day, and day of week. A neural network model with one hidden layer (10 nodes) was created (using the train function in Statistics and Machine Learning Toolbox™). A comparison of the optimal schedules to the modeled schedules for one of the training scenarios is shown in Figure 2.

AIChE_figure2
Figure 2. Optimal vs modeled plant schedules.

The model does a good job of capturing the up-and-down dynamics of the optimal schedules, although the inaccuracies of the model at specific power levels are apparent.

Comparing Predictions. An ensemble of 500 individual scenarios were randomly created using the probability distribution functions previously described.

The first forecast metric targeted in the study was the total cost. The cost is calculated directly from the plant power generating schedules, whether derived from the optimization solver or the model.

The focus of this talk is the distribution of outcomes, i.e. the ensemble, and not any individual scenario. Using the results above from the optimization solver and the neural network model, the probability distribution functions for total cost were assembled – the results are shown in Figure 3. The agreement between the optimal and modeled results is good.

AIChE_figure3
Figure 3. Distribution of total cost.

Exercising the Model. The most compelling feature of the modeling approach is the economy with which large numbers of scenarios can be explored.

The second forecast metric targeted in this study was the coal and gas plant capacity factors. In this case, an ensemble size of 250,000 scenarios was created. The model was run and the distribution assembled, shown in contour map format in Figure 4.

The runtime using the model (on a simple laptop machine) was 26 minutes. If the optimization solver had been used to generate the ensemble results, it would have taken ~500 hours! This highlights the practicality of the modeling approach.

AIChE_figure4
Figure 4. Joint distribution of plant capacity factors.

Many inferences can be made from this chart which could not be made without the model, and which may be valuable to inform operational decision-making. For example: (1) it is extremely unlikely that the gas plant capacity factor would exceed 0.6 – this may enable preventive plant maintenance to be scheduled; (2) the probability of different fuel volume requirements could be used to inform a purchase contract negotiation; and (3) a simple post-processing step could be used to predict greenhouse gas emissions over the forecast horizon, for regulatory, credit trading, or other purposes.

In contrast to the optimization solver, the model makes it possible to explore this ensemble, and any other that might be of interest.

The case study in this talk is very simple, but real problems are generally not. For example, for a more complex problem involving 10 power plants, a month long forecast horizon, and 1,000 scenarios, the optimization solver would require ~5,000x more computing time than the modeling approach.

Conclusion. This talk presents a probability density estimation method for complex optimization problems which can enable and improve operational decision-making under uncertainty. The method allows the exploration of ensembles consisting of a large number of scenarios in a reasonable time, in contrast to the optimization solver which may be challenged to produce even a few solutions in the same amount of time.

The method described in this talk has extremely broad applicability – to manufacturing, resource extraction, transportation, and financial services operations, among many others.

This extended abstract was presented at AIChE Spring Meeting 2016.

View conference presentation slides.

Published 2016 - 80816v00

View Articles for Related Capabilities