By Jennifer Petrosky, MathWorks and Rajiv Singh, MathWorks

Interactive visualization tools help implement advanced statistical techniques for analyzing batch processes.

Process industries, including petrochemicals, pulp and paper, and semiconductor, can benefit from these techniques. In this article, we discuss how MATLAB can help process engineers and plant operators simplify complex decision-making tasks for real-time fault detection and condition monitoring.

With connectivity protocols, such as OPC and fieldbus, large quantities of process variable records stored in databases have become more accessible. Now, process engineers can apply data-intensive technologies for fault detection and condition monitoring in various manufacturing settings. The engineers strive to develop techniques that extract information about the condition of a process from these large data sets, and present their findings using only a small number of variables.

By design, a manufacturing (batch) process has conditions that vary over the duration of its run. This requires that decisions about process conditions be based on their entire history rather than a status at one instant. One way to achieve this goal is by treating the measurements of the process variables at different time instants as distinct variables and using them collectively to determine the batch condition. Multi-Way Principal Component Analysis (Multi-Way PCA) is one such approach based on this idea.

In this approach, we start by deriving a reduced dimensional space, defined by the principal components, using the data sets representing the entire histories of good (normal) batch processes. In doing so, the measurement sets from one batch are unfolded into a single row vector. A matrix (X) composed of several such vectors, each corresponding to a different calibration (normal) batch, is used to extract the principal components:

`P = princomp(X);` |
`% Statistics Toolbox function` |

`Pc = P(1:3,:);` |
`% extract the most relevant` |

For analyzing the condition of a particular batch, we unfold its entire history into a vector and map it as a single point onto this principal component space. The coordinates of this point, called scores, act as indicators of process variability. A process engineer will monitor the batch condition by comparing the location of this point against a region of acceptable variability, which is obtained from the data sets of good batches (see Figure 1).

Now, let's apply this technique to estimate the condition of a running batch in advance. Calculation of the scores requires the entire observation set, which is not complete until the batch is finished. Consequently, for a running batch, the scores must be predicted based on the forecast of future measurements that extend from the current time until the end of the batch. A probabilistic description is used to determine the variability of the unmeasured values. This means that the predicted batch scores, which are a function of the entire batch history, will appear as a region of probable values in the principal component space, rather than a single point (see Figure 1). The size of this region is a measure of uncertainty in the prediction, which decreases as more data becomes available over time (see Figure 3).

Let's look at a real-life example: a semiconductor etching process. We use data sets from 107 good batches (normalized) to derive our three-dimensional principal component model. This data is available from Eigenvector Research at www.evriware.com/Data/Data_sets.html. Assuming a Gaussian distribution, an ellipsoidal region of 95% confidence is derived for the variability of the batch scores (gray ellipsoids in Figure 2). In MATLAB, the ellipsoids are generated using Singular Value Decomposition (SVD) on the covariance matrix of the data, and are rendered using the volume visualization tools.

In a visualization-based scheme for condition monitoring, a process engineer should compare this region (called the in-control region and represented by gray ellipsoids in Figure 2) to the predicted score region for a running batch (red ellipsoids in Figure 2).

The intersection of the two regions is a measure of the likelihood that the running process will end up in the in-control region. If the predicted score region is large and encloses the in-control region (Figure 2a), a decision cannot be made because of the high level of uncertainty of the location of the batch scores. The plant operator must wait until more measurements become available. If the in- control region completely encloses the forecasted score region (Figure 2b), then there is a strong probability that the batch will have similar results to the calibration batches and the operator does nothing. However, if the two regions are disjointed (Figure 2c), then the batch may be off course and require adjustments.

Visualizing these regions could also help us identify the direction of a potential aberration (minimum overlap) and formulate a control strategy. For example, if the two regions are disjointed along only the first score axis (S1) and this score can be related to the process inputs in a physically meaningful way, then we know how to vary those inputs to maximize the overlap between the regions at the next time instant. (Since the details of this procedure are specific to the application, they are not displayed here.)

The various graphical and statistical tools used in this application can be conveniently brought together into a single GUI to enhance the usability of these tools. The Batch Condition Monitoring example, is a prototype that emulates a real-time condition monitoring scenario (Figure 4). Users can log data and visualize the score regions against the in-control region at every logging instant. Several test cases for batch processes are provided, which can be selected from a pop-up menu (left bottom of GUI).

The GUI in Figure 4 provides solid rotation, projection, and lighting control tools to facilitate clear visualization. The camera toolbar (top) helps interactively adjust for light, brightness, and view direction. Varying the transparency and surface lighting of the objects reveals the intersections of ellipsoids. Projections along higher dimensions can be selected using the data panner (right), by mouse-dragging an icon. Optionally, the user can also superimpose such projections at any time instant to view the locus of predicted scores in a 3-D space.

Advanced tools for data compression and statistical analysis are invaluable in applications, such as gene expression analysis (bioinformatics), and image processing, in addition to the process monitoring and fault detection applications discussed here. Using visualization utilities in combination with statistical analysis methods can greatly enhance the usability and flexibility of these tools and speed up the decision-making process.

Published 2003