# Multivariate Statistical Process Monitoring of a Copper Production Process

We will be analysing data from a continuous process of electrolytic copper production at Boliden AB (Skelleftehamn, Sweden).

The aim of the analysis is firstly, to gain an understanding of the systematic variation present in the process (its range of normal behaviour), and secondly to design a method of monitoring the process to capture departures from normal behaviour.

We start by attempting to use traditional univariate methods to understand the process, and rapidly find that this is hard. We therefore turn to multivariate methods, using Principal Component Analysis.

Finally, we discuss how the PCA model that we have developed might be deployed to monitor live data. We do this in two different ways. Firstly we incorporate the PCA model plots into a GUI, and connect the plots to a live feed of the data using the OPC toolbox: creating a 'dashboard' view of the live process to enable easy online monitoring of the process behaviour. This GUI can be deployed as a standalone executable using the MATLAB Compiler. Secondly, we take the PCA plots and incorporate them into a webpage. The webpage is designed in ASP.NET, and communicates with a database to access the data: enabling remote monitoring of the system via the web or intranet.

Sam Roberts

## Contents

- PRODUCTS REQUIRED
- SECTION I: BUILDING A MODEL OF THE PROCESS
- Load in data
- Overview of the variables
- Create standardised data
- Attempt to visualise the variables
- Make a better attempt
- Visualising observations
- Multivariate Methods
- Carry out PCA
- Create a biplot of the PCA scores and loadings.
- Multivariate control charts
- Summary of Analysis
- Saving the PCA model
- INTERLUDE: OPC
- The OPC Toolbox
- Connecting to an OPC server
- Close the opctool
- SECTION II: APPLYING THE MODEL IN AN ONLINE MONITORING TOOL
- A GUI view of the PCA model plots
- Utility function
- See the GUI in action
- Compiling the GUI
- SECTION III: DEPLOYING THE MODEL TO A WEBPAGE FOR REMOTE MONITORING
- References

## PRODUCTS REQUIRED

Most of this demonstration requires only MATLAB and Statistics Toolbox. Accessing data via OPC requires OPC Toolbox. Although the GUI application ProcessMonitor can be created and executed with the above products, compiling to a standalone executable also requires MATLAB Compiler. The deployed web application requires Database Toolbox to access data from databases, and to compile it to a .NET assembly for web deployment requires MATLAB Builder for .NET.

## SECTION I: BUILDING A MODEL OF THE PROCESS

## Load in data

Two copper samples were measured each day for one year, and the levels of eight metal impurities (Ag, Ni, Pb, Bi, Sb, As, Te, Se) were recorded. The dataset contains 730 measurements (2/day for 365 days) of the 8 impurities: these are contained in the array 'metals'. There are also two arrays 'varnames' and 'obsnames' containing variable and observation names. The variable 'TAI' refers to 'Total Analysis Index', a measure devised by the dataset originators as a summary metric of sample quality. It is instructive to compare the analysis here to the TAI summary metric.

```
load metal
whos
[numobs, numvars] = size(metals);
```

Name Size Bytes Class Attributes TAI 730x1 5840 double metals 730x8 46720 double obsname 730x1 62780 cell varnames 1x8 512 cell

## Overview of the variables

Let's start off by examining the distributions of each of the variables (metal impurities).

boxplot(metals,'Labels',varnames); %#ok xlabel('Metal'); ylabel('Level of Impurity');

## Create standardised data

We can see from the boxplot that the variances of the metal impurities are not equal. We would like to be able to consider each variable on the same scale, so let's standardise them to have a mean of 0 and a variance of 1.

means = mean(metals); sds = std(metals); metalstd = (metals - repmat(means,numobs,1)) ./ repmat(sds,numobs,1); boxplot(metalstd,'Labels', varnames); xlabel('Metal'); ylabel('Level of Impurity (Standardised)'); shg; % Bring the current figure to the front

## Attempt to visualise the variables

We need to understand how the variables change over time, and how they are related to each other. Let's attempt to get a picture of those things by plotting all the variables together on the same plot.

figure; plot(metalstd); xlim([0, numobs]); xlabel('Observation Number'); ylabel('Level of Impurity (Standardised)'); legend(varnames, 'Location', 'NorthEastOutside');

## Make a better attempt

That clearly wasn't sensible - we couldn't see anything properly, as there was just too much information. We can at least separate out the eight variables into their own plots.

figure('Units', 'Normalized', 'Position', [0.6,0.1,0.4,0.8]); suptitle('Metal Impurities over Time'); haxis(numvars) = 0; for i = 1:numvars haxis(i) = subplot(numvars, 1, i); plot(metalstd(:, i)); ylabel(varnames{i}); xlim([0, numobs]); if i < numvars set(haxis(i), 'XTickLabel', []); else xlabel('Observation Number'); end end

That was a little better - for each variable, it's possible to see how it varies over time, and it's maybe possible to see that, for example, Se seems to be correlated with Te. We can improve the view of each variable by placing each variable on the same scale, and adding control limits at 2 and 3 standard deviations to give us an indication of exceptional behaviour.

for ax = 1:numvars axes(haxis(ax)); ylim([-5, 5]); upperlimits = [4, 3, -2, -3]; lowerlimits = [3, 2, -3, -4]; colors = {'r', 'y', 'y', 'r'}; for i = 1:4 patch([0, 0, numobs, numobs],... [upperlimits(i), lowerlimits(i), lowerlimits(i), upperlimits(i)],... colors{i}, 'EdgeColor', 'none', 'FaceAlpha', 0.4); end end shg;

## Visualising observations

This gives us an overview of the dataset, although it's still quite difficult to get a deep understanding from this graphic of how the process variables are related to each other - we can't see the wood for the trees. We could try instead to focus just on a single timepoint, eg observation 1. Let's plot in polar coordinates for a nice visualisation. This gives us a picture of all the variables, but just at a single timepoint.

figure; offset = 4; [X,Y] = pol2cart(... (0:1/numvars:1)*2*pi,... [metalstd(1,:), metalstd(1,1)] + offset); plot(X,Y); axis off square; hold on; % Plot upper and lower control limits at 2 and 3 sd for i = 1:4 [X,Y] = pol2cart(... [0:1/99:1,1:-1/99:0]*2*pi,... [ones(1,100)* (upperlimits(i)+offset),... ones(1,100)* (lowerlimits(i)+offset)]); patch(X, Y, colors{i}, 'EdgeColor', 'none', 'FaceAlpha', 0.4); end % Label variables nicely [X,Y] = pol2cart(... (0:1/numvars:1)*2*pi,... ones(1,numvars+1)*(4 + offset)); for i = 1:numvars h = text(X(i), Y(i), varnames{i}); circlepos = (i-1)/numvars; if circlepos < 0.25 set(h, 'VerticalAlignment', 'bottom', 'HorizontalAlignment', 'left'); elseif circlepos == 0.25 set(h, 'VerticalAlignment', 'bottom', 'HorizontalAlignment', 'center'); elseif circlepos < 0.5 set(h, 'VerticalAlignment', 'bottom', 'HorizontalAlignment', 'right'); elseif circlepos == 0.5 set(h, 'VerticalAlignment', 'middle', 'HorizontalAlignment', 'right'); elseif circlepos < 0.75 set(h, 'VerticalAlignment', 'top', 'HorizontalAlignment', 'right'); elseif circlepos == 0.75 set(h, 'VerticalAlignment', 'top', 'HorizontalAlignment', 'center'); elseif circlepos < 1 set(h, 'VerticalAlignment', 'top', 'HorizontalAlignment', 'left'); end end shg;

## Multivariate Methods

These are nice plots, but we can still only really see one variable or observation at a time. We want to get an overview of the process as a whole. In order to achieve this we must turn to multivariate analysis methods.

## Carry out PCA

We can extract a lot more information with a principal component analysis. PCA constructs new variables (*principal components*, or PCs) that are weighted sums of the original variables. Unlike TAI, the weights do not reflect arbitrary ideas of how
important the variables might be, but are constructed so that the new variables capture a maximal proportion of the variance
in the original data. By examining a small number of principal components, you can get an understanding of the process which
is nearly as good as examining all of the original variables.

[loadings, scores, variances, tscores] = princomp(metalstd);

To see how well the PCA model describes the data, we display the cumulative variances accounted for by successive PCs in a pareto plot. When interpreting this plot, we're looking for a sharp bend or 'elbow' in the curve: this would indicate that the components before the elbow represented systematic, interesting variation in the process, and the later components represented noise variation; we could remove those and concentrate on the systematic components. Unfortunately, there's no clear elbow in this plot, so we'll rely on our judgement and keep just the first two components. The 2 component model accounts for 67% of the variation in the data, therefore by examining PCs 1 and 2, we are getting a view of the process which is 67% as good as examining all the original variables. 67% is not huge, however, and one might want to come back later and revisit this decision, perhaps including the first three PCs to capture 76% of the variance.

figure; pareto(variances); xlabel('Principal Component') ylabel('Variance')

## Create a biplot of the PCA scores and loadings.

We can use the PCA model to get a lot more insight into the behaviour of the process, by plotting the PCA scores and loadings on the same plot. Note firstly that all variables have a high loading on PC1, indicating that PC1 reflects the average level of impurity. Note secondly that in PC2, the variables fall into two distinct clusters, implying the existence of two distinct impurity profiles: one containing high levels of Sb, As and Bi (all group 5a elements in the periodic table), and low levels of Ni, Pb, Se, Ag, and Te; the other profile having the reverse pattern. The numbers on the axes do not mean anything in absolute terms.

figure hold on; scatter(scores(:,1), scores(:,2), 5, 'bo'); scale = max((abs(scores(:,1:2)))) ./ max((abs(loadings(:,1:2)))); scatter(loadings(:,1)*scale(1), loadings(:,2)*scale(2), 'ro', 'filled'); for i = 1:numvars line([0,loadings(i,1)*scale(1)],... [0,loadings(i,2)*scale(2)], 'Color', 'r'); text(loadings(i,1)*scale(1)-1, loadings(i,2)*scale(2), varnames{i}); end title('Principal Component Biplot'); xlabel('PC 1'); ylabel('PC 2'); xlim([-17,4]); shg;

## Multivariate control charts

The PCA biplot and contribution plots have given us a lot more insight into the behaviour of the process. They have one disadvantage over the univariate methods considered earlier, however, in that they don't have control limits displayed. We can extract more information from the PCA model, however, which is suitable for displaying in a control chart.

The T2 statistic gives a measure of the distance of a sample from the process mean *within* the plane defined by PC1 and PC2, and the squared PCA residuals give a measure of the distance of a sample *perpendicular to* that plane. A high T2 statistic thus indicates that a sample is exhibiting an extreme variation, but well-accounted for by
the PCA model. A high residual indicates that the sample is exhibiting a form of variation which is not well-accounted for
by the PCA model.

We can plot both T2 and the PCA residual on control charts.

% Calculate T2 scores and critical 95, 99 and 99.9% limits t2 = sum((scores(:,1:2).^2)./repmat(variances(1:2)',numobs,1),2); tcrit = (2*(numobs^2-1)/(numobs*(numobs-2)))*... [finv(0.999,2,numobs-2),... finv(0.99,2,numobs-2),... finv(0.95,2,numobs-2)]; % Calculate squared residuals and critical 95, 99 and 99.9% limits residuals = (metalstd - scores(:,1:2)*loadings(:,1:2)'); dist = sqrt(sum(residuals.^2,2)./(numvars-2)); distcrit = sqrt(sum(sum(residuals.^2))./((numobs-3)*(numvars-2)))*... sqrt([finv(0.999,numvars-2,((numobs-3)*(numvars-2))),... finv(0.99,numvars-2,((numobs-3)*(numvars-2))),... finv(0.95,numvars-2,((numobs-3)*(numvars-2)))]); t2distfig = figure; t2ax = subplot(2,1,1); plot(t2); xlim([0,numobs]); xlabel('Observation Number'); ylabel('T^{2}'); upperlimits = [tcrit(1),tcrit(2)]; lowerlimits = [tcrit(2),tcrit(3)]; colors = {'r', 'y'}; for i = 1:2 patch([0, 0, numobs, numobs],... [upperlimits(i), lowerlimits(i), lowerlimits(i), upperlimits(i)],... colors{i}, 'EdgeColor', 'none', 'FaceAlpha', 0.4); end distax = subplot(2,1,2); plot(dist); xlim([0,numobs]); xlabel('Observation Number'); ylabel('Distance to Model'); upperlimits = [distcrit(1),distcrit(2)]; lowerlimits = [distcrit(2),distcrit(3)]; colors = {'r', 'y'}; for i = 1:2 patch([0, 0, numobs, numobs],... [upperlimits(i), lowerlimits(i), lowerlimits(i), upperlimits(i)],... colors{i}, 'EdgeColor', 'none', 'FaceAlpha', 0.4); end

## Summary of Analysis

We have used MATLAB to process and visualise a continuous process of copper production, creating some custom displays of the data to emphasise relevant properties of the process variation. Using multivariate statistics (principal component analysis), we have built a model of the process which captures the important variation in a small number of variables, enabling monitoring of the process as a whole using a few informative plots, rather than large numbers of graphs of each individual process variable. Using this approach has enabled us to discover patterns of correlation among the variables indicative of different impurity profiles; to recognise outliers, and to classify them by assigning a reason for their abnormal status.

## Saving the PCA model

We can save relevant aspects of the PCA model to a .mat file for later use, and clear up MATLAB.

loadings = loadings(:,1:2); save pcamodel means sds loadings variances scale tcrit distcrit varnames clear all; close all force; clc;

## INTERLUDE: OPC

## The OPC Toolbox

Open Process Control (OPC), also known as OLE for Process Control, is a series of specifications defined by the OPC Foundation (http://www.opcfoundation.org) for supporting open connectivity in industrial automation. OPC uses Microsoft DCOM technology to provide a communication link between OPC servers and OPC clients. OPC has been designed to provide reliable communication of information in a process plant, such as a petrochemical refinery, an automobile assembly line, or a paper mill.

Using the OPC Toolbox, you can acquire live OPC data directly into MATLAB and Simulink, and write data directly to the OPC server from MATLAB and Simulink.

For this demonstration, we need to access an OPC server. Please follow the instructions contained in SettingUpTheIconicsServer.txt.

## Connecting to an OPC server

Using commands from the OPC toolbox, we can connect to an OPC server.

da = opcda('localhost', 'Iconics.SimulatorOPCDA.2'); connect(da); disp(da);

Summary of OPC Data Access Client Object: localhost/Iconics.SimulatorOPCDA.2 Server Parameters Host : localhost ServerID : Iconics.SimulatorOPCDA.2 Status : connected Timeout : 10 seconds Object Parameters Group : 0-by-1 dagroup object Event Log : 0 of 1000 events

We can create a group of variables ('items') for monitoring. Here we use some of the built in variables that come with the Iconics simulation server.

monitorgroup = addgroup(da, 'Monitor Group'); additem(monitorgroup, 'Numeric.Sine'); additem(monitorgroup, 'Logical.Random'); disp(monitorgroup);

Summary of OPC Data Access Group Object: Monitor Group Object Parameters Group Type : private Item : 2-by-1 daitem object Parent : localhost/Iconics.SimulatorOPCDA.2 Update Rate : 0.5 Deadband : 0% Object Status Active : on Subscription : on Logging : off Logging Parameters Records : 120 Duration : at least 60 seconds Logging to : memory Status : Waiting for START. 0 records available for GETDATA/PEEKDATA

We can also read those variables. The data is returned in a structure which contains the values, as well as an ID, a data quality measure, a timestamp, and an error field. We can extract the values from the structure.

data = read(monitorgroup); disp(data(1).Value); disp(data(2).Value);

-0.9479 0

All of these operations can be carried out from the command line, or using the graphical user interface which comes with the OPC toolbox.

```
hopc = opctool; % Demo OPC tool here.
```

Right-click on 'OPC Network', and select 'Add Host'. Enter the hostname as 'localhost'. If you have followed the instructions earlier in SettingUpTheIconicsServer.txt, you should now see a new host 'localhost', with an OPC server 'ICONICS.SimulatorOPCDA2'. Right-click on the server, and select 'Create Client'. In the area 'MATLAB OPC Clients', you should now see a client for this server. It is disconnected (indicated in red on the right hand panel, but clicking 'Connect' will connect you. Back on the left-hand panel, select the server and then click the button from the toolbar 'View Namespace'. This displays a tree view of the signals available on the server. Select 'Numeric'->'Sine', right-click, and Add To New Group. This will add the signal (a dummy sine wave signal that ships with the simulation server) to a signal monitoring group. Selecting the signal from within the OPC client panel will now display the current value of the signal in the data panel. This can be logged to a file, or to a MATLAB variable for further analysis. opctool can also be used to troubleshoot OPC coneections.

## Close the opctool

```
hopc.close;
clear hopc;
```

In SettingUpTheIconicsServer.txt we used the OPC Configuration Tool that comes with the server to set up eight simulation signals, and eight items in the address space of the server. The items were named Metals.Ag, Metals.Ni etc, to correspond to each of the variables in the dataset. Using the OPC toolbox, we can write our data to the variables, and read it back. We'll use this ability in the next section.

disconnect(da); clear all; close all force; clc;

## SECTION II: APPLYING THE MODEL IN AN ONLINE MONITORING TOOL

## A GUI view of the PCA model plots

The file `ProcessMonitor.m` uses the OPC toolbox, and MATLAB's ability to create graphical user interfaces, to build a 'dashboard' view of an online
process. A timer object reads the current value of our eight variables from the OPC server, applies our PCA model, and plots
the results into the GUI. Uncomment the following line and execute it to take a look at the program.

```
% edit ProcessMonitor.m
```

## Utility function

Because we don't have a real live system, just a simulation server, we have a little utility program `WriteToOPC.m`, which cycles through our dataset and periodically writes the next value of our variables to the OPC server. This utility
is entirely disconnected from our GUI; the GUI is reading data directly from the OPC server as if it were real live data.
Uncomment the following line and execute it to take a look at the program.

```
% edit writeToOPC.m
```

## See the GUI in action

Uncomment the following lines to run the GUI. Leave them commented to just show a screenshot of the GUI.

% writeToOPC % ProcessMonitor figure; iptsetpref('ImshowBorder','tight'); imshow('PMscreenshot.png');

## Compiling the GUI

Using the MATLAB Compiler we can produce standalone executable versions of the utility program and the Process Monitor GUI. Quit MATLAB, and run writeToOPC.exe, and then ProcessMonitor.exe.

% mcc -m writeToOPC % mcc -m ProcessMonitor

## SECTION III: DEPLOYING THE MODEL TO A WEBPAGE FOR REMOTE MONITORING

Another application of common interest to those in the process industries is to display historical data, often stored in a
database, via webpage. The file `getCurrentProcessStateAsImage.m` is a function that connects to a MySQL database which is being continuously updated with live process data. It retrieves
the latest set of data, applies our PCA model, and returns the analysis graphs as an image. We compile this function to a
.NET object using MATLAB Builder for .NET, and it can then be called from an ASP.NET webpage written in C#.

% Please note that here we only demonstrate how to code the web application % via MATLAB and C#. There is a little more work to be done in Visual % Studio in order to set the project up as a website and get it running % that we do not document here. % edit getCurrentProcessStateAsImage % mcc -F MPSMdotnetobject.prj % edit GeneratePlot.aspx.cs

## References

Conny Wikström, Christer Albano, Lennart Eriksson, Håkan Fridén, Erik Johansson, Åke Nordahl, Stefan Rännar, Maria Sandberg,
Nouna Kettaneh-Wold and Svante Wold (1998). Multivariate process and quality monitoring applied to an electrolysis process:
Part I. Process supervision with multivariate control charts. *Chemometrics and Intelligent Laboratory Systems* 42(1-2): 221-231.