Technical Articles and Newsletters

# Developing an IoT Analytics System with MATLAB, Machine Learning, and ThingSpeak

By Robert S. Mawrey, MathWorks

The combination of smart connected devices with data analytics and machine learning is enabling a wide range of applications, from home-grown traffic monitors to sophisticated predictive maintenance systems and futuristic consumer products (like the Amazon Echo and the Google Nest).

While the potential of Internet of Things (IoT) is virtually limitless, designing IoT systems can seem daunting, requiring a complex web infrastructure and multidomain expertise.

This article shows how you can prototype and deploy an IoT system with data analytics without developing custom web software or servers. The workflow is based on MATLAB® and ThingSpeak™, an analytic IoT platform that can run MATLAB code on demand in the cloud. To illustrate the workflow we’ll create a tidal forecasting system that boaters can use to predict the effect of wind on water levels.

Water depth varies with the tides, but it is also significantly influenced by the strength, duration, and direction of the wind. Forecasting wind-driven water levels typically requires sophisticated hydrodynamic models as well as detailed knowledge of the shape of the local bay and ocean floor. NOAA and other organizations use these resources to forecast water depth in major harbors, but minor ports and bays cannot justify the expense involved.

The system we’ll be developing is based on neural networks and low-cost hardware devices rather than computationally intensive hydrodynamic models and complex web infrastructure, providing economically feasible tidal forecasting to minor ports and bays.

The code used in this example is available for download.

## Tidal Wind Surge Forecasting Example

This example is based on data captured in a bay in Cape Cod, Massachusetts (Figure 1).

Figure 1. Tide forecasting system.

MATLAB code reads wind and tide data from ThingSpeak and other online data sources and runs tidal prediction and neural network algorithms that forecast the tide surge and generate an on-demand tide surge forecast plot [Figure2].

Figure 2. Wind-driven tide forecast plot based on MATLAB machine learning and ThingSpeak. The plot shows the actual measured tide, the astronomical tide forecast (discounting wind), the wind-driven tide forecast predicted using the neural networks (astronomical forecast plus wind), and the residual (the difference between actual or wind-driven forecasts and the astronomical forecast).

We’ll complete the workflow for developing the system in these five steps:

1. Collect historical data
2. Analyze the data
3. Develop predictive algorithms
4. Deploy the analytics to the cloud
5. Perform on-demand data analytics and visualization

## Step 1: Collecting Data

We begin by collecting historical tide-level data using ThingSpeak and a low-cost tide gauge. The tide gauge firmware is written in C code and uses open-source ThingSpeak communication libraries for embedded devices published on GitHub.

## Step 2: Analyzing Historical Data

We download the historical data from ThingSpeak to MATLAB running on the desktop, where we can inspect and clean the data to remove noise, outliers, missing or erroneous values, and other anomalies.

% Download the tide data using a custom function
[tideTime,tideRangemm] = readAllMyTideData();
% Convert the range to water depth from the mud
disttomud = 3449; %Measured distance from the gauge to the mud.
depthFeet = (disttomud - tideRangemm) / 25.4 / 12;
[tideTime,duptimeindex] = unique(tideTime);
depthFeet = depthFeet(duptimeindex);
depthFeetNoisy = depthFeet;
% Remove outliers (could also use movmedian)
depthFeet = hampel(depthFeet,121,2);
depthFeet = hampel(depthFeet,11,1);


We plot the data to verify that the outliers and other anomalies have been removed (Figure 3).

Figure 3. Plot showing data with noise and outliers removed.

Tide levels are measured relative to the mean lower low water (MLLW), or lowest level during the lunar day. We calculate LLW levels using the findpeaks function in Signal Processing Toolbox™.

%Invert the data to use findpeaks to find the valleys
negdepthFeet = -depthFeet;
dur = duration(15,0,0);
% Use findpeaks
[lowerLowWater,lowerLowUTC] = findpeaks(negdepthFeet,
tideTime,'MinPeakProminence',10/12,'MinPeakDistance',dur);
lowerLowWater = -lowerLowWater;
MLLWmud = mean(lowerLowWater);
MLLW = 0;
% Tide level in feet relative to MLLW
feetMLLW = (depthFeet-MLLWmud);


We plot the lower low water values to inspect the results of the calculation (Figure 4).

Figure 4. Calculation of mean lower low water (MLLW).

We use the cleaned tidal data to forecast future tide levels with UTide, sophisticated tidal analysis functions available on File Exchange. Using UTide we can calculate the phase and amplitude of the many sinusoidal functions that form the tide at a particular location.

% Predict Tides. Downsample to avoid memory issues.
sampledTide = feetMLLW(1:15:end);
sampledTime = tideTime(1:15:end);

MashpeeLatitude = 41.6483;
% Calculate the tidal coefficients using UTide’s ut_solv function
tideCoefWithSurge = ut_solv(datenum(sampledTime),sampledTide,[],...
MashpeeLatitude,'auto','notrend','DiagnPlots');

% Use ut-reconstr to forecast the astronomical tides at the same times as the
% measuredtides
forecastedFeetMLLW = ut_reconstr(datenum(tideTime),tidecoefNoSurge);

% Calculate the residual error
residualFeetMLLW = feetMLLW - forecastedFeetMLLW;
residualFeetMLLWhourly =
interp1(datenum(tideTime),residualFeetMLLW,datenum(tideTimeHourly));
residualFeetMLLWhourly = fillgaps(residualFeetMLLWhourly);


Comparison of the astronomical forecast and the measured data reveals a residual error that varies with time (Figure 5).

Figure 5. Residual error between measured tides and the astronomical forecast.

We notice that the residual error is proportional to the strength of the wind: As the wind blows, the water is pushed in or out of the bay. When the wind blows hard from certain directions, it can raise or lower the water level by as much as a foot (Figure 6).

Figure 6. Wind cross-correlation with water level.

Clearly we need to take wind strength into account. In the next section we’ll describe how to forecast the wind’s effect on the water level.

## Step 3: Developing Predictive Algorithms

After trying both Fitting and NARX neural networks with various sets of input data, we determine that the NARX network works well for near-term predictions and the Fitting network is effective for both near- and long-term predictions.

Data inputs to the neural networks include historical and forecasted wind, measured tide levels, and forecasted astronomical tide levels. We design and test the neural networks using the Neural Net Fitting and Neural Net Time Series apps built into Neural Network Toolbox™.

% Use the readNOAABuoy function to read the data from NOAA buoy 44020
% readNOAABuoy(buoyNumber,years)
[buoyTime,buoyWSPD,buoyWDIR,buoyPRES]=readNOAABuoy(44020,1);

tideTimeHourly.TimeZone = 'UTC';
windSpeedResampled = interp1(buoyTime,buoyWSPD,tideTimeHourly,'linear','extrap');
windDirResampled = interp1(buoyTime,buoyWDIR,tideTimeHourly,'linear','extrap');
windSpeedSquared = windSpeedResampled.^2;

% Prepare the data to create arrays to train a NARX neural network
tideForecastHourly = ut_reconstr(datenum(tideTimeHourly),tidecoefNoSurge);
stressN = (windSpeedResampled.^2).*cosd(windDirResampled);
stressE = (windSpeedResampled.^2).*sind(windDirResampled);
neuralInputNarx = [tideForecastHourly,stressN,stressE];
neuralOutputNarx = tideResampled - MLLWmud;

neuralInputNarxTrain = neuralInputNarx(300:4950,:);
neuralOutputNarxTrain = neuralOutputNarx(300:4950);


We use the neural network app to train the NARX network (Figure 7).

Figure 7. Training a NARX network using the Neural Time Series app.

We use the Neural Time Series tool to train the NARX network and generate the following code:

% Solve an Autoregression Problem with External Input with a NARX Neural
% Network
% Script generated by Neural Time Series app

X = tonndata(neuralInputNarxTrain,false,false);
T = tonndata(neuralOutputNarxTrain,false,false);

% Choose a Training Function
trainFcn = 'trainlm';  % Levenberg-Marquardt backpropagation.

% Create a Nonlinear Autoregressive Network with External Input
inputDelays = 1:24;
feedbackDelays = 1:24;
hiddenLayerSize = 9;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize,'open',trainFcn);

% Choose Input and Feedback Pre/Post-Processing Functions
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.inputs{2}.processFcns = {'removeconstantrows','mapminmax'};

% Prepare the Data for Training and Simulation
[x,xi,ai,t] = preparets(net,X,{},T);

% Setup Division of Data for Training, Validation, Testing
net.divideFcn = 'dividerand';  % Divide data randomly
net.divideMode = 'time';  % Divide up every sample
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;

% Choose a Performance Function
net.performFcn = 'mse';  % Mean Squared Error

% Train the Network
[net,tr] = train(net,x,t,xi,ai);
netNarx = net;


We use the Neural Fitting app to similarly train a two-layer feed-forward network.

We evaluate the performance of the neural network during wind surge events by comparing the NARX network performance to the Fitting network for varying surge levels (Figure 8).

Figure 8. Fitting and NARX neural network performance as a function of water surge.

We see that the NARX multi-step ahead provides lower mean error than the Fitting network for predictions of moderate surge up to 12 hours ahead. For forecasts 24 or more hours ahead, the NARX network underperforms the Fitting network.

To forecast the tide levels with wind for the first 12 hours, we use the NARX network. Between hours 12 and 24 we linearly interpolate the results of the NARX and Fitting network, and after 24 hours we use the Fitting network.

We plot the results and see that the forecasts fit the actual levels well during wind-surge events (Figures 9 and 10).

Figure 9. Plot showing how the neural network forecast matches the actual level during storm winds that raise the water level.

Figure 10. Plot showing how the neural network forecast matches the actual level during storm winds that reduce the water level.

## Step 4: Deploying Analytics to the Cloud

We write a MATLAB script using the Fitting and NARX neural network and deploy it to ThingSpeak to forecast and display measured and predicted tidal levels (Figure 11).

Figure 11. ThingSpeak MATLAB visualization interface showing MATLAB code.

## Step 5: Perform On-demand Data Analytics and Visualization

MATLAB code generates the on-demand tide surge forecast plot (Figure 12).

Figure 12. Wind-driven tide forecast plot based on MATLAB machine learning and ThingSpeak.Inputs to the neural networks include forecasted wind speed and direction from the National Weather Service, recent wind speed and direction from NOAA, and the forecasted astronomical tide calculated using UTide.

We use MATLAB code to read the data from ThingSpeak using the thingSpeakRead function and from the other sources using websave. We combine the data using the timetable function and run the neural networks to generate the forecast.

We have the option to make the plot publicly available or keep it private, and the plot can easily be embedded in a custom website.

## Summary

We have demonstrated prototyping and deploying sophisticated IoT analytics without web development or the deployment of web infrastructure. We found that the ability to use MATLAB functions and tools such as Neural Network Toolbox apps reduces the overall effort and makes it possible for a single engineer to implement a working IoT analytics system without specialized web development skills or highly specialized statistics and machine learning knowledge.

This IoT analytics workflow can easily be applied to other data analytics and machine learning applications such as predictive maintenance or power load forecasting.

Published 2016 - 93072v00