How to remove artifacts from a signal

Hello,
I would like to remove artifacts from a signal, typically a lack of information in a thermocouple signal due to a shortcut. In fact I don't want to calculate the moving average because it would alter the "good information" and extrapolate data that don't exist. But I want to detect wrong information (artifacts) based on this moving average, and replace it by NaN or 0.
I tried to code it, but it is not efficient at all (based on for loops), and I am sure I am not the first one to do this ;)
Thanks a lot for your help!
Max

 Accepted Answer

Hello guys,
Tanks a lot for all tour help. But : - Image Analyst, your way of doing delete to much information around the artifact - Star Rider, your way is too "binary", the artifact does not necessarily have an absolute value (threshold) - Cedric, your way seems efficient, but I don't understand it :)
So I was inspired by all of your solution and made my own that is based on a moving average, that allows me to apply a threshold as a percentage :
function signal = deleteArtifactMax(signal, threshold, interpolation)
global samplingFrequency maxInterpolationDuration;
signal(isnan(signal)) = 0;
time = repmat([1:size(signal,1)]',1,size(signal,2));
movingAverage = (...
signal(1:end-9,:) + ...
signal(2:end-8,:) + ...
signal(3:end-7,:) + ...
signal(4:end-6,:) + ...
signal(5:end-5,:) + ...
signal(6:end-4,:) + ...
signal(7:end-3,:) + ...
signal(8:end-2,:) + ...
signal(9:end-1,:) + ...
signal(10:end,:))/10;
isValid = abs(signal(5:end-5,:) - movingAverage) <= movingAverage*threshold;
notValid = logical([zeros(4,size(signal,2)); ~isValid; zeros(5,size(signal,2))]);
signal(notValid) = NaN;
signal(signal == 0) = NaN;
timeCut = time .* notValid;
for i = 1:size(signal,2)
timeCutCell{i} = find(timeCut(:,i) ~= 0);
end
if interpolation == true
for i = 1:size(signal,2)
if ~isempty(timeCutCell{1,i})
% Delete too long interpolation
iNotToInterpolate = [];
dTimeCutCell = [0; diff(timeCutCell{:,i})];
k=0;
for j=1:size(dTimeCutCell)
if dTimeCutCell(j) == 1
k=k+1;
else
if k > maxInterpolationDuration*60/samplingFrequency
iNotToInterpolate = [iNotToInterpolate j-k:j-1];
end
k=0;
end
end
% Selection of points to interpolate
iToInterpolate = setdiff(timeCutCell{1,i},timeCutCell{1,i}(iNotToInterpolate));
interpTime = time(:,i);
interpTime(iToInterpolate) = [];
interpSignal = signal(:,i);
interpSignal(iToInterpolate) = [];
% Interpolation
signal(iToInterpolate,i) = interp1(interpTime, interpSignal, iToInterpolate, 'linear');
end
end
end
end
Thanks a lot for your help and for your time.
Max

1 Comment

This looks much simpler than my 4 lines, no doubt ;-) You could have asked, that's why the forum is for!
In short, I flag places where the derivative is larger (in absolute value) than any possible valid value for the smooth signal that you have, which defines the positions where you signal goes down or up abruptly. This generates a vector of logicals like
(abs( d ) > 10) ):
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0
----------------- -------
glitch 1 glitch 2
but what I want ultimately is a vector of logicals that flags also the interior of glitches, so I can select them (and e.g. replace the content). One simple way to do this is to multiply by the minus sign of the derivative (which is -1 when going down and +1 when going up):
-sign( d ) .* (abs( d ) > 10):
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 -1 0 0 0 0 0 0 1 0 0 -1 0 0 0 0 0
------------------ --------
glitch 1 glitch 2
and to compute a cumulative sum:
cumsum( -sign( d ) .* (abs( d ) > 10) ):
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
---------------- ------
glitch 1 glitch 2
Then convert to logical and use it or its complement depending what we want to index.
Note that you may need to enlarge a little the width of the selected zones to be on the safe side and this can be done using a basic convolution (I use a random vector of flags to illustrate):
flags = [0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0] ;
extFlags = conv( flags, [1 1 1], 'same' ) > 0 ;
Displaying flags and extFlags respectively, we can see what this did:
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0

Sign in to comment.

More Answers (4)

Cedric
Cedric on 24 Sep 2017
Edited: Cedric on 24 Sep 2017
Why not simply:
d = [0; diff( signal )] ;
isValid = ~logical( cumsum( -sign( d ) .* (abs( d ) > 10) )) ;
then e.g. set invalid entries to NaN
signalCut = signal ;
signalCut(~isValid) = NaN ;
or interpolate:
t = 1 : numel( signal ) ;
signalClean = interp1( t(isValid), signal(isValid), t ) ;
This is a quick technical trick that you have to tweak and check visually. It is not robust (in comparison to the two other answers), but if all you want to do is a one shot elimination of the glitches ..
With that you get:
subplot( 3, 1, 1) ; plot( signal, 'b', 'LineWidth', 2 ) ;
grid ; title( 'Signal' ) ;
subplot( 3, 1, 2 ) ; plot( isValid, 'r', 'LineWidth', 2 ) ;
grid ; title( '"is valid"' ) ;
subplot( 4, 1, 3) ; plot( signalCut, 'g', 'LineWidth', 2 ) ;
grid ; title( 'Cut signal' ) ;
subplot( 4, 1, 4) ; plot( signalClean, 'm', 'LineWidth', 2 ) ;
grid ; title( 'Cleaned signal' ) ;
Try this:
D = load('signal.mat');
signal = D.signal;
t = (0:length(signal)-1)'; % Create Time Vector
p = polyfit(t, signal, 2); % Polynomial Fit For Detrending
dtrnd_sig = signal - polyval(p, t); % Detrend
t2 = t; % Temporary Time Vector
signal(dtrnd_sig < -100) = []; % Set Signal ‘Dips’ To ‘Empty’
t2(dtrnd_sig < -100) = []; % Set Corresponding Time Of ‘Dips’ To ‘Empty’
signal_new = interp1(t2, signal, t, 'linear'); % Interpolate To Fill Missing Data
figure(1)
plot(t,signal_new,'-r')
There may be other ways to accomplish this objective. I used linear interpolation, 'spline' is another option you may want to experiment with, as is the Signal Processing Toolbox resample function.

7 Comments

My usual approach to problems such as yours is to design a frequency-selective filter, probably a lowpass filter. That is probably the best solution for your signal, since the signal itself appears to have a very low frequency component, with the ‘dropouts’ being high frequency.
I would have done this for your signal, however discrete filter design requires either a corresponding time vector, or a known sampling frequency (assuming a regularly-sampled signal).
The filter approach is likely the best option.
I can design and test a filter for your signal if you provide the sampling frequency of your signal, and a typical complete signal rather than a sample.
Hey Star Strider,
I am curious to see how will treat this with a lowpass filter. You will find enclosed 2 full example signals. The sampling frequency is 1hz.
Max
This was something of an adventure! My filter idea didn’t work as well as I would have liked, so I went with the interpolation strategy.
The Code
filename = 'signalUpdate.mat';
D = load(filename);
signal = D.signal;
N = size(signal,1);
t = (0:N-1)'; % Create Time Vector (sec)
nnan = nnz(isnan(signal)); % Number Of ‘NaN’ Values (Diagnostic Only)
nzeo = nnz(signal == 0); % Number Of Zero Values (Diagnostic Only)
figure(1)
plot(t, signal) % Time Domain Plot
grid
nr_sig = size(signal,2);
for k1 = 1:nr_sig
Csignal{k1} = signal(~isnan(signal(:,k1)),k1); % Create Cell Arrays For Signals
Ct{k1} = t(~isnan(signal(:,k1))); % Time Vector Cell Array
end
Threshold = 50;
for k1 = 1:nr_sig
Msignal = Csignal{k1}; % Create Copy
MTime = Ct{k1}; % Create Copy
Mask = Msignal < Threshold;
Msignal(Mask) = []; % Set Signal ‘<Threshold’ To ‘[]’
MTime(Mask) = []; % Set Time ‘<Threshold’ To ‘[]’
CIsignal{k1} = interp1(MTime, Msignal, Ct{k1}, 'linear'); % Interpolate
end
figure(2)
for k1 = 1:nr_sig
subplot(nr_sig, 1, k1)
plot(Ct{k1}, CIsignal{k1})
grid
end
This creates a copy of the original data in cell array ‘Csignal’, and then interpolates it to create a new cell array ‘CIsignal’ with the discontinuities removed. The result in ‘CIsignal’ appears to be what you want. The ‘Ct’ cell arrays are the corresponding time vectors for ‘Csignal’ and ‘CIsignal’.
The cell arrays have the advantage here of containing only the data, and not the NaN values at the end (apparently to create equal-length vectors for the matrix).
My code is robust to the data you provided. I assume they are representative of all your data. If so, it should be robust to your other data as well.
Note that the ‘nnan’ and ‘nzro’ are for information only and are not necessary for the code to run. I used them to learn some of the characteristics of your data. You can delete them. I left them in my posted code in the event you need them.
You can create a function from my code if you want to. See the documentation on Function Basics (link) for details. I will help as needed.
Max, let's remove the ambiguity. Do you want to
  1. "replace it by NaN or 0" which you asked for, OR
  2. fix/repair the bad places by interpolation, which might be more useful?
#1 will either show gaps (no plotting) if you replaced by nan, or go (plot) all the way down to the x axis if you replaced by 0. #2 will plot in the "bad regions" a smooth signal. I did #1 while Star did #2. Which way do you want it, Max?
Actually, my code eliminates the artefacts first, (creating ‘Msignal’ and the corresponding ‘MTime’ that are temporary and could be saved if desired), then uses those to do the interpolation.
There is nothing statistically ‘wrong’ with interpolating data if there are abrupt discontinuities due to ‘dropped’ data, and the underlying trend is neither significantly noisy nor changing rapidly. Those are true for the data presented, so interpolation is acceptable here.
Hi,
Thanks you Star Strider. This solution seems efficient. In fact, I did not wanted to interpolate but given the good result, I will activate or not this option based on the signal I am treating.
Thanks a lot to you both!
Max
Our pleasure!
If you do not want to interpolate, change this part of my code to output the non-interpolated data and time vectors as ‘SplitSignal’:
for k1 = 1:nr_sig
Msignal = Csignal{k1}; % Create Copy
MTime = Ct{k1}; % Create Copy
Mask = Msignal < Threshold;
Msignal(Mask) = []; % Set Signal ‘<Threshold’ To ‘[]’
MTime(Mask) = []; % Set Time ‘<Threshold’ To ‘[]’
CIsignal{k1} = interp1(MTime, Msignal, Ct{k1}, 'linear'); % Interpolate
SplitSignal{k1} = [MTime, Msignal]; % Store Time & Signal Without Interpolation
end

Sign in to comment.

Image Analyst
Image Analyst on 23 Sep 2017
Edited: Image Analyst on 23 Sep 2017
You can detect "bad" elements either by computing the MAD and then thresholding to identify them, and then replacing them with zero or NAN (whichever you want)
badIndexes = abs(signal - conv(signal, ones(1, windowWidth)/windowWidth, 'same')) > 10;
signal(badIndexes) = nan; % or 0

2 Comments

Two lines where I blur and subtract are too complex? Let me make it longer where I make intermediate signals where you can see what's going on. Bad locations are nans, as you requested, so those points are not plotted so that's why there are "gaps" in the curve, but that's what you asked for.
s = load('signal.mat')
signal = s.signal;
subplot(4, 1, 1);
plot(signal, 'b-');
grid on;
% Now fix the signal.
windowWidth = 231;
kernel = ones(1, windowWidth) / windowWidth;
blurredSignal = conv(signal, kernel, 'same');
subplot(4, 1, 2);
plot(blurredSignal, 'b-');
grid on;
mad = abs(signal - blurredSignal);
subplot(4, 1, 3);
plot(mad, 'b-');
grid on;
badIndexes = mad > 10;
fixedSignal = signal; % Initialize.
fixedSignal(badIndexes) = nan; % or 0
subplot(4, 1, 4);
plot(fixedSignal, 'b-');
grid on;
Hi Image Analyst,
Thanks for the update. In fact I missed the "/ windowWidth" and I used too narrow windows. But know it works perfectly.
Max

Sign in to comment.

maxroucool
maxroucool on 23 Sep 2017
Hello,
Thank you very much to you both!
I tried both solutions : The first one is easy to understand and work well on the sample I provided. But in fact, I gave only the beginning of the curve. So the polyfit does not fit well enough the whole curve. I tried various polynomial order but it just don't fit...
The second one is more complex to understand. I checked what is the mathematical function used and I think I understood. However, I tried for several "windowWidth" but none seems to work... Any idea on how to make it work?
Thanks a lot,
Max

1 Comment

@maxroucool: Please post comments in the section for comments. It is not clear, to which answer this comment belongs.

Sign in to comment.

Categories

Asked:

on 23 Sep 2017

Edited:

on 1 Oct 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!