How to identify repetitive patterns in data

43 views (last 30 days)
Dhiraj
Dhiraj on 28 May 2014
Commented: Star Strider on 1 Jun 2014
Hello. I have 72x5184 samples of a waveform.There are 72 frames of data being transmitted and each transmitted frame consists of 5184 samples. Now I know that there is a known sequence being transmitted at the end of every frame and this particular waveform is contained in the last 576 samples that I have captured of every frame. The problem is that this known sequence is shifting and I need to know how much it has shifted. So what I'm looking for is how to identify a pattern in sampled data. Also this pattern is not known to me. Visually when I plot the data ,I can see it. But it's very painful to keep scanning these samples and try to visually identify the location of the repeating pattern. Is there a more elegant way of doing this?
  3 Comments
Dhiraj
Dhiraj on 28 May 2014
The sequence is not known, however what is known is that it will repeat itself regularly. Ideally this repetition should be at the end of every frame, but this is not happening and I'm trying to figure out by how much it is shifting.
Image Analyst
Image Analyst on 28 May 2014
Can you attach a mat file with your data so we can see it?

Sign in to comment.

Answers (1)

Star Strider
Star Strider on 28 May 2014
Edited: Star Strider on 28 May 2014
For an unknown but invariant sequence located somewhere in each of the 72 of your 576-element vectors, this will locate the approximate mid-point of the sequence in each of your vectors:
% Create data
recnr = 10; % Change to 72
F = rand(recnr,576); % Record array
S = [1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1]; % Known sequence
for k1 = 1:size(F,1) % Hide sequence randomly in data
chk(k1) = randi(550); % Track start location of sequence
idx = chk(k1)+[1:17]; % Define indices of sequence in data
F(k1,idx) = S; % Insert sequence in data
end
% Find sequence in data
for k1 = 1:size(F,1)-1
for k2 = 1:size(F,2)
loc(k1,:) = F(k1,:) - circshift(F(k1+1,:), [k1 k2]);
end
locidx(k1,:) = find(loc(k1,:) <= min(loc(k1,:)));
end
loc(size(F,1),:) = F(size(F,1),:) - circshift(F(1,:), [0 k2]);
locidx(size(F,1),:) = find(loc(size(F,1),:) <= min(loc(size(F,1),:)));
locidx = circshift(locidx, [1 0]);
% Plot location of sequence in the data
figure(1)
ribbon(loc')
grid on
% Plot the ‘known’ and ‘discovered’ locations of the sequence
figure(2)
plot(locidx, chk, '+b')
grid
Since you do not know the sequence but only that it is invariant except for position in the 576-element vector in each record, this will locate the approximate mid-position of the sequence in each vector. I chose a particular sequence here for illustration, but also tried it with a random sequence. It works for any sequence of fixed length and invariant pattern.
It does a ‘brute force’ comparison of each pair of vectors, shifting and subtracting one from the next one in the sequence. At the end it does the same comparison on the remaining pair of vectors, and shifts the index vector to align them correctly with respect to their positions in each of your 72 records.
The plots illustrate the idea. Figure 1 shows the location of the sequence in the data, and Figure 2 shows that the ‘known’ and ‘discovered’ initial index locations of the sequence are the same, though offset by about half the length of the sequence. This should be enough for you to determine the sequence location. If you know the sequence length, you can then write code to isolate and identify it.
If I understand what you are asking and the characteristics of your vectors and your constraints, this should do what you want.
  4 Comments
Dhiraj
Dhiraj on 1 Jun 2014
The problem I'm facing is that the data is 'sampled' data. So the values of the samples for the known sequence keep changing while the pattern remains the same. So say if it's a sine wave, the sampled values could be 0, .25, 1, 0.25, 0, -.25, -1, 0 and the next time it could be something totally different....but the shape is still a sine. I don't know if I am able to convey what I want to correctly. Notwithstanding, I am attaching the raw samples captured. You can neglect the first 2 frames....ie 2x5184 samples. After that, the last 576 samples of every frame should be the same. @Image Analyst...See if you can make any sense out of this. Thanks again.
Star Strider
Star Strider on 1 Jun 2014
I cannot get any useful information at all from your signal. The Fourier transform is essentially flat except for a DC offset, meaning that for all intents and purposes it is a broadband impulse function.
I did a wavelet analysis on it using the Daubechies, Haar, and Mexican Hat wavelets, and it continued to look random.
If you know:
  1. Sampling Frequency of your signal (and that it does not change between signals)
  2. Frequency or period of the sine sequence (and that it does not change between signals)
You can probably use filter to find it. (That was my first approach, before posting my ‘brute force’ approach earlier.) However considering the nature of the signal, without the information on the sine curve, there is no way I can see to identify it. The filter approach requires the usual constraints on discrete filtering, Nyquist frequency and such, so your sine signal has to be sampled at least twice the Nyquist frequency.
If you can identify and extract a short segment of your signal that includes your entire sine sequence, along with the sampling times (and units, preferably in seconds), it might be possible to design a filter to detect it. It still must be unique in your signal in order to detect it accurately.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!