samplealign

Align peaks in signal to reference peaks

Syntax

[I,J]
= samplealign(X,Y)

[I,J] = samplealign(X,X,Y,Name,Value)

Description

[I,J] = samplealign(X,Y) aligns the observations in two matrices of data, X and Y, by introducing gaps.

example

[I,J] = samplealign(X,X,Y,Name,Value) modifies the behavior of samplealign using one or more Name=Value arguments. For example, you can get a plot of relevant statistics by setting ShowNetwork=true.

example

Examples

collapse all

Recover Nonlinear Warping between Two Signals with Noisy Gaussian Peaks

Open Live Script

Create two signals with noisy Gaussian peaks.

rng('default')
peakLoc = [30 60 90 130 150 200 230 300 380 430];
peakInt = [7 1 3 10 3 6 1 8 3 10];
time = 1:450;
comp = exp(-(bsxfun(@minus,time,peakLoc')./5).^2);
sig_1 = (peakInt + rand(1,10)) * comp + rand(1,450);
sig_2 = (peakInt + rand(1,10)) * comp + rand(1,450);

Define a nonlinear warping function.

wf = @(t) 1 + (t<=100).*0.01.*(t.^2) + (t>100).*...
     (310+150*tanh(t./100-3));

Warp the second signal to distort it.

sig_2 = interp1(time,sig_2,wf(time),'pchip');

Align the observations between the two signals by introducing gaps.

[i,j] = samplealign([time;sig_1]',[time;sig_2]',...
                    'weights',[0,1],'band',35,'quantile',.5);

Plot the reference signal, distorted signal, and warped (corrected) signal.

figure
sig_3 = interp1(time,sig_2,interp1(i,j,time,'pchip'),'pchip');
plot(time,sig_1,time,sig_2,time,sig_3)
legend('Reference','Distorted Signal','Corrected Signal')
title('Non-linear Warping Example')

Figure contains an axes object. The axes object with title Non-linear Warping Example contains 3 objects of type line. These objects represent Reference, Distorted Signal, Corrected Signal.

Plot the real and the estimated warping functions.

figure
plot(time,wf(time),time,interp1(j,i,time,'pchip'))
legend('Distorting Function','Estimated Warping')

Figure contains an axes object. The axes object contains 2 objects of type line. These objects represent Distorting Function, Estimated Warping.

Input Arguments

collapse all

`X` — Data containing sequential observations
real matrix

Data containing sequential observations, specified as a real matrix. Rows of X correspond to observations or samples, and columns correspond to features or dimensions. The first column is the reference dimension, and must contain unique values in ascending order. The reference dimension could contain sample indices of the observations or a measurable value, such as time.

Data Types: double

`Y` — Data containing sequential observations
real matrix

Data containing sequential observations, specified as a real matrix. Rows of Y correspond to observations or samples, and columns correspond to features or dimensions. The first column is the reference dimension and must contain unique values in ascending order. The reference dimension could contain sample indices of the observations or a measurable value, such as time.

Data Types: double

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: samplealign(X,Y,ShowNetwork=true) plots the resulting dynamic programming statistics after computing the alignment

`Band` — Maximum allowable distance between observations along the reference dimension
`Inf` (default) | positive scalar | function handle

Maximum allowable distance between observations along the reference dimension, specified as a positive scalar, Inf, or a function handle.

Band limits the number of potential matches between observations in two data sets. If S is the value in the reference dimension for a given observation (row) in one data set, then that observation is matched only with observations in the other data set whose values in the reference dimension fall within S ± Band. Only these potential matches are passed to the algorithm for further scoring.

If you specify Band as a function handle, samplealign sequentially passes the midpoint between each of the reference observations (scalars) x(i) in one data set and a given reference observation y(j) in the other data set. samplealign checks whether

|x(i) – y(j)| ≤ band((x(i) + y(j))/2)

and if so, the observations X(i,:) and Y(j,:) are potential matches.

The Band constraint reduces the time and memory complexity of the algorithm from O(MN) to O(sqrt(MN)*K), where M and N are the number of observations in X and Y respectively, and K is a small constant such that K<<M and K<<N. Adjust Band to the maximum expected shift between the reference dimensions in the two data sets, that is, between X(:,1) and Y(:,1).

Note

If you specify both Band and Width, only pairs of observations that meet both constraints are considered potential matches and passed to the algorithm for scoring.

Tip

Specify Width when you do not have a good estimate for Band. To get an indication of the memory required to run the algorithm with specific Band and Width parameters on your data sets, run samplealign, but do not specify return values and set ShowConstraints to true.

Example: 15

Data Types: double | function_handle

`Distance` — Function to calculate distance between pairs of observations that are potential matches
`@(r,s)sqrt(sum((r-s).^2,2))` (default) | function handle of the form `@(r,s)function(r,s)`

Function to calculate distance between pairs of observations that are potential matches, specified as a function handle of the form @(r,s)function(r,s). The function accepts r and s as matrices of the same size whose paired rows represent all potential matches of observations in X and Y respectively. The function returns a column vector of nonnegative distance values with the same number of rows as r and s.

Distance uses all columns in X and Y, including the reference dimension, when calculating distances. If you do not want to include the reference dimension in the distance calculations, use the Weights name-value argument to exclude it.

Data Types: function_handle

`Gap` — Penalty for an observation being matched to a gap in the other data
`1` (default) | nonnegative scalar | function handle | cell array of two function handles or a mix of function handles and nonnegative scalars

Penalty for an observation being matched to a gap in the other data set, specified as one of the following.

Cell array of two function handles, {@(x)G(x),@(y)H(y)}. Each of the functions G and H calculate the penalty for an observation (row) when it is matched to a gap in the other data set. These functions return a column vector of penalty values with the same number of rows as X or Y.
Single function handle @(x)Z(x) uses Z as both functions in the cell array syntax: {@(x)Z(x),@(y)Z(y)}.
The two-element cell array can contain a scalar value instead of a function handle for either or both entries. In this case, the scalar value is used as the penalty for a row being matched to gap.
A single nonnegative scalar value is used as the penalty for all rows matched to a gap.

Gap specifies the position-dependent terms for assigning gap penalties. The calculated value, GPX, is the gap penalty for matching observations from the first data set X to gaps inserted in the second data set Y, and is the product of two terms: GPX = G * QMS. The term G takes its value as a function of the observations in X. Similarly, GPY is the gap penalty for matching observations from Y to gaps inserted in X, and is the product of two terms: GPY = H * QMS. The term H takes its value as a function of the observations in Y. By default, the term QMS is the 0.75 quantile of the score for the pairs of observations that are potential matches (that is, pairs that comply with the Band and Width constraints).

Note

Gap defaults to a relatively safe value. However, the success of the algorithm depends on the fine tuning of the gap penalties, which is application dependent. When the gap penalties are large relative to the score of the correct matches, samplealign returns alignments with fewer gaps, but with more incorrectly aligned regions. When the gap penalties are smaller, the output alignment contains longer regions with gaps and fewer matched observations. Set ShowNetwork to true to compare the gap penalties to the score of matched observations in different regions of the alignment.

Example: 10

Data Types: double | cell | function_handle

`Quantile` — Quantile value for `Gap`
`0.75` (default) | scalar between `0` and `1`

Quantile value used to calculate QMS for Gap, a scalar between 0 and 1.

Example: 0.5

Data Types: double

`ShowAlignment` — Indication to plot sample matching
`false` (default) | `true` | positive inteer

Indication to plot sample matching, specified as false (do not plot), true (plot the data for column 2), or a positive integer that is the index of the column of data to plot. The plot uses the reference dimension as the horizontal axis, the selected index as the vertical axis, and shows links between all the potential matches that meet the constraints, the potential and selected matches along with the data from X and Y.

See Algorithms.

Example: true

Data Types: logical

`ShowConstraints` — Indication to plot `Band` and `Width` data
`false` (default) | `true`

Indication to plot Band and Width data, specified as false (do not plot) or true (plot the data). Use the plot to estimate the memory required to run the algorithm with specific 'Band' and 'Width' on your data set.

See Algorithms.

Example: true

Data Types: logical

`ShowNetwork` — Indication to plot dynamic programming statistics
`false` (default) | `true`

Indication to plot dynamic programming statistics, specified as false (do not plot) or true (plot the statistics). The statistics include:

Gap scores
Match penalties
Winning path plot
Dynamic programming network

See Algorithms.

Example: true

Data Types: logical

`Weights` — Relative weight data columns
`true(size(X,2))` (default) | logical row vector | nonnegative row vector

Relative weight of data columns in X and Y, specified as a nonnegative or logical row vector. A 0 or false value means samplealign does not use that column when calculating the Distance between observations that are potential matches. Positive values multiply the corresponding distance scores for the associated columns.

Tip

Setting some Weights values to 0 can speed the distance calculation when the data sets have many columns (features).

The weight values do not affect computations using the Band or Width name-value arguments, or gap penalty computations using the Gap name-value argument.

Data Types: double | logical

`Width` — Number of potential entries in other data set to score
`10` (default) | two-element positive vector | positive scalar

Number of potential entries in other data set to score, specified as a two-element positive vector or a positive scalar.

Two-element positive vector [u,v] — Each element in X is scored to the closest u observations in Y. Each element in Y is scored to the closest v observations in X. Then, only these potential matches are passed to the algorithm for further scoring. Closeness is measured using only the first column (the reference dimension) in each data set.
Positive scalar z — The two-element algorithm is used with the vector [z,z].

The Width constraint reduces the time and memory complexity of the algorithm from O(MN) to O(sqrt(MN)*sqrt(UV)), where M and N are the number of observations in X and Y respectively, and U and V are small constants such that U<<M and V<<N.

Note

If you specify both Band and Width, only pairs of observations that meet both constraints are considered potential matches and passed to the algorithm for scoring.

Tip

Note

If you provide a Band name-value argument, the default value of Width is Inf.

Example: [10,20]

Data Types: double

Output Arguments

collapse all

`I` — Indices of the matches for each row (observation) in `X`
integer column vector

Indices of the matches for each row (observation) in X, returned as an integer column vector.

`J` — Indices of the matches for each row (observation) in `Y`
real vector

Indices of the matches for each row (observation) in Y, returned as an integer column vector.

Algorithms

samplealign uses a dynamic programming algorithm to minimize the sum of positive scores resulting from pairs of observations that are potential matches and the penalties resulting from the insertion of gaps. Return values I and J are column vectors containing indices that indicate the matches for each row (observation) in X and Y respectively. For the algorithmic complexity, see Band.

If you do not specify the return values I and J, samplealign does not run the dynamic programming algorithm. Running samplealign without return values, but setting the ShowConstraints, ShowNetwork, or ShowAlignment name-value arguments to true, lets you explore the constrained search space, the dynamic programming network, or the aligned observations, without running into potential memory problems.

References

[1] Myers, C.S. and Rabiner, L.R. (1981). A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal 60:7, 1389–1409.

[2] Sakoe, H. and Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech and Signal Processing ASSP-26(1), 43–49.

Version History

Introduced in R2007b

samplealign

Syntax

Description

Examples

Recover Nonlinear Warping between Two Signals with Noisy Gaussian Peaks

Input Arguments

`X` — Data containing sequential observations
real matrix

`Y` — Data containing sequential observations
real matrix

Name-Value Arguments

`Band` — Maximum allowable distance between observations along the reference dimension
`Inf` (default) | positive scalar | function handle

`Distance` — Function to calculate distance between pairs of observations that are potential matches
`@(r,s)sqrt(sum((r-s).^2,2))` (default) | function handle of the form `@(r,s)function(r,s)`

`Gap` — Penalty for an observation being matched to a gap in the other data
`1` (default) | nonnegative scalar | function handle | cell array of two function handles or a mix of function handles and nonnegative scalars

`Quantile` — Quantile value for `Gap`
`0.75` (default) | scalar between `0` and `1`

`ShowAlignment` — Indication to plot sample matching
`false` (default) | `true` | positive inteer

`ShowConstraints` — Indication to plot `Band` and `Width` data
`false` (default) | `true`

`ShowNetwork` — Indication to plot dynamic programming statistics
`false` (default) | `true`

`Weights` — Relative weight data columns
`true(size(X,2))` (default) | logical row vector | nonnegative row vector

`Width` — Number of potential entries in other data set to score
`10` (default) | two-element positive vector | positive scalar

Output Arguments

`I` — Indices of the matches for each row (observation) in `X`
integer column vector

`J` — Indices of the matches for each row (observation) in `Y`
real vector

Algorithms

References

Version History

See Also

Topics

samplealign

Syntax

Description

Examples

Recover Nonlinear Warping between Two Signals with Noisy Gaussian Peaks

Input Arguments

X — Data containing sequential observations real matrix

Y — Data containing sequential observations real matrix

Name-Value Arguments

Band — Maximum allowable distance between observations along the reference dimension Inf (default) | positive scalar | function handle

Distance — Function to calculate distance between pairs of observations that are potential matches @(r,s)sqrt(sum((r-s).^2,2)) (default) | function handle of the form @(r,s)function(r,s)

Gap — Penalty for an observation being matched to a gap in the other data 1 (default) | nonnegative scalar | function handle | cell array of two function handles or a mix of function handles and nonnegative scalars

Quantile — Quantile value for Gap 0.75 (default) | scalar between 0 and 1

ShowAlignment — Indication to plot sample matching false (default) | true | positive inteer

ShowConstraints — Indication to plot Band and Width data false (default) | true

ShowNetwork — Indication to plot dynamic programming statistics false (default) | true

Weights — Relative weight data columns true(size(X,2)) (default) | logical row vector | nonnegative row vector

Width — Number of potential entries in other data set to score 10 (default) | two-element positive vector | positive scalar

Output Arguments

I — Indices of the matches for each row (observation) in X integer column vector

J — Indices of the matches for each row (observation) in Y real vector

Algorithms

References

Version History

See Also

Topics

`X` — Data containing sequential observations
real matrix

`Y` — Data containing sequential observations
real matrix

`Band` — Maximum allowable distance between observations along the reference dimension
`Inf` (default) | positive scalar | function handle

`Distance` — Function to calculate distance between pairs of observations that are potential matches
`@(r,s)sqrt(sum((r-s).^2,2))` (default) | function handle of the form `@(r,s)function(r,s)`

`Gap` — Penalty for an observation being matched to a gap in the other data
`1` (default) | nonnegative scalar | function handle | cell array of two function handles or a mix of function handles and nonnegative scalars

`Quantile` — Quantile value for `Gap`
`0.75` (default) | scalar between `0` and `1`

`ShowAlignment` — Indication to plot sample matching
`false` (default) | `true` | positive inteer

`ShowConstraints` — Indication to plot `Band` and `Width` data
`false` (default) | `true`

`ShowNetwork` — Indication to plot dynamic programming statistics
`false` (default) | `true`

`Weights` — Relative weight data columns
`true(size(X,2))` (default) | logical row vector | nonnegative row vector

`Width` — Number of potential entries in other data set to score
`10` (default) | two-element positive vector | positive scalar

`I` — Indices of the matches for each row (observation) in `X`
integer column vector

`J` — Indices of the matches for each row (observation) in `Y`
real vector