msalign - Align peaks in mass spectrum to reference peaks

Syntax

IntensitiesOut = msalign(MZ, Intensities, RefMZ)

... = msalign(..., 'Weights', WeightsValue, ...)
... = msalign(..., 'Range', RangeValue, ...)
... = msalign(..., 'WidthOfPulses', WidthOfPulsesValue, ...)
... = msalign(..., 'WindowSizeRatio', WindowSizeRatioValue, ...)
... = msalign(..., 'Iterations', IterationsValue, ...)
... = msalign(..., 'GridSteps', GridStepsValue, ...)
... = msalign(..., 'SearchSpace', SearchSpaceValue, ...)
... = msalign(..., 'ShowPlot', ShowPlotValue, ...)
[IntensitiesOut, RefMZOut] = msalign(..., 'Group', GroupValue, ...)

Arguments

MZVector of mass/charge (m/z) values for a spectrum or set of spectra. The number of elements in the vector equals n or the number of rows in the matrix Intensities.
Intensities

Either of the following:

  • Column vector of intensity values for a spectrum, where each row corresponds to an m/z value.

  • Matrix of intensity values for a set of mass spectra that share the same m/z range, where each row corresponds to an m/z value, and each column corresponds to a spectrum.

The number of rows equals n or the number of elements in vector MZ.

RefMZVector of m/z values of known reference masses in a sample spectrum.

    Tip   For reference peaks, select compounds that do not undergo structural transformation, such as phosphorylation. Doing so will increase the accuracy of your alignment and allow you to detect compounds that do exhibit structural transformations among the sample spectra.

WeightsValueVector of positive values, with the same number of elements as RefMZ. The default vector is ones(size(RefMZ)).
RangeValueTwo-element vector, in which the first element is negative and the second element is positive, that specifies the lower and upper limits of a range, in m/z units, relative to each peak. No peak will shift beyond these limits. Default is [-100 100].
WidthOfPulsesValuePositive value that specifies the width, in m/z units, for all the Gaussian pulses used to build the correlating synthetic spectrum. The point of the peak where the Gaussian pulse reaches 60.65% of its maximum is set to the width specified by WidthOfPulsesValue. Default is 10.
WindowSizeRatioValuePositive value that specifies a scaling factor that determines the size of the window around every alignment peak. The synthetic spectrum is compared to the sample spectrum only within these regions, which saves computation time. The size of the window is given in m/z units by WidthOfPulsesValue * WindowSizeRatioValue. Default is 2.5, which means at the limits of the window, the Gaussian pulses have a value of 4.39% of their maximum.
IterationsValuePositive integer that specifies the number of refining iterations. At every iteration, the search grid is scaled down to improve the estimates. Default is 5.
GridStepsValuePositive integer that specifies the number of steps for the search grid. At every iteration, the search area is divided by GridStepsValue^2. Default is 20.
SearchSpaceValueString that specifies the type of search space. Choices are:
  • 'regular' — Default. Evenly spaced lattice.

  • 'latin' — Random Latin hypercube with GridStepsValue^2 samples.

ShowPlotValueControls the display of a plot of an original and aligned spectrum over the reference masses specified by RefMZ. Choices are true, false, or I, an integer specifying the index of a spectrum in Intensities. If set to true, the first spectrum in Intensities is plotted. Default is:
  • false — When return values are specified.

  • true — When return values are not specified.

GroupValueControls the creation of RefMZOut, a new vector of m/z values to be used as reference masses for aligning the peaks. This vector is created by adjusting the values in RefMZ, based on the sample data from multiple spectra in Intensities, such that the overall shifting and scaling of the peaks is minimized. Choices are true or false (default).

    Tip   Set GroupValue to true only if Intensities contains data for a large number of spectra, and you are not confident of the m/z values used for your reference peaks in RefMZ. Leave GroupValue set to false if you are confident of the m/z values used for your reference peaks in RefMZ.

Return Values

IntensitiesOutEither of the following:
  • Column vector intensity values for a spectrum, where each row corresponds to an m/z value.

  • Matrix of intensity values for a set of mass spectra that share the same mass/charge (m/z) range, where each row corresponds to an m/z value, and each column corresponds to a spectrum.

The intensity values represent a shifting and scaling of the data.

RefMZOutVector of m/z values of reference masses, calculated from RefMZ and the sample data from multiple spectra in Intensities, when GroupValue is set to true.

Description

IntensitiesOut = msalign(MZ, Intensities, RefMZ) aligns the peaks in a raw mass spectrum or spectra, represented by Intensities and MZ, to reference peaks, provided by RefMZ. First, it creates a synthetic spectrum from the reference peaks using Gaussian pulses centered at the m/z values specified by RefMZ. Then, it shifts and scales the m/z scale to find the maximum alignment between the input spectrum or spectra and the synthetic spectrum. (It uses an iterative multiresolution grid search until it finds the best scale and shift factors for each spectrum.) Once the new m/z scale is determined, the corrected spectrum or spectra are created by resampling their intensities at the original m/z values, creating IntensitiesOut, a vector or matrix of corrected intensity values. The resampling method preserves the shape of the peaks.

... = msalign(..., 'PropertyName', PropertyValue, ...) calls msalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


... = msalign(..., 'Weights', WeightsValue, ...)
specifies the relative weight for each mass in RefMZ, the vector of reference m/z values. WeightsValue is a vector of positive values, with the same number of elements as RefMZ. The default vector is ones(size(RefMZ)), which means each reference peak is weighted equally, so that more intense reference peaks have a greater effect in the alignment algorithm. If you have a less intense reference peak, you can increase its weight to emphasize it more in the alignment algorithm.

... = msalign(..., 'Range', RangeValue, ...) specifies the lower and upper limits of the range, in m/z units, relative to each peak. No peak will shift beyond these limits. RangeValue is a two-element vector, in which the first element is negative and the second element is positive. Default is [-100 100].

... = msalign(..., 'WidthOfPulses', WidthOfPulsesValue, ...) specifies the width, in m/z units, for all the Gaussian pulses used to build the correlating synthetic spectrum. The point of the peak where the Gaussian pulse reaches 60.65% of its maximum is set to the width specified by WidthOfPulsesValue. Choices are any positive value. Default is 10. WidthOfPulsesValue may also be a function handle. The function is evaluated at the respective m/z values and returns a variable width for the pulses. Its evaluation should give reasonable values between 0 and max(abs(Range)); otherwise, the function returns an error.

... = msalign(..., 'WindowSizeRatio', WindowSizeRatioValue, ...) specifies a scaling factor that determines the size of the window around every alignment peak. The synthetic spectrum is compared to the sample spectrum only within these regions, which saves computation time. The size of the window is given in m/z units by WidthOfPulsesValue * WindowSizeRatioValue. Choices are any positive value. Default is 2.5, which means at the limits of the window, the Gaussian pulses have a value of 4.39% of their maximum.

... = msalign(..., 'Iterations', IterationsValue, ...) specifies the number of refining iterations. At every iteration, the search grid is scaled down to improve the estimates. Choices are any positive integer. Default is 5.

... = msalign(..., 'GridSteps', GridStepsValue, ...) specifies the number of steps for the search grid. At every iteration, the search area is divided by GridStepsValue^2. Choices are any positive integer. Default is 20.

... = msalign(..., 'SearchSpace', SearchSpaceValue, ...) specifies the type of search space. Choices are:

... = msalign(..., 'ShowPlot', ShowPlotValue, ...) controls the display of a plot of an original and aligned spectrum over the reference masses specified by RefMZ. Choices are true, false, or I, an integer specifying the index of a spectrum in Intensities. If set to true, the first spectrum in Intensities is plotted. Default is:

[IntensitiesOut, RefMZOut] = msalign(..., 'Group', GroupValue, ...) controls the creation of RefMZOut, a new vector of m/z values to be used as reference masses for aligning the peaks. This vector is created by adjusting the values in RefMZ, based on the sample data from multiple spectra in Intensities, such that the overall shifting and scaling of the peaks is minimized. Choices are true or false (default).

Examples

Aligning Mass Spectrum with Three or More Reference Peaks

  1. Load sample data, reference masses, and parameter data for synthetic peak width.

    load sample_lo_res
    R = [3991.4 4598 7964 9160];
    W = [60 100 60 100];
    
  2. Display a color image of the mass spectra before alignment.

    msheatmap(MZ_lo_res,Y_lo_res,'markers',R,'range',[3000 10000])
    title('before alignment')
    

  3. Align spectra with reference masses and display a color image of mass spectra after alignment.

    YA = msalign(MZ_lo_res,Y_lo_res,R,'weights',W);
    msheatmap(MZ_lo_res,YA,'markers',R,'range',[3000 10000])
    title('after alignment')
    

Aligning Mass Spectrum with One Reference Peak

It is not recommended to use the msalign function if you have only one reference peak. Instead, use the following procedure, which shifts the MZ vector, but does not scale it.

  1. Load sample data and view the first sample spectrum.

    load sample_lo_res
    MZ = MZ_lo_res;
    Y = Y_lo_res(:,1);
    msviewer(MZ, Y)
    

  2. Use the tall peak around 4000 m/z as the reference peak. To determine the reference peak's m/z value, click , and then click-drag to zoom in on the peak. Right-click in the center of the peak, and then click Add Marker to label the peak with its m/z value.


  3. Shift a spectrum by the difference between RP, the known reference mass of 4000 m/z, and SP, the experimental mass of 4051.14 m/z.

    RP = 4000;
    SP = 4051.14;
    YOut = interp1(MZ, Y, MZ-(RP-SP));
  4. Plot the original spectrum in red and the shifted spectrum in blue and zoom in on the reference peak.

    plot(MZ,Y,'r',MZ,YOut,'b:')
    xlabel('Mass/Charge (M/Z)')
    ylabel('Relative Intensity')
    legend('Y','YOut')
    axis([3600 4800 -2 60])

References

[1] Monchamp, P., Andrade-Cetto, L., Zhang, J.Y., and Henson, R. (2007) Signal Processing Methods for Mass Spectrometry. In Systems Bioinformatics: An Engineering Case-Based Approach, G. Alterovitz and M.F. Ramoni, eds. (Artech House Publishers).

See Also

Bioinformatics Toolbox™ functions: msbackadj, msheatmap, mspalign, mspeaks, msresample, msviewer

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS