mspeaks - Convert raw mass spectrometry data to peak list (centroided data)

Syntax

Peaks = mspeaks(MZ, Intensities)

Peaks = mspeaks(MZ, Intensities, ...'Base', BaseValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'Levels', LevelsValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'NoiseEstimator', NoiseEstimatorValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'Multiplier', MultiplierValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'Denoising', DenoisingValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'PeakLocation', PeakLocationValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'FWHHFilter', FWHHFilterValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'OverSegmentationFilter', OverSegmentationFilterValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'HeightFilter', HeightFilterValue, ...)
Peaks = mspeaks(MZ, Intensities, ...'ShowPlot', ShowPlotValue, ...)

Arguments

MZVector of mass/charge (m/z) values for a set of spectra. The number of elements in the vector equals n or the number of rows in matrix Intensities.
IntensitiesMatrix of intensity values for a set of mass spectra that share the same mass/charge (m/z) range. Each row corresponds to an m/z value, and each column corresponds to a spectrum or retention time. The number of rows equals n or the number of elements in vector MZ.
BaseValueAn integer between 2 and 20 that specifies the wavelet base. Default is 4.
LevelsValueAn integer between 1 and 12 that specifies the number of levels for the wavelet decomposition. Default is 10.
NoiseEstimatorValue

String or scalar that specifies the method to estimate the threshold, T, to filter out noisy components in the first high-band decomposition (y_h). Choices are:

  • mad — Default. Median absolute deviation, which calculates T = sqrt(2*log(n))*mad(y_h) / 0.6745, where n = the number of rows in the Intensities matrix.

  • std — Standard deviation, which calculates T = std(y_h).

  • A positive real value.

MultiplierValueA positive real value that specifies the threshold multiplier constant. Default is 1.0.
DenoisingValueControls the use of wavelet denoising to smooth the signal. Choices are true (default) or false.

    Note   If your data has previously been smoothed, for example, with the mslowess or mssgolay function, it is not necessary to use wavelet denoising. Set this property to false.

PeakLocationValueValue that specifies the proportion of the peak height that selects the points used to compute the centroid mass of the respective peak. The value must be ≥ 0 and ≤ 1. Default is 1.0.
FWHHFilterValuePositive real value that specifies the minimum full width at half height (FWHH), in m/z units, for reported peaks. Peaks with FWHH below this value are not included in the output list Peaks. Default is 0.
OverSegmentationFilterValuePositive real value that specifies the minimum distance, in m/z units, between neighboring peaks. When the signal is not smoothed appropriately, multiple maxima can appear to represent the same peak. By increasing this filter value, oversegmented peaks are joined into a single peak. Default is 0.
HeightFilterValuePositive real value that specifies the minimum height for reported peaks. Default is 0.
ShowPlotValueControls the display of a plot of the original and the smoothed signal, with the peaks included in the output matrix Peaks marked. Choices are true, false, or I, an integer specifying the index of a spectrum in Intensities. If set to true, the first spectrum in Intensities is plotted. Default is:
  • false — When return values are specified.

  • true — When return values are not specified.

Return Values

PeaksTwo-column matrix where each row corresponds to a peak. The first column contains mass/charge (m/z) values, and the second column contains ion intensity values.

Description

Peaks = mspeaks(MZ, Intensities) finds relevant peaks in raw mass spectrometry data, and creates Peaks, a two-column matrix, containing the m/z value and ion intensity for each peak.

mspeaks finds peaks by first smoothing the signal using undecimated wavelet transform with Daubechies coefficients, then assigning peak locations, and lastly, eliminating peaks that do not satisfy specified criteria.

Peaks = mspeaks(MZ, Intensities, ...'PropertyName', PropertyValue, ...) calls mspeaks with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:


Peaks = mspeaks(MZ, Intensities, ...'Base', BaseValue, ...)
specifies the wavelet base. BaseValue must be an integer between 2 and 20. Default is 4.

Peaks = mspeaks(MZ, Intensities, ...'Levels', LevelsValue, ...) specifies the number of levels for the wavelet decomposition. LevelsValue must be an integer between 1 and 12. Default is 10.

Peaks = mspeaks(MZ, Intensities, ...'NoiseEstimator', NoiseEstimatorValue, ...) specifies the method to estimate the threshold, T, to filter out noisy components in the first high-band decomposition (y_h). Choices are:

Peaks = mspeaks(MZ, Intensities, ...'Multiplier', MultiplierValue, ...) specifies the threshold multiplier constant. MultiplierValue must be a positive real value. Default is 1.0.

Peaks = mspeaks(MZ, Intensities, ...'Denoising', DenoisingValue, ...) controls the use of wavelet denoising to smooth the signal. Choices are true (default) or false.

Peaks = mspeaks(MZ, Intensities, ...'PeakLocation', PeakLocationValue, ...) specifies the proportion of the peak height that selects the points used to compute the centroid mass of the respective peak. PeakLocationValue must be a value ≥ 0 and ≤ 1. Default is 1.0.

Peaks = mspeaks(MZ, Intensities, ...'FWHHFilter', FWHHFilterValue, ...) specifies the minimum full width at half height (FWHH), in m/z units, for reported peaks. Peaks with FWHH below this value are not included in the output list Peaks. FWHHFilterValue must be a positive real value. Default is 0.

Peaks = mspeaks(MZ, Intensities, ...'OverSegmentationFilter', OverSegmentationFilterValue, ...) specifies the minimum distance, in m/z units, between neighboring peaks. When the signal is not smoothed appropriately, multiple maxima can appear to represent the same peak. By increasing this filter value, oversegmented peaks are joined into a single peak. OverSegmentationFilterValue must be a positive real value. Default is 0.

Peaks = mspeaks(MZ, Intensities, ...'HeightFilter', HeightFilterValue, ...) specifies the minimum height for reported peaks. Peaks with heights below this value are not included in the output list Peaks. HeightFilterValue must be a positive real value. Default is 0.

Peaks = mspeaks(MZ, Intensities, ...'ShowPlot', ShowPlotValue, ...) controls the display of a plot of the original and the smoothed signal, with the peaks included in the output matrix Peaks marked. Choices are true, false, or I, an integer specifying the index of a spectrum in Intensities. If set to true, the first spectrum in Intensities is plotted. Default is either:

Examples

  1. Load a MAT-file, included with the Bioinformatics Toolbox software, which contains mass spectrometry data variables, including MZ_lo_res, a vector of m/z values for a set of spectra, and Y_lo_res, a matrix of intensity values for a set of mass spectra that share the same m/z range.

    load sample_lo_res
    
  2. Adjust the baseline of the eight spectra stored in Y_lo_res.

    YB = msbackadj(MZ_lo_res,Y_lo_res);
    
  3. Convert the raw mass spectrometry data to a peak list by finding the relevant peaks in each spectrum.

    P = mspeaks(MZ_lo_res,YB);
    
  4. Plot the third spectrum in YB, the matrix of baseline-corrected intensity values, with the detected peaks marked.

    P = mspeaks(MZ_lo_res,YB,'SHOWPLOT',3);

  5. Smooth the signal using the mslowess function. Then convert the smoothed data to a peak list by finding relevant peaks and plot the third spectrum.

    YS = mslowess(MZ_lo_res,YB,'SHOWPLOT',3);
    

    P = mspeaks(MZ_lo_res,YS,'DENOISING',false,'SHOWPLOT',3);

  6. Use the cellfun function to remove all peaks with m/z values less than 2000 from the eight peaks lists in output P. Then plot the peaks of the third spectrum (in red) over its smoothed signal (in blue).

    Q = cellfun(@(p) p(p(:,1)>2000,:),P,'UniformOutput',false);
    figure
    plot(MZ_lo_res,YS(:,3),'b',Q{3}(:,1),Q{3}(:,2),'rx')
    xlabel('Mass/Charge (M/Z)')
    ylabel('Relative Intensity')
    axis([0 20000 -5 95])
    

References

[1] Morris, J.S., Coombes, K.R., Koomen, J., Baggerly, K.A., and Kobayash, R. (2005) Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum. Bioinfomatics 21:9, 1764–1775.

[2] Yasui, Y., Pepe, M., Thompson, M.L., Adam, B.L., Wright, G.L., Qu, Y., Potter, J.D., Winget, M., Thornquist, M., and Feng, Z. (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4:3, 449–463.

[3] Donoho, D.L., and Johnstone, I.M. (1995) Adapting to unknown smoothness via wavelet shrinkage. J. Am. Statist. Asso. 90, 1200–1224.

[4] Strang, G., and Nguyen, T. (1996) Wavelets and Filter Banks (Wellesley: Cambridge Press).

[5] Coombes, K.R., Tsavachidis, S., Morris, J.S., Baggerly, K.A., Hung, M.C., and Kuerer, H.M. (2005) Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics 5(16), 4107–4117.

See Also

Bioinformatics Toolbox functions: msbackadj, msdotplot, mslowess, mspalign, msppresample, mssgolay

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS