Accessing Mass spectrometry data using the Proteome Commons IO Library
There are many formats for mass spectrometry data. The Proteome Commons Project http://www.proteomecommons.org provides a library of functions for reading many of the widely used formats. In this example we will read a ABI T2D file.
Contents
Add the Proteome Commons IO Library to MATLAB
Download the Proteome Commons IO jar files from http://www.proteomecommons.org/current/531
The documentation says that all the T2D support is in the main libraries but I had to also download the IO-T2D extension. http://www.proteomecommons.org/archive/1118700527203/index.html
javaaddpath('d:\ProteomeCommons\t2d\ProteomeCommons.org-IO-T2D.jar') javaaddpath('d:\ProteomeCommons\ProteomeCommons.org-IO.jar') import org.proteomecommons.io.t2d.* import org.proteomecommons.io.*
Open the file and get the peak list
t2dFile = org.proteomecommons.io.t2d.T2DPeakListReader('D01_LINEAR_1.t2d');
peakList = t2dFile.getPeakList
peakList = spectrumID: instrumentID: 347000060 operatingMode: 1 dataFormat: 3 compressionType: 2 startTime: 45904.0 incrementTime: 1.0 timeStamp: 0.0 totalIONCount: 1.38464581E8 basePeakTimeS: 0.0 basePeakIntensity: 0.0 totalShots: 3000 totalAccumulations: 30 defaultCalibrationEquationType: 1 defaultCalibrationNConstants: 3 defaultCalibrationConstant (1): -19.6071105762 defaultCalibrationConstant (2): 9.460726319330329E-7 defaultCalibrationConstant (3): 2.65680681285942E-5 acquisitionCalibrationEquationType: 1 acquisitionCalibrationNConstants: 3 acquisitionCalibrationConstant (1): -19.6071105762 acquisitionCalibrationConstant (2): 9.460726319330329E-7 acquisitionCalibrationConstant (3): 2.65680681285942E-5 locationBounds.xMin: 10891.1598458665 locationBounds.yMin: 31154.7323682203 locationBounds.xMax: 12148.4260626032 locationBounds.yMax: 32290.606124471502 intensityRange.lMin: 4900.0 intensityRange.lMax: 4900.0 flags: 0 1999.9958762835058 9.803921699523926 2000.0830819650498 8.104575157165527 2000.1702895522499 10.326797485351562 2000.2574990451055 15.03268051147461 2000.344710443991 12.549020767211914 2000.4319237481593 13.07189655303955 2000.5191389581712 15.816994667053223 2000.6063560740265 11.895425796508789 .....
Use the methods command to show what methods are available for this object.
methods(peakList)
Methods for class org.proteomecommons.io.t2d.T2DPeakList: T2DPeakList duplicate equals getAcquisitionCalibrationConstants getAcquisitionCalibrationEquationType getAveraged getBasePeakIntensity getBasePeakTimeS getClass getCompressionType getDataFormat getDefaultCalibrationConstants getDefaultCalibrationEquationType getFlags getIncrementTime getInstrumentID getIntensityRange getLocationBounds getName getOperatingMode getParent getParentPeakList getPeaks getSpectrumID getStartTime getTandemCount getTimeStamp getTotalAccumulations getTotalIONCount getTotalShots hashCode notify notifyAll setAveraged setBasePeakIntensity setBasePeakTimeS setCompressionType setDataFormat setDataPoints setDefaultCalibrationConstants setDefaultCalibrationEquationType setFlags setInstrumentID setIntensityRange setLocationBounds setName setOperatingMode setParent setParentPeakList setPeaks setSpectrumID setTandemCount setTimeStamp setTotalAccumulations setTotalIONCount setTotalShots toString wait
Extract the peaks
This creates an array of peaks.
peaks = peakList.getPeaks;
You can access the individual peak information
methods(peaks(1)) peaks(1).getCharge peaks(1).getIntensity peaks(1).getMassOverCharge
Methods for class org.proteomecommons.io.t2d.T2DPeak:
T2DPeak getIntensity setCharge
compareTo getMassOverCharge setIntensity
equals hashCode setMassOverCharge
getAveraged notify toString
getCharge notifyAll wait
getClass setAveraged
ans =
-2.1475e+009
ans =
9.8039
ans =
2.0000e+003
Copy the data into a MATLAB array
Loop over all the data
numPeaks = numel(peaks); MZ = zeros(numPeaks,1); I = zeros(numPeaks,1); for count = 1:numel(peaks) MZ(count) = peaks(count).getMassOverCharge; I(count) = peaks(count).getIntensity; end
Plot the results
plot(MZ,I)