Starting with a group of training data and a given classification (like "disease" or "non-"disease") this function builds a three level Bayesian Network from mass spectrometry data. The function was designed primarily for use in finding proteins that are diagnostic of a disease group using a biologic sample.
The root node of the Bayesian network is the class variable. The first lower level contains all the features found to have high mutual information with the class variable. The second lower level are features that have high mutual information with the first lower level variable.
The input is a vector of class values for some number of spectra, a vector of IDs for those samples, and a data matrix with cases (samples) in rows, and features (mass positions in a mass spectrum) in columns, whose values are the intensity of the mass spectrum for that sample at that mass position. The values in the matrix are derived from the spectra via peak picking and alignment routines that are not included here. The values are allowed to be continuous but are later discretized automatically.
Many iterations of n-fold cross validation are allowed. The output consists of a frequency of discovery for the links described above. Thus the stability of the feature set can be estimated. The second level features are provided in order to discover modifications, satellites, and adducts of the parent protein. The function attempts to combine second level features withthe parent feature when the result provides more mutual information with the class variable.
The function also reports error rates under cross-validation, and provides options to average replicated samples, normalize spectra, and others.
Updated comments in main function for usability