Bioinformatics Toolbox Release Notes

R2016a

New Features, Bug Fixes, Compatibility Considerations

featurecount: Summarize sequence reads for large NGS datasets

The `featurecount` function lets you count the number of reads from the next-generation sequencing data that map to genomic features of interest using SAM and GTF file inputs. You can summarize the sequence reads at the different feature levels such as exon, transcripts, or genes. The function supports stranded sequencing protocols as well as single-end and paired-end read counting. Various choices for multiple mapping reads are available, suitable for both RNA-Seq and CHIP-Seq data analysis, as well as different methods for read (or fragment) disambiguation.

Functionality being changed or removed

The `Color` field of a structure for the `RowLabelsColor` and `ColumnLabelsColor` properties of `HeatMap` and `clustergram` objectsStill runsNot applicable

Independent color support for each row or column label has been removed. If there are multiple colors, the default color (black) is used as the label text color.

Set `LabelsWithMarkers` to `true` for colored markers instead of colored text. For details, see `HeatMap object`.

`featuresparse`Warns`featureparse`The function has been renamed to `featureparse`.
`featuresmap`Warns`featureview`The function has been renamed to `featureview`.
`affyinvarsetnorm`Still runsNot applicableThe function has been updated to realign with the behavior of `R2012b` or earlier releases.
`getTranscripts` and `getGenes` methods of the `GTFAnnotation` classStill runsNot applicableThese two methods have been updated to improve the computation time.

R2015b

Bug Fixes, Compatibility Considerations

Functionality being changed or removed

`cleave`Still runsNot applicable

When cleaving a sequence using trypsin, the function now applies trypsin's exception rules by default. As a result, the default output may differ from earlier releases.

To prevent the use of the default exceptions for trypsin, use an empty string as the exception rule when you run `cleave`.

To see the exception rules, check the table listed in `cleavelookup`.
`knnimpute`Still runsNot applicableThe function now errors if the number of nearest neighbors (`k`) is not a positive scalar integer.

Bug Fixes

R2014b

New Features, Bug Fixes, Compatibility Considerations

Small sample unpaired hypothesis tests for count data

You can perform an unpaired hypothesis test for count data (from high-throughput sequencing assays such as RNA-Seq or ChIP-Seq) with small numbers of samples or replicates using `nbintest`. For instance, you can use this function to decide if an observed difference in read counts between two conditions is significant for a given gene. The function assumes read counts follow a negative binomial or Poisson distribution.

Functions for navigating the Gene Transfer Format (GTF) hierarchy to assist with alternative gene splicing and isoform analyses

The following functions of the `GTFAnnotation` class help you navigate the GTF information hierarchy to perform alternative gene splicing and isoform analyses:

• `getSegments` returns a table of nonoverlapping segments built by flattening the transcripts.

• `getGenes` returns a table of unique genes referenced by exons.

• `getTranscripts` returns a table of unique transcripts referenced by exons.

• `getExons` returns a table of exons.

Attractor metagene algorithm for feature engineering using mutual information-based clustering

The `metafeatures` function uses the attractor metagene algorithm, which is an unsupervised learning algorithm for feature engineering using mutual information-based learning.

Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
`knnclassify`Still runs`fitcknn`Use `fitcknn` to fit a `knn` classification model and classify data using the `predict` function of `ClassificationKNN` object.
Default values for the `knn` classifier of `randfeatures`Still runsWhen you specify `'knn'` as the classifier, `randfeatures` now uses the following new defaults.
• The default function is `fitcknn`.

• For the `'ClassOptions'` name-value pair argument, the defaults are `{'Distance','corelation','NumNeighbors',5}`.

• For the `'PerformanceThreshold'` name-value pair argument, the default is `0.7`.

• For the `'ConfidenceThreshold` name-value pair argument, the default is `1`.

The `'Type'` name-value pair argument of `gethmmtree`WarnsTo download the `'seed'` tree, use `gethmmtree` without any extra input arguments. To obtain the `'full'` tree, you may use the `gethmmalignment` function to download the `'full'` alignment and build a tree using the `seqpdist` and `seqneighjoin` functions.Setting `'Type'` to `'seed'` or `'full'` is now ignored since the PFAM database no longer provides trees for the `'full'` alignment.

R2014a

Bug Fixes, Compatibility Considerations

Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations

`'R2012b'` name-value pair input argument for the `seqalignviewer` function

WarnsThe default version of `seqalignviewer` runs more robustly than the previous version (`R2012b`), and the default version is recommended to use. This name-value pair is intended only for customers who need the previous version.See the Compatibility Considerations subheading in Select and move behaviors in the Sequence Alignment app.
`bowtieread` functionErrors
• If you have the original FASTQ (or FASTA) file, use the `bowtie` function (for UNIX® and Mac users only) to remap your files. This will create BAM files that are compatible with the toolbox.

• If you have old BOWTIE files without the sequence files, you can read the files using `textscan`.

When using other BOWTIE mapper/aligner programs, set appropriate option(s) to create either a SAM or BAM output file. Then use the `Biomap` object or the `samread` or `bamread` function to access the mapped short reads.

R2013b

Bug Fixes, Compatibility Considerations

Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
`Index` name-value pair argument as input to the `bamread` functionErrorsRemove instances of the `Index` name-value pair argument. See the Compatibility Considerations subheading in Increased performance when reading BAM files.
`'average'` as a choice for the `Method` input argument to the `seqneighjoin` functionErrors`'equivar'`Replace instances of `'average'` as an input to `seqneighjoin` with `'equivar'`.

Changes to tool names:

Errors
• Replace instances of `multialignviewer` with `seqalignviewer`.

• Replace instances of `phytreetool` with `phytreeviewer`.

• Replace instances of `seqtool` with `seqviewer`.

`'natural'` as a choice for the `Output` name-value pair input argument to these functions:

Errors`'linear'`Replace instances of `'natural'` as the value of the `Output` name-value pair input argument with `'linear'` for these functions:
• `affyrma`

• `affygcrma`

• `rmasummary`

R2013a

New Features, Bug Fixes, Compatibility Considerations

Saving to FASTQ, FASTA, SAM, and BAM files from a `BioMap` object

You can write the information of any `BioRead/BioMap` object to a file using the `write` function of the object.

Sorting unordered BAM files using `BioMap` objects

You can pass an unordered BAM file to a `BioMap` constructor, which then creates a new ordered file.

Quality control plots for unmapped short-read data

You can obtain quality control plots for short-read data using the `plotSummary` function of the `BioRead` object. The function creates a figure containing six plots that present summary statistics of the data stored in a FASTQ file.

In addition, you can use the `BioReadQualityStatistics` object to:

• Parse FASTQ files without creating a `BioRead` object.

• Interact with the quality data to compare different data sets or filtering options.

• Create customized plots.

Select and move behaviors in the Sequence Alignment app

You can select a block from aligned sequences and move it horizontally if gaps are available.

Compatibility Considerations

To use the previous version of `seqalignviewer`, set the name-value pair argument `'R2012b'` to `true`.

Random access of annotation object data, for consistency with BioMap object data access

You can have random access to data in `GFFAnnotation` and `GTFAnnotation` objects by using these functions:

• `getSubset`

• `getData`

• `getIndex`

Functionality being changed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
`bowtieread` functionWarns`bowtie` function for UNIX and Mac users.When using other BOWTIE mapper/aligner programs, set appropriate option(s) to create either a SAM or BAM output file. Then use the `BioMap` object or the `samread` or `bamread` function to access the mapped short reads.

`'natural'` as a choice for the `OutputValue` name-value pair input argument to these functions:

Warns`'linear'`

Replace instances of `'natural'` as the value of the `OutputValue` name-value pair input argument with `'linear'` for these functions:

`'R2012b'` name-value pair input argument for the `seqalignviewer` function.

Still runsSee the Compatibility Considerations subheading in Select and move behaviors in the Sequence Alignment app.

R2012b

New Features, Bug Fixes, Compatibility Considerations

Multiple reference sequences in `BioMap` objects

You can now store information about short reads mapped to multiple references in a `BioMap` object. The new `SequenceDictionary` property contains the catalog of references available in a `BioMap` object.

Compatibility Considerations

For `BioMap` objects created using R2012b:

• The `Reference` property is now a cell string of length `obj.NSeqs`, for both `BioMap` objects with multiple references in the `SequenceDictionary` and objects with only one reference. For `BioMap` objects created before R2012b—which can only have a single reference—the `Reference` property is a string.

• `BioMap` methods that access data by genomic ranges now accept `BioMap` objects with multiple references. To use these methods, you must specify the reference or references to operate on. The affected methods are:

Mapping single and paired-end short read data to reference genomes

Two new functions generate an index and map short reads to a reference sequence using the Burrows-Wheeler transform.

 Note:   `bowtiebuild` and `bowtie` run on Mac and UNIX platforms only.

Increased performance when reading BAM files

The `bamread` function no longer requires the `Index` name-value pair argument to provide index information from a structure in the MATLAB® workspace. Indexing happens automatically without a decrease in performance.

Compatibility Considerations

The `Index` name-value pair argument as input to the `bamread` function will be removed in a future release. There is no need to replace it, only remove it.

Name changes for `multialignviewer`, `phytreetool`, and `seqtool` tools

Three tools in Bioinformatics Toolbox™ are renamed. The old names return a warning and will be removed in a future release.

Compatibility Considerations

The choice of `'average'` for the `Method` input argument to the `seqneighjoin` function warns and will be removed in a future release. It is replaced with `'equivar'`.

Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
`Index` name-value pair argument as input to the `bamread` functionWarnsRemove instances of the `Index` name-value pair argument. See the Compatibility Considerations subheading in Increased performance when reading BAM files.
`'average'` as a choice for the `Method` input argument to the `seqneighjoin` functionWarns`'equivar'`Replace instances of `'average'` as an input to `seqneighjoin` with `'equivar'`

Changes to tool names:

• `multialignviewer`

• `phytreetool`

• `seqtool`

Warns
• Replace instances of `multialignviewer` with `seqalignviewer`

• Replace instances of `phytreetool` with `phytreeviewer`

• Replace instances of `seqtool` with `seqviewer`

`'natural'` as a choice for the `Output` name-value pair input argument to these functions:

Still runs`'linear'`Replace instances of `'natural'` as the value of the `Output` name-value pair input argument with `'linear'` for these functions:
• `affyrma`

• `affygcrma`

• `rmasummary`

`setName`, `getName`, and `getNSeqs` methods of `BioRead` and `BioMap` objects

Still runsDot notation

Replace instances of:
`setName(BioObj, name)`
with:
```BioObj.Name = name```

Replace instances of:
`getName(BioObj)`
with:
`BioObj.Name`

Replace instances of:
`getNSeqs(BioObj)`
with:
`BioObj.NSeqs`

`isequalwithequalnans` for DataMatrix objectStill runs`isequaln`Replace instances of `isequalwithequalnans` with `isequaln`
`princomp` for DataMatrix objectStill runs`pca`Replace instances of `princomp` with `pca`

R2012a

New Features, Bug Fixes, Compatibility Considerations

Update to Jmol Functions

The following functions are updated to use Version 12.0.5 of the Jmol molecule viewer:

Enhancements to Objects for NGS Data

You now can construct and access information in a `BioMap` object (created from a BAM-formatted file) more efficiently. Filtering, binning, counting, and base-coverage calculation operations are now faster because source file scanning is no longer needed.

When using the `BioIndexedFile`, `BioRead`, or `BioMap` constructor to create an object from a FASTA-, FASTQ-, or SAM-formatted file, the source file no longer has a size limit of 4 GB.

Compatibility Considerations

The `BioRead` and `BioMap` constructors are changed as follows:

• When creating a `BioMap` object from a SAM- or BAM-formatted file containing multiple reference sequences, the `BioMap` constructor by default uses the first reference listed in the Dictionary of the source file.

• The following syntaxes, which take a `BioIndexedFile` object as an input, have been removed:

`BioReadobj = BioRead(BioIFobj)`

`BioMapobj = BioMap(BioIFobj)`

There is no longer a need to use this syntax, as you can create an indexed object directly from the SAM- or BAM-formatted source file. See Representing Sequence and Quality Data in a BioRead Object or Representing Sequence, Quality, and Alignment/Mapping Data in a BioMap Object.

• The following syntaxes have been removed:

`BioReadobj = BioRead('SAMFile', File)`

`BioReadobj = BioRead('FASTQFile', File)`

`BioReadobj = BioRead(File)`

• The following syntax has been removed:

`BioMapobj = BioMap('SAMFile', File)`

`BioMapobj = BioMap(File)`

• The `Indexed` name-value pair argument as input to the `getSubset` method of the `BioRead` or `BioMap` class has been removed. Use the `InMemory` name-value pair argument instead.

• The `'SubsetRef'` name-value pair argument of the `BioMap` constructor has been removed. Use the `'SelectRef'` name-value pair argument instead.

• The `getCoverage` method of the `BioMap` class has been removed. Use the `getBaseCoverage`, `getCounts`, or `getIndex` method instead.

Enhancements to the NGS Browser

When you import short-read alignment data from a SAM- or BAM-formatted file into the NGS Browser:

• SAM-formatted files no longer have a size limit of 4 GB. Now, the size of both SAM- and BAM-formatted files is limited only by your operating system and available memory.

• The SAM- or BAM-formatted file can contain alignment data for multiple references. When importing short reads, you can select one reference sequence from those listed in the file header, or scan the file to see a list of the actual reference sequences and the aligned read count for each reference sequence.

Compatibility Considerations

• The `aacount` and `basecount` functions no longer accept the `'Others'` name-value pair. Use the `'Ambiguous'` or `'Gaps'` name-value pair instead.

• The `aacount` and `basecount` functions no longer accept the `'Structure'` name-value pair. Use the `'Ambiguous'` name-value pair with either `'ignore'` or `'warn'` instead.

• The `aacount`, `basecount`, `codoncount`, and `dimercount` functions no longer include an `Others` field in the output structure. Use the `Ambiguous` field instead.

Demo for DNA Methylation Analysis

The following demo describes how to identify and compare potential cancer-related methylations at the base-pair level:

 Exploring Genome-Wide Differences in DNA Methylation Profiles

Functionality Being Changed or Removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
`SubsetRef` name-value pair argument as input to the `BioMap` constructor functionErrors`'SelectRef'` name-value pair argumentReplace instances of `SubsetRef` with `SelectRef`.
`BioIndexedFile` object as input to the `BioRead` or `BioMap` constructor functionErrorsA FASTQ-, SAM-, or BAM-formatted fileSee the Compatibility Considerations subheading in Enhancements to Objects for NGS Data.
`'FASTQFile', File` pair as input to the `BioRead` constructorErrors`File`See the Compatibility Considerations subheading in Enhancements to Objects for NGS Data.
`'SAMFile', File` pair as input to the `BioRead` or `BioMap` constructorErrors`File`See the Compatibility Considerations subheading in Enhancements to Objects for NGS Data.
`Indexed` name-value pair argument as input to the `getSubset` method of the `BioRead` or `BioMap` classErrors`InMemory` name-value pair argumentReplace instances of `'Indexed', false` pair with `'InMemory', true` pair.
`getCoverage` method of the `BioMap` classErrors`getBaseCoverage`, `getCounts`, or `getIndex` methodReplace all instances of `getCoverage` with `getBaseCoverage`, `getCounts`, or `getIndex`.
`'Others'` name-value pair as input to `aacount` and `basecount` functionsErrors`'Ambiguous'` or `'Gaps'` name-value pair as input to `aacount` and `basecount` functionsReplace instances of `'Others'` with `'Ambiguous'` or `'Gaps'`.
`'Structure'` name-value pair as input to `aacount` and `basecount` functionsErrors`'Ambiguous'` name-value pair with either `'ignore'` or `'warn'` as input to `aacount` and `basecount` functionsReplace instances of `'Structure'` with `'Ambiguous'` paired with `'ignore'` or `'warn'`.
`Others` field from the output structure returned by `aacount`, `basecount`, `codoncount`, or `dimercount`.Errors`Ambiguous` fieldReplace instances of `Others` (as an input) with `Ambiguous`.

R2011b

New Features, Bug Fixes, Compatibility Considerations

Visualizing and Investigating Short-Read Alignments and Feature Annotations in the NGS Browser

The NGS Browser lets you visually verify and investigate the alignment of short-read sequences to a reference sequence. For more information, see Visualizing and Investigating Short-Read Alignments and `ngsbrowser`.

Objects for Genomic Feature Annotations

Following are new classes for objects that contain genomic feature annotations for nucleotide sequences:

These classes have properties and methods that you can use to explore, access, filter, and manipulate all or a subset of the feature annotation data. For more information, see Storing and Managing Feature Annotations in Objects.

Enhancements to BioRead and BioMap Objects

You can now construct a `BioMap` object from a BAM-formatted file.

When constructing these objects from source files, by default the data is indexed, which is more efficient for construction and data access. The `BioRead` and `BioMap` constructors now include an `IndexDir` name-value pair argument, which lets you specify the location of the index file.

You can still construct these objects with the data in memory, which lets you modify all the properties of the objects. The `BioRead` and `BioMap` constructors now include an `InMemory` name-value pair argument, which lets you construct the objects with the data in memory.

For details on the previous enhancements, see Storing and Managing Short-Read Sequence Data in Objects.

Compatibility Considerations

The `BioRead` and `BioMap` constructors are changed as follows:

• The following syntaxes that take a `BioIndexedFile` object as an input will be removed in a future release:

`BioReadobj = BioRead(BioIFobj)`

`BioMapobj = BioMap(BioIFobj)`

There is no longer a need to use this syntax, as you can create an indexed object directly from the SAM- or BAM-formatted source file. See Representing Sequence and Quality Data in a BioRead Object or Representing Sequence, Quality, and Alignment/Mapping Data in a BioMap Object.

• The following syntaxes will be removed in a future release:

`BioReadobj = BioRead('SAMFile', File)`

`BioReadobj = BioRead('FASTQFile', File)`

`BioReadobj = BioRead(File)`

• The following syntax will be removed in a future release:

`BioMapobj = BioMap('SAMFile', File)`

`BioMapobj = BioMap(File)`

• The `Indexed` name-value pair argument as input to the `getSubset` method of the `BioRead` or `BioMap` class will be removed in a future release. Use the `InMemory` name-value pair argument instead.

• The `'SubsetRef'` name-value pair argument of the `BioMap` constructor will be removed in a future release. Use the `'SelectRef'` name-value pair argument instead.

• If you use the `getSubset` method of a `BioRead` or `BioMap` object, and specify the same element more than once, the method errors, even if the object is in memory.

Enhancements to the saminfo and baminfo Functions

The `saminfo` and `baminfo` functions now include a `ScanDictionary` name-value pair argument, which controls the return of the reference names and the number of reads aligned to each reference from a SAM- or BAM-formatted file in new fields, `ScannedDictionary` and `ScannedDictionaryCount`. This information is needed when constructing a `BioMap` object from a file with multiple reference sequences. For more information, see Constructing a BioMap Object from a SAM- or BAM-Formatted File.

Compatibility Considerations

The `Reference` field is no longer returned in the output structure for `baminfo`. The `ScannedDictionary` field now includes names of the reference sequences.

Conversion of Error and Warning Message Identifiers

For R2011b, some error and warning message identifiers have changed in Bioinformatics Toolbox.

Compatibility Considerations

If you have scripts or functions that use message identifiers that changed, you must update the code to use the new identifiers. Typically, message identifiers are used to turn off specific warning messages, or in code that uses a try/catch statement and performs an action based on a specific error identifier.

For example, the `Bioinfo:nwalign:InvalidScoringMatrix` identifier has changed to `bioinfo:nwalign:InvalidScoringMatrix`. If your code checks for `Bioinfo:nwalign:InvalidScoringMatrix`, you must update it to check for `bioinfo:nwalign:InvalidScoringMatrix` instead.

To determine the identifier for a warning, run the following command just after you see the warning:

 `[MSG,MSGID] = lastwarn;`

The preceding command saves the message identifier to the variable MSGID.

To determine the identifier for an error, run the following command just after you see the error:

 `exception = MException.last;` `MSGID = exception.identifier;`
 Note:   Warning messages indicate a potential issue with your code. While you can turn off a warning, a suggested alternative is to change your code so it runs warning-free.

Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
`'SubsetRef'` name-value pair argument as input to `BioMap` constructor functionWarns`'SelectRef'` name-value pair argumentSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
`BioIndexedFile` object as input to the `BioRead` or `BioMap` constructor functionWarnsA FASTQ-, SAM-, or BAM-formatted fileSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
`'FASTQFile', File` pair as input to the `BioRead` constructorWarns`File`See the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects
`'SAMFile', File` pair as input to the `BioRead` or `BioMap` constructorWarns`File`See the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
`Indexed` name-value pair argument as input to `getSubset` method of the `BioRead` or `BioMap` classWarns`InMemory` name-value pair argumentSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
`Reference` field of structure returned by `baminfo`Errors`ScannedDictionary` fieldSee the Compatibility Considerations subheading in Enhancements to the saminfo and baminfo Functions.

R2011a

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

The following functions have a new field, `FilePath`, in their output structure:

The `fastainfo` function has two additional fields in its output structure: `Header` and `Length`.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.6, the `aacount` and `basecount` functions still allowed `'Others'` and `'Structure'` name-value pairs, but displayed a warning.

In Bioinformatics Toolbox Version 3.7, the `aacount` and `basecount` functions do not allow `'Others'` and `'Structure'` name-value pairs, and return an error if you use them. Now you must use the `'Ambiguous'` and `'Gaps'` name-value pairs, which specify whether to count or ignore ambiguous characters and gaps, as well as specify how to count ambiguous characters, and whether to display a warning.

Updates to the BioIndexedFile Class, Properties, and Methods

The following name-value pairs of the `BioIndexedFile` constructor function are renamed:

• `MapKeys` is now `IndexedByKeys`.

• `MemMapIndex` is now `MemoryMappedIndex`.

 Note:   The former name-value pairs are still valid for Bioinformatics Toolbox Version 3.7 (R2011a).

The `MemoryMappedIndex` property of the `BioIndexedFile` class is now editable, which lets you load and unload file indices in memory.

The `BioIndexedFile` class includes the following new methods:

• `getDictionary` — Retrieve reference sequence names from SAM-formatted source file associated with `BioIndexedFile` object.

• `getSubset` — Create object containing subset of elements from `BioIndexedFile` object.

The `BioMap` constructor includes a new name-value pair, `SubsetRef`, which lets you specify one reference sequence in the input argument (`BioIndexedFile` object, SAM-formatted file, or structure) when constructing the `BioMap` object.

The following method of the `BioRead` and `BioMap` classes is updated:

 `getSubset` — Create object containing subset of elements from object. Updated with addition of the `Indexed` name-value pair, which lets you use the `BioIndexedFile` object when creating a new object, thus saving memory. This name-value pair is ignored if your `BioRead` or `BioMap` object was not created from a `BioIndexedFile` object.

Following are new methods of the `BioMap` class:

The `getCoverage` method of the `BioMap` class is being removed in a future release. Use the `getBaseCoverage`, `getCounts`, and `getIndex` methods instead.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.6 and earlier, the `BioMap` class included a `getCoverage` method, which computes read coverage in a `BioMap` object.

In Bioinformatics Toolbox Version 3.7, the `getCoverage` method still runs, but displays a warning. Now use the `getBaseCoverage`, `getCounts`, and `getIndex` methods of the `BioMap` class.

Demos for High-Throughput Sequence Analysis

Following are two new high-throughput sequence analysis demos:

Support Vector Machine (SVM) Functions

The functionality of the `svmsmoset` function is incorporated into the `svmtrain` and `statset` functions. Although `svmsmoset` is still valid, it is no longer documented.

The `svmtrain` function has been updated:

• The function can now handle `NaN` values in the training matrix input and performs more checks of parameters you supply.

• The function now includes Sequential Minimal Optimization (SMO) functionality plus four new name-value pairs: `kernelcachelimit`, `kktviolationlevel`, `options`, and `tolkkt`.

• The default training method is `SMO`, even if you have Optimization Toolbox™ installed.

• The `QuadProg_Opts` and `SMO_Opts` name-value pairs have been replaced by the `options` name-value pair. Although the former name-value pairs are still valid, the recommended ways to perform quadratic programming (QP) training and SMO training are summarized in the following bullets.

• The recommended way to include `QP` options for `svmtrain` is to use the `QP` training method and use the new `options` name-value pair. For the `options` value, use a structure you create with `optimset`.

• The recommended way to include `SMO` options for `svmtrain` is to use the default `SMO` training method and use the new `kernelcachelimit`, `kktviolationlevel`, `options`, and `tolkkt` name-value pairs. For the `options` value, use a structure you create with the `statset` function and its `Display` and `MaxIter` name-value pairs.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.6 and earlier, if you had Optimization Toolbox installed, `QP` was the default training method for the `svmtrain` function. Now the default training method is `SMO`.

Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
`'Others'` name-value pair as input to `aacount` and `basecount` functionsErrors`'Ambiguous'` or `'Gaps'` name-value pair as input to `aacount` and `basecount` functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
`'Structure'` name-value pair as input to `aacount` and `basecount` functionsErrors`'Ambiguous'` name-value pair with either `'ignore'` or `'warn'` as input to `aacount` and `basecount` functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
`getCoverage` method of `BioMap` classWarns`getBaseCoverage`, `getCounts`, and `getIndex` methodsSee the Compatibility Considerations subheading in Updates to BioRead and BioMap Classes and Methods.
`svmsmoset` functionStill runs`svmtrain` and `statset` functions`svmsmoset` is not recommended. Use `svmtrain` and `statset` instead.

R2010b

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

The following new functions let you read Bowtie- and SOAP-formatted files:

Sequence Conversion Functions

The following new functions support CIGAR strings for sequence mapping and alignment:

• `align2cigar` — Convert aligned sequences to corresponding Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings.

• `cigar2align` — Convert unaligned sequences to aligned sequences using Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings

Sequence Statistics Functions

The following functions are updated:

• `aacount` — Count amino acids in sequence. Updated by adding the `Ambiguous` property, which lets you specify how to count ambiguous amino acid characters. Updated by adding the `Gaps` property, which lets you specify to count or ignore gaps. The `Others` and `Structure` properties still work, but display a warning, indicating that they will be invalid in future versions of Bioinformatics Toolbox. The `Others` field in the output structure is replaced by the `Ambiguous` field.

• `basecount` — Count nucleotides in sequence. Updated by adding the `Ambiguous` property, which lets you specify how to count ambiguous nucleotide characters. Updated by adding the `Gaps` property, which lets you specify to count or ignore gaps. The `Others` and `Structure` properties still work, but display a warning, indicating that they will be invalid in future versions of Bioinformatics Toolbox. The `Others` field in the output structure is replaced by the `Ambiguous` field.

• `codonbias` — Calculate codon frequency for each amino acid coded for in nucleotide sequence. Updated by adding the `Ambiguous` property, which lets you specify how to count codons containing ambiguous nucleotide characters.

• `codoncount` — Count codons in nucleotide sequence. Updated by adding the `Ambiguous` property, which lets you specify how to count codons containing ambiguous nucleotide characters. Updated by adding the `GeneticCode` property, which lets you overlay a grid that groups the synonymous codons on the heat map of the codon counts. The `Others` field in the output structure is replaced by the `Ambiguous` field.

• `dimercount` — Count dimers in nucleotide sequence. Updated by adding the `Ambiguous` property, which lets you specify how to count dimers containing ambiguous nucleotide characters. The `Others` field in the output structure is replaced by the `Ambiguous` field.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.5 and earlier, the `aacount` and `basecount` functions included `'Others'` and `'Structure'` property name/property value pairs, which let you specify how to count ambiguous characters and gaps, and whether to display a warning. These functions also returned a structure with an `Others` field.

In Bioinformatics Toolbox Version 3.6, the `aacount` and `basecount` functions still allow `'Others'` and `'Structure'` property name/property value pairs, but display a warning. Now the `aacount` and `basecount` functions include the `'Ambiguous'` and `'Gaps'` property name/property value pairs, which specify whether to count or ignore ambiguous characters and gaps, as well as specify how to count ambiguous characters, and whether to display a warning. These functions now return a structure with an `Ambiguous` field, which replaces the `Others` field.

In Bioinformatics Toolbox Version 3.6, the `codoncount` and `dimercount` functions return a structure with an optional `Ambiguous` field, which replaces the `Others` field.

Pairwise Sequence Alignment Functions

The following function is updated:

• `nwalign` — Globally align two sequences using Needleman-Wunsch algorithm. Updated to support semiglobal or "glocal" alignments by addition of `Glocal` property.

Multiple Sequence Alignment Functions

The following new functions support CIGAR strings for sequence mapping and alignment:

• `align2cigar` — Convert aligned sequences to corresponding Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings.

• `cigar2align` — Convert unaligned sequences to aligned sequences using Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings

The following functions are updated:

• `multialign` — Align multiple sequences using progressive method. Updated to include a new property, `'UseParallel'`, which lets you use `parfor`-loops and compute in parallel mode.

• `seqpdist` — Calculate pairwise distance between sequences. Updated to include a new property, `'UseParallel'`, which lets you use `parfor`-loops and compute in parallel mode.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.4 and earlier, the `multialign` and `seqpdist` functions included `'JobManager'` and `'WaitInQueue'` property name/property value pairs, which let you process in parallel, including support for the MATLAB scheduler for clusters.

In Bioinformatics Toolbox Version 3.5, the `multialign` and `seqpdist` functions allowed the `'JobManager'` and `'WaitInQueue'` property name/property value pairs, but displayed a warning.

In Bioinformatics Toolbox Version 3.6, the `multialign` and `seqpdist` functions error if you use the `'JobManager'` or `'WaitInQueue'` property name/property value pair. Instead they include the `'UseParallel'` property name/property value pair, which lets you process in parallel, including support for:

• Local workers for multicore machines

• The MATLAB scheduler for clusters

• Third-party schedulers for clusters

Updates to BioMap Class, Methods, and Properties

You can now create a `BioMap` object from a MATLAB structure containing sequence and alignment information, returned by the `bamread` function.

The following method of the `BioMap` class is updated:

 `getCoverage` — Compute read coverage in BioMap object. Updated to return the coverage of multiple regions of the reference sequence.

The `BioMap` class includes the following new methods:

The `BioMap` class includes the following new property:

• `MatePosition` — Positions of the mates for all read sequences represented in the `BioMap` object.

Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
`'Others'` property name/property value pair as input to `aacount` and `basecount` functionsWarns`'Ambiguous'` or `'Gaps'` property name/property value pair as input to `aacount` and `basecount` functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
`'Structure'` property name/property value pair as input to `aacount` and `basecount` functionsWarns`'Ambiguous'` property name/property value pair with either `'ignore'` or `'warn'` as input to `aacount` and `basecount` functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
`'JobManager'` property name/property value pair as input to `multialign` and `seqpdist` functionsErrors`'UseParallel'` property name/property value pair as input to `multialign` and `seqpdist` functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions.
`'WaitInQueue'` property name/property value pair as input to `multialign` and `seqpdist` functionsErrors`'UseParallel'` property name/property value pair as input to `multialign` and `seqpdist` functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions.

The following properties of a clustergram object:

• `ColumnMarker`

• `Impute`

• `Ratio`

• `RowMarker`

• `SymmetricRange`

Errors

New properties of a clustergram object:

• `ColumnGroupMarker`

• `ImputeFun`

• `DisplayRatio`

• `RowGroupMarker`

• `Symmetric`

See Clustergram Methods and Properties.
`'Dimension'` property name/property value pair as input to `clustergram` functionErrors`'Cluster'` property name/property value pair as input to `clustergram` functionSee the Compatibility Considerations subheading in Microarray Functions.
`'Pdist'` property name/property value pair as input to `clustergram` functionErrorsEither `'RowPdist'` or `'ColumnPdist'` property name/property value pair as input to `clustergram` functionSee the Compatibility Considerations subheading in Microarray Functions.
`pdbplot` functionErrors`molviewer` functionSee the Compatibility Considerations subheading in Protein Analysis and Sequence Utilities Functions.
`getpir` and `pirread` functionsErrorsUse `getembl`, `getgenpept`, and `getpdb` to retrieve protein sequences from Web databases. Use `emblread`, `genpeptread`, and `pdbread` to read protein sequence data. See Data Formats and Databases Functions.
`mamadnorm` and `mameannorm` functionsErrors`manorm` function

R2010a

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

The following functions are new:

The following functions are updated:

• `fastaread` — Read data from FASTA file. Updated to allow trimming of the headers in the output structure by addition of `TrimHeaders` property.

• `fastqread` — Read data from FASTQ file. Updated to allow trimming of the headers in the output structure by addition of `TrimHeaders` property.

• `phytreeread` — Read phylogenetic tree file. Updated to return a second output containing bootstrap values for tree nodes.

Pairwise Sequence Alignment Functions

The following function is updated:

• `fastaread` — Read data from FASTA file. Updated to allow trimming of the headers in the output structure by addition of `TrimHeaders` property.

Multiple Sequence Alignment Functions

The following functions are updated:

• `fastaread` — Read data from FASTA file. Updated to allow trimming of the headers in the output structure by addition of `TrimHeaders` property.

• `multialign` — Align multiple sequences using progressive method. Updated to include a new property, `'UseParallel'`, which lets you use `parfor`-loops and compute in parallel mode.

• `seqpdist` — Calculate pairwise distance between sequences. Updated to include a new property, `'UseParallel'`, which lets you use `parfor`-loops and compute in parallel mode.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.4 and earlier, the `multialign` and `seqpdist` functions included `'JobManager'` and `'WaitInQueue'` property name/property value pairs, which let you process in parallel, including support for the MATLAB scheduler for clusters.

In Bioinformatics Toolbox Version 3.5, the `multialign` and `seqpdist` functions do not include the include the `'JobManager'` and `'WaitInQueue'` property name/property value pairs. Instead they include the `'UseParallel'` property name/property value pair, which lets you process in parallel, including support for:

• Local workers for multicore machines

• The MATLAB scheduler for clusters

• Third-party schedulers for clusters

Phylogenetic Tree Tools and Methods

The following functions are updated:

• `phytreeread` — Read phylogenetic tree file. Updated to return a second output containing bootstrap values for tree nodes.

• `seqpdist` — Calculate pairwise distance between sequences. Updated to include a new property, `'UseParallel'`, which lets you use `parfor`-loops and compute in parallel mode.

BioIndexedFile Function, Object, Methods, and Properties

Following is a new class for an object that lets you extract information from large multi-entry text files.

This class has properties and methods that are useful for accessing, reading, and parsing data from a large source file.

BioRead Function, Object, Methods, and Properties

Following is a new class for an object that contains data from short-read sequences, including sequence headers, nucleotide sequences, and the quality scores for the sequences.

This class has properties and methods that you can use to explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or sequence alignment and mapping.

BioMap Function, Object, Methods, and Properties

Following is a new class for an object that contains data from short-read sequences, including sequence headers, read sequences, quality scores for the sequences, and data about alignment and mapping to a single reference sequence.

This class has properties and methods that you can use to explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or viewing the data.

Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
`'JobManager'` property name/property value pair as input to `multialign` and `seqpdist` functionsWarns`'UseParallel'` property name/property value pair as input to `multialign` and `seqpdist` functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions
`'WaitInQueue'` property name/property value pair as input to `multialign` and `seqpdist` functionsWarns`'UseParallel'` property name/property value pair as input to `multialign` and `seqpdist` functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions.

R2009b

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

Following are new functions:

The following functions are updated:

• `affyread` — Read microarray data from Affymetrix® GeneChip® file. Updated to read cell layout files (CLF) and background probe (BGP) files.

• `multialignwrite` — Write multiple alignment to file. Updated to write a file in either ClustalW ALN format (default) or MSF format.

Protein Analysis Functions

Following is a new function:

The following function is updated:

• `cleave` — Cleave amino acid sequence with enzyme. Updated to let you specify an exception to the enzyme's cleavage rule and to let you specify a maximum number of missed cleavage sites. Also updated to return the number of missed cleavage sites per peptide fragment.

Data Visualization Functions

The following functions are updated:

• `microplateplot` — Display visualization of microtiter plate. Display updated so that first row of input matrix appears at the top and is labeled row A. Updated to return the handle to the axes of the plot, which lets you reverse the order or the rows or columns in the display. Updated to include a new property, `'TextFontSize'`, which lets you control the font size of text labels.

• `multialignviewer` — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• `showalignment` — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.3, the default layout for the plot returned by `microplateplot` displayed the first row of the input matrix at the bottom.

In Bioinformatics Toolbox Version 3.4, the plot displays the first row of the input matrix at the top.

Sequence Statistics Functions

The following function is updated:

Sequence Utility Functions

The following functions are updated:

• `cleave` — Cleave amino acid sequence with enzyme. Updated to let you specify an exception to the enzyme's cleavage rule and to let you specify a maximum number of missed cleavage sites. Also updated to return the number of missed cleavage sites per peptide fragment.

• `rebasecuts` — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 904 of REBASE®, the Restriction Enzyme Database.

• `restrict` — Split nucleotide sequence at restriction site. Updated to use Version 904 of REBASE, the Restriction Enzyme Database.

Sequence Visualization Functions

The following functions are updated:

• `multialignviewer` — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• `showalignment` — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

Pairwise Sequence Alignment Functions

Following is a new function:

The following functions are updated:

• `multialignviewer` — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• `showalignment` — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

Multiple Sequence Alignment Functions

The following functions are updated:

• `multialignviewer` — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• `multialignwrite` — Write multiple alignment to file. Updated to write a file in either ClustalW ALN format (default) or MSF format.

• `showalignment` — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

Phylogenetic Tree Tools and Methods

The Phylogenetic Tree Tool includes the following updates:

• Includes two new circular print renderings: equal angle and equal daylight

• Updates to Tools menu, including commands to select specific branch and leaf nodes based on different criteria, such as distance, common ancestors, leaves only, and descendants.

Following is a new method:

The following method is updated:

• `plot` — Draw phylogenetic tree. Updated to include two new algorithms for circular layouts: equal angle and equal daylight. Updated to let you rotate circular trees from 0 through 360 degrees and to rotate leaf labels of circular trees so that the text is aligned to the root node. Updated the `'LeafLabels'` property so that it defaults to `true` for circular layouts and to `false` for square and angular layouts.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.3, the `'LeafLabels'` property defaulted to `true` when the `'Type'` property was `'square'` or `'angular'`, and to `false` when the `'Type'` property was `'radial'`.

In Bioinformatics Toolbox Version 3.4, the `'LeafLabels'` property defaults to `false` when the `'Type'` property is `'square'` or `'angular'`, and to `true` when the `'Type'` property is `'radial'`.

Clustergram Window

The Clustergram window has two new toolbar buttons:

• Annotate button — Shows and hides intensity values for each area of the heat map.

• Show Dendrogram button — Shows and hides the dendrograms.

Clustergram Methods and Properties

The following are new methods of a clustergram object:

The following properties of a clustergram object are renamed:

• `ColumnMarker` is now `ColumnGroupMarker`.

• `Impute` is now `ImputeFun`.

• `Ratio` is now `DisplayRatio`.

• `RowMarker` is now `RowGroupMarker`.

• `SymmetricRange` is now `Symmetric`.

 Note:   The former property names are still valid for Bioinformatics Toolbox version 3.4 (R2009b).

Following is a new property related to the display of dendrogram tree diagrams in a clustergram object:

• `ShowDendrogram`

The following are new properties related to the display of row and column labels of a clustergram object:

• `RowLabels`

• `ColumnLabels`

• `RowLabelsLocation`

• `ColumnLabelsLocation`

• `RowLabelsColor`

• `ColumnLabelsColor`

• `LabelsWithMarkers`

• `RowLabelsRotate`

• `ColumnLabelsRotate`

The following are new properties related to annotating data in a clustergram object:

• `Annotate`

• `AnnotColor`

• `AnnotPrecision`

When using clustergram properties with the `get` and `set` methods, the property names are now case sensitive.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.3, the property names of a clustergram object were not case sensitive when used with the `get` and `set` methods.

In Bioinformatics Toolbox Version 3.4, property names of a clustergram object are case sensitive.

HeatMap Object, Methods, and Properties

Following is a new object:

• HeatMap object — Object containing matrix and heat map display properties.

The following are methods of a HeatMap object:

A HeatMap object includes many properties that control the creation of the heat map, row and column labels, axes labels, title, and data annotation.

DataMatrix Methods

Following is a new method of a DataMatrix object:

Microarray Functions, Objects, Methods, and Properties

Following are new functions to create objects containing data from a microarray gene expression experiment:

These objects have properties and methods that are useful for viewing and analyzing the data or a subset of the data.

Mass Spectrometry Functions

Following are new functions:

The following function is updated:

• `mspeaks` — Convert raw peak data to peak list (centroided data). Updated to include a new property, `'Style'`, which lets you specify the style for marking the peaks in the plot.

Demos for Sequence Analysis

Following are two new sequence analysis demos:

Demos for Microarray Analysis

Following are two new microarray analysis demos:

R2009a

New Features, Bug Fixes, Compatibility Considerations

Data Visualization Functions

Following is a new function:

Sequence Utility Functions

The following functions are updated:

• `rebasecuts` — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 811 of REBASE, the Restriction Enzyme Database.

• `restrict` — Split nucleotide sequence at restriction site. Updated to use Version 811 of REBASE, the Restriction Enzyme Database.

Sequence Conversion Functions

The following function is updated:

• `nt2aa` — Convert nucleotide sequence to amino acid sequence. Updated to include a new property, `'ACGTOnly'`, to support ambiguous and unknown nucleotide characters.

Bioanalytic and Mass Spectrometry Functions

The following functions are updated to use with data from any separation technique, including mass spectrometry:

Microarray Functions

The following functions are updated:

• `cghcbs` — Perform circular binary segmentation (CBS) on array-based comparative genomic hybridization (aCGH) data. Updated to include an optional heuristic stopping rule to improve performance.

• `ilmnbslookup` — Look up Illumina® BeadStudio™ target (probe) sequence and annotation information. Updated to read Illumina microRNA array annotation files.

• `ilmnbsread` — Read gene expression data exported from Illumina BeadStudio software. Updated to read Illumina microRNA array data files.

• `mattest` — Perform two-sample t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated with new property, `'VarType'`, which lets you specify equal or unequal (default) variance for the test.

Compatibility Considerations

A compatibility consideration related to the `mattest` function was introduced in Bioinformatics Toolbox Version 3.2, but not reported in the Release Notes for Version 3.2 (R2008b). Specifically, in Bioinformatics Toolbox Version 3.1 and earlier, the `mattest` function used equal variance for the test. In Bioinformatics Toolbox Version 3.2, the `mattest` function starting using unequal variance for the test.

Demo for Sequence Analysis

The following is a new sequence analysis demo:

 Predicting Protein Secondary Structure Using a Neural Network

R2008b

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

Following are new functions:

The following functions are updated:

• `affyread` — Read microarray data from Affymetrix GeneChip file. Updated so that `Probes` field in the return structure is now a `single`, which reduces memory usage.

• `celintensityread` — Read probe intensities from Affymetrix CEL files. Updated so that `PMIntensities` and `MMIntensities` fields in the return structure are now `singles`, which reduces memory usage.

• `geosoftread` — Read Gene Expression Omnibus (GEO) SOFT format data. Updated to support Platform (GPL) records.

• `getgeodata` — Retrieve Gene Expression Omnibus (GEO) format data. Updated to support Platform (GPL) and Series (GSE) records.

• `goannotread` — Read annotations from Gene Ontology annotated file. Updated to include two new properties, `'Fields'` and `'Aspect'`, which let you read a subset of the data in the annotated file.

• `multialignread` — Read multiple sequence alignment file. Updated to support PHYLIP (Phylogeny Inference Package) multiple-sequence alignment files.

• `mzxmlread` — Read data from mzXML file. Improved to read larger files, faster and without running out of memory. Updated with three new properties, `'Levels'`, `'TimeRange'`, and `'ScanIndices'`, which let you filter and read a subset of the data. Updated with a `'Verbose'` property to control the progress display while reading the file.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.1 and earlier, the `Probes` field, in the structure returned by `affyread`, and the `PMIntensities` and `MMIntensities` fields, in the structure returned by `celintensityread`, were `doubles`. In Bioinformatics Toolbox Version 3.2, these fields are `singles`.

Sequence Utility Functions

Following is a new function:

The following functions are updated:

• `blastncbi` — Create remote NCBI BLAST report request ID or link to NCBI BLAST report. Updated to include a `'GapCosts'` property, which lets you specify penalties for both opening and extending gaps, and an `'Entrez'` property, which lets you limit searches using Entrez query syntax.

• `cleave` — Cleave amino acid sequence with enzyme. Includes a new input argument that specifies the name of an enzyme or compound for which a cleavage rule is specified in the literature.

• `rebasecuts` — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 806 of REBASE, the Restriction Enzyme Database.

• `restrict` — Split nucleotide sequence at restriction site. Updated to use Version 806 of REBASE, the Restriction Enzyme Database.

• `seqlogo` — Display sequence logo for nucleotide or amino acid sequences. Updated to return a figure handle to the sequence logo.

Multiple Sequence Alignment Functions

Following is a new function:

The following function is updated:

• `multialignread` — Read multiple sequence alignment file. Updated to support PHYLIP (Phylogeny Inference Package) multiple sequence alignment files.

Gene Ontology Functions

The following function is updated:

• `goannotread` — Read annotations from Gene Ontology annotated file. Updated to include two new properties, `'Fields'` and `'Aspect'`, which let you read a subset of the data in the annotated file.

Protein Analysis Functions

Following are new functions:

The following function is updated:

• `cleave` — Cleave amino acid sequence with enzyme. Includes a new input argument that specifies the name of an enzyme or compound for which a cleavage rule is specified in the literature.

Mass Spectrometry Functions

Following are new functions:

The following function is updated:

• `mzxmlread` — Read data from mzXML file. Improved to read larger files, faster and without running out of memory. Updated with three new properties, `'Levels'`, `'TimeRange'`, and `'ScanIndices'`, which let you filter and read a subset of the data. Updated with a `'Verbose'` property to control the progress display while reading the file.

Microarray File Format Functions

Following are new functions:

The following functions are updated:

• `affyread` — Read microarray data from Affymetrix GeneChip file. Updated so that `Probes` field in the return structure is now a `single`, which reduces memory usage.

• `celintensityread` — Read probe intensities from Affymetrix CEL files. Updated so that `PMIntensities` and `MMIntensities` fields in the return structure are now `singles`, which reduces memory usage.

• `geosoftread` — Read Gene Expression Omnibus (GEO) SOFT format data. Updated to support Platform (GPL) records.

• `getgeodata` — Retrieve Gene Expression Omnibus (GEO) format data. Updated to support Platform (GPL) and Series (GSE) records.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.1 and earlier, the `Probes` field, in the structure returned by `affyread`, and the `PMIntensities` and `MMIntensities` fields, in the structure returned by `celintensityread`, were `doubles`. In Bioinformatics Toolbox Version 3.2, these fields are `singles`.

Microarray Functions

Following are new functions:

The following functions are updated:

• `ilmnbslookup` — Look up Illumina BeadStudio target (probe) sequence and annotation information. Updated to support BGX and TXT annotation files.

• `mattest` — Perform two-sample t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated to use unequal variance instead of equal variance for the test.

• `probesetlookup` — Look up information for Affymetrix probe set. Updated to accept multiple probe set IDs/names or gene IDs.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.1 and earlier, the `mattest` function used equal variance for the test. In Bioinformatics Toolbox Version 3.2, the `mattest` function uses unequal variance for the test.

DataMatrix Object

Following is a new object:

• DataMatrix object — Data structure encapsulating data and metadata from microarray experiment so that it can be indexed by gene or probe identifiers and by sample identifiers.

DataMatrix Methods

There are many methods that let you create, index into, modify, create subsets, sort, perform operations on, analyze, and plot a DataMatrix object.

Demo for Sequence Analysis

The following is a new sequence analysis demo:

R2008a

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

Following is a new function:

The following functions are updated:

• `celintensityread` — Read probe intensities from Affymetrix CEL files. Updated output structure to include a new field, `GroupNumbers`, which contains group numbers of probes.

• `fastawrite` — Write to file using FASTA format. Updated such that if you specify an existing file, new data is appended to the file instead of overwriting it.

• `getgenbank` — Retrieve sequence information from GenBank® database. Updated such that if you use the `'ToFile'` property and specify an existing file, new data is appended to the file instead of overwriting it. Updated to allow you to access a partial sequence by adding new property `'PartialSeq'`.

• `getgenpept` — Retrieve sequence information from GenPept database. Updated such that if you use the `'ToFile'` property and specify an existing file, new data is appended to the file instead of overwriting it. Updated to allow you to access a partial sequence by adding new property `'PartialSeq'`.

• `getgeodata` — Retrieve Gene Expression Omnibus (GEO) SOFT format data. Updated to retrieve both Sample (GSM) and Data Set (GDS) data.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.0 and earlier, when writing to files using the `fastawrite` function or the `getgenbank` or `getgenpept` functions with the `'ToFile'` property, if you specified an existing file, the file was overwritten. In Bioinformatics Toolbox Version 3.1, if you specify an existing file, new data is appended to the file instead of overwriting it.

Sequence Utility Functions

The following functions are updated:

• `evalrasmolscript` — Send RasMol script commands to Molecule Viewer window. Updated to use Version 11.4 of the Jmol molecule viewer.

• `molviewer` — Display and manipulate 3-D molecule structure. Updated to use Version 11.4 of the Jmol molecule viewer.

• `ramachandran` — Draw Ramachandran plot for Protein Data Bank (PDB) data. Updated to handle PDB files with multiple chains and models by adding three properties: `'Chain'`, `'Plot'`, and `'Model'`. Updated Ramachandran plot to mark glycine residues and display reference regions by adding three properties: `'Glycine'`, `'Regions'`, and `'RegionDef'`. Updated Ramachandran plot to display amino acid information in ToolTip. Updated to easily determine the names and sequence positions of amino acids corresponding to torsion angles by creating an output structure.

• `rebasecuts` — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 710 of REBASE, the Restriction Enzyme Database.

• `restrict` — Split nucleotide sequence at restriction site. Updated to use Version 710 of REBASE, the Restriction Enzyme Database.

Pairwise Sequence Alignment Functions

The following functions are updated:

• `nwalign` — Globally align two sequences using Needleman-Wunsch algorithm. Updated to improve pairwise sequence performance.

• `swalign` — Locally align two sequences using Smith-Waterman algorithm. Updated to improve pairwise sequence performance.

Phylogenetic Tree Tools Function

The following function is updated:

• `dnds` — Estimate synonymous and nonsynonymous substitution rates. Updated by adding `'AdjustStops'` property to control whether stop codons are excluded from calculations.

Protein Analysis Functions

The following functions are updated:

• `evalrasmolscript` — Send RasMol script commands to Molecule Viewer window. Updated to use Version 11.4 of the Jmol molecule viewer.

• `molviewer` — Display and manipulate 3-D molecule structure. Updated to use Version 11.4 of the Jmol molecule viewer.

• `ramachandran` — Draw Ramachandran plot for Protein Data Bank (PDB) data. Updated to handle PDB files with multiple chains and models by adding three properties: `'Chain'`, `'Plot'`, and `'Model'`. Updated Ramachandran plot to mark glycine residues and display reference regions by adding three properties: `'Glycine'`, `'Regions'`, and `'RegionDef'`. Updated Ramachandran plot to display amino acid information in ToolTip. Updated to easily determine the names and sequence positions of amino acids by creating an output structure.

Microarray File Format Functions

Following is a new function:

The following functions are updated:

• `celintensityread` — Read probe intensities from Affymetrix CEL files. Updated output structure to include a new field, `GroupNumbers`, which contains group numbers of probes.

• `getgeodata` — Retrieve Gene Expression Omnibus (GEO) SOFT format data. Updated to retrieve both Sample (GSM) and Data Set (GDS) data.

Microarray Functions

Following are new functions:

The following functions are updated:

• `clustergram` — Compute hierarchical clustering, display dendrogram and heat map, and create clustergram object.

Updated properties include:

• `'Linkage'` — Can specify linkage method separately for rows and columns.

• `'Dendrogram'` — Can specify color threshold separately for rows and columns.

Replaced properties include:

• `'Dimension'` — Replaced by the `'Cluster'` property, which lets you cluster along the columns, rows, or both.

• `'Pdist'` — Replaced by `'RowPdist'` and `'ColumnPdist'` properties.

New properties include:

• `'Standardize'` — Specifies the dimension for standardizing the data.

• `'DisplayRange'` — Specifies the display range of standardized values.

• `'LogTrans'` — Controls the log2 transform of the data.

• `'Impute'` — Specifies a function and properties to impute missing data.

• `'RowMarker'` — Adds color and text marker to a group of rows.

• `'ColumnMarker'` — Adds color and text marker to a group of columns.

The interactivity of the clustergram figure is enhanced with the following features:

• Select a group of rows or columns and display the group number and genes or samples within.

• Create a new clustergram of only a group of the data.

• Export data as a clustergram object or structure in the MATLAB Workspace.

• `maboxplot` — Create box plot for microarray data. Updated by adding `'BoxPlot'` property, which lets you specify arguments to pass to the `boxplot` function, which creates the box plot.

• `mairplot` — Create intensity versus ratio scatter plot of microarray data. Updated by adding `'PlotOnly'` property, which lets you display the scatter plot without user interface components.

• `mattest` — Perform two-sample t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated by adding `'Bootstrap'` property to run bootstrap tests.

• `mavolcanoplot` — Create significance versus gene expression ratio (fold change) scatter plot of microarray data. Updated by adding `'PlotOnly'` property, which lets you display the volcano plot without user interface components.

• `probesetvalues` — Create table of Affymetrix probe set intensity values. Updated by adding `'Background'` property to control the background correction.

• `zonebackadj` — Perform background adjustment on Affymetrix microarray probe-level data using zone-based method. Updated to return a third output containing the estimated background values for each probe.

Compatibility Considerations

In Bioinformatics Toolbox Version 3.0 and earlier, the `clustergram` function included `'Dimension'` and `'Pdist'` properties. In Bioinformatics Toolbox Version 3.1, the `'Dimension'` property is replaced by the `'Cluster'` property, and the `'Pdist'` property is replaced by the `'RowPdist'` and `'ColumnPdist'` properties.

Object

Following is a new object:

Clustergram Methods

The following are new methods of a clustergram object:

Demo for Sequence Analysis

The following is a new sequence analysis demo:

Demo for Microarray Data Analysis

The following is a new microarray data analysis demo:

Demo for Visualization Tools

The following is a new visualization tool demo:

R2007b

New Features, Bug Fixes, Compatibility Considerations

Data Format and Database Functions

Following are new functions:

The following function was updated:

• `affyread` — Read microarray data from Affymetrix GeneChip file. Updated the structure returned when reading a CDF library file. The structure contains three new subfields: `GroupNumber`, `Direction`, and `GroupName`.

Microarray File Format Functions

Following is a new function:

The following function was updated:

• `affyread` — Read microarray data from Affymetrix GeneChip file. Updated the structure returned when reading a CDF library file. The structure contains three new subfields: `GroupNumber`, `Direction`, and `GroupName`.

Microarray Functions

Following are new functions:

The following function is updated:

• `probesetvalues` — Create table of Affymetrix probe set intensity values. Updated return matrix, which contains intensity values for probe-level data, to include two new fields: `GroupNumber` and `Direction`. Updated to return a second output containing the column names for the return matrix, which contains intensity values for probe-level data.

Sequence Conversion, Utility, and Visualization Functions

Following are new functions:

Mass Spectrometry Functions

The following function is updated:

• `mspalign` — Align mass spectra from multiple peak lists from LC/MS or GC/MS data set. Updated to include a new property, `'ShowEstimation'`, which controls the display of an assessment plot relative to the estimation method and the vector of common mass/charge (m/z) values.

Statistical Learning Functions

The following function is updated:

• `svmsmoset` — Create or edit Sequential Minimal Optimization (SMO) options structure. Updated default values for the `'MaxIter'` and `'KernelCacheLimit'` properties. Changed the `'Display'` property so that when set to `'iter'`, a report displays every 500 iterations instead of 10.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.6 and earlier, the `svmsmoset` function used a `'MaxIter'` property with a default of `1500` and a `'KernelCacheLimit'` property with a default of `7500`. In Bioinformatics Toolbox Version 3.0, the defaults are `15000` and `5000`, respectively. Also, when you set the `'Display'` property to `'iter'`, a report displays every 500 iterations instead of 10.

Gene Ontology Methods

The following methods of a gene ontology object are updated:

• `geneont.getancestors` — Find terms that are ancestors of specified Gene Ontology term. Updated to also return the number of times each ancestor is found. Updated to include two new properties, `'Relationtype'`, which specifies a relationship type to search for in the gene ontology, and `'Exclude'`, which controls excluding the original queried term(s) from the output, unless the term was reached while searching the gene ontology.

• `geneont.getdescendants` — Find terms that are descendants of specified Gene Ontology term. Updated to also return the number of times each descendant is found. Updated to include two new properties, `'Relationtype'`, which specifies a relationship type to search for in the gene ontology, and `'Exclude'`, which controls excluding the original queried term(s) from the output, unless the term was reached while searching the gene ontology.

• `geneont.getrelatives` — Find terms that are relatives of specified Gene Ontology term. Updated to also return the number of times each relative is found. Updated to include three new properties, `'Levels'`, which specifies the number of levels up and down to search in the gene ontology, `'Relationtype'`, which specifies a relationship type to search for in the gene ontology, and `'Exclude'`, which controls excluding the original queried term(s) from the output, unless the term was reached while searching the gene ontology.

Demos for Microarray Data Analysis

The following are two new microarray data analysis demos:

Demos for Sequence Analysis

The following are two new sequence analysis demos:

The Investigating the Bird Flu Virus demo was updated to demonstrate how to write KML-formatted files, which can be used by Google Earth™ to display geospatial data.

Demo for Graph Theory Analysis

The following is a new graph theory demo:

R2007a+

New Features, Bug Fixes, Compatibility Considerations

Data Formats and Databases Functions

The following functions are updated:

• `affyread` — Read microarray data from Affymetrix GeneChip file. Updated to read Affymetrix files from expression, genotyping, or resequencing assays on all platforms, except Solaris™.

• `celintensityread` — Read probe intensities from Affymetrix CEL files. Updated to read Affymetrix CEL and CDF files from expression or genotyping assays on all platforms, except Solaris.

• `mzxmlread` — Read mzXML file into MATLAB as structure. Updated to read mzXML files that conform to the mzXML 2.1 specification or earlier specifications.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.6, the structure returned by `affyread` when reading a CHP file from an expression assay no longer contains a `ProbePairs` field. The `ProbePairs` field still exists in the structure returned by `affyread` when reading a CDF file.

Microarray File Formats Functions

The following functions are updated:

• `affyread` — Read microarray data from Affymetrix GeneChip file. Updated to read Affymetrix files from expression, genotyping, or resequencing assays on all platforms, except Solaris.

• `celintensityread` — Read probe intensities from Affymetrix CEL files. Updated to read Affymetrix CEL and CDF files from expression or genotyping assays on all platforms, except Solaris.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.6, the structure returned by `affyread` when reading a CHP file from an expression assay no longer contains a `ProbePairs` field. The `ProbePairs` field still exists in the structure returned by `affyread` when reading a CDF file.

Microarray Utility Functions

The following function is updated:

• `probesetplot` — Plot Affymetrix probe set intensity values. Updated to accept structures created from CEL and CDF files, instead of a structure created from a CHP file.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.5 and earlier, the `probesetplot` function accepted a structure created from a CHP file as input. Currently it requires two structures: one created from a CEL file and one created from a CDF library file. If you have any scripts that call the `probesetplot` function, you need to update them to provide the correct input arguments.

Microarray Normalization and Filtering Functions

Following is a new function:

• `zonebackadj` — Perform background adjustment on Affymetrix microarray probe-level data using zone-based method.

Mass Spectrometry Functions

The following function is updated:

• `mzxmlread` — Read mzXML file into MATLAB as structure. Updated to read mzXML files that conform to the mzXML 2.1 specification or earlier specifications.

Following is a new function you can use to calibrate and/or synchronize multidimensional mass spectrometry data:

R2007a

New Features, Bug Fixes, Compatibility Considerations

Data Formats and Database Functions

Following are new functions for reading and creating files:

The following functions were updated:

• `celintensityread` — Read probe intensities from Affymetrix CEL files (Windows® 32). Updated so that the order of columns (CEL files) in return matrices `PMIntensities` and `MMIntensities` matches the order of CEL files in the `CELFiles` input argument.

• `pdbread` — Read data from Protein Data Bank (PDB) file. Updated so that the six fields containing coordinate information (`Atom`, `AtomSD`, `AnisotropicTemp`, `AnisotropicTempSD`, `Terminal`, and `HeterogenAtom`) are now subfields within the `Model` field of the MATLAB structure. Updated to include a new property, `ModelNum`, which reads only the specified model from a PDB-formatted text file.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the `celintensityread` function ordered the columns (CEL files) of return matrices `PMIntensities` and `MMIntensities` alphabetically.

In Bioinformatics Toolbox Version 2.4 and earlier, the `pdbread` function stored coordinate information in six fields (`Atom`, `AtomSD`, `AnisotropicTemp`, `AnisotropicTempSD`, `Terminal`, and `HeterogenAtom`) within the MATLAB structure. These six fields are now subfields within the `Model` field of the MATLAB structure.

Demo for Data Formats and Database Functions

The Accessing NCBI Entrez Databases with E-Utilities demo illustrates how to programatically search and retrieve data.

Statistical Learning Functions

Following are new functions:

• `optimalleaforder` — Determine optimal leaf ordering for hierarchical binary cluster tree.

• `svmsmoset` — Create or edit Sequential Minimal Optimization (SMO) options structure.

The following function was updated:

• `svmtrain` — Train support vector machine classifier. Updated to include a new `SMO` method and a new property, `SMO_Opts`, which provides options for the `SMO` method. The `BoxConstraint` property has changed, including a new default value.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the `svmtrain` function used a `BoxConstraint` property with a default of $\frac{1}{\sqrt{eps}}$. In Bioinformatics Toolbox Version 2.5, the default is `1`, which can lead to slightly different results.

Protein Analysis and Sequence Utilities Functions

Following are new functions:

The following functions were updated:

• `featuresparse` — Parse features from GenBank, GenPept, or EMBL data. Updated to include a new property, `Sequence`, which controls the extraction, when possible, of the sequences.

• `oligoprop` — Calculate sequence properties of DNA oligonucleotide. Updated to handle ambiguous `N` characters in a sequence.

The following function is removed:

• `pdbplot` — Plot 3-D protein structure. This function was replaced by the `molviewer` function.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.5, the `pdbplot` function was replaced by the `molviewer` function. If you have any scripts that call the `pdbplot` function, you need to update them to call the `molviewer` function.

Demo for Protein Analysis and Sequence Utilities Functions

The Visualizing the Three-dimensional Structure of a Molecule demo illustrates the `molviewer` function.

Sequence Alignment Functions

The following function was updated:

• `seqpdist` — Calculate pairwise distance between sequences. Updated to assume that all input sequences are aligned if they have the same length, regardless of the presence of gaps. If you know your input sequences are not aligned, you can align them before passing them to `seqpdist` (for example, using `multialign`), or set `PairwiseAlignment` to `true` when using `seqpdist`.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the `seqpdist` function assumed all input sequences were aligned if they had the same length and at least one gap.

Demo for Sequence Alignment Functions

The Comparing Whole Genomes demo illustrates how to compare features of organisms on a genomic evolution scale.

Microarray File Formats Functions

Following is a new function:

The following function was updated:

• `celintensityread` — Read probe intensities from Affymetrix CEL files (Windows 32). Updated so that the order of columns (CEL files) in return matrices `PMIntensities` and `MMIntensities` matches the order of CEL files in the `CELFiles` input argument.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the `celintensityread` function ordered the columns (CEL files) of return matrices `PMIntensities` and `MMIntensities` alphabetically.

Microarray Normalization and Filtering Functions

Following are new functions:

• `affyprobeaffinities` — Compute Affymetrix probe affinities from their sequences and MM probe intensities.

• `gcrmabackadj` — Perform GC Robust Multi-array Average (GCRMA) background adjustment on Affymetrix microarray probe-level data using sequence information.

• `gcrma` — Perform GC Robust Multi-array Average (GCRMA) background adjustment, quantile normalization, and median-polish summarization on Affymetrix microarray probe-level data.

Demo for Microarray File Formats, Normalization, and Filtering Functions

The Preprocessing Affymetrix Microarray Data at the Probe Level demo illustrates the `affyprobeseqread`, `affyprobeaffinities`, `gcrmabackadj`, and `gcrma` functions.

Microarray Data Analysis and Visualization Functions

Following is a new function:

• `mafdr` — Estimate false discovery rate (FDR) of differentially expressed genes from two experimental conditions or phenotypes.

The following function was updated:

• `mattest` — Perform two-tailed t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated to include a new property, `Permute`, which controls whether permutation tests are run.

Demo for Microarray Data Analysis and Visualization Functions

The Exploring Gene Expression Data demo illustrates the `mattest` and `mafdr` functions.

Mass Spectrometry Functions

Following are new functions:

The following function was updated:

• `msheatmap` — Create pseudocolor image of set of mass spectra. Updated to handle LC/MS and GC/MS data.

Phylogenetic Tree Tools Functions

Following is a new function:

The following functions were updated:

• `dnds` — Estimate synonymous and nonsynonymous substitution rates. Updated to include two new properties, `Verbose`, which controls the display of the codons considered in the computations and their amino acid translations, and `Window`, which performs the calculations over a sliding window.

• `dndsml` — Estimate synonymous and nonsynonymous substitution rates using maximum likelihood method. Updated to include a new property, `Verbose`, which controls the display of the codons considered in the computations and their amino acid translations.

• `seqpdist` — Calculate pairwise distance between sequences. Updated to assume that all input sequences are aligned if they have the same length, regardless of the presence of gaps. If you know your input sequences are not aligned, you can align them before passing them to `seqpdist` (for example, using `multialign`), or set `PairwiseAlignment` to `true` when using `seqpdist`.

Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the `seqpdist` function assumed all input sequences were aligned if they had the same length and at least one gap.

Demos for Phylogenetic Tree Tools Functions

The following demos illustrate the `nwalign`, `seqinsertgaps`, `dnds`, and `multialign` functions:

The Reconstructing the Origin and the Diffusion of the SARS Epidemic demo presents an analysis of the SARS epidemic.

Phylogenetic Tree Methods

Following is a new method of a phytree object:

R2006b

New Features, Bug Fixes

Data Formats and Database Functions

Following is a new function for getting data into the MATLAB environment:

The following functions were updated:

• `celintensityread` — Read probe intensities from Affymetrix CEL files (Windows 32). Updated to include a new property, `Verbose`, which controls the display of a progress report showing the name of each CEL file as it is read.

• `fastaread` — Read data from FASTA file. Updated to include a new property, `Blockread`, which controls reading a single entry or block of entries from a file.

• `geosoftread` — Read Gene Expression Omnibus (GEO) SOFT format data. Updated to read Data Set (GDS) files as well as Sample (GSM) files.

• `getblast` — BLAST report from NCBI Web site. Updated to include a new property, `WaitTilReady`, which pauses the MATLAB software and waits a specified time (minutes) for a report from the NCBI Web site.

• `scfread` — Read trace data from SCF file. Updated to include more output options.

Sequence Utilities Functions

Following is a new function for parsing sequence data:

Sequence Visualization Functions

The following function was updated:

• `seqtool` — Open tool to interactively explore biological sequences. Updated to download sequences from the EMBL database, interactively move the viewing frame in the Sequence Viewer by pressing and holding Ctrl while click-dragging, and export an amino acid translation as a FASTA file or to the MATLAB Workspace.

Multiple Sequence Alignment Functions

The following function was updated:

Microarray File Formats

The following function was updated:

• `celintensityread` — Read probe intensities from Affymetrix CEL files (Windows 32). Updated to include a new property, `Verbose`, which controls the display of a progress report showing the name of each CEL file as it is read.

Microarray Data Analysis and Visualization Functions

The following functions were updated:

• `clustergram` — Create dendrogram and heat map. Updated to include a new property, `OptimalLeafOrder`, which enables or disables the optimal leaf ordering calculation, which determines the leaf order that maximizes the similarity between neighboring leaves.

• `mairplot` — Create intensity versus ratio scatter plot for microarray signals. Updated to include a new property, `Type`, which creates either an IR plot or MA plot, changing the plot axes to log scale, and adding plot interactive features such as displaying gene labels, changing factor lines, normalizing data, and exporting data.

• `mapcaplot` — Create Principal Component plot of expression profile data. Updated by adding an export feature.

• `redgreencmap` — Create red and green colormap. Updated to include a new property, `Interpolation`, which sets the method for color interpolation.

Graph Theory Functions

Following are new functions for applying basic graph theory algorithms to sparse matrices:

Graph Visualization Methods

Following are new methods for applying basic graph theory algorithms to a `biograph object`:

Phylogenetic Tree Methods

Following is a new method for the `phytree` object:

R2006a+

New Features

Data Formats and Databases Functions

The following functions are removed:

• `getpir` — Sequence data from PIR-PSD database. This function retrieved data from the PIR-PSD database. This database has been discontinued and this function no longer retrieves data. See `http://pir.georgetown.edu/pirwww/dbinfo/pir_psd.shtml` for more details.

• `pirread` — Read data from Protein Information Resource (PIR) file. This function supported the data format of the PIR-PSD database. This database has been discontinued. See `http://pir.georgetown.edu/pirwww/dbinfo/pir_psd.shtml` for more details.

Sequence Utilities Functions

The following function was updated to include five new databases, including refseq_rna, refseq_genomic, env_nt, refseq_protein, and env_nr:

Sequence Visualization Functions

Following is a new function for visualizing sequence data:

Statistical Learning Functions

The following function was updated to include three new properties, including `RBF_Sigma`, `BoxConstraint`, and `Autoscale`:

Microarray Functions

The following function is supported on the Windows 32 platform only:

Following are new functions for preprocessing Affymetrix probe-level microarray data:

• `celintensityread` — Read probe intensities from Affymetrix CEL files (Windows 32).

• `rmabackadj` — Perform background adjustment on Affymetrix microarray probe-level data using Robust Multi-array Average (RMA) procedure.

• `rmasummary` — Calculate gene (probe set) expression values from Affymetrix microarray probe-level data using Robust Multi-array Average (RMA) procedure.

• `affyinvarsetnorm` — Perform rank invariant set normalization on probe intensities from multiple Affymetrix CEL or DAT files.

Following is a new function for two-color microarray normalization:

• `mainvarsetnorm` — Perform rank invariant set normalization on gene expression values from two experimental conditions or phenotypes.

Following are new functions for microarray differential expression analysis:

• `mattest` — Perform two-sample, two-tailed t-test to evaluate differential expression of genes from two experimental conditions or phenotypes.

• `mavolcanoplot` — Create significance versus gene expression ratio (fold change) scatter plot of microarray data.

Demo for Microarray Functions

New demo of the new microarray functions (Analyzing Affymetrix Microarray Gene Expression Data).

R2006a

No New Features or Changes

R14SP3+

New Features

Demo for Gene Ontology Functions

New demo for the new Gene Ontology functions (`geneontologydemo`) and working with whole genomes (`biomemorymapdemo`).

R14SP3

No New Features or Changes

R14SP2+

New Features

Phylogenetic Tree Methods

• `getcanonical` — Calculate the canonical form of a phylogenetic tree.

• `getnewwickstr` — Create a Newick formatted string.

• `reroot` — Change the root of a phylogenetic tree.

• `subtree` — Extract a subtree.

• `weights` — Calculate weights for a phylogenetic tree.

Microarray Functions

`probesetplot` — Plot values for an Affymetrix CHP file probe set.

Statistics Functions

`rankfeatures` — Renamed function. The previous name was `sqtlfeatures`.

R14SP2

New Features

Updated RBASE Table

RBASE is the enzyme table that the function `restrict` uses to locate sequence patterns.

Expanded Bioperl Demonstration

Example of calling the MATLAB software from Perl scripts now includes several examples of passing various types of data (both directly and by variant variable) back and forth between Perl and a MATLAB Automation Server. To view the demo, type `bioperldemo`.