Accelerating the pace of engineering and science

# Bioinformatics Toolbox Release Notes

## R2014b

New Features, Bug Fixes, Compatibility Considerations

#### Small sample unpaired hypothesis tests for count data

You can perform an unpaired hypothesis test for count data (from high-throughput sequencing assays such as RNA-Seq or ChIP-Seq) with small numbers of samples or replicates using nbintest. For instance, you can use this function to decide if an observed difference in read counts between two conditions is significant for a given gene. The function assumes read counts follow a negative binomial or Poisson distribution.

#### Functions for navigating the Gene Transfer Format (GTF) hierarchy to assist with alternative gene splicing and isoform analyses

The following functions of the GTFAnnotation class help you navigate the GTF information hierarchy to perform alternative gene splicing and isoform analyses:

#### Attractor metagene algorithm for feature engineering using mutual information-based clustering

The metafeatures function uses the attractor metagene algorithm, which is an unsupervised learning algorithm for feature engineering using mutual information-based learning.

#### Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
knnclassifyStill runsfitcknnUse fitcknn to fit a knn classification model and classify data using the predict function of ClassificationKNN object.
Default values for the knn classifier of randfeaturesStill runsWhen you specify 'knn' as the classifier, randfeatures now uses the following new defaults.
• The default function is fitcknn.

• For the 'ClassOptions' name-value pair argument, the defaults are {'Distance','corelation','NumNeighbors',5}.

• For the 'PerformanceThreshold' name-value pair argument, the default is 0.7.

• For the 'ConfidenceThreshold name-value pair argument, the default is 1.

The 'Type' name-value pair argument of gethmmtreeWarnsTo download the 'seed' tree, use gethmmtree without any extra input arguments. To obtain the 'full' tree, you may use the gethmmalignment function to download the 'full' alignment and build a tree using the seqpdist and seqneighjoin functions.Setting 'Type' to 'seed' or 'full' is now ignored since the PFAM database no longer provides trees for the 'full' alignment.

## R2014a

Bug Fixes, Compatibility Considerations

#### Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations

'R2012b' name-value pair input argument for the seqalignviewer function

WarnsThe default version of seqalignviewer runs more robustly than the previous version (R2012b), and the default version is recommended to use. This name-value pair is intended only for customers who need the previous version.See the Compatibility Considerations subheading in Select and move behaviors in the Sequence Alignment app.
• If you have the original FASTQ (or FASTA) file, use the bowtie function (for UNIX® and Mac users only) to remap your files. This will create BAM files that are compatible with the toolbox.

• If you have old BOWTIE files without the sequence files, you can read the files using textscan.

When using other BOWTIE mapper/aligner programs, set appropriate option(s) to create either a SAM or BAM output file. Then use the Biomap object or the samread or bamread function to access the mapped short reads.

## R2013b

Bug Fixes, Compatibility Considerations

#### Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
Index name-value pair argument as input to the bamread functionErrorsRemove instances of the Index name-value pair argument. See the Compatibility Considerations subheading in Increased performance when reading BAM files.
'average' as a choice for the Method input argument to the seqneighjoin functionErrors'equivar'Replace instances of 'average' as an input to seqneighjoin with 'equivar'.

Changes to tool names:

Errors
• Replace instances of multialignviewer with seqalignviewer.

• Replace instances of phytreetool with phytreeviewer.

• Replace instances of seqtool with seqviewer.

'natural' as a choice for the Output name-value pair input argument to these functions:

Errors'linear'Replace instances of 'natural' as the value of the Output name-value pair input argument with 'linear' for these functions:
• affyrma

• affygcrma

• rmasummary

## R2013a

New Features, Bug Fixes, Compatibility Considerations

#### Saving to FASTQ, FASTA, SAM, and BAM files from a BioMap object

You can write the information of any BioRead/BioMap object to a file using the write function of the object.

#### Sorting unordered BAM files using BioMap objects

You can pass an unordered BAM file to a BioMap constructor, which then creates a new ordered file.

#### Quality control plots for unmapped short-read data

You can obtain quality control plots for short-read data using the plotSummary function of the BioRead object. The function creates a figure containing six plots that present summary statistics of the data stored in a FASTQ file.

• Parse FASTQ files without creating a BioRead object.

• Interact with the quality data to compare different data sets or filtering options.

• Create customized plots.

#### Select and move behaviors in the Sequence Alignment app

You can select a block from aligned sequences and move it horizontally if gaps are available.

#### Compatibility Considerations

To use the previous version of seqalignviewer, set the name-value pair argument 'R2012b' to true.

#### Random access of annotation object data, for consistency with BioMap object data access

You can have random access to data in GFFAnnotation and GTFAnnotation objects by using these functions:

• getSubset

• getData

• getIndex

#### Functionality being changed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
bowtieread functionWarnsbowtie function for UNIX and Mac users.When using other BOWTIE mapper/aligner programs, set appropriate option(s) to create either a SAM or BAM output file. Then use the BioMap object or the samread or bamread function to access the mapped short reads.

'natural' as a choice for the OutputValue name-value pair input argument to these functions:

Warns'linear'

Replace instances of 'natural' as the value of the OutputValue name-value pair input argument with 'linear' for these functions:

'R2012b' name-value pair input argument for the seqalignviewer function.

Still runsSee the Compatibility Considerations subheading in Select and move behaviors in the Sequence Alignment app.

## R2012b

New Features, Bug Fixes, Compatibility Considerations

#### Multiple reference sequences in BioMap objects

You can now store information about short reads mapped to multiple references in a BioMap object. The new SequenceDictionary property contains the catalog of references available in a BioMap object.

#### Compatibility Considerations

For BioMap objects created using R2012b:

• The Reference property is now a cell string of length obj.NSeqs, for both BioMap objects with multiple references in the SequenceDictionary and objects with only one reference. For BioMap objects created before R2012b—which can only have a single reference—the Reference property is a string.

• BioMap methods that access data by genomic ranges now accept BioMap objects with multiple references. To use these methods, you must specify the reference or references to operate on. The affected methods are:

#### Mapping single and paired-end short read data to reference genomes

Two new functions generate an index and map short reads to a reference sequence using the Burrows-Wheeler transform.

• bowtiebuild builds index files using an input reference sequence.

• bowtie maps single and paired-end short reads to indexed reference files.

 Note:   bowtiebuild and bowtie run on Mac and UNIX platforms only.

#### Increased performance when reading BAM files

The bamread function no longer requires the Index name-value pair argument to provide index information from a structure in the MATLAB® workspace. Indexing happens automatically without a decrease in performance.

#### Compatibility Considerations

The Index name-value pair argument as input to the bamread function will be removed in a future release. There is no need to replace it, only remove it.

#### Name changes for multialignviewer, phytreetool, and seqtool tools

Three tools in Bioinformatics Toolbox™ are renamed. The old names return a warning and will be removed in a future release.

#### Compatibility Considerations

The choice of 'average' for the Method input argument to the seqneighjoin function warns and will be removed in a future release. It is replaced with 'equivar'.

#### Functionality being changed or removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
Index name-value pair argument as input to the bamread functionWarnsRemove instances of the Index name-value pair argument. See the Compatibility Considerations subheading in Increased performance when reading BAM files.
'average' as a choice for the Method input argument to the seqneighjoin functionWarns'equivar'Replace instances of 'average' as an input to seqneighjoin with 'equivar'

Changes to tool names:

• multialignviewer

• phytreetool

• seqtool

Warns
• Replace instances of multialignviewer with seqalignviewer

• Replace instances of phytreetool with phytreeviewer

• Replace instances of seqtool with seqviewer

'natural' as a choice for the Output name-value pair input argument to these functions:

Still runs'linear'Replace instances of 'natural' as the value of the Output name-value pair input argument with 'linear' for these functions:
• affyrma

• affygcrma

• rmasummary

setName, getName, and getNSeqs methods of BioRead and BioMap objects

Still runsDot notation

Replace instances of:
setName(BioObj, name)
with:
BioObj.Name = name

Replace instances of:
getName(BioObj)
with:
BioObj.Name

Replace instances of:
getNSeqs(BioObj)
with:
BioObj.NSeqs

isequalwithequalnans for DataMatrix objectStill runsisequalnReplace instances of isequalwithequalnans with isequaln
princomp for DataMatrix objectStill runspcaReplace instances of princomp with pca

## R2012a

New Features, Bug Fixes, Compatibility Considerations

#### Update to Jmol Functions

The following functions are updated to use Version 12.0.5 of the Jmol molecule viewer:

• evalrasmolscript — Send RasMol script commands to Molecule Viewer window.

• molviewer — Display and manipulate 3-D molecule structure.

#### Enhancements to Objects for NGS Data

You now can construct and access information in a BioMap object (created from a BAM-formatted file) more efficiently. Filtering, binning, counting, and base-coverage calculation operations are now faster because source file scanning is no longer needed.

When using the BioIndexedFile, BioRead, or BioMap constructor to create an object from a FASTA-, FASTQ-, or SAM-formatted file, the source file no longer has a size limit of 4 GB.

#### Compatibility Considerations

The BioRead and BioMap constructors are changed as follows:

• When creating a BioMap object from a SAM- or BAM-formatted file containing multiple reference sequences, the BioMap constructor by default uses the first reference listed in the Dictionary of the source file.

• The following syntaxes, which take a BioIndexedFile object as an input, have been removed:

BioMapobj = BioMap(BioIFobj)

There is no longer a need to use this syntax, as you can create an indexed object directly from the SAM- or BAM-formatted source file. See Representing Sequence and Quality Data in a BioRead Object or Representing Sequence, Quality, and Alignment/Mapping Data in a BioMap Object.

• The following syntaxes have been removed:

• The following syntax has been removed:

BioMapobj = BioMap('SAMFile', File)

BioMapobj = BioMap(File)

• The Indexed name-value pair argument as input to the getSubset method of the BioRead or BioMap class has been removed. Use the InMemory name-value pair argument instead.

• The 'SubsetRef' name-value pair argument of the BioMap constructor has been removed. Use the 'SelectRef' name-value pair argument instead.

• The getCoverage method of the BioMap class has been removed. Use the getBaseCoverage, getCounts, or getIndex method instead.

#### Enhancements to the NGS Browser

When you import short-read alignment data from a SAM- or BAM-formatted file into the NGS Browser:

• SAM-formatted files no longer have a size limit of 4 GB. Now, the size of both SAM- and BAM-formatted files is limited only by your operating system and available memory.

• The SAM- or BAM-formatted file can contain alignment data for multiple references. When importing short reads, you can select one reference sequence from those listed in the file header, or scan the file to see a list of the actual reference sequences and the aligned read count for each reference sequence.

#### Compatibility Considerations

• The aacount and basecount functions no longer accept the 'Others' name-value pair. Use the 'Ambiguous' or 'Gaps' name-value pair instead.

• The aacount and basecount functions no longer accept the 'Structure' name-value pair. Use the 'Ambiguous' name-value pair with either 'ignore' or 'warn' instead.

• The aacount, basecount, codoncount, and dimercount functions no longer include an Others field in the output structure. Use the Ambiguous field instead.

#### Demo for DNA Methylation Analysis

The following demo describes how to identify and compare potential cancer-related methylations at the base-pair level:

 Exploring Genome-Wide Differences in DNA Methylation ProfilesExploring Genome-Wide Differences in DNA Methylation Profiles

#### Functionality Being Changed or Removed

Functionality What Happens When You Use This Functionality?Use This InsteadCompatibility Considerations
SubsetRef name-value pair argument as input to the BioMap constructor functionErrors'SelectRef' name-value pair argumentReplace instances of SubsetRef with SelectRef.
BioIndexedFile object as input to the BioRead or BioMap constructor functionErrorsA FASTQ-, SAM-, or BAM-formatted fileSee the Compatibility Considerations subheading in Enhancements to Objects for NGS Data.
'FASTQFile', File pair as input to the BioRead constructorErrorsFileSee the Compatibility Considerations subheading in Enhancements to Objects for NGS Data.
'SAMFile', File pair as input to the BioRead or BioMap constructorErrorsFileSee the Compatibility Considerations subheading in Enhancements to Objects for NGS Data.
Indexed name-value pair argument as input to the getSubset method of the BioRead or BioMap classErrorsInMemory name-value pair argumentReplace instances of 'Indexed', false pair with 'InMemory', true pair.
getCoverage method of the BioMap classErrorsgetBaseCoverage, getCounts, or getIndex methodReplace all instances of getCoverage with getBaseCoverage, getCounts, or getIndex.
'Others' name-value pair as input to aacount and basecount functionsErrors'Ambiguous' or 'Gaps' name-value pair as input to aacount and basecount functionsReplace instances of 'Others' with 'Ambiguous' or 'Gaps'.
'Structure' name-value pair as input to aacount and basecount functionsErrors'Ambiguous' name-value pair with either 'ignore' or 'warn' as input to aacount and basecount functionsReplace instances of 'Structure' with 'Ambiguous' paired with 'ignore' or 'warn'.
Others field from the output structure returned by aacount, basecount, codoncount, or dimercount.ErrorsAmbiguous fieldReplace instances of Others (as an input) with Ambiguous.

## R2011b

New Features, Bug Fixes, Compatibility Considerations

#### Visualizing and Investigating Short-Read Alignments and Feature Annotations in the NGS Browser

The NGS Browser lets you visually verify and investigate the alignment of short-read sequences to a reference sequence. For more information, see Visualizing and Investigating Short-Read Alignments and ngsbrowser.

#### Objects for Genomic Feature Annotations

Following are new classes for objects that contain genomic feature annotations for nucleotide sequences:

These classes have properties and methods that you can use to explore, access, filter, and manipulate all or a subset of the feature annotation data. For more information, see Storing and Managing Feature Annotations in Objects.

#### Enhancements to BioRead and BioMap Objects

You can now construct a BioMap object from a BAM-formatted file.

When constructing these objects from source files, by default the data is indexed, which is more efficient for construction and data access. The BioRead and BioMap constructors now include an IndexDir name-value pair argument, which lets you specify the location of the index file.

You can still construct these objects with the data in memory, which lets you modify all the properties of the objects. The BioRead and BioMap constructors now include an InMemory name-value pair argument, which lets you construct the objects with the data in memory.

For details on the previous enhancements, see Storing and Managing Short-Read Sequence Data in Objects.

#### Compatibility Considerations

The BioRead and BioMap constructors are changed as follows:

• The following syntaxes that take a BioIndexedFile object as an input will be removed in a future release:

BioMapobj = BioMap(BioIFobj)

There is no longer a need to use this syntax, as you can create an indexed object directly from the SAM- or BAM-formatted source file. See Representing Sequence and Quality Data in a BioRead Object or Representing Sequence, Quality, and Alignment/Mapping Data in a BioMap Object.

• The following syntaxes will be removed in a future release:

• The following syntax will be removed in a future release:

BioMapobj = BioMap('SAMFile', File)

BioMapobj = BioMap(File)

• The Indexed name-value pair argument as input to the getSubset method of the BioRead or BioMap class will be removed in a future release. Use the InMemory name-value pair argument instead.

• The 'SubsetRef' name-value pair argument of the BioMap constructor will be removed in a future release. Use the 'SelectRef' name-value pair argument instead.

• If you use the getSubset method of a BioRead or BioMap object, and specify the same element more than once, the method errors, even if the object is in memory.

#### Enhancements to the saminfo and baminfo Functions

The saminfo and baminfo functions now include a ScanDictionary name-value pair argument, which controls the return of the reference names and the number of reads aligned to each reference from a SAM- or BAM-formatted file in new fields, ScannedDictionary and ScannedDictionaryCount. This information is needed when constructing a BioMap object from a file with multiple reference sequences. For more information, see Constructing a BioMap Object from a SAM- or BAM-Formatted File.

#### Compatibility Considerations

The Reference field is no longer returned in the output structure for baminfo. The ScannedDictionary field now includes names of the reference sequences.

#### Conversion of Error and Warning Message Identifiers

For R2011b, some error and warning message identifiers have changed in Bioinformatics Toolbox.

#### Compatibility Considerations

If you have scripts or functions that use message identifiers that changed, you must update the code to use the new identifiers. Typically, message identifiers are used to turn off specific warning messages, or in code that uses a try/catch statement and performs an action based on a specific error identifier.

For example, the Bioinfo:nwalign:InvalidScoringMatrix identifier has changed to bioinfo:nwalign:InvalidScoringMatrix. If your code checks for Bioinfo:nwalign:InvalidScoringMatrix, you must update it to check for bioinfo:nwalign:InvalidScoringMatrix instead.

To determine the identifier for a warning, run the following command just after you see the warning:

 [MSG,MSGID] = lastwarn;

The preceding command saves the message identifier to the variable MSGID.

To determine the identifier for an error, run the following command just after you see the error:

 exception = MException.last; MSGID = exception.identifier;
 Note:   Warning messages indicate a potential issue with your code. While you can turn off a warning, a suggested alternative is to change your code so it runs warning-free.

#### Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
'SubsetRef' name-value pair argument as input to BioMap constructor functionWarns'SelectRef' name-value pair argumentSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
BioIndexedFile object as input to the BioRead or BioMap constructor functionWarnsA FASTQ-, SAM-, or BAM-formatted fileSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
'FASTQFile', File pair as input to the BioRead constructorWarnsFileSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects
'SAMFile', File pair as input to the BioRead or BioMap constructorWarnsFileSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
Indexed name-value pair argument as input to getSubset method of the BioRead or BioMap classWarnsInMemory name-value pair argumentSee the Compatibility Considerations subheading in Enhancements to BioRead and BioMap Objects.
Reference field of structure returned by baminfoErrorsScannedDictionary fieldSee the Compatibility Considerations subheading in Enhancements to the saminfo and baminfo Functions.

## R2011a

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

The following functions have a new field, FilePath, in their output structure:

• fastainfo — Return information about FASTA file.

• fastqinfo — Return information about FASTQ file.

• saminfo — Return information about Sequence Alignment/Map (SAM) file.

The fastainfo function has two additional fields in its output structure: Header and Length.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.6, the aacount and basecount functions still allowed 'Others' and 'Structure' name-value pairs, but displayed a warning.

In Bioinformatics Toolbox Version 3.7, the aacount and basecount functions do not allow 'Others' and 'Structure' name-value pairs, and return an error if you use them. Now you must use the 'Ambiguous' and 'Gaps' name-value pairs, which specify whether to count or ignore ambiguous characters and gaps, as well as specify how to count ambiguous characters, and whether to display a warning.

#### Updates to the BioIndexedFile Class, Properties, and Methods

The following name-value pairs of the BioIndexedFile constructor function are renamed:

• MapKeys is now IndexedByKeys.

• MemMapIndex is now MemoryMappedIndex.

 Note:   The former name-value pairs are still valid for Bioinformatics Toolbox Version 3.7 (R2011a).

The MemoryMappedIndex property of the BioIndexedFile class is now editable, which lets you load and unload file indices in memory.

The BioIndexedFile class includes the following new methods:

• getDictionary — Retrieve reference sequence names from SAM-formatted source file associated with BioIndexedFile object.

• getSubset — Create object containing subset of elements from BioIndexedFile object.

The BioMap constructor includes a new name-value pair, SubsetRef, which lets you specify one reference sequence in the input argument (BioIndexedFile object, SAM-formatted file, or structure) when constructing the BioMap object.

The following method of the BioRead and BioMap classes is updated:

 getSubset — Create object containing subset of elements from object. Updated with addition of the Indexed name-value pair, which lets you use the BioIndexedFile object when creating a new object, thus saving memory. This name-value pair is ignored if your BioRead or BioMap object was not created from a BioIndexedFile object.

Following are new methods of the BioMap class:

• getBaseCoverage — Return base-by-base alignment coverage of reference sequence in BioMap object.

• getCounts — Return count of read sequences aligned to reference sequence in BioMap object.

• getIndex — Return indices of read sequences aligned to reference sequence in BioMap object.

The getCoverage method of the BioMap class is being removed in a future release. Use the getBaseCoverage, getCounts, and getIndex methods instead.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.6 and earlier, the BioMap class included a getCoverage method, which computes read coverage in a BioMap object.

In Bioinformatics Toolbox Version 3.7, the getCoverage method still runs, but displays a warning. Now use the getBaseCoverage, getCounts, and getIndex methods of the BioMap class.

#### Support Vector Machine (SVM) Functions

The functionality of the svmsmoset function is incorporated into the svmtrain and statset functions. Although svmsmoset is still valid, it is no longer documented.

The svmtrain function has been updated:

• The function can now handle NaN values in the training matrix input and performs more checks of parameters you supply.

• The function now includes Sequential Minimal Optimization (SMO) functionality plus four new name-value pairs: kernelcachelimit, kktviolationlevel, options, and tolkkt.

• The default training method is SMO, even if you have Optimization Toolbox™ installed.

• The QuadProg_Opts and SMO_Opts name-value pairs have been replaced by the options name-value pair. Although the former name-value pairs are still valid, the recommended ways to perform quadratic programming (QP) training and SMO training are summarized in the following bullets.

• The recommended way to include QP options for svmtrain is to use the QP training method and use the new options name-value pair. For the options value, use a structure you create with optimset.

• The recommended way to include SMO options for svmtrain is to use the default SMO training method and use the new kernelcachelimit, kktviolationlevel, options, and tolkkt name-value pairs. For the options value, use a structure you create with the statset function and its Display and MaxIter name-value pairs.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.6 and earlier, if you had Optimization Toolbox installed, QP was the default training method for the svmtrain function. Now the default training method is SMO.

#### Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
'Others' name-value pair as input to aacount and basecount functionsErrors'Ambiguous' or 'Gaps' name-value pair as input to aacount and basecount functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
'Structure' name-value pair as input to aacount and basecount functionsErrors'Ambiguous' name-value pair with either 'ignore' or 'warn' as input to aacount and basecount functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
getCoverage method of BioMap classWarnsgetBaseCoverage, getCounts, and getIndex methodsSee the Compatibility Considerations subheading in Updates to BioRead and BioMap Classes and Methods.
svmsmoset functionStill runssvmtrain and statset functionssvmsmoset is not recommended. Use svmtrain and statset instead.

## R2010b

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

• baminfo — Return information about Binary Sequence Alignment/Map (BAM) file.

The following new functions let you read Bowtie- and SOAP-formatted files:

• soapread — Read data from Short Oligonucleotide Analysis Package (SOAP) file.

#### Sequence Conversion Functions

The following new functions support CIGAR strings for sequence mapping and alignment:

• align2cigar — Convert aligned sequences to corresponding Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings.

• cigar2align — Convert unaligned sequences to aligned sequences using Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings

#### Sequence Statistics Functions

The following functions are updated:

• aacount — Count amino acids in sequence. Updated by adding the Ambiguous property, which lets you specify how to count ambiguous amino acid characters. Updated by adding the Gaps property, which lets you specify to count or ignore gaps. The Others and Structure properties still work, but display a warning, indicating that they will be invalid in future versions of Bioinformatics Toolbox. The Others field in the output structure is replaced by the Ambiguous field.

• basecount — Count nucleotides in sequence. Updated by adding the Ambiguous property, which lets you specify how to count ambiguous nucleotide characters. Updated by adding the Gaps property, which lets you specify to count or ignore gaps. The Others and Structure properties still work, but display a warning, indicating that they will be invalid in future versions of Bioinformatics Toolbox. The Others field in the output structure is replaced by the Ambiguous field.

• codonbias — Calculate codon frequency for each amino acid coded for in nucleotide sequence. Updated by adding the Ambiguous property, which lets you specify how to count codons containing ambiguous nucleotide characters.

• codoncount — Count codons in nucleotide sequence. Updated by adding the Ambiguous property, which lets you specify how to count codons containing ambiguous nucleotide characters. Updated by adding the GeneticCode property, which lets you overlay a grid that groups the synonymous codons on the heat map of the codon counts. The Others field in the output structure is replaced by the Ambiguous field.

• dimercount — Count dimers in nucleotide sequence. Updated by adding the Ambiguous property, which lets you specify how to count dimers containing ambiguous nucleotide characters. The Others field in the output structure is replaced by the Ambiguous field.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.5 and earlier, the aacount and basecount functions included 'Others' and 'Structure' property name/property value pairs, which let you specify how to count ambiguous characters and gaps, and whether to display a warning. These functions also returned a structure with an Others field.

In Bioinformatics Toolbox Version 3.6, the aacount and basecount functions still allow 'Others' and 'Structure' property name/property value pairs, but display a warning. Now the aacount and basecount functions include the 'Ambiguous' and 'Gaps' property name/property value pairs, which specify whether to count or ignore ambiguous characters and gaps, as well as specify how to count ambiguous characters, and whether to display a warning. These functions now return a structure with an Ambiguous field, which replaces the Others field.

In Bioinformatics Toolbox Version 3.6, the codoncount and dimercount functions return a structure with an optional Ambiguous field, which replaces the Others field.

#### Pairwise Sequence Alignment Functions

The following function is updated:

• nwalign — Globally align two sequences using Needleman-Wunsch algorithm. Updated to support semiglobal or "glocal" alignments by addition of Glocal property.

#### Multiple Sequence Alignment Functions

The following new functions support CIGAR strings for sequence mapping and alignment:

• align2cigar — Convert aligned sequences to corresponding Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings.

• cigar2align — Convert unaligned sequences to aligned sequences using Compact Idiosyncratic Gapped Alignment Report (CIGAR) format strings

The following functions are updated:

• multialign — Align multiple sequences using progressive method. Updated to include a new property, 'UseParallel', which lets you use parfor-loops and compute in parallel mode.

• seqpdist — Calculate pairwise distance between sequences. Updated to include a new property, 'UseParallel', which lets you use parfor-loops and compute in parallel mode.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.4 and earlier, the multialign and seqpdist functions included 'JobManager' and 'WaitInQueue' property name/property value pairs, which let you process in parallel, including support for the MATLAB scheduler for clusters.

In Bioinformatics Toolbox Version 3.5, the multialign and seqpdist functions allowed the 'JobManager' and 'WaitInQueue' property name/property value pairs, but displayed a warning.

In Bioinformatics Toolbox Version 3.6, the multialign and seqpdist functions error if you use the 'JobManager' or 'WaitInQueue' property name/property value pair. Instead they include the 'UseParallel' property name/property value pair, which lets you process in parallel, including support for:

• Local workers for multicore machines

• The MATLAB scheduler for clusters

• Third-party schedulers for clusters

#### Updates to BioMap Class, Methods, and Properties

You can now create a BioMap object from a MATLAB structure containing sequence and alignment information, returned by the bamread function.

The following method of the BioMap class is updated:

 getCoverage — Compute read coverage in BioMap object. Updated to return the coverage of multiple regions of the reference sequence.

The BioMap class includes the following new methods:

The BioMap class includes the following new property:

• MatePosition — Positions of the mates for all read sequences represented in the BioMap object.

#### Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
'Others' property name/property value pair as input to aacount and basecount functionsWarns'Ambiguous' or 'Gaps' property name/property value pair as input to aacount and basecount functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
'Structure' property name/property value pair as input to aacount and basecount functionsWarns'Ambiguous' property name/property value pair with either 'ignore' or 'warn' as input to aacount and basecount functionsSee the Compatibility Considerations subheading in Sequence Statistics Functions.
'JobManager' property name/property value pair as input to multialign and seqpdist functionsErrors'UseParallel' property name/property value pair as input to multialign and seqpdist functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions.
'WaitInQueue' property name/property value pair as input to multialign and seqpdist functionsErrors'UseParallel' property name/property value pair as input to multialign and seqpdist functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions.

The following properties of a clustergram object:

• ColumnMarker

• Impute

• Ratio

• RowMarker

• SymmetricRange

Errors

New properties of a clustergram object:

• ColumnGroupMarker

• ImputeFun

• DisplayRatio

• RowGroupMarker

• Symmetric

See Clustergram Methods and Properties.
'Dimension' property name/property value pair as input to clustergram functionErrors'Cluster' property name/property value pair as input to clustergram functionSee the Compatibility Considerations subheading in Microarray Functions.
'Pdist' property name/property value pair as input to clustergram functionErrorsEither 'RowPdist' or 'ColumnPdist' property name/property value pair as input to clustergram functionSee the Compatibility Considerations subheading in Microarray Functions.
pdbplot functionErrorsmolviewer functionSee the Compatibility Considerations subheading in Protein Analysis and Sequence Utilities Functions.
getpir and pirread functionsErrorsUse getembl, getgenpept, and getpdb to retrieve protein sequences from Web databases. Use emblread, genpeptread, and pdbread to read protein sequence data. See Data Formats and Databases Functions.

## R2010a

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

The following functions are new:

• saminfo — Return information about Sequence Alignment/Map (SAM) file.

The following functions are updated:

• phytreeread — Read phylogenetic tree file. Updated to return a second output containing bootstrap values for tree nodes.

#### Pairwise Sequence Alignment Functions

The following function is updated:

#### Multiple Sequence Alignment Functions

The following functions are updated:

• multialign — Align multiple sequences using progressive method. Updated to include a new property, 'UseParallel', which lets you use parfor-loops and compute in parallel mode.

• seqpdist — Calculate pairwise distance between sequences. Updated to include a new property, 'UseParallel', which lets you use parfor-loops and compute in parallel mode.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.4 and earlier, the multialign and seqpdist functions included 'JobManager' and 'WaitInQueue' property name/property value pairs, which let you process in parallel, including support for the MATLAB scheduler for clusters.

In Bioinformatics Toolbox Version 3.5, the multialign and seqpdist functions do not include the include the 'JobManager' and 'WaitInQueue' property name/property value pairs. Instead they include the 'UseParallel' property name/property value pair, which lets you process in parallel, including support for:

• Local workers for multicore machines

• The MATLAB scheduler for clusters

• Third-party schedulers for clusters

#### Phylogenetic Tree Tools and Methods

The following functions are updated:

• phytreeread — Read phylogenetic tree file. Updated to return a second output containing bootstrap values for tree nodes.

• seqpdist — Calculate pairwise distance between sequences. Updated to include a new property, 'UseParallel', which lets you use parfor-loops and compute in parallel mode.

#### BioIndexedFile Function, Object, Methods, and Properties

Following is a new class for an object that lets you extract information from large multi-entry text files.

• BioIndexedFile — Allow quick and efficient access to large text file with nonuniform-size entries.

This class has properties and methods that are useful for accessing, reading, and parsing data from a large source file.

#### BioRead Function, Object, Methods, and Properties

Following is a new class for an object that contains data from short-read sequences, including sequence headers, nucleotide sequences, and the quality scores for the sequences.

• BioRead — Contain sequence and quality data.

This class has properties and methods that you can use to explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or sequence alignment and mapping.

#### BioMap Function, Object, Methods, and Properties

Following is a new class for an object that contains data from short-read sequences, including sequence headers, read sequences, quality scores for the sequences, and data about alignment and mapping to a single reference sequence.

• BioMap — Contain sequence, quality, alignment, and mapping data.

This class has properties and methods that you can use to explore, access, filter, and manipulate all or a subset of the data, before doing subsequent analyses or viewing the data.

#### Function Elements Being Removed

Function Element NameWhat Happens When You Use This Function ElementUse This InsteadCompatibility Considerations
'JobManager' property name/property value pair as input to multialign and seqpdist functionsWarns'UseParallel' property name/property value pair as input to multialign and seqpdist functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions
'WaitInQueue' property name/property value pair as input to multialign and seqpdist functionsWarns'UseParallel' property name/property value pair as input to multialign and seqpdist functionsSee the Compatibility Considerations subheading in Multiple Sequence Alignment Functions.

## R2009b

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

Following are new functions:

• fastainfo — Return information about FASTA file.

• fastqinfo — Return information about FASTQ file.

• fastqwrite — Write to file using FASTQ format.

• sffinfo — Return information about SFF file.

• tgspcinfo — Return information about SPC file.

The following functions are updated:

• affyread — Read microarray data from Affymetrix® GeneChip® file. Updated to read cell layout files (CLF) and background probe (BGP) files.

• multialignwrite — Write multiple alignment to file. Updated to write a file in either ClustalW ALN format (default) or MSF format.

#### Protein Analysis Functions

Following is a new function:

• isotopicdist — Calculate high-resolution isotope mass distribution and density function.

The following function is updated:

• cleave — Cleave amino acid sequence with enzyme. Updated to let you specify an exception to the enzyme's cleavage rule and to let you specify a maximum number of missed cleavage sites. Also updated to return the number of missed cleavage sites per peptide fragment.

#### Data Visualization Functions

The following functions are updated:

• microplateplot — Display visualization of microtiter plate. Display updated so that first row of input matrix appears at the top and is labeled row A. Updated to return the handle to the axes of the plot, which lets you reverse the order or the rows or columns in the display. Updated to include a new property, 'TextFontSize', which lets you control the font size of text labels.

• multialignviewer — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• showalignment — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.3, the default layout for the plot returned by microplateplot displayed the first row of the input matrix at the bottom.

In Bioinformatics Toolbox Version 3.4, the plot displays the first row of the input matrix at the top.

#### Sequence Statistics Functions

The following function is updated:

• seqshowwords — Graphically display words in sequence. Updated to search for multiple words in a sequence.

#### Sequence Utility Functions

The following functions are updated:

• cleave — Cleave amino acid sequence with enzyme. Updated to let you specify an exception to the enzyme's cleavage rule and to let you specify a maximum number of missed cleavage sites. Also updated to return the number of missed cleavage sites per peptide fragment.

• rebasecuts — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 904 of REBASE®, the Restriction Enzyme Database.

• restrict — Split nucleotide sequence at restriction site. Updated to use Version 904 of REBASE, the Restriction Enzyme Database.

#### Sequence Visualization Functions

The following functions are updated:

• multialignviewer — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• showalignment — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

#### Pairwise Sequence Alignment Functions

Following is a new function:

• localalign — Return local optimal and suboptimal alignments between two sequences.

The following functions are updated:

• multialignviewer — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• showalignment — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

#### Multiple Sequence Alignment Functions

The following functions are updated:

• multialignviewer — Display and interactively adjust multiple sequence alignment. Updated to accept a list of names to label the sequences in the Multiple Sequence Alignment Viewer window.

• multialignwrite — Write multiple alignment to file. Updated to write a file in either ClustalW ALN format (default) or MSF format.

• showalignment — Display color-coded sequence alignment. Updated to control the inclusion or exclusion of terminal gaps from the count of matches and similar residues when displaying a pairwise alignment.

#### Phylogenetic Tree Tools and Methods

The Phylogenetic Tree Tool includes the following updates:

• Includes two new circular print renderings: equal angle and equal daylight

• Updates to Tools menu, including commands to select specific branch and leaf nodes based on different criteria, such as distance, common ancestors, leaves only, and descendants.

Following is a new method:

• cluster — Validate clusters in phylogenetic tree.

The following method is updated:

• plot — Draw phylogenetic tree. Updated to include two new algorithms for circular layouts: equal angle and equal daylight. Updated to let you rotate circular trees from 0 through 360 degrees and to rotate leaf labels of circular trees so that the text is aligned to the root node. Updated the 'LeafLabels' property so that it defaults to true for circular layouts and to false for square and angular layouts.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.3, the 'LeafLabels' property defaulted to true when the 'Type' property was 'square' or 'angular', and to false when the 'Type' property was 'radial'.

In Bioinformatics Toolbox Version 3.4, the 'LeafLabels' property defaults to false when the 'Type' property is 'square' or 'angular', and to true when the 'Type' property is 'radial'.

#### Clustergram Window

The Clustergram window has two new toolbar buttons:

• Annotate button — Shows and hides intensity values for each area of the heat map.

• Show Dendrogram button — Shows and hides the dendrograms.

#### Clustergram Methods and Properties

The following are new methods of a clustergram object:

The following properties of a clustergram object are renamed:

• ColumnMarker is now ColumnGroupMarker.

• Impute is now ImputeFun.

• Ratio is now DisplayRatio.

• RowMarker is now RowGroupMarker.

• SymmetricRange is now Symmetric.

 Note:   The former property names are still valid for Bioinformatics Toolbox version 3.4 (R2009b).

Following is a new property related to the display of dendrogram tree diagrams in a clustergram object:

• ShowDendrogram

The following are new properties related to the display of row and column labels of a clustergram object:

• RowLabels

• ColumnLabels

• RowLabelsLocation

• ColumnLabelsLocation

• RowLabelsColor

• ColumnLabelsColor

• LabelsWithMarkers

• RowLabelsRotate

• ColumnLabelsRotate

The following are new properties related to annotating data in a clustergram object:

• Annotate

• AnnotColor

• AnnotPrecision

When using clustergram properties with the get and set methods, the property names are now case sensitive.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.3, the property names of a clustergram object were not case sensitive when used with the get and set methods.

In Bioinformatics Toolbox Version 3.4, property names of a clustergram object are case sensitive.

#### HeatMap Object, Methods, and Properties

Following is a new object:

• HeatMap object — Object containing matrix and heat map display properties.

The following are methods of a HeatMap object:

• addXLabel — Label x-axis of heat map.

• addYLabel — Label y-axis of heat map.

• plot — Render heat map for object.

• view — Render heat map for object.

A HeatMap object includes many properties that control the creation of the heat map, row and column labels, axes labels, title, and data annotation.

#### DataMatrix Methods

Following is a new method of a DataMatrix object:

• dmwrite — Write DataMatrix object to text file.

#### Microarray Functions, Objects, Methods, and Properties

Following are new functions to create objects containing data from a microarray gene expression experiment:

• bioma.ExpressionSet — Contain data from microarray gene expression experiment.

• bioma.data.ExptData — Contain expression data from microarray gene expression experiment.

• bioma.data.MetaData — Contain sample or feature metadata from microarray gene expression experiment.

• bioma.data.MIAME — Contain experiment information from microarray gene expression experiment.

These objects have properties and methods that are useful for viewing and analyzing the data or a subset of the data.

#### Mass Spectrometry Functions

Following are new functions:

• isotopicdist — Calculate high-resolution isotope mass distribution and density function.

• tgspcinfo — Return information about SPC file.

The following function is updated:

• mspeaks — Convert raw peak data to peak list (centroided data). Updated to include a new property, 'Style', which lets you specify the style for marking the peaks in the plot.

## R2009a

New Features, Bug Fixes, Compatibility Considerations

#### Data Visualization Functions

Following is a new function:

#### Sequence Utility Functions

The following functions are updated:

• rebasecuts — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 811 of REBASE, the Restriction Enzyme Database.

• restrict — Split nucleotide sequence at restriction site. Updated to use Version 811 of REBASE, the Restriction Enzyme Database.

#### Sequence Conversion Functions

The following function is updated:

• nt2aa — Convert nucleotide sequence to amino acid sequence. Updated to include a new property, 'ACGTOnly', to support ambiguous and unknown nucleotide characters.

#### Bioanalytic and Mass Spectrometry Functions

The following functions are updated to use with data from any separation technique, including mass spectrometry:

• msalign — Align peaks in signal to reference peaks.

• msbackadj — Correct baseline of signal with peaks.

• mslowess — Smooth signal with peaks using nonparametric method.

• msnorm — Normalize set of signals with peaks.

• mspeaks — Convert raw peak data to peak list (centroided data).

• msppresample — Resample signal with peaks while preserving peaks.

• msresample — Resample signal with peaks.

• mssgolay — Smooth signal with peaks using least-squares polynomial.

#### Microarray Functions

The following functions are updated:

• cghcbs — Perform circular binary segmentation (CBS) on array-based comparative genomic hybridization (aCGH) data. Updated to include an optional heuristic stopping rule to improve performance.

• ilmnbslookup — Look up Illumina® BeadStudio™ target (probe) sequence and annotation information. Updated to read Illumina microRNA array annotation files.

• mattest — Perform two-sample t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated with new property, 'VarType', which lets you specify equal or unequal (default) variance for the test.

#### Compatibility Considerations

A compatibility consideration related to the mattest function was introduced in Bioinformatics Toolbox Version 3.2, but not reported in the Release Notes for Version 3.2 (R2008b). Specifically, in Bioinformatics Toolbox Version 3.1 and earlier, the mattest function used equal variance for the test. In Bioinformatics Toolbox Version 3.2, the mattest function starting using unequal variance for the test.

#### Demo for Sequence Analysis

The following is a new sequence analysis demo:

 Predicting Protein Secondary Structure Using a Neural NetworkPredicting Protein Secondary Structure Using a Neural Network

## R2008b

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

Following are new functions:

• affygcrma — Perform GC Robust Multi-array Average (GCRMA) procedure on Affymetrix microarray probe-level data.

• affyrma — Perform Robust Multi-array Average (RMA) procedure on Affymetrix microarray probe-level data.

• affysnpannotread — Read Affymetrix Mapping DNA array data from CSV-formatted annotation file.

• geoseriesread — Read Gene Expression Omnibus (GEO) Series (GSE) format data.

• multialignwrite — Write multiple-alignment to file using ClustalW ALN format.

The following functions are updated:

• affyread — Read microarray data from Affymetrix GeneChip file. Updated so that Probes field in the return structure is now a single, which reduces memory usage.

• celintensityread — Read probe intensities from Affymetrix CEL files. Updated so that PMIntensities and MMIntensities fields in the return structure are now singles, which reduces memory usage.

• geosoftread — Read Gene Expression Omnibus (GEO) SOFT format data. Updated to support Platform (GPL) records.

• getgeodata — Retrieve Gene Expression Omnibus (GEO) format data. Updated to support Platform (GPL) and Series (GSE) records.

• goannotread — Read annotations from Gene Ontology annotated file. Updated to include two new properties, 'Fields' and 'Aspect', which let you read a subset of the data in the annotated file.

• multialignread — Read multiple sequence alignment file. Updated to support PHYLIP (Phylogeny Inference Package) multiple-sequence alignment files.

• mzxmlread — Read data from mzXML file. Improved to read larger files, faster and without running out of memory. Updated with three new properties, 'Levels', 'TimeRange', and 'ScanIndices', which let you filter and read a subset of the data. Updated with a 'Verbose' property to control the progress display while reading the file.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.1 and earlier, the Probes field, in the structure returned by affyread, and the PMIntensities and MMIntensities fields, in the structure returned by celintensityread, were doubles. In Bioinformatics Toolbox Version 3.2, these fields are singles.

#### Sequence Utility Functions

Following is a new function:

The following functions are updated:

• blastncbi — Create remote NCBI BLAST report request ID or link to NCBI BLAST report. Updated to include a 'GapCosts' property, which lets you specify penalties for both opening and extending gaps, and an 'Entrez' property, which lets you limit searches using Entrez query syntax.

• cleave — Cleave amino acid sequence with enzyme. Includes a new input argument that specifies the name of an enzyme or compound for which a cleavage rule is specified in the literature.

• rebasecuts — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 806 of REBASE, the Restriction Enzyme Database.

• restrict — Split nucleotide sequence at restriction site. Updated to use Version 806 of REBASE, the Restriction Enzyme Database.

• seqlogo — Display sequence logo for nucleotide or amino acid sequences. Updated to return a figure handle to the sequence logo.

#### Multiple Sequence Alignment Functions

Following is a new function:

• multialignwrite — Write multiple alignment to file using ClustalW ALN format.

The following function is updated:

• multialignread — Read multiple sequence alignment file. Updated to support PHYLIP (Phylogeny Inference Package) multiple sequence alignment files.

#### Gene Ontology Functions

The following function is updated:

• goannotread — Read annotations from Gene Ontology annotated file. Updated to include two new properties, 'Fields' and 'Aspect', which let you read a subset of the data in the annotated file.

#### Protein Analysis Functions

Following are new functions:

• cleavelookup — Find cleavage rule for enzyme or compound.

• pdbsuperpose — Superpose 3-D structures of two proteins.

• pdbtransform — Apply linear transformation to 3-D structure of molecule.

The following function is updated:

• cleave — Cleave amino acid sequence with enzyme. Includes a new input argument that specifies the name of an enzyme or compound for which a cleavage rule is specified in the literature.

#### Mass Spectrometry Functions

Following are new functions:

• mzcdf2peaks — Convert mzCDF structure to peak list.

• mzcdfinfo — Return information about netCDF file containing mass spectrometry data.

• mzxmlinfo — Return information about mzXML file.

The following function is updated:

• mzxmlread — Read data from mzXML file. Improved to read larger files, faster and without running out of memory. Updated with three new properties, 'Levels', 'TimeRange', and 'ScanIndices', which let you filter and read a subset of the data. Updated with a 'Verbose' property to control the progress display while reading the file.

#### Microarray File Format Functions

Following are new functions:

• affygcrma — Perform GC Robust Multi-array Average (GCRMA) procedure on Affymetrix microarray probe-level data.

• affyrma — Perform Robust Multi-array Average (RMA) procedure on Affymetrix microarray probe-level data.

• affysnpannotread — Read Affymetrix Mapping DNA array data from CSV-formatted annotation file.

• geoseriesread — Read Gene Expression Omnibus (GEO) Series (GSE) format data.

The following functions are updated:

• affyread — Read microarray data from Affymetrix GeneChip file. Updated so that Probes field in the return structure is now a single, which reduces memory usage.

• celintensityread — Read probe intensities from Affymetrix CEL files. Updated so that PMIntensities and MMIntensities fields in the return structure are now singles, which reduces memory usage.

• geosoftread — Read Gene Expression Omnibus (GEO) SOFT format data. Updated to support Platform (GPL) records.

• getgeodata — Retrieve Gene Expression Omnibus (GEO) format data. Updated to support Platform (GPL) and Series (GSE) records.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.1 and earlier, the Probes field, in the structure returned by affyread, and the PMIntensities and MMIntensities fields, in the structure returned by celintensityread, were doubles. In Bioinformatics Toolbox Version 3.2, these fields are singles.

#### Microarray Functions

Following are new functions:

• affysnpintensitysplit — Split Affymetrix SNP probe intensity information for alleles A and B.

• affygcrma — Perform GC Robust Multi-array Average (GCRMA) procedure on Affymetrix microarray probe-level data.

• affyrma — Perform Robust Multi-array Average (RMA) procedure on Affymetrix microarray probe-level data.

• DataMatrix — Create DataMatrix object.

The following functions are updated:

• ilmnbslookup — Look up Illumina BeadStudio target (probe) sequence and annotation information. Updated to support BGX and TXT annotation files.

• mattest — Perform two-sample t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated to use unequal variance instead of equal variance for the test.

• probesetlookup — Look up information for Affymetrix probe set. Updated to accept multiple probe set IDs/names or gene IDs.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.1 and earlier, the mattest function used equal variance for the test. In Bioinformatics Toolbox Version 3.2, the mattest function uses unequal variance for the test.

#### DataMatrix Object

Following is a new object:

• DataMatrix object — Data structure encapsulating data and metadata from microarray experiment so that it can be indexed by gene or probe identifiers and by sample identifiers.

#### DataMatrix Methods

There are many methods that let you create, index into, modify, create subsets, sort, perform operations on, analyze, and plot a DataMatrix object.

#### Demo for Sequence Analysis

The following is a new sequence analysis demo:

## R2008a

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

Following is a new function:

The following functions are updated:

• celintensityread — Read probe intensities from Affymetrix CEL files. Updated output structure to include a new field, GroupNumbers, which contains group numbers of probes.

• fastawrite — Write to file using FASTA format. Updated such that if you specify an existing file, new data is appended to the file instead of overwriting it.

• getgenbank — Retrieve sequence information from GenBank® database. Updated such that if you use the 'ToFile' property and specify an existing file, new data is appended to the file instead of overwriting it. Updated to allow you to access a partial sequence by adding new property 'PartialSeq'.

• getgenpept — Retrieve sequence information from GenPept database. Updated such that if you use the 'ToFile' property and specify an existing file, new data is appended to the file instead of overwriting it. Updated to allow you to access a partial sequence by adding new property 'PartialSeq'.

• getgeodata — Retrieve Gene Expression Omnibus (GEO) SOFT format data. Updated to retrieve both Sample (GSM) and Data Set (GDS) data.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.0 and earlier, when writing to files using the fastawrite function or the getgenbank or getgenpept functions with the 'ToFile' property, if you specified an existing file, the file was overwritten. In Bioinformatics Toolbox Version 3.1, if you specify an existing file, new data is appended to the file instead of overwriting it.

#### Sequence Utility Functions

The following functions are updated:

• evalrasmolscript — Send RasMol script commands to Molecule Viewer window. Updated to use Version 11.4 of the Jmol molecule viewer.

• molviewer — Display and manipulate 3-D molecule structure. Updated to use Version 11.4 of the Jmol molecule viewer.

• ramachandran — Draw Ramachandran plot for Protein Data Bank (PDB) data. Updated to handle PDB files with multiple chains and models by adding three properties: 'Chain', 'Plot', and 'Model'. Updated Ramachandran plot to mark glycine residues and display reference regions by adding three properties: 'Glycine', 'Regions', and 'RegionDef'. Updated Ramachandran plot to display amino acid information in ToolTip. Updated to easily determine the names and sequence positions of amino acids corresponding to torsion angles by creating an output structure.

• rebasecuts — Find restriction enzymes that cut nucleotide sequence. Updated to use Version 710 of REBASE, the Restriction Enzyme Database.

• restrict — Split nucleotide sequence at restriction site. Updated to use Version 710 of REBASE, the Restriction Enzyme Database.

#### Pairwise Sequence Alignment Functions

The following functions are updated:

• nwalign — Globally align two sequences using Needleman-Wunsch algorithm. Updated to improve pairwise sequence performance.

• swalign — Locally align two sequences using Smith-Waterman algorithm. Updated to improve pairwise sequence performance.

#### Phylogenetic Tree Tools Function

The following function is updated:

• dnds — Estimate synonymous and nonsynonymous substitution rates. Updated by adding 'AdjustStops' property to control whether stop codons are excluded from calculations.

#### Protein Analysis Functions

The following functions are updated:

• evalrasmolscript — Send RasMol script commands to Molecule Viewer window. Updated to use Version 11.4 of the Jmol molecule viewer.

• molviewer — Display and manipulate 3-D molecule structure. Updated to use Version 11.4 of the Jmol molecule viewer.

• ramachandran — Draw Ramachandran plot for Protein Data Bank (PDB) data. Updated to handle PDB files with multiple chains and models by adding three properties: 'Chain', 'Plot', and 'Model'. Updated Ramachandran plot to mark glycine residues and display reference regions by adding three properties: 'Glycine', 'Regions', and 'RegionDef'. Updated Ramachandran plot to display amino acid information in ToolTip. Updated to easily determine the names and sequence positions of amino acids by creating an output structure.

#### Microarray File Format Functions

Following is a new function:

The following functions are updated:

• celintensityread — Read probe intensities from Affymetrix CEL files. Updated output structure to include a new field, GroupNumbers, which contains group numbers of probes.

• getgeodata — Retrieve Gene Expression Omnibus (GEO) SOFT format data. Updated to retrieve both Sample (GSM) and Data Set (GDS) data.

#### Microarray Functions

Following are new functions:

• affysnpquartets — Create table of SNP probe quartet results for Affymetrix probe set.

• cghfreqplot — Display frequency of DNA copy number alterations across multiple samples.

• ilmnbslookup — Look up Illumina BeadStudio target (probe) sequence and annotation information.

• redbluecmap — Create red and blue color map.

The following functions are updated:

• clustergram — Compute hierarchical clustering, display dendrogram and heat map, and create clustergram object.

Updated properties include:

• 'Linkage' — Can specify linkage method separately for rows and columns.

• 'Dendrogram' — Can specify color threshold separately for rows and columns.

Replaced properties include:

• 'Dimension' — Replaced by the 'Cluster' property, which lets you cluster along the columns, rows, or both.

• 'Pdist' — Replaced by 'RowPdist' and 'ColumnPdist' properties.

New properties include:

• 'Standardize' — Specifies the dimension for standardizing the data.

• 'DisplayRange' — Specifies the display range of standardized values.

• 'LogTrans' — Controls the log2 transform of the data.

• 'Impute' — Specifies a function and properties to impute missing data.

• 'RowMarker' — Adds color and text marker to a group of rows.

• 'ColumnMarker' — Adds color and text marker to a group of columns.

The interactivity of the clustergram figure is enhanced with the following features:

• Select a group of rows or columns and display the group number and genes or samples within.

• Create a new clustergram of only a group of the data.

• Export data as a clustergram object or structure in the MATLAB Workspace.

• maboxplot — Create box plot for microarray data. Updated by adding 'BoxPlot' property, which lets you specify arguments to pass to the boxplot function, which creates the box plot.

• mairplot — Create intensity versus ratio scatter plot of microarray data. Updated by adding 'PlotOnly' property, which lets you display the scatter plot without user interface components.

• mattest — Perform two-sample t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated by adding 'Bootstrap' property to run bootstrap tests.

• mavolcanoplot — Create significance versus gene expression ratio (fold change) scatter plot of microarray data. Updated by adding 'PlotOnly' property, which lets you display the volcano plot without user interface components.

• probesetvalues — Create table of Affymetrix probe set intensity values. Updated by adding 'Background' property to control the background correction.

• zonebackadj — Perform background adjustment on Affymetrix microarray probe-level data using zone-based method. Updated to return a third output containing the estimated background values for each probe.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 3.0 and earlier, the clustergram function included 'Dimension' and 'Pdist' properties. In Bioinformatics Toolbox Version 3.1, the 'Dimension' property is replaced by the 'Cluster' property, and the 'Pdist' property is replaced by the 'RowPdist' and 'ColumnPdist' properties.

#### Object

Following is a new object:

#### Clustergram Methods

The following are new methods of a clustergram object:

• get — Retrieve information about clustergram object.

• plot — Render clustergram heat map and dendrograms for clustergram object.

• set — Set property of clustergram object.

• view — View clustergram heat map and dendrograms for clustergram object.

#### Demo for Sequence Analysis

The following is a new sequence analysis demo:

#### Demo for Microarray Data Analysis

The following is a new microarray data analysis demo:

#### Demo for Visualization Tools

The following is a new visualization tool demo:

## R2007b

New Features, Bug Fixes, Compatibility Considerations

#### Data Format and Database Functions

Following are new functions:

The following function was updated:

• affyread — Read microarray data from Affymetrix GeneChip file. Updated the structure returned when reading a CDF library file. The structure contains three new subfields: GroupNumber, Direction, and GroupName.

#### Microarray File Format Functions

Following is a new function:

The following function was updated:

• affyread — Read microarray data from Affymetrix GeneChip file. Updated the structure returned when reading a CDF library file. The structure contains three new subfields: GroupNumber, Direction, and GroupName.

#### Microarray Functions

Following are new functions:

• chromosomeplot — Plot chromosome ideogram with G-banding pattern.

• cghcbs — Perform circular binary segmentation (CBS) on array-based comparative genomic hybridization (aCGH) data.

The following function is updated:

• probesetvalues — Create table of Affymetrix probe set intensity values. Updated return matrix, which contains intensity values for probe-level data, to include two new fields: GroupNumber and Direction. Updated to return a second output containing the column names for the return matrix, which contains intensity values for probe-level data.

#### Sequence Conversion, Utility, and Visualization Functions

Following are new functions:

• blastlocal — Perform search on local BLAST database to create BLAST report.

• rnaconvert — Convert secondary structure of RNA sequence between bracket and matrix notations.

• rnafold — Predict minimum free-energy secondary structure of RNA sequence.

• rnaplot — Draw secondary structure of RNA sequence.

#### Mass Spectrometry Functions

The following function is updated:

• mspalign — Align mass spectra from multiple peak lists from LC/MS or GC/MS data set. Updated to include a new property, 'ShowEstimation', which controls the display of an assessment plot relative to the estimation method and the vector of common mass/charge (m/z) values.

#### Statistical Learning Functions

The following function is updated:

• svmsmoset — Create or edit Sequential Minimal Optimization (SMO) options structure. Updated default values for the 'MaxIter' and 'KernelCacheLimit' properties. Changed the 'Display' property so that when set to 'iter', a report displays every 500 iterations instead of 10.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.6 and earlier, the svmsmoset function used a 'MaxIter' property with a default of 1500 and a 'KernelCacheLimit' property with a default of 7500. In Bioinformatics Toolbox Version 3.0, the defaults are 15000 and 5000, respectively. Also, when you set the 'Display' property to 'iter', a report displays every 500 iterations instead of 10.

#### Gene Ontology Methods

The following methods of a gene ontology object are updated:

• geneont.getancestors — Find terms that are ancestors of specified Gene Ontology term. Updated to also return the number of times each ancestor is found. Updated to include two new properties, 'Relationtype', which specifies a relationship type to search for in the gene ontology, and 'Exclude', which controls excluding the original queried term(s) from the output, unless the term was reached while searching the gene ontology.

• geneont.getdescendants — Find terms that are descendants of specified Gene Ontology term. Updated to also return the number of times each descendant is found. Updated to include two new properties, 'Relationtype', which specifies a relationship type to search for in the gene ontology, and 'Exclude', which controls excluding the original queried term(s) from the output, unless the term was reached while searching the gene ontology.

• geneont.getrelatives — Find terms that are relatives of specified Gene Ontology term. Updated to also return the number of times each relative is found. Updated to include three new properties, 'Levels', which specifies the number of levels up and down to search in the gene ontology, 'Relationtype', which specifies a relationship type to search for in the gene ontology, and 'Exclude', which controls excluding the original queried term(s) from the output, unless the term was reached while searching the gene ontology.

#### Demos for Sequence Analysis

The following are two new sequence analysis demos:

The Investigating the Bird Flu VirusInvestigating the Bird Flu Virus demo was updated to demonstrate how to write KML-formatted files, which can be used by Google Earth™ to display geospatial data.

#### Demo for Graph Theory Analysis

The following is a new graph theory demo:

## R2007a+

New Features, Bug Fixes, Compatibility Considerations

#### Data Formats and Databases Functions

The following functions are updated:

• affyread — Read microarray data from Affymetrix GeneChip file. Updated to read Affymetrix files from expression, genotyping, or resequencing assays on all platforms, except Solaris™.

• celintensityread — Read probe intensities from Affymetrix CEL files. Updated to read Affymetrix CEL and CDF files from expression or genotyping assays on all platforms, except Solaris.

• mzxmlread — Read mzXML file into MATLAB as structure. Updated to read mzXML files that conform to the mzXML 2.1 specification or earlier specifications.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.6, the structure returned by affyread when reading a CHP file from an expression assay no longer contains a ProbePairs field. The ProbePairs field still exists in the structure returned by affyread when reading a CDF file.

#### Microarray File Formats Functions

The following functions are updated:

• affyread — Read microarray data from Affymetrix GeneChip file. Updated to read Affymetrix files from expression, genotyping, or resequencing assays on all platforms, except Solaris.

• celintensityread — Read probe intensities from Affymetrix CEL files. Updated to read Affymetrix CEL and CDF files from expression or genotyping assays on all platforms, except Solaris.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.6, the structure returned by affyread when reading a CHP file from an expression assay no longer contains a ProbePairs field. The ProbePairs field still exists in the structure returned by affyread when reading a CDF file.

#### Microarray Utility Functions

The following function is updated:

• probesetplot — Plot Affymetrix probe set intensity values. Updated to accept structures created from CEL and CDF files, instead of a structure created from a CHP file.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.5 and earlier, the probesetplot function accepted a structure created from a CHP file as input. Currently it requires two structures: one created from a CEL file and one created from a CDF library file. If you have any scripts that call the probesetplot function, you need to update them to provide the correct input arguments.

#### Microarray Normalization and Filtering Functions

Following is a new function:

• zonebackadj — Perform background adjustment on Affymetrix microarray probe-level data using zone-based method.

#### Mass Spectrometry Functions

The following function is updated:

• mzxmlread — Read mzXML file into MATLAB as structure. Updated to read mzXML files that conform to the mzXML 2.1 specification or earlier specifications.

Following is a new function you can use to calibrate and/or synchronize multidimensional mass spectrometry data:

• samplealign — Align two data sets containing sequential observations by introducing gaps.

## R2007a

New Features, Bug Fixes, Compatibility Considerations

#### Data Formats and Database Functions

Following are new functions for reading and creating files:

• affyprobeseqread — Read data file containing probe sequence information for Affymetrix GeneChip array.

• pdbwrite — Write to file using Protein Data Bank (PDB) format.

The following functions were updated:

• celintensityread — Read probe intensities from Affymetrix CEL files (Windows® 32). Updated so that the order of columns (CEL files) in return matrices PMIntensities and MMIntensities matches the order of CEL files in the CELFiles input argument.

• pdbread — Read data from Protein Data Bank (PDB) file. Updated so that the six fields containing coordinate information (Atom, AtomSD, AnisotropicTemp, AnisotropicTempSD, Terminal, and HeterogenAtom) are now subfields within the Model field of the MATLAB structure. Updated to include a new property, ModelNum, which reads only the specified model from a PDB-formatted text file.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the celintensityread function ordered the columns (CEL files) of return matrices PMIntensities and MMIntensities alphabetically.

In Bioinformatics Toolbox Version 2.4 and earlier, the pdbread function stored coordinate information in six fields (Atom, AtomSD, AnisotropicTemp, AnisotropicTempSD, Terminal, and HeterogenAtom) within the MATLAB structure. These six fields are now subfields within the Model field of the MATLAB structure.

#### Demo for Data Formats and Database Functions

The Accessing NCBI Entrez Databases with E-UtilitiesAccessing NCBI Entrez Databases with E-Utilities demo illustrates how to programatically search and retrieve data.

#### Statistical Learning Functions

Following are new functions:

• optimalleaforder — Determine optimal leaf ordering for hierarchical binary cluster tree.

• svmsmoset — Create or edit Sequential Minimal Optimization (SMO) options structure.

The following function was updated:

• svmtrain — Train support vector machine classifier. Updated to include a new SMO method and a new property, SMO_Opts, which provides options for the SMO method. The BoxConstraint property has changed, including a new default value.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the svmtrain function used a BoxConstraint property with a default of $\frac{1}{\sqrt{eps}}$. In Bioinformatics Toolbox Version 2.5, the default is 1, which can lead to slightly different results.

#### Protein Analysis and Sequence Utilities Functions

Following are new functions:

The following functions were updated:

• featuresparse — Parse features from GenBank, GenPept, or EMBL data. Updated to include a new property, Sequence, which controls the extraction, when possible, of the sequences.

• oligoprop — Calculate sequence properties of DNA oligonucleotide. Updated to handle ambiguous N characters in a sequence.

The following function is removed:

• pdbplot — Plot 3-D protein structure. This function was replaced by the molviewer function.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.5, the pdbplot function was replaced by the molviewer function. If you have any scripts that call the pdbplot function, you need to update them to call the molviewer function.

#### Sequence Alignment Functions

The following function was updated:

• seqpdist — Calculate pairwise distance between sequences. Updated to assume that all input sequences are aligned if they have the same length, regardless of the presence of gaps. If you know your input sequences are not aligned, you can align them before passing them to seqpdist (for example, using multialign), or set PairwiseAlignment to true when using seqpdist.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the seqpdist function assumed all input sequences were aligned if they had the same length and at least one gap.

#### Demo for Sequence Alignment Functions

The Comparing Whole GenomesComparing Whole Genomes demo illustrates how to compare features of organisms on a genomic evolution scale.

#### Microarray File Formats Functions

Following is a new function:

• affyprobeseqread — Read data file containing probe sequence information for Affymetrix GeneChip array.

The following function was updated:

• celintensityread — Read probe intensities from Affymetrix CEL files (Windows 32). Updated so that the order of columns (CEL files) in return matrices PMIntensities and MMIntensities matches the order of CEL files in the CELFiles input argument.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the celintensityread function ordered the columns (CEL files) of return matrices PMIntensities and MMIntensities alphabetically.

#### Microarray Normalization and Filtering Functions

Following are new functions:

• affyprobeaffinities — Compute Affymetrix probe affinities from their sequences and MM probe intensities.

• gcrmabackadj — Perform GC Robust Multi-array Average (GCRMA) background adjustment on Affymetrix microarray probe-level data using sequence information.

• gcrma — Perform GC Robust Multi-array Average (GCRMA) background adjustment, quantile normalization, and median-polish summarization on Affymetrix microarray probe-level data.

#### Demo for Microarray File Formats, Normalization, and Filtering Functions

The Preprocessing Affymetrix Microarray Data at the Probe LevelPreprocessing Affymetrix Microarray Data at the Probe Level demo illustrates the affyprobeseqread, affyprobeaffinities, gcrmabackadj, and gcrma functions.

#### Microarray Data Analysis and Visualization Functions

Following is a new function:

• mafdr — Estimate false discovery rate (FDR) of differentially expressed genes from two experimental conditions or phenotypes.

The following function was updated:

• mattest — Perform two-tailed t-test to evaluate differential expression of genes from two experimental conditions or phenotypes. Updated to include a new property, Permute, which controls whether permutation tests are run.

#### Demo for Microarray Data Analysis and Visualization Functions

The Exploring Gene Expression DataExploring Gene Expression Data demo illustrates the mattest and mafdr functions.

#### Mass Spectrometry Functions

Following are new functions:

• msdotplot — Plot set of peak lists from LC/MS or GC/MS data set.

• mspalign — Align mass spectra from multiple peak lists from LC/MS or GC/MS data set.

• mspeaks — Convert raw mass spectrometry data to peak list (centroided data).

• msppresample — Resample mass spectrometry signal while preserving peaks.

• mzxml2peaks — Convert mzXML structure to peak list.

The following function was updated:

• msheatmap — Create pseudocolor image of set of mass spectra. Updated to handle LC/MS and GC/MS data.

#### Phylogenetic Tree Tools Functions

Following is a new function:

• seqinsertgaps — Insert gaps into nucleotide or amino acid sequence.

The following functions were updated:

• dnds — Estimate synonymous and nonsynonymous substitution rates. Updated to include two new properties, Verbose, which controls the display of the codons considered in the computations and their amino acid translations, and Window, which performs the calculations over a sliding window.

• dndsml — Estimate synonymous and nonsynonymous substitution rates using maximum likelihood method. Updated to include a new property, Verbose, which controls the display of the codons considered in the computations and their amino acid translations.

• seqpdist — Calculate pairwise distance between sequences. Updated to assume that all input sequences are aligned if they have the same length, regardless of the presence of gaps. If you know your input sequences are not aligned, you can align them before passing them to seqpdist (for example, using multialign), or set PairwiseAlignment to true when using seqpdist.

#### Compatibility Considerations

In Bioinformatics Toolbox Version 2.4 and earlier, the seqpdist function assumed all input sequences were aligned if they had the same length and at least one gap.

#### Demos for Phylogenetic Tree Tools Functions

The following demos illustrate the nwalign, seqinsertgaps, dnds, and multialign functions:

#### Phylogenetic Tree Methods

Following is a new method of a phytree object:

• reorder — Reorder leaves of phylogenetic tree.

## R2006b

New Features, Bug Fixes

#### Data Formats and Database Functions

Following is a new function for getting data into the MATLAB environment:

• mzxmlread — Read mzXML file into the MATLAB software as structure.

The following functions were updated:

• celintensityread — Read probe intensities from Affymetrix CEL files (Windows 32). Updated to include a new property, Verbose, which controls the display of a progress report showing the name of each CEL file as it is read.

• fastaread — Read data from FASTA file. Updated to include a new property, Blockread, which controls reading a single entry or block of entries from a file.

• geosoftread — Read Gene Expression Omnibus (GEO) SOFT format data. Updated to read Data Set (GDS) files as well as Sample (GSM) files.

• getblast — BLAST report from NCBI Web site. Updated to include a new property, WaitTilReady, which pauses the MATLAB software and waits a specified time (minutes) for a report from the NCBI Web site.

• scfread — Read trace data from SCF file. Updated to include more output options.

#### Sequence Utilities Functions

Following is a new function for parsing sequence data:

• featuresparse — Parse features from GenBank, GenPept, or EMBL data.

#### Sequence Visualization Functions

The following function was updated:

• seqtool — Open tool to interactively explore biological sequences. Updated to download sequences from the EMBL database, interactively move the viewing frame in the Sequence Viewer by pressing and holding Ctrl while click-dragging, and export an amino acid translation as a FASTA file or to the MATLAB Workspace.

#### Multiple Sequence Alignment Functions

The following function was updated:

• multialignviewer — Open viewer for multiple sequence alignments. Updated to export consensus sequences.

#### Microarray File Formats

The following function was updated:

• celintensityread — Read probe intensities from Affymetrix CEL files (Windows 32). Updated to include a new property, Verbose, which controls the display of a progress report showing the name of each CEL file as it is read.

#### Microarray Data Analysis and Visualization Functions

The following functions were updated:

• clustergram — Create dendrogram and heat map. Updated to include a new property, OptimalLeafOrder, which enables or disables the optimal leaf ordering calculation, which determines the leaf order that maximizes the similarity between neighboring leaves.

• mairplot — Create intensity versus ratio scatter plot for microarray signals. Updated to include a new property, Type, which creates either an IR plot or MA plot, changing the plot axes to log scale, and adding plot interactive features such as displaying gene labels, changing factor lines, normalizing data, and exporting data.

• mapcaplot — Create Principal Component plot of expression profile data. Updated by adding an export feature.

• redgreencmap — Create red and green colormap. Updated to include a new property, Interpolation, which sets the method for color interpolation.

#### Graph Theory Functions

Following are new functions for applying basic graph theory algorithms to sparse matrices:

#### Graph Visualization Methods

Following are new methods for applying basic graph theory algorithms to a biograph object:

• allshortestpaths — Find all shortest paths in biograph object.

• conncomp — Find strongly or weakly connected components in biograph object.

• getmatrix — Get connection matrix from biograph object.

• isdag — Test for cycles in biograph object.

• isomorphism — Find isomorphism between two biograph objects.

• isspantree — Determine if tree created from biograph object is spanning tree.

• maxflow — Calculate maximum flow and minimum cut in biograph object.

• minspantree — Find minimal spanning tree in biograph object.

• shortestpath — Solve shortest path problem in biograph object.

• topoorder — Perform topological sort of directed acyclic graph extracted from biograph object.

• traverse — Traverse biograph object by following adjacent nodes.

#### Phylogenetic Tree Methods

Following is a new method for the phytree object:

• getmatrix — Convert phytree object into a relationship matrix.

## R2006a+

New Features

#### Data Formats and Databases Functions

The following functions are removed:

#### Sequence Utilities Functions

The following function was updated to include five new databases, including refseq_rna, refseq_genomic, env_nt, refseq_protein, and env_nr:

#### Sequence Visualization Functions

Following is a new function for visualizing sequence data:

• featuresmap — Draw linear or circular map of features from GenBank structure.

#### Statistical Learning Functions

The following function was updated to include three new properties, including RBF_Sigma, BoxConstraint, and Autoscale:

• svmtrain — Train support vector machine classifier.

#### Microarray Functions

The following function is supported on the Windows 32 platform only:

• affyread — Read microarray data from Affymetrix GeneChip file (Windows 32).

Following are new functions for preprocessing Affymetrix probe-level microarray data:

• celintensityread — Read probe intensities from Affymetrix CEL files (Windows 32).

• rmabackadj — Perform background adjustment on Affymetrix microarray probe-level data using Robust Multi-array Average (RMA) procedure.

• rmasummary — Calculate gene (probe set) expression values from Affymetrix microarray probe-level data using Robust Multi-array Average (RMA) procedure.

• affyinvarsetnorm — Perform rank invariant set normalization on probe intensities from multiple Affymetrix CEL or DAT files.

Following is a new function for two-color microarray normalization:

• mainvarsetnorm — Perform rank invariant set normalization on gene expression values from two experimental conditions or phenotypes.

Following are new functions for microarray differential expression analysis:

• mattest — Perform two-sample, two-tailed t-test to evaluate differential expression of genes from two experimental conditions or phenotypes.

• mavolcanoplot — Create significance versus gene expression ratio (fold change) scatter plot of microarray data.

#### Demo for Microarray Functions

New demo of the new microarray functions (Analyzing Affymetrix Microarray Gene Expression Data).

## R2006a

No New Features or Changes

## R14SP3+

New Features

#### Multiple Sequence Alignment Viewer

• multialignviewer — Interactively view, explore alignments, and make manual modifications.

#### Microarray Functions for Agilent Software

• magetfield — Utility function to extract data from a microarray.

#### Demo for Gene Ontology Functions

New demo for the new Gene Ontology functions (geneontologydemo) and working with whole genomes (biomemorymapdemo).

## R14SP3

No New Features or Changes

## R14SP2+

New Features

#### Sequence Alignment Functions

• multialign — Align multiple sequences using a progressive method with Distributed Computing Toolbox™ support.

• profalign — Align two profiles using Needleman-Wunsch global alignment.

• showalignment — Updated to show multiply aligned sequences.

• seqpdist — Updated to calculate pairwise distances between observations with Distributed Computing Toolbox support.

#### Sequence Statistics Functions

• codonbias — Calculate codon frequency for each amino acid in a DNA sequence.

• cpgisland — Locate CpG islands in a DNA sequence.

#### Sequence Utilities Functions

• rebasecuts — Find restriction enzymes that cut a protein sequence.

• seqtool — Graphical User Interface (GUI) for single sequence analysis.

#### Phylogenetic Tree Functions

• dnds, dndsml — Estimate synonymous and nonsynonymous substitutions rates.

• seqneighjoin — Reconstruct a phylogenetic tree with a Neighbor-joining method.

#### Phylogenetic Tree Methods

• getcanonical — Calculate the canonical form of a phylogenetic tree.

• getnewwickstr — Create a Newick formatted string.

• reroot — Change the root of a phylogenetic tree.

• subtree — Extract a subtree.

• weights — Calculate weights for a phylogenetic tree.

#### Microarray Functions

probesetplot — Plot values for an Affymetrix CHP file probe set.

#### Statistics Functions

rankfeatures — Renamed function. The previous name was sqtlfeatures.

## R14SP2

New Features

#### Updated RBASE Table

RBASE is the enzyme table that the function restrict uses to locate sequence patterns.

#### Expanded Bioperl Demonstration

Example of calling the MATLAB software from Perl scripts now includes several examples of passing various types of data (both directly and by variant variable) back and forth between Perl and a MATLAB Automation Server. To view the demo, type bioperldemo.