MATLAB Digest - July 2005
Integrating MATLAB with Perl and Java – Examples from Bioinformatics
by Brian Madsen
Bioinformaticists require numerous programming and scripting resources to research and process biological data. Many of these utilities are available online through open-source associations such as bioperl.org and biojava.org, while others exist as custom-developed scripts and applications.
Linking MATLAB to the Bioperl and BioJava utility libraries provides a clean and powerful front end to this important shared-development community and enables existing Perl and Java users to access MATLAB tools in the Bioinformatics, Distributed Computing, Image Processing, and Statistics toolboxes.
This article provides examples for working with Bioperl and BioJava in MATLAB, including the following:- Calling Perl modules from MATLAB
- Leveraging Microsoft COM to access MATLAB functions in Perl
- Bringing Java classes into MATLAB
- Analyzing sequence information with MATLAB
- Making calls to BioJava classes
Bioperl and BioJava
As repositories for their respective associations, bioperl.org and biojava.org provide access to extensive collections of open-source Perl and Java tools for processing bioinformatics, genomics, and life science data.
Running the Perl and Java examples in this article requires that you install Perl and Bioperl modules and BioJava binary Java Archive (JAR) files. You can access current release files and complete installation instructions at the Bioperl and BioJava Websites. Please register to download the Perl, Java, and MATLAB code used in these examples.
Calling Perl Modules from MATLAB
To determine the version of Perl on your machine, type !perl -v at the MATLAB command prompt. The examples in this article were written with Perl 5.6.1 on Linux and tested with ActiveState Perl 5.005_03 built for MSWin32-x86-object. While these configurations have been tested for compatibility, other versions of Perl may require changes to the code.
The MATLAB function perl.m executes Perl scripts and optionally returns the results. For example, try running the calendar.pl script in a MATLAB command window.
>> % Switch to the folder with the example files.
>> cd 'D:\Work\'
>> cal2005 = perl('calendar.pl','2005')
cal2005 =
Leveraging Microsoft COM to Access MATLAB Functions in Perl
You need Microsoft Windows to invoke MATLAB commands in Perl scripts. Microsoft Component Object Model (COM) enables software components to communicate in Windows operating system environments. If you have MATLAB locally installed on a Windows-based computer, try running the following Perl script at a DOS prompt.
D:\Work>type MATLAB_from_Perl.pl
#!/usr/bin/perl -w
use Win32::OLE;
use Win32::OLE::Variant;
# Simple perl script to execute commands in MATLAB.
# The name Win32::OLE is misleading; this actually uses COM
# Use existing instance if MATLAB is already running.
eval {$ml = Win32::OLE->GetActiveObject('Matlab.Application')};
die "MATLAB not installed" if $@;
unless (defined $ml) {
$ml = Win32::OLE->new('Matlab.Application')
or die "Oops, cannot start MATLAB";
}
# Execute the function in MATLAB and retrieve the results in a Variant array.
$ml->Execute('magicArray = magic(4)');
$mReal = Variant(VT_ARRAY|VT_R8|VT_BYREF,4,4);
$mImag = Variant(VT_ARRAY|VT_R8|VT_BYREF,4,4);
print "\n>> GetFullMatrix('magicArray', 'base', ",'$mReal, $mImag',")\n";
$ml->GetFullMatrix('magicArray', 'base', $mReal, $mImag);
for ($i = 0; $i < 4; $i++) {
printf "%3d %3d %3d %3d\n", $mReal->Get($i,0), $mReal->Get($i,1),
$mReal->Get($i,2), $mReal->Get($i,3);
}
undef $ml; # close MATLAB if we opened it
D:\Work>perl MATLAB_from_Perl.pl
>> GetFullMatrix('magicArray', 'base', $mReal, $mImag)| 16 | 2 | 3 | 13 |
| 5 | 11 | 10 | 8 |
| 9 | 7 | 6 | 12 |
| 4 | 14 | 15 | 1 |
Bringing Java Classes into MATLAB
Current versions of MATLAB offer an integrated interface to Java functions. To confirm the installed version of Java, enterversion('-java') at the MATLAB command prompt. The process for using Java functions in MATLAB is simple:
- Create your class definitions in
.javafiles. - Use your Java compiler to produce
.classfiles. - Make the class definitions available for use in MATLAB.
You can add Java classes to MATLAB by simply updating the dynamic Java class path with the javaclasspath command (introduced in MATLAB 7). Any classes in the specified directory will then be available to MATLAB.
For example, if you have a D:\Work\javaclasses directory on your computer, run the following command in the MATLAB command window to add that directory to the MATLAB dynamic class path.
>> % Modify this line to point to your java class directory
>> javaclasspath 'D:\Work\javaclasses';
Alternatively, you can point directly to a compressed .jar file containing all of your Java classes and packages.
Using the clear java command will refresh your Java classes on the dynamic path. MATLAB will look for Java files in the static path before looking in the dynamic path.
Calls to MATLAB from Java
Currently, there is no documented interface that supports calling MATLAB directly from Java.
An Example from Bioinformatics
Gleevec is a drug that successfully targets known cancer-causing proteins. Initially approved to treat chronic myelogenous leukemia (CML), Gleevec is also effective for treatment of gastrointestinal stromal tumors (GIST).
Research has identified several gene targets for Gleevec including proto-oncogene tyrosine-protein kinase ABL1 (NP_009297), proto-oncogene tyrosine-protein kinase Kit (NP_000213), and platelet-derived growth factor receptor alpha precursor (NP_006197). (The NP_XXXXXX codes are NCBI Accession numbers that you use to get the sequences.)
>> ABL1 = 'NP_009297';
>> Kit = 'NP_000213';
>> PDGFRA = 'NP_006197';
Analyzing Sequence Information with MATLAB
You can download the target protein sequence information from NCBI using getgenpept.
>> targets(1).Sequence = getgenpept (ABL1, 'SequenceOnly', true);
>> targets(2).Sequence = getgenpept(Kit, 'SequenceOnly', true);
>> targets(3).Sequence = getgenpept(PDGFRA, 'SequenceOnly', true);
It could be interesting to see an alignment of these three sequences as well as the consensus sequence based on that alignment. MATLAB can provide this information.
>> alignedTargets = multialign(targets, 'verbose', true); Branch No 1 Match scr avg: 3.1167 Mismatch scr avg: -0.9807 Profile lengths: 976 1089 Open gap: 15.5833 Gap extend: 0.7792 Tree distance: 0.6037
Aligned sequences:
seq 2 MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLCTDPGFVKWT...
seq 3 MGTSHPAFLVLGCLLTGL---------SLILCQLSLPSILPNENEKVVQLNSSFSLRCFGESEVSWQ...
Branch No 2 Match scr avg: 1.7144 Mismatch scr avg: -0.2434 Profile lengths: 1149 1120 Open gap: 8.5719 Gap extend: 0.4286 Tree distance: 0.8122
Aligned sequences:
seq 1 MGQQPGKVLGDQRRPSLPAL-------HFIKGAGKKESSRHGGPHCNVFVEHEALQRPVASDFEPQG...
seq 2 MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLCTDPGFVKWTFEILDET...
seq 3 MGTSHPAFLVLGCLLTGL---------SLILCQLSLPSILPNENEKVVQLNSSFSLRCFGESEVSWQ...
>> consensusSeq = seqconsensus(alignedTargets);
>> % This sequence can be viewed in a MATLAB GUI with seqtool
>> seqtool(consensusSeq); ![]() |
Figure 1. MATLAB Sequence Viewer. Click on image to see enlarged view. |
>> % And saved as a FASTA file.
>> fastawrite('consensus.fa','Consensus sequence',consensusSeq);
You can use this sequence to BLAST the NCBI database for additional sequences of interest. (The Basic Local Alignment Search Tool or “BLAST” algorithm provides a method for rapid searching of nucleotide and protein databases.)
>> % Send the sequence to NCBI, and return the Report ID
>> RID = blastncbi(consensusSeq,'blastp');
>> % Pass the RID to GETBLAST to parse the report, load it into
>> % a MATLAB structure, and save a copy as a text file. (NCBI may
>> % require a few minutes to process the request.)
>> report = getblast(RID,'TOFILE','Report.txt');
The new variable, report, is a MATLAB structure with the BLAST results from the consensus sequence, and the new file Report.txt is a text copy of the original BLAST report. The results contain sequences from a number of different species. You can now filter for hits from Homo sapiens and parse the Hits.Names field for the GI numbers.
>> gi_results = '';
>> for i = 1:50 % The default number of returned hits is 50
>> org = regexp(report.Hits(i).Name,'\[([\w ]+?)\]','match','once');
>> if strmatch(org,'[Homo sapiens]')
>> gi_results = [gi_results;...
>> {regexp(report.Hits(i).Name,'gi\|(\d+?)\|','match','once')}];
>> end
>> end
>> % Here's the final list of GI's from the report
>> gi_results
gi_results = 'gi|1736333|'
'gi|199784|'
'gi|129894|'
'gi|202930|'
'gi|32140872|'
'gi|2117831|'
'gi|47938802|'
'gi|825686|'
'gi|1817734|'
'gi|125471|'
'gi|61536|'
So far, we have accomplished our task using M-code in MATLAB. As an alternative, we can look at the original consensus sequence in a BioJava GUI.
Making Calls to BioJava Classes
If you have an installation of BioJava, update your Java class path with the file information.
>> javaclasspath('D:\Work\javaclasses\biojava-1.3.1.jar');
>> % Now import the java classes that you will use.
>> import org.biojava.bio.symbol.*
>> import org.biojava.bio.seq.*
>> import org.biojava.bio.gui.sequence.*
>> % Create a Protein Symbolic List object for the peptide sequence.
>> prot = ProteinTools.createProtein(consensusSeq);
>> % This is a Java object, not a MATLAB character array
>> % Now create a SequencePanel object to display the sequence,
>> seqPanel = SequencePanel;
>> % Set the sequence to be displayed,
>> seqPanel.setSequence(prot);
>> % And the length of the sequence to be shown.
>> seqPanel.setRange(RangeLocation(1,prot.length())); A MultiLineRenderer object lets you use several BioJava visualization methods.
>> mlr = MultiLineRenderer;
>> % A SymbolSequenceRenderer is used to display the peptide symbols,
>> mlr.addRenderer(SymbolSequenceRenderer);
>> % And a RulerRenderer displays sequence positions.
>> mlr.addRenderer(RulerRenderer);
>> seqPanel.setRenderer(mlr); Create a MATLAB figure window and add the Java sequence panel using the javacomponent function.
>> F = figure;
>> javacomponent(seqPanel,java.awt.BorderLayout.NORTH,F);
>> % If you resize the figure, the seqPanel object will report back to
>> % the MATLAB Command Window what portion of the sequence is visible.
![]() |
Figure 2. Java Sequence Panel. Click on image to see enlarged view. |
MATLAB and the Bioinformatics Toolbox offer an extensive collection of powerful, open tools for researching and processing sequences and supporting data.

