Thread Subject: getgenbank problem

Subject: getgenbank problem

From: Arthur Zheng

Date: 22 Nov, 2009 05:35:03

Message: 1 of 3

I'm trying to download Genomes in Progress from NCBI. The accession number,for example, can be "NZ_ACIR00000000". The code is as simple as below:

seq = getgenbank('NZ_ACIR00000000', 'SequenceOnly', 'true');

I tried a numver of times. However, I always got the error:
*********************************************************************
??? Error using ==> getncbidata>accession2gi at 370
The key NZ_ACIR00000000 has more than one sequence file associated with
it in the nucleotide database.

Error in ==> getncbidata at 179
[giID,db] = accession2gi(accessnum,db,'quick');

Error in ==> getgenbank at 82
    gb =
    getncbidata(accessnum,varargin{:},'database','nucleotide','fileformat','FASTA');
********************************************************************
What's wrong? thanks.

Subject: getgenbank problem

From: Paola Favaretto

Date: 1 Dec, 2009 16:16:06

Message: 2 of 3

Hi Arthur,

Currently GETGENBANK can retrieve only one sequence at a time. The record you are trying to access (NZ_ACIR00000000) is associated with 216 sequences. Therefore, you could do one of the following:

1) You can access the information using the EUtililites. See the demo (ncbieutilsdemo - Accessing NCBI Entrez Database with E-Utilities) that ships with the toolbox for more information on how to use the EUtilities from MATLAB.

2) Alternatively, you will have to retrieve each sequence separately. Because the sequences have consecutive accession numbers starting from NZ_ACIR01000001 up to NZ_ACIR01000216, you can even automate the search by creating the accession number string and then calling getgenbank with that accession. However, there might be restrictions on how many searches of this type can be done at the NCBI site. Using the EUtils is preferable.

I hope this helps.

-Paola

Subject: getgenbank problem

From: Arthur Zheng

Date: 2 Dec, 2009 04:21:01

Message: 3 of 3

"Paola Favaretto" <myname.mylastname@mathworks.com> wrote in message <hf3fg6$o40$1@fred.mathworks.com>...
> Hi Arthur,
>
> Currently GETGENBANK can retrieve only one sequence at a time. The record you are trying to access (NZ_ACIR00000000) is associated with 216 sequences. Therefore, you could do one of the following:
>
> 1) You can access the information using the EUtililites. See the demo (ncbieutilsdemo - Accessing NCBI Entrez Database with E-Utilities) that ships with the toolbox for more information on how to use the EUtilities from MATLAB.
>
> 2) Alternatively, you will have to retrieve each sequence separately. Because the sequences have consecutive accession numbers starting from NZ_ACIR01000001 up to NZ_ACIR01000216, you can even automate the search by creating the accession number string and then calling getgenbank with that accession. However, there might be restrictions on how many searches of this type can be done at the NCBI site. Using the EUtils is preferable.
>
> I hope this helps.
>
> -Paola

Hi Paola,

thanks for your response. I'll try your suggestions.

Hao

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
getgenbank ncbi Arthur Zheng 22 Nov, 2009 00:39:08
rssFeed for this Thread

Contact us at files@mathworks.com