fastaread fails when the header contains a comma

The attached file includes the first 8 proteins from the fasta file of E.Coli that I downloaded from UniProt. (I had to change the file to csv format because this form does not support .fasta extension. Why?)
When I use fastaread on this code, fastaread returns only 5 proteins. Looking into the 4th protein sequence, you may see that there is a concatenation of headers and sequences.
I realized and verified that when there is a comma in the 3rd part of the header (in this example "system N,N", fastaread adds quotes around the header and concatenates the header with the previous sequence. In the attached example, 3 consequent headers have a comma and are concatenated with their sequences to the 4th sequence.

Answers (1)

Try this one: readFasta.m (attached). Had to create another fasta reader due to the issues you're describing when using fastaread. Invoke readFasta similar to the fastaread command.
S = readFasta('Ecoli_31May18_4313_Top_8.csv')

3 Comments

The patch does the work. A comment and a request: Comment: It would be nice if the header of readFasta would tell that it solves the problem with the comma as well Request: please implement this fix in fastaread, in the next version of Matlab
Hi Moshe, I'm actually just a volunteer, normal Matlab user just like you. You can ask the MATLAB development team to fix this issue and provide the link to this Q&A section so they know what to fix. Here's the website for the report suggestions/bugs:
https://www.mathworks.com/support/bugreports/ => Report a bug => Technical Support: Installation, product help, bugs, suggestions or documentation errors
Thanks for your useful help.
The link you provide is where I started. This is why I assumed that it was Matlab with the answer.

Sign in to comment.

Products

Release

R2018a

Tags

Asked:

on 31 May 2018

Commented:

on 4 Jun 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!