fastaread fails when the header contains a comma
Show older comments
The attached file includes the first 8 proteins from the fasta file of E.Coli that I downloaded from UniProt. (I had to change the file to csv format because this form does not support .fasta extension. Why?)
When I use fastaread on this code, fastaread returns only 5 proteins. Looking into the 4th protein sequence, you may see that there is a concatenation of headers and sequences.
I realized and verified that when there is a comma in the 3rd part of the header (in this example "system N,N", fastaread adds quotes around the header and concatenates the header with the previous sequence. In the attached example, 3 consequent headers have a comma and are concatenated with their sequences to the 4th sequence.
Answers (1)
OCDER
on 31 May 2018
Try this one: readFasta.m (attached). Had to create another fasta reader due to the issues you're describing when using fastaread. Invoke readFasta similar to the fastaread command.
S = readFasta('Ecoli_31May18_4313_Top_8.csv')
3 Comments
Moshe Tsvi Rupp
on 3 Jun 2018
OCDER
on 4 Jun 2018
Hi Moshe, I'm actually just a volunteer, normal Matlab user just like you. You can ask the MATLAB development team to fix this issue and provide the link to this Q&A section so they know what to fix. Here's the website for the report suggestions/bugs:
https://www.mathworks.com/support/bugreports/ => Report a bug => Technical Support: Installation, product help, bugs, suggestions or documentation errors
Moshe Tsvi Rupp
on 4 Jun 2018
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!