How does READTABLE deal with strings?

81 views (last 30 days)
With the following files I notice different behaviour of "readtable" in different MATLAB releases:
% File 1: readtable_test1.txt
id another_string age
100 10
101 grr" 1
102 "grr2" 2
% File 2: readtable_test2.txt
id another_string age
100 10
101 grr" 1
102 "grr2" 2
103 "grr3 3
% File 3: readtable_test3.txt
id another_string age
100 10
101 grr" 1
102 "grr2" 2
103 "grr3 3
104 grr"4 4
When running:
readtable('readtable_test1.txt','Delimiter','\t')
R2014a and R2014b both produce the same output.
With:
readtable('readtable_test2.txt','Delimiter','\t')
R2014a reads the table without issues, considering the double quotes (") to be part of the field. 2014b most likely considers a " to be a delimiter, and produces an error due to failure to close the quote.
And for:
readtable('readtable_test3.txt','Delimiter','\t')
R2014a and R2014b both run successfully, but produce difference output. R2014a considers all double quotes to be part of the string fields defined by tab delimiters, and reads the table without trouble. R2014b incorporates multiple table rows, despite tab delimiters, into a single table entry.
This behavior appears to be inconsistent and I would like to understand why.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 14 Oct 2015
There were a number of inconsistencies with the behavior of %q in previous releases. As of R2014b %q works exactly as documented. 
According to the documentation READTABLE uses TEXTSCAN to import from textfiles with the format specifier '%q'.
Reading with %q has the following behavior:
If the first character in the string is a double-quote character, TEXTSCAN includes all the characters (including delimiters and end-of-line) until it reaches a second, lone double-quote character. Then it reads all additional characters until it finds another delimiter or end-of-line.
In the output string, the first and second double-quotes are removed; any paired double-quotes ( "" ) are collapsed into a lone double-quote character.
If the string does not start with a double-quote character, %q acts like %s and reads all characters until reaching the next delimiter or end-of-line.
Consider the third file:
In R2014b, grr" is read as a string that contains a double quote, "grr2" is read as a quoted string, and "grr3 is read as the beginning of a quoted string that does not end until the grr" on the next line. It has to be that way, otherwise there would be no way to read quoted strings that contain delimiters.
Without this behavior, quoted-strings containing delimiters and end-of-line characters would not be parsed correctly. This is the purpose of quoted strings; that is to enclose string data which may include any character even delimiters.
To work around this behavior in READTABLE, you would need to include the parameter 'Format' and indicate string fields as '%s'. However, this will not strip quotes from the output strings.
The correct format for above example would be: '%f %s %f' as in:
readtable('readtable_test3.txt','Delimiter','\t', 'Format', '%f %s %f')

More Answers (0)

Tags

No tags entered yet.

Products


Release

R2014b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!