Products & Services Solutions Academia Support User Community Company

Learn more about MATLAB   

textscan - Read formatted data from text file or string

Syntax

C = textscan(fid, 'format')
C = textscan(fid, 'format', N)
C = textscan(fid, 'format', param, value, ...)
C = textscan(fid, 'format', N, param, value, ...)
C = textscan(str, ...)
[C, position] = textscan(...)

Description

C = textscan(fid, 'format') reads data from an open text file identified by file identifier fid into cell array C. The format input is a string of conversion specifiers enclosed in single quotation marks. The number of specifiers determines the number of cells in the cell array C.

C = textscan(fid, 'format', N) reads data from the file, using the format N times, where N is a positive integer. You can read additional data from the file after N cycles by calling textscan again using the original fid.

C = textscan(fid, 'format', param, value, ...) reads data from the file using nondefault parameter settings specified by one or more pairs of param and value arguments. For a list of all valid parameter strings, value descriptions, and defaults, see User Configurable Options.

C = textscan(fid, 'format', N, param, value, ...) reads data from the file, using the format N times, and using nondefault parameter settings specified by pairs of param and value arguments.

C = textscan(str, ...) reads data from string str. You can use the format, N, and parameter/value arguments described above with this syntax. However, for strings, repeated calls to textscan restart the scan from the beginning each time. (See Example 10 — Resuming a Text Scan of a String.)

[C, position] = textscan(...) returns the file or string position at the end of the scan as the second output argument. For a file, this is the value that ftell(fid) would return after calling textscan. For a string, position indicates how many characters textscan read.

Remarks

When textscan reads a specified file or string, it attempts to match the data to the format string. If textscan fails to convert a data field, it stops reading and returns all fields read before the failure.

Basic Conversion Specifiers

The format input is a string of one or more conversion specifiers. The following table lists the basic specifiers.

Field TypeSpecifierDetails
Integer, signed%d
%d8
%d16
%d32
%d64
32-bit
8-bit
16-bit
32-bit
64-bit
Integer, unsigned%u
%u8
%u16
%u32
%u64
32-bit
8-bit
16-bit
32-bit
64-bit
Floating-point number%f
%f32
%f64
%n
64-bit (double)
32-bit (single)
64-bit (double)
64-bit (double)
Character strings%s
%q
%c
String
String, possibly double-quoted
Any single character, including a delimiter
Pattern-matching strings%[...]

Read only characters in the brackets, until the first nonmatching character. Use %[]...] to include ].

Example: %[mus] reads 'summer ' as 'summ'.

%[^...]

Read only characters not in the brackets, until the first matching character. Use %[^]...] to exclude ].

Example: %[^xrg] reads 'summer ' as 'summe'.

For each numeric conversion specifier, textscan returns a K-by-1 MATLAB numeric vector to the output cell array C, where K is the number of times that textscan finds a field matching the specifier. For each string conversion specifier, textscan returns a K-by-1 cell vector of strings. For each character conversion of the form %Nc (see Field Length), textscan returns a K-by-N character array.

Field Length

You can specify the number of characters or digits to read by inserting a number between the percent character (%) and the format specifier. For floating-point numbers (%n, %f, %f32, %f64), you also can specify the number of digits read to the right of the decimal point.

Specifier

Action Taken

%NcRead N characters, including delimiter characters.

Example: %9c reads 'Let's Go!' as 'Let's Go!'.

%Ns
%Nq
%N[...]
%N[^...]
%Nn
%Nd...
%Nu...
%Nf...

Read N characters or digits (counting a decimal point as a digit), or up to the first delimiter, whichever comes first.

Example: %5f32 reads '473.238' as 473.2.

%N.Dn
%N.Df...

Read N digits (counting a decimal point as a digit), or up to the first delimiter, whichever comes first. Return D decimal digits in the output.

Example: %7.2f reads '473.238' as 473.23 .

Skipping Fields or Parts of Fields

The textscan function reads all characters in your file in sequence unless you tell it to ignore a particular field or a portion of a field.

Use the following format specifiers to skip or read portions of fields:

Specifier

Action Taken

%*...

Skip the field. textscan does not create an output cell for any field that it skips.

Example: '%s %*s %s %s %*s %*s %s' (spaces are optional) converts the string
'Blackbird singing in the dead of night' to four output cells with the strings
'Blackbird' 'in' 'the' 'night'

%*n...

Ignore n characters of the field, where n is an integer less than or equal to the number of characters in the field.

Example: %*4s reads 'summer ' as 'er'.

literal

Ignore the specified characters of the field.

Example: Level%u8 reads 'Level1' as 1.

Example: %u8Step reads '2Step' as 2.

The textscan function does not include leading white-space characters in the processing of any data fields. When processing numeric data, textscan also ignores trailing white space.

User Configurable Options

This table shows the valid param-value options and their default values. Parameter names are not case sensitive.

Parameter

Value

Default

BufSize

Maximum string length in bytes

4095
CollectOutput

If true, textscan concatenates consecutive output cells with the same data type into a single array.

0 (false)

CommentStyle

Symbol(s) designating text to ignore.
Specify a single string (such as '%') to ignore characters following the string on the same line. Specify a cell array of two strings (such as {'/*', '*/'}) to ignore characters between the strings.

None

Delimiter

Field delimiter character(s)

White space

EmptyValue

Value to return for empty numeric fields in delimited files

NaN
EndOfLine

End-of-line character

Determined from the file: \n, \r, or \r\n

ExpChars

Exponent characters

'eEdD'
HeaderLines

Number of lines to skip. (Includes the remainder of the current line.)

0

MultipleDelimsAsOne

If true, textscan treats consecutive delimiters as a single delimiter. Only valid if you specify the delimiter option.

0 (false)
ReturnOnError

Determines behavior when textscan fails to read or convert. If true, textscan terminates without an error and returns all fields read. If false, textscan terminates with an error and does not return an output cell array.

1 (true)
TreatAsEmpty

String(s) in the data file to treat as an empty value. Can be a single string or cell array of strings. Only applies to numeric fields.

None

Whitespace

White-space characters

' \b\t'

Field and Row Delimiters

Within each row, the default field delimiter is white space. White space can be any combination of space (' '), backspace ('\b'), or tab ('\t') characters.

If you use the default (white space) field delimiter, textscan interprets repeated white-space characters as a single delimiter. If you specify a nondefault delimiter, textscan interprets repeated delimiter characters as separate delimiters, and returns an empty value to the output cell. (See Example 5 — Specifying Delimiter and Empty Value Conversion and Example 7 — Handling Repeated Delimiters.)

Rows delimiters are end-of-line (EOL) character sequences. The default end-of-line setting depends on the format of your file, and can include a newline character ('\n'), a carriage return ('\r'), or a combination of the two ('\r\n').

For more information, see Example 9 — Using Nondefault Control Characters.

Numeric Fields

textscan converts numeric fields to the specified output type according to MATLAB rules regarding overflow, truncation, and the use of NaN, Inf, and -Inf.

For example, MATLAB represents an integer NaN as zero. If textscan finds an empty field associated with an integer format specifier (such as %d or %u), it returns the empty value as zero and not NaN. (See Example 2 — Reading Different Types of Data and Example 5 — Specifying Delimiter and Empty Value Conversion.)

textscan imports any complex number as a whole into a complex numeric field, converting the real and imaginary parts to the specified numeric type. Valid forms for a complex number are as follows:

Form

Example

±<real>±<imag>i|j5.7-3.1i
±<imag>i|j-7j

Do not include embedded white space in a complex number. textscan interprets embedded white space as a field delimiter.

Examples

Example 1 — Reading a String

Read the following string, truncating each value to one decimal digit. The specifier %*1d tells textscan to skip the remaining digit:

str = '0.41 8.24 3.57 6.24 9.27';

C = textscan(str, '%3.1f %*1d');

textscan returns a 1-by-1 cell array C:

C{1} = [0.4; 8.2; 3.5; 6.2; 9.2]

Example 2 — Reading Different Types of Data

The text file scan1.dat contains data in the following form:

Sally  Level1  12.34  45  1.23e10  inf   NaN   Yes
Joe    Level2  23.54  60  9e19     -inf  0.001 No
Bill   Level3  34.90  12  2e5      10    100   No

Open the file, and read each column with the appropriate conversion specifier:

fid = fopen('scan1.dat');
C = textscan(fid, '%s %s %f32 %d8 %u %f %f %s');
fclose(fid);

textscan returns a 1-by-8 cell array C with the following cells:

C{1} = {'Sally'; 'Joe'; 'Bill'}          class cell
C{2} = {'Level1'; 'Level2'; 'Level3'}    class cell
C{3} = [12.34; 23.54; 34.9]              class single
C{4} = [45; 60; 12]                      class int8
C{5} = [4294967295; 4294967295; 200000]  class uint32
C{6} = [Inf; -Inf; 10]                   class double
C{7} = [NaN; 0.001; 100]                 class double
C{8} = {'Yes'; 'No'; 'No'}               class cell

The first two elements of C{5} are the maximum values for a 32-bit unsigned integer, or intmax('uint32').

Example 3 — Removing a Literal String

Remove the text 'Level' from each field in the second column of the data from Example 2:

fid = fopen('scan1.dat');
C = textscan(fid, '%s Level%u8 %f32 %d8 %u %f %f %s');
fclose(fid);

textscan returns a 1-by-8 cell array, C, with

C{2} = [1; 2; 3]                        class uint8

Example 4 — Reading Only the First Field

Read the first column of the file in Example 2 into a cell array, skipping the rest of the line:

fid = fopen('scan1.dat');
names = textscan(fid, '%s %*[^\n]');
fclose(fid);

textscan returns a 1-by-1 cell array names:

names{1} = {'Sally'; 'Joe'; 'Bill'}

Example 5 — Specifying Delimiter and Empty Value Conversion

The comma-delimited file data.csv contains

1,  2,  3,  4,   ,  6
7,  8,  9,   , 11, 12

Read the file, converting empty cells to -Inf:

fid = fopen('data.csv');
C = textscan(fid, '%f %f %f %f %u32 %f', 'delimiter', ',', ...
             'EmptyValue', -Inf);
fclose(fid);

textscan returns a 1-by-6 cell array C with the following cells:

C{1} = [1; 7]           class double
C{2} = [2; 8]           class double
C{3} = [3; 9]           class double
C{4} = [4; -Inf]        class double (empty converted to -Inf)
C{5} = [0; 11]          class uint32 (empty converted to 0)
C{6} = [6; 12]          class double

textscan converts the empty value in C{4}, associated with a floating-point format, to -Inf. Because MATLAB represents unsigned integer -Inf as 0, textscan converts the empty value in C{5} to 0 and not -Inf.

Example 6 — Using Custom Empty Value Strings and Comments

The comma-delimited file data2.csv contains the lines

abc, 2, NA, 3, 4
// Comment Here
def, na, 5, 6, 7

Designate the input that textscan should treat as comments or empty values:

fid = fopen('data2.csv');
C = textscan(fid, '%s %n %n %n %n', 'delimiter', ',', ...
             'treatAsEmpty', {'NA', 'na'}, ...
             'commentStyle', '//');
fclose(fid);

textscan returns a 1-by-5 cell array C with the following cells:

C{1} = {'abc'; 'def'}
C{2} = [2; NaN]
C{3} = [NaN; 5]
C{4} = [3; 6]
C{5} = [4; 7]

Example 7 — Handling Repeated Delimiters

The file data3.csv contains

1,2,3,,4
5,6,7,,8

To treat the repeated commas as a single delimiter, use the MultipleDelimsAsOne parameter, with a value of 1:

fid = fopen('data3.csv');
C = textscan(fid, '%f %f %f %f', 'delimiter', ',', ...
             'MultipleDelimsAsOne', 1);
fclose(fid);

textscan returns a 1-by-4 cell array C with the following cells:

C{1} = [1; 5]
C{2} = [2; 6]
C{3} = [3; 7]
C{4} = [4; 8]

Example 8 — Using the CollectOutput Switch

The file grades.txt contains

Student_ID  | Test1  | Test2  | Test3
   1           91.5     89.2     77.3
   2           88.0     67.8     91.0
   3           76.3     78.1     92.5
   4           96.4     81.2     84.6

The default value for the CollectOutput switch is 0 (false), and textscan returns each column of the numeric data in a separate array:

fid = fopen('grades.txt');

% read column headers
C_text = textscan(fid, '%s', 4, 'delimiter', '|');

% read numeric data
C_data0 = textscan(fid, '%d %f %f %f')

C_data0 = 
  [4x1 int32]    [4x1 double]    [4x1 double]    [4x1 double]

Set CollectOutput to 1 (true) to collect the consecutive columns of the same class (the test scores, which are all double) into a single array:

frewind(fid);

C_text = textscan(fid, '%s', 4, 'delimiter', '|');

C_data1 = textscan(fid, '%d %f %f %f', ...
                        'CollectOutput', 1)

C_data1 = 
    [4x1 int32]    [4x3 double]

fclose(fid);

Example 9 — Using Nondefault Control Characters

When you specify one of the following escape sequences for any parameter value, textscan converts that sequence to the corresponding control character:

\bBackspace
\nNewline
\rCarriage return
\tTab
\\Backslash (\)

If your data uses a different control character, use the sprintf function to explicitly convert the escape sequence in your call to textscan.

For example, the following string includes a form feed character, \f:

lyric = sprintf('Blackbird\fsinging\fin\fthe\fdead\fof\fnight');

To read the string using textscan, call the sprintf function to explicitly convert the form feed:

C = textscan(lyric, '%s', 'delimiter', sprintf('\f'));

textscan returns a 1-by-1 cell array C:

C{1} = 
    {'Blackbird'; 'singing'; 'in'; 'the'; 'dead'; 'of'; 'night'}

Example 10 — Resuming a Text Scan of a String

If you resume a text scan of a file by calling textscan with the same file identifier (fid), textscan automatically resumes reading at the point where it terminated the last read.

If your input is a string rather than a file, textscan reads from the beginning of the string each time. To resume a scan from any other position in the string, you must use the two-output argument syntax in your initial call to textscan. For example, given the string

lyric = 'Blackbird singing in the dead of night'

Read the first word of the string:

[firstword, pos] = textscan(lyric,'%9c', 1);

Resume the scan:

lastpart = textscan(lyric(pos+1:end), '%s');

See Also

load, type, importdata, uiimport, dlmread, xlsread, fscanf, fread

Importing Large ASCII Data Sets in the MATLAB Data Import and Export documentation

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS