Usage: [data, result]= readtext(source, delimiter, comment, quotes, options)
Whatever text (file) you give it, readtext returns an array of the contents (or send me a bug report). Matlab can't read variable length lines or variable type values with the standard library. readtext can read any text file. Any string (or even regexp) can be delimiting, default is a comma. Everything after (and including) a comment character, until the line end, is ignored. Quote characters may also be given, everything between them is treated as one item. There are options to control what will be converted to numbers and how empty items are saved.
If you find any errors, please let me know: peder at axensten dot se.
source: the file to be read. May be a file path or just the file name. OR: The text itself, see 'textsource', below.
delimiter: (default: ',') any non-empty string. May be a regexp, but this is slow on large files.
comment: (default: '') zero or one character. Anything after (and including) this character, until the end of the line, will be ignored.
quotes: (default: '') zero, one (opening quote equals closing), or two characters (opening and closing quote) to be treated as paired braces. Everything between the quotes will be treated as one item. The quotes will remain. Quotes may be nested.
options: (default: '') may contain (concatenate combined options):
- 'textsource': source contains the actual text to be processed, not the file name.
- 'textual': no numeric conversion ('data' is a cell array of strings only),
- 'numeric': everything is converted to a number or NaN ('data' is a numeric array, empty items are converted to NaNs unless 'empty2zero' is given),
- 'empty2zero': an empty field is saved as zero, and
- 'empty2NaN': an empty field is saved as NaN.
- 'usewaitbar': call waitbar to report progress. If you find the wait bar annoying, get 'waitbar alternative' at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=11398
data: A cell array containing the read text, divided into cells by delimiter and line endings. 'data' will be empty if the file is not found, could not be opened, or is empty. With the option 'numeric', 'data' will be a numeric array, with 'textual', 'data' will be a cell array of strings only, and otherwise it will be a mixed cell array. For Matlab < version 7, returned strings may contain leading white-space.
result: a structure:
.min: minimum number of columns found in a line.
.max: number of columns in 'data', before removing empty columns.
.rows: number of rows in 'data', before removing empty rows.
.numberMask: true, if numeric conversion ('NaN' converted to NaN counts).
.number: number of numeric conversions ('NaN' converted to NaN counts).
.emptyMask: true, if empty item in file.
.empty: number of empty items in file.
.stringMask: true, if non-number and non-empty.
.string: number of non-number, non-empty items.
.quote: number of quotes.
EXAMPLE 1: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')
This will load the file 'txtfile' into variable a, treating any of tab or comma as delimiters. Everything from and including # to the next newline will be ignored. Everything between two double quotes will be treated as a string. Everything will be converted to numbers and a numeric array returned. Non-numeric items will become NaNs and empty items are converted to zero.
EXAMPLE 2: a= readtext('The, actual, text, to, process', ',', '', '', 'textsource')
This will process the actual text string, returning a cell string of the five words.
COPYRIGHT (C) Peder Axensten (peder at axensten dot se), 2006-2007.