Usage: [data, result]= readtext(source, delimiter, comment, quotes, options)
Whatever text (file) you give it, readtext returns an array of the contents (or send me a bug report). Matlab can't read variable length lines or variable type values with the standard library. readtext can read any text file. Any string (or even regexp) can be delimiting, default is a comma. Everything after (and including) a comment character, until the line end, is ignored. Quote characters may also be given, everything between them is treated as one item. There are options to control what will be converted to numbers and how empty items are saved.
If you find any errors, please let me know: peder at axensten dot se.
source: the file to be read. May be a file path or just the file name. OR: The text itself, see 'textsource', below.
delimiter: (default: ',') any non-empty string. May be a regexp, but this is slow on large files.
comment: (default: '') zero or one character. Anything after (and including) this character, until the end of the line, will be ignored.
quotes: (default: '') zero, one (opening quote equals closing), or two characters (opening and closing quote) to be treated as paired braces. Everything between the quotes will be treated as one item. The quotes will remain. Quotes may be nested.
options: (default: '') may contain (concatenate combined options):
- 'textsource': source contains the actual text to be processed, not the file name.
- 'textual': no numeric conversion ('data' is a cell array of strings only),
- 'numeric': everything is converted to a number or NaN ('data' is a numeric array, empty items are converted to NaNs unless 'empty2zero' is given),
- 'empty2zero': an empty field is saved as zero, and
- 'empty2NaN': an empty field is saved as NaN.
- 'usewaitbar': call waitbar to report progress. If you find the wait bar annoying, get 'waitbar alternative' at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=11398
data: A cell array containing the read text, divided into cells by delimiter and line endings. 'data' will be empty if the file is not found, could not be opened, or is empty. With the option 'numeric', 'data' will be a numeric array, with 'textual', 'data' will be a cell array of strings only, and otherwise it will be a mixed cell array. For Matlab < version 7, returned strings may contain leading white-space.
result: a structure:
.min: minimum number of columns found in a line.
.max: number of columns in 'data', before removing empty columns.
.rows: number of rows in 'data', before removing empty rows.
.numberMask: true, if numeric conversion ('NaN' converted to NaN counts).
.number: number of numeric conversions ('NaN' converted to NaN counts).
.emptyMask: true, if empty item in file.
.empty: number of empty items in file.
.stringMask: true, if non-number and non-empty.
.string: number of non-number, non-empty items.
.quote: number of quotes.
EXAMPLE 1: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')
This will load the file 'txtfile' into variable a, treating any of tab or comma as delimiters. Everything from and including # to the next newline will be ignored. Everything between two double quotes will be treated as a string. Everything will be converted to numbers and a numeric array returned. Non-numeric items will become NaNs and empty items are converted to zero.
EXAMPLE 2: a= readtext('The, actual, text, to, process', ',', '', '', 'textsource')
This will process the actual text string, returning a cell string of the five words.
COPYRIGHT (C) Peder Axensten (peder at axensten dot se), 2006-2007.
This is beautiful. How can this not be standard in matlab. Thank you.
Excellent, thank you! XLSread failed on me with a very large csv file. readtext worked without problems.
I love this function. It never fails, never has a problem. It is incredibly obvious to use. Something like this should just be native to matlab.
Wow... Thanks. I tried to some CSV files data with various MatLab functions like CSVREAD, DELMREAD, etc. Nothing worked. Your file did the magic. Thank you :-)
It works smoothly.
could u plz submit an accompanying "savetext.m"?
Excellent!!! It really help me with my problem. Read the data from a Thorlabs detector PM100A which has a one line header.
Thanks for such a nice function!!!!
would be good if we could specify range
Thank you so much! I was really surprised that Matlab doesn't have a built-in function that is easy to use. Yours works great though, thanks for making it available.
Fantastic function, does excatly as it says for a very complex csv including mixed data types within rows and columns of unequal length.
hmmm... this file helped me...
excellent and necessary
Great contribution to mathworks community. It saved me a great deal of time.
This is really useful to me and I have used it quite a bit.
I noticed however that comment lines are not fully removed but instead leave a blank line (apart from the very first line which will be fully removed if it is a comment). My work-around is shown below (although it has the downside of removing ALL blank lines - whether commented or not).
Replace the following lines:
if(~isempty(opts.comment)) % Remove comments.
text= regexprep(text, ['^\' opts.comment '[^' eol ']*' eol], '');
text= regexprep(text, [ '\' opts.comment '[^' eol ']*'], '');
with these lines:
if(~isempty(opts.comment)) % Remove comments.
text= regexprep(text, ['\' opts.comment '[^' eol ']*'], ''); % Remove commented line endings (but in the case of whole-lines this leaves a blank line!).
text= regexprep(text, [eol '+'], eol); % Remove blank lines, part 1: remove multiple eol instances (2 in a row = a blank line)
text= regexprep(text, ['^' eol '+'], ''); % Remove blank lines, part 2: if there is an eol at the beginning then remove it.
I get this error message when I use: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')
??? Undefined function or method 'readtext' for input arguments of type 'char'.
what should I do?
This was a fast solution to problems that I was having with mixed data types (strings and Numeric) in a cell array. Thanks
After spending all day on the forums trying to figure out how to read a text file into a cell array and then convert it into a matrix, I found this which did everything I needed it to in one go - Excellent piece of work, thank you very much!
excellent code but very slow on reading large datas
Excellent solution for when xlsread cannot be called (no Microsoft Excel around) and csvread cannot differentiate between delimiting commas and commas in double quoted text.
This the kind of features that matlab could be supposed to do right away. Thanks! Mathworks should pay you and put this in the next release! Valuable for students who wish to work/learn with matlab.
This has been a great help! I was wondering if there is a way to identify a number of headerlines to skip?
I was struggling with a .csv file with mixed text/numbers as well as a varying number of delimiters per line - and this worked perfectly - thanks!
Very useful to get numbers which are always at the same place. You obtain first the matrix with all numeric data of the file and then you take your value in the matrix!
Thank you, this works fine!
But the # of space characters (as delimiter) differs in my file, so i have to use the regexp option, which slows the process :-(. So it would be nice to have an option to merge delimiters or empty2nothing.
worked like a charm
Good - could do with a 'headerlines' option though
I spent a good while trying to find something like this. It's crazy how many of them just don't seem to work.
Tried readcsv does not work, and yours works right out of the box, and example works great for frist time try. Appreciate your work.
90% of my programming time has been consumed by the struggle with text strings in Matlab. Data import formats are never consistent. Even if they claim to be "comma-delimited", dates and other elements may not be well-formed. This function has solved my problem, and I can dispense with my bizarre collection of fixes. Many thanks.
Great routine...thanks for saving me so much time. Mathworks: include this in next release please!
Thank you so much. I agree that this should be standard in the matlab function library, it is in mine. Cheers!
Great routine, saved me lot of time reading data files containing both numeric and text data
Gem of a routine mate. I wish I would have searched matlab file exchange a year ago. Fixing data read with dlmread etc. was a night mare. I recommend the routine to be added to standard matlab library
Very Nice. You should comment that you can add ['Directory of Choice' '\' 'File Name' and that will work. The point being you dont have to be in the directory to read the file(i.e. the program is not stupid).
Hi: Wonderful routine. I was processing a file in windows using a txt file from mac. So I had this problem with line delimiter. Mac uses CR (13) for line break, windows uses CRLF (13 10) for line break. Instead of replacing the CR with CRLF when transferred to windows, the txt had CRCRLF (I heard ftp programs replaces CR with CRLF for the file transfers between mac and windows, perhaps its a problem with the microsoft outlook). So I added an extra "text = strrep(text, [char(13) char(10)], eol);" in line 155.
Hey, you may want to consider revising line 206 to be:
waitbar(0, th, sprintf('Reading ''%s''...', regexprep(fname,'\_','\\_')));
Or maybe handle it some other way (with some other temporary variable, with filter supplied). latex(...) would work, but is in the Symbolic Math Toolkit. Also, you might be able to set(th,...) something to make it not display latex.
The idea is to properly handle filenames with underscores due to complications with the LaTex formatting. (_ is not displayed, but instead subscripts the subsequent character)
Love the script though, thanks.
Thank you so much, I did work well to read the ASCII file I have...you saved me alot of time..thanks
Worked for my problem. I had a *.csv file with 1100 colums. With 'readtext' I could convert it into a cell. Thanks,
- Now handles a string directly.
- Better removal of comments. Could leave an empty first row before.
- No more (La)TeX formatting of file names.
Close waitbar instead of deleting it, and some other minor waitbar compatibility fixes.
Better error report when file open fails. Somewhat quicker. Recommends 'waitbar alternative'. Ok with Matlab orig. waitbar too, of course.
Now works in Matlab 6.5.1 (R13, SP1) (maybe 6.5 too), versions <6.5 will NOT work.
- Made 'options' case independent.
Updated the help text. No code changes.
Inspired by: loadcell.m
Create scripts with code, output, and formatted text in a single executable document.