File Exchange

image thumbnail

readtext

version 1.0 (7.18 KB) by

Any text (file) you give it, readtext returns an array of the contents. You can chose delimiter etc.

4.80392
51 Ratings

25 Downloads

Updated

View License

Editor's Note: This file was selected as MATLAB Central Pick of the Week

Usage: [data, result]= readtext(source, delimiter, comment, quotes, options)
 
Whatever text (file) you give it, readtext returns an array of the contents (or send me a bug report). Matlab can't read variable length lines or variable type values with the standard library. readtext can read any text file. Any string (or even regexp) can be delimiting, default is a comma. Everything after (and including) a comment character, until the line end, is ignored. Quote characters may also be given, everything between them is treated as one item. There are options to control what will be converted to numbers and how empty items are saved.
 
If you find any errors, please let me know: peder at axensten dot se.
 
source: the file to be read. May be a file path or just the file name. OR: The text itself, see 'textsource', below.
 
delimiter: (default: ',') any non-empty string. May be a regexp, but this is slow on large files.
 
comment: (default: '') zero or one character. Anything after (and including) this character, until the end of the line, will be ignored.
 
quotes: (default: '') zero, one (opening quote equals closing), or two characters (opening and closing quote) to be treated as paired braces. Everything between the quotes will be treated as one item. The quotes will remain. Quotes may be nested.
 
options: (default: '') may contain (concatenate combined options):
- 'textsource': source contains the actual text to be processed, not the file name.
- 'textual': no numeric conversion ('data' is a cell array of strings only),
- 'numeric': everything is converted to a number or NaN ('data' is a numeric array, empty items are converted to NaNs unless 'empty2zero' is given),
- 'empty2zero': an empty field is saved as zero, and
- 'empty2NaN': an empty field is saved as NaN.
- 'usewaitbar': call waitbar to report progress. If you find the wait bar annoying, get 'waitbar alternative' at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=11398
 
data: A cell array containing the read text, divided into cells by delimiter and line endings. 'data' will be empty if the file is not found, could not be opened, or is empty. With the option 'numeric', 'data' will be a numeric array, with 'textual', 'data' will be a cell array of strings only, and otherwise it will be a mixed cell array. For Matlab < version 7, returned strings may contain leading white-space.
 
result: a structure:
.min: minimum number of columns found in a line.
.max: number of columns in 'data', before removing empty columns.
.rows: number of rows in 'data', before removing empty rows.
.numberMask: true, if numeric conversion ('NaN' converted to NaN counts).
.number: number of numeric conversions ('NaN' converted to NaN counts).
.emptyMask: true, if empty item in file.
.empty: number of empty items in file.
.stringMask: true, if non-number and non-empty.
.string: number of non-number, non-empty items.
.quote: number of quotes.
 
EXAMPLE 1: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')
This will load the file 'txtfile' into variable a, treating any of tab or comma as delimiters. Everything from and including # to the next newline will be ignored. Everything between two double quotes will be treated as a string. Everything will be converted to numbers and a numeric array returned. Non-numeric items will become NaNs and empty items are converted to zero.
 
EXAMPLE 2: a= readtext('The, actual, text, to, process', ',', '', '', 'textsource')
This will process the actual text string, returning a cell string of the five words.
 
COPYRIGHT (C) Peder Axensten (peder at axensten dot se), 2006-2007.

Comments and Ratings (54)

maximus

This is beautiful. How can this not be standard in matlab. Thank you.

Harm

Harm (view profile)

Excellent, thank you! XLSread failed on me with a very large csv file. readtext worked without problems.

Nathan Orloff

I love this function. It never fails, never has a problem. It is incredibly obvious to use. Something like this should just be native to matlab.

John Baldauf

Srinivas

Wow... Thanks. I tried to some CSV files data with various MatLab functions like CSVREAD, DELMREAD, etc. Nothing worked. Your file did the magic. Thank you :-)

Julius

Julius (view profile)

It works smoothly.

Ray Lee

could u plz submit an accompanying "savetext.m"?

Excellent!!! It really help me with my problem. Read the data from a Thorlabs detector PM100A which has a one line header.

Thanks for such a nice function!!!!

Luis

SpartanG72

works well
would be good if we could specify range

SpartanG72

Nathan

Nathan (view profile)

Thank you so much! I was really surprised that Matlab doesn't have a built-in function that is easy to use. Yours works great though, thanks for making it available.

Tim

Tim (view profile)

Fantastic function, does excatly as it says for a very complex csv including mixed data types within rows and columns of unequal length.

hmmm... this file helped me...

Rashbeard

excellent and necessary

Eoin

Eoin (view profile)

Martina Callaghan

Georges

Great contribution to mathworks community. It saved me a great deal of time.

Roger Parkyn

This is really useful to me and I have used it quite a bit.

I noticed however that comment lines are not fully removed but instead leave a blank line (apart from the very first line which will be fully removed if it is a comment). My work-around is shown below (although it has the downside of removing ALL blank lines - whether commented or not).

Replace the following lines:
if(~isempty(opts.comment)) % Remove comments.
  text= regexprep(text, ['^\' opts.comment '[^' eol ']*' eol], '');
  text= regexprep(text, [ '\' opts.comment '[^' eol ']*'], '');
end

with these lines:
if(~isempty(opts.comment)) % Remove comments.
  text= regexprep(text, ['\' opts.comment '[^' eol ']*'], ''); % Remove commented line endings (but in the case of whole-lines this leaves a blank line!).
  text= regexprep(text, [eol '+'], eol); % Remove blank lines, part 1: remove multiple eol instances (2 in a row = a blank line)
  text= regexprep(text, ['^' eol '+'], ''); % Remove blank lines, part 2: if there is an eol at the beginning then remove it.
end

noga cohen

Hi,
I get this error message when I use: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')

??? Undefined function or method 'readtext' for input arguments of type 'char'.

what should I do?

Glenn

Glenn (view profile)

This was a fast solution to problems that I was having with mixed data types (strings and Numeric) in a cell array. Thanks

Hastiepen

After spending all day on the forums trying to figure out how to read a text file into a cell array and then convert it into a matrix, I found this which did everything I needed it to in one go - Excellent piece of work, thank you very much!

Debela

Debela (view profile)

excellent code but very slow on reading large datas

Excellent solution for when xlsread cannot be called (no Microsoft Excel around) and csvread cannot differentiate between delimiting commas and commas in double quoted text.

m4lte k

Adam Pilchak

amazing function.

djr djr

Jacques Labrie

This the kind of features that matlab could be supposed to do right away. Thanks! Mathworks should pay you and put this in the next release! Valuable for students who wish to work/learn with matlab.

Cathy Ruhl

This has been a great help! I was wondering if there is a way to identify a number of headerlines to skip?

Frank Maenka

I was struggling with a .csv file with mixed text/numbers as well as a varying number of delimiters per line - and this worked perfectly - thanks!

Gregoire Margoton

Very useful to get numbers which are always at the same place. You obtain first the matrix with all numeric data of the file and then you take your value in the matrix!
Thank you

C Schwalm

Very slick.

john maclane

yippikaeyeah m***f***ckr!!!

Tom Rdn

Thank you, this works fine!
But the # of space characters (as delimiter) differs in my file, so i have to use the regexp option, which slows the process :-(. So it would be nice to have an option to merge delimiters or empty2nothing.

Ashish Bhan

worked like a charm

a a

Good - could do with a 'headerlines' option though

Greg Mefford

I spent a good while trying to find something like this. It's crazy how many of them just don't seem to work.

P WANG

Tried readcsv does not work, and yours works right out of the box, and example works great for frist time try. Appreciate your work.

Percy Wang

Bill Hutchison

90% of my programming time has been consumed by the struggle with text strings in Matlab. Data import formats are never consistent. Even if they claim to be "comma-delimited", dates and other elements may not be well-formed. This function has solved my problem, and I can dispense with my bizarre collection of fixes. Many thanks.

thinh ducthinh

Nicholas Wolff

Great routine...thanks for saving me so much time. Mathworks: include this in next release please!

Michael Rubens

Thank you so much. I agree that this should be standard in the matlab function library, it is in mine. Cheers!

Kamal Mannar

Great routine, saved me lot of time reading data files containing both numeric and text data

Kam S

Gem of a routine mate. I wish I would have searched matlab file exchange a year ago. Fixing data read with dlmread etc. was a night mare. I recommend the routine to be added to standard matlab library

Nathan Orloff

Very Nice. You should comment that you can add ['Directory of Choice' '\' 'File Name' and that will work. The point being you dont have to be in the directory to read the file(i.e. the program is not stupid).

Uggo Pinho

A. L.

Madhusudhanan Balasubramanian

Hi: Wonderful routine. I was processing a file in windows using a txt file from mac. So I had this problem with line delimiter. Mac uses CR (13) for line break, windows uses CRLF (13 10) for line break. Instead of replacing the CR with CRLF when transferred to windows, the txt had CRCRLF (I heard ftp programs replaces CR with CRLF for the file transfers between mac and windows, perhaps its a problem with the microsoft outlook). So I added an extra "text = strrep(text, [char(13) char(10)], eol);" in line 155.

Dan Pearce

T G

Hey, you may want to consider revising line 206 to be:

waitbar(0, th, sprintf('Reading ''%s''...', regexprep(fname,'\_','\\_')));

Or maybe handle it some other way (with some other temporary variable, with filter supplied). latex(...) would work, but is in the Symbolic Math Toolkit. Also, you might be able to set(th,...) something to make it not display latex.

The idea is to properly handle filenames with underscores due to complications with the LaTex formatting. (_ is not displayed, but instead subscripts the subsequent character)

Love the script though, thanks.

Mahmoud lotfy

Thank you so much, I did work well to read the ASCII file I have...you saved me alot of time..thanks

Tiga Tiga

Worked for my problem. I had a *.csv file with 1100 colums. With 'readtext' I could convert it into a cell. Thanks,

Tiga

Updates

Version 1.8:
- Fixed a problem when the comment character was a regexp character, such as '*'.
- Fixed a problem when reading files with no data.

- Better removal of comments. Could leave an empty first row before.
- Added a 'usewaitbar' option.
- Now removes empty last columns and rows.

- No more (La)TeX formatting of file names.
- Prefixed waitbar messages with '(readtext)'.

Close waitbar instead of deleting it, and some other minor waitbar compatibility fixes.

Better error report when file open fails. Somewhat quicker. Recommends 'waitbar alternative'. Ok with Matlab orig. waitbar too, of course.

Now works in Matlab 6.5.1 (R13, SP1) (maybe 6.5 too), versions <6.5 will NOT work.

- Made 'options' case independent.
- Added some fields to 'result' (see 'result:', above).
- Removed result.errmess -- now uses error(errmess).
- Removed result.nan -- was equivalent to result.string.
- A few small bug fixes.

Updated the help text. No code changes.

MATLAB Release
MATLAB 6.5.1 (R13SP1)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video