4.79592

4.8 | 49 ratings Rate this file 71 Downloads (last 30 days) File Size: 19.6 KB File ID: #10946

readtext

by

 

02 May 2006 (Updated )

Any text (file) you give it, readtext returns an array of the contents. You can chose delimiter etc.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

Usage: [data, result]= readtext(source, delimiter, comment, quotes, options)
 
Whatever text (file) you give it, readtext returns an array of the contents (or send me a bug report). Matlab can't read variable length lines or variable type values with the standard library. readtext can read any text file. Any string (or even regexp) can be delimiting, default is a comma. Everything after (and including) a comment character, until the line end, is ignored. Quote characters may also be given, everything between them is treated as one item. There are options to control what will be converted to numbers and how empty items are saved.
 
If you find any errors, please let me know: peder at axensten dot se.
 
source: the file to be read. May be a file path or just the file name. OR: The text itself, see 'textsource', below.
 
delimiter: (default: ',') any non-empty string. May be a regexp, but this is slow on large files.
 
comment: (default: '') zero or one character. Anything after (and including) this character, until the end of the line, will be ignored.
 
quotes: (default: '') zero, one (opening quote equals closing), or two characters (opening and closing quote) to be treated as paired braces. Everything between the quotes will be treated as one item. The quotes will remain. Quotes may be nested.
 
options: (default: '') may contain (concatenate combined options):
- 'textsource': source contains the actual text to be processed, not the file name.
- 'textual': no numeric conversion ('data' is a cell array of strings only),
- 'numeric': everything is converted to a number or NaN ('data' is a numeric array, empty items are converted to NaNs unless 'empty2zero' is given),
- 'empty2zero': an empty field is saved as zero, and
- 'empty2NaN': an empty field is saved as NaN.
- 'usewaitbar': call waitbar to report progress. If you find the wait bar annoying, get 'waitbar alternative' at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=11398
 
data: A cell array containing the read text, divided into cells by delimiter and line endings. 'data' will be empty if the file is not found, could not be opened, or is empty. With the option 'numeric', 'data' will be a numeric array, with 'textual', 'data' will be a cell array of strings only, and otherwise it will be a mixed cell array. For Matlab < version 7, returned strings may contain leading white-space.
 
result: a structure:
.min: minimum number of columns found in a line.
.max: number of columns in 'data', before removing empty columns.
.rows: number of rows in 'data', before removing empty rows.
.numberMask: true, if numeric conversion ('NaN' converted to NaN counts).
.number: number of numeric conversions ('NaN' converted to NaN counts).
.emptyMask: true, if empty item in file.
.empty: number of empty items in file.
.stringMask: true, if non-number and non-empty.
.string: number of non-number, non-empty items.
.quote: number of quotes.
 
EXAMPLE 1: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')
This will load the file 'txtfile' into variable a, treating any of tab or comma as delimiters. Everything from and including # to the next newline will be ignored. Everything between two double quotes will be treated as a string. Everything will be converted to numbers and a numeric array returned. Non-numeric items will become NaNs and empty items are converted to zero.
 
EXAMPLE 2: a= readtext('The, actual, text, to, process', ',', '', '', 'textsource')
This will process the actual text string, returning a cell string of the five words.
 
COPYRIGHT (C) Peder Axensten (peder at axensten dot se), 2006-2007.

Acknowledgements

Loadcell.M inspired this file.

This file inspired Extract Numbers Only, Readtext Wrapper, and Ecopathlite: A Matlab Based Implementation Of Ecopath.

MATLAB release MATLAB 6.5.1 (R13SP1)
Other requirements Works in Matlab 6.5.1 (R13SP1) (probably 6.5 too), versions <6.5 will NOT work. Possibly you need strtrim for Matlab 6.x (www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=3228).
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (52)
20 Mar 2014 Nathan Orloff

I love this function. It never fails, never has a problem. It is incredibly obvious to use. Something like this should just be native to matlab.

24 Jan 2014 John Baldauf  
18 Oct 2013 Srinivas

Wow... Thanks. I tried to some CSV files data with various MatLab functions like CSVREAD, DELMREAD, etc. Nothing worked. Your file did the magic. Thank you :-)

10 Jun 2013 Julius

It works smoothly.

27 May 2013 Ray Lee

could u plz submit an accompanying "savetext.m"?

20 Oct 2012 Luis Jose Salazar Serrano

Excellent!!! It really help me with my problem. Read the data from a Thorlabs detector PM100A which has a one line header.

Thanks for such a nice function!!!!

Luis

18 May 2012 SpartanG72

works well
would be good if we could specify range

18 May 2012 SpartanG72  
06 Mar 2012 Reza Farrahi Moghaddam  
03 Feb 2012 Nathan

Thank you so much! I was really surprised that Matlab doesn't have a built-in function that is easy to use. Yours works great though, thanks for making it available.

05 Jan 2012 Tim

Fantastic function, does excatly as it says for a very complex csv including mixed data types within rows and columns of unequal length.

27 Dec 2011 Humayun Kathuria

hmmm... this file helped me...

25 Oct 2011 Rashbeard

excellent and necessary

05 Oct 2011 Eoin  
31 May 2011 Martina Callaghan  
23 Mar 2011 Georges

Great contribution to mathworks community. It saved me a great deal of time.

16 Nov 2010 Roger Parkyn

This is really useful to me and I have used it quite a bit.

I noticed however that comment lines are not fully removed but instead leave a blank line (apart from the very first line which will be fully removed if it is a comment). My work-around is shown below (although it has the downside of removing ALL blank lines - whether commented or not).

Replace the following lines:
if(~isempty(opts.comment)) % Remove comments.
text= regexprep(text, ['^\' opts.comment '[^' eol ']*' eol], '');
text= regexprep(text, [ '\' opts.comment '[^' eol ']*'], '');
end

with these lines:
if(~isempty(opts.comment)) % Remove comments.
text= regexprep(text, ['\' opts.comment '[^' eol ']*'], ''); % Remove commented line endings (but in the case of whole-lines this leaves a blank line!).
text= regexprep(text, [eol '+'], eol); % Remove blank lines, part 1: remove multiple eol instances (2 in a row = a blank line)
text= regexprep(text, ['^' eol '+'], ''); % Remove blank lines, part 2: if there is an eol at the beginning then remove it.
end

01 Nov 2010 noga cohen

Hi,
I get this error message when I use: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')

??? Undefined function or method 'readtext' for input arguments of type 'char'.

what should I do?

22 Jan 2010 Geneva Wilkesanders  
13 Jan 2010 Glenn

This was a fast solution to problems that I was having with mixed data types (strings and Numeric) in a cell array. Thanks

19 Nov 2009 Hastiepen

After spending all day on the forums trying to figure out how to read a text file into a cell array and then convert it into a matrix, I found this which did everything I needed it to in one go - Excellent piece of work, thank you very much!

03 Jul 2009 Debela

excellent code but very slow on reading large datas

16 Apr 2009 B. Andre Weinstock

Excellent solution for when xlsread cannot be called (no Microsoft Excel around) and csvread cannot differentiate between delimiting commas and commas in double quoted text.

04 Mar 2009 m4lte k  
22 Jan 2009 Adam Pilchak

amazing function.

17 Oct 2008 djr djr  
09 Oct 2008 Jacques Labrie

This the kind of features that matlab could be supposed to do right away. Thanks! Mathworks should pay you and put this in the next release! Valuable for students who wish to work/learn with matlab.

06 Oct 2008 Cathy Ruhl

This has been a great help! I was wondering if there is a way to identify a number of headerlines to skip?

22 Aug 2008 Frank Maenka

I was struggling with a .csv file with mixed text/numbers as well as a varying number of delimiters per line - and this worked perfectly - thanks!

17 Jul 2008 Gregoire Margoton

Very useful to get numbers which are always at the same place. You obtain first the matrix with all numeric data of the file and then you take your value in the matrix!
Thank you

26 Jun 2008 C Schwalm

Very slick.

05 May 2008 john maclane

yippikaeyeah m***f***ckr!!!

30 Apr 2008 Tom Rdn

Thank you, this works fine!
But the # of space characters (as delimiter) differs in my file, so i have to use the regexp option, which slows the process :-(. So it would be nice to have an option to merge delimiters or empty2nothing.

24 Apr 2008 Ashish Bhan

worked like a charm

25 Mar 2008 a a

Good - could do with a 'headerlines' option though

21 Jan 2008 Greg Mefford

I spent a good while trying to find something like this. It's crazy how many of them just don't seem to work.

15 Oct 2007 P WANG

Tried readcsv does not work, and yours works right out of the box, and example works great for frist time try. Appreciate your work.

15 Oct 2007 Percy Wang  
14 Aug 2007 Bill Hutchison

90% of my programming time has been consumed by the struggle with text strings in Matlab. Data import formats are never consistent. Even if they claim to be "comma-delimited", dates and other elements may not be well-formed. This function has solved my problem, and I can dispense with my bizarre collection of fixes. Many thanks.

15 Jul 2007 thinh ducthinh  
03 May 2007 Nicholas Wolff

Great routine...thanks for saving me so much time. Mathworks: include this in next release please!

26 Mar 2007 Michael Rubens

Thank you so much. I agree that this should be standard in the matlab function library, it is in mine. Cheers!

10 Jan 2007 Kamal Mannar

Great routine, saved me lot of time reading data files containing both numeric and text data

01 Jan 2007 Kam S

Gem of a routine mate. I wish I would have searched matlab file exchange a year ago. Fixing data read with dlmread etc. was a night mare. I recommend the routine to be added to standard matlab library

18 Dec 2006 Nathan Orloff

Very Nice. You should comment that you can add ['Directory of Choice' '\' 'File Name' and that will work. The point being you dont have to be in the directory to read the file(i.e. the program is not stupid).

30 Nov 2006 Uggo Pinho  
21 Nov 2006 A. L.  
14 Oct 2006 Madhusudhanan Balasubramanian

Hi: Wonderful routine. I was processing a file in windows using a txt file from mac. So I had this problem with line delimiter. Mac uses CR (13) for line break, windows uses CRLF (13 10) for line break. Instead of replacing the CR with CRLF when transferred to windows, the txt had CRCRLF (I heard ftp programs replaces CR with CRLF for the file transfers between mac and windows, perhaps its a problem with the microsoft outlook). So I added an extra "text = strrep(text, [char(13) char(10)], eol);" in line 155.

19 Sep 2006 Dan Pearce  
10 Aug 2006 T G

Hey, you may want to consider revising line 206 to be:

waitbar(0, th, sprintf('Reading ''%s''...', regexprep(fname,'\_','\\_')));

Or maybe handle it some other way (with some other temporary variable, with filter supplied). latex(...) would work, but is in the Symbolic Math Toolkit. Also, you might be able to set(th,...) something to make it not display latex.

The idea is to properly handle filenames with underscores due to complications with the LaTex formatting. (_ is not displayed, but instead subscripts the subsequent character)

Love the script though, thanks.

05 Jul 2006 Mahmoud lotfy

Thank you so much, I did work well to read the ASCII file I have...you saved me alot of time..thanks

04 Jul 2006 Tiga Tiga

Worked for my problem. I had a *.csv file with 1100 colums. With 'readtext' I could convert it into a cell. Thanks,

Tiga

Updates
04 May 2006

Updated the help text. No code changes.

08 May 2006

- Made 'options' case independent.
- Added some fields to 'result' (see 'result:', above).
- Removed result.errmess -- now uses error(errmess).
- Removed result.nan -- was equivalent to result.string.
- A few small bug fixes.

06 Jun 2006

Now works in Matlab 6.5.1 (R13, SP1) (maybe 6.5 too), versions <6.5 will NOT work.

15 Jun 2006

Better error report when file open fails. Somewhat quicker. Recommends 'waitbar alternative'. Ok with Matlab orig. waitbar too, of course.

20 Jul 2006

Close waitbar instead of deleting it, and some other minor waitbar compatibility fixes.

16 Aug 2006

- No more (La)TeX formatting of file names.
- Prefixed waitbar messages with '(readtext)'.

02 Oct 2006

- Better removal of comments. Could leave an empty first row before.
- Added a 'usewaitbar' option.
- Now removes empty last columns and rows.

08 Mar 2007

Version 1.8:
- Fixed a problem when the comment character was a regexp character, such as '*'.
- Fixed a problem when reading files with no data.

Contact us