4.71875

4.7 | 32 ratings Rate this file 233 downloads (last 30 days) File Size: 19.58 KB File ID: #10946

readtext

by Peder Axensten

 

02 May 2006 (Updated 01 Apr 2007)

Code covered by BSD License  

Any text (file) you give it, readtext returns an array of the contents. You can chose delimiter etc.

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

Download Now | Watch this File

File Information
Description

Usage: [data, result]= readtext(source, delimiter, comment, quotes, options)
 
Whatever text (file) you give it, readtext returns an array of the contents (or send me a bug report). Matlab can't read variable length lines or variable type values with the standard library. readtext can read any text file. Any string (or even regexp) can be delimiting, default is a comma. Everything after (and including) a comment character, until the line end, is ignored. Quote characters may also be given, everything between them is treated as one item. There are options to control what will be converted to numbers and how empty items are saved.
 
If you find any errors, please let me know: peder at axensten dot se.
 
source: the file to be read. May be a file path or just the file name. OR: The text itself, see 'textsource', below.
 
delimiter: (default: ',') any non-empty string. May be a regexp, but this is slow on large files.
 
comment: (default: '') zero or one character. Anything after (and including) this character, until the end of the line, will be ignored.
 
quotes: (default: '') zero, one (opening quote equals closing), or two characters (opening and closing quote) to be treated as paired braces. Everything between the quotes will be treated as one item. The quotes will remain. Quotes may be nested.
 
options: (default: '') may contain (concatenate combined options):
- 'textsource': source contains the actual text to be processed, not the file name.
- 'textual': no numeric conversion ('data' is a cell array of strings only),
- 'numeric': everything is converted to a number or NaN ('data' is a numeric array, empty items are converted to NaNs unless 'empty2zero' is given),
- 'empty2zero': an empty field is saved as zero, and
- 'empty2NaN': an empty field is saved as NaN.
- 'usewaitbar': call waitbar to report progress. If you find the wait bar annoying, get 'waitbar alternative' at http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=11398
 
data: A cell array containing the read text, divided into cells by delimiter and line endings. 'data' will be empty if the file is not found, could not be opened, or is empty. With the option 'numeric', 'data' will be a numeric array, with 'textual', 'data' will be a cell array of strings only, and otherwise it will be a mixed cell array. For Matlab < version 7, returned strings may contain leading white-space.
 
result: a structure:
.min: minimum number of columns found in a line.
.max: number of columns in 'data', before removing empty columns.
.rows: number of rows in 'data', before removing empty rows.
.numberMask: true, if numeric conversion ('NaN' converted to NaN counts).
.number: number of numeric conversions ('NaN' converted to NaN counts).
.emptyMask: true, if empty item in file.
.empty: number of empty items in file.
.stringMask: true, if non-number and non-empty.
.string: number of non-number, non-empty items.
.quote: number of quotes.
 
EXAMPLE 1: [a,b]= readtext('txtfile', '[,\t]', '#', '"', 'numeric-empty2zero')
This will load the file 'txtfile' into variable a, treating any of tab or comma as delimiters. Everything from and including # to the next newline will be ignored. Everything between two double quotes will be treated as a string. Everything will be converted to numbers and a numeric array returned. Non-numeric items will become NaNs and empty items are converted to zero.
 
EXAMPLE 2: a= readtext('The, actual, text, to, process', ',', '', '', 'textsource')
This will process the actual text string, returning a cell string of the five words.
 
COPYRIGHT (C) Peder Axensten (peder at axensten dot se), 2006-2007.

Acknowledgements

The author wishes to acknowledge the following in the creation of this submission:
loadcell.m
This submission has inspired the following:
Extract numbers only

MATLAB release MATLAB 6.5.1 (R13SP1)
Other requirements Works in Matlab 6.5.1 (R13SP1) (probably 6.5 too), versions <6.5 will NOT work. Possibly you need strtrim for Matlab 6.x (www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=3228).
Zip File Content  
Other Files license.txt,
readtext.m
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (32)
04 Jul 2006 Tiga Tiga

Worked for my problem. I had a *.csv file with 1100 colums. With 'readtext' I could convert it into a cell. Thanks,

Tiga

05 Jul 2006 Mahmoud lotfy

Thank you so much, I did work well to read the ASCII file I have...you saved me alot of time..thanks

10 Aug 2006 T G

Hey, you may want to consider revising line 206 to be:

waitbar(0, th, sprintf('Reading ''%s''...', regexprep(fname,'\_','\\_')));

Or maybe handle it some other way (with some other temporary variable, with filter supplied). latex(...) would work, but is in the Symbolic Math Toolkit. Also, you might be able to set(th,...) something to make it not display latex.

The idea is to properly handle filenames with underscores due to complications with the LaTex formatting. (_ is not displayed, but instead subscripts the subsequent character)

Love the script though, thanks.

19 Sep 2006 Dan Pearce  
14 Oct 2006 Madhusudhanan Balasubramanian

Hi: Wonderful routine. I was processing a file in windows using a txt file from mac. So I had this problem with line delimiter. Mac uses CR (13) for line break, windows uses CRLF (13 10) for line break. Instead of replacing the CR with CRLF when transferred to windows, the txt had CRCRLF (I heard ftp programs replaces CR with CRLF for the file transfers between mac and windows, perhaps its a problem with the microsoft outlook). So I added an extra "text = strrep(text, [char(13) char(10)], eol);" in line 155.

21 Nov 2006 A. L.  
30 Nov 2006 Uggo Pinho  
18 Dec 2006 Nathan Orloff

Very Nice. You should comment that you can add ['Directory of Choice' '\' 'File Name' and that will work. The point being you dont have to be in the directory to read the file(i.e. the program is not stupid).

01 Jan 2007 Kam S

Gem of a routine mate. I wish I would have searched matlab file exchange a year ago. Fixing data read with dlmread etc. was a night mare. I recommend the routine to be added to standard matlab library

10 Jan 2007 Kamal Mannar

Great routine, saved me lot of time reading data files containing both numeric and text data

26 Mar 2007 Michael Rubens

Thank you so much. I agree that this should be standard in the matlab function library, it is in mine. Cheers!

03 May 2007 Nicholas Wolff

Great routine...thanks for saving me so much time. Mathworks: include this in next release please!

15 Jul 2007 thinh ducthinh  
14 Aug 2007 Bill Hutchison

90% of my programming time has been consumed by the struggle with text strings in Matlab. Data import formats are never consistent. Even if they claim to be "comma-delimited", dates and other elements may not be well-formed. This function has solved my problem, and I can dispense with my bizarre collection of fixes. Many thanks.

15 Oct 2007 Percy Wang  
15 Oct 2007 P WANG

Tried readcsv does not work, and yours works right out of the box, and example works great for frist time try. Appreciate your work.

21 Jan 2008 Greg Mefford

I spent a good while trying to find something like this. It's crazy how many of them just don't seem to work.

25 Mar 2008 a a

Good - could do with a 'headerlines' option though

24 Apr 2008 Ashish Bhan

worked like a charm

30 Apr 2008 Tom Rdn

Thank you, this works fine!
But the # of space characters (as delimiter) differs in my file, so i have to use the regexp option, which slows the process :-(. So it would be nice to have an option to merge delimiters or empty2nothing.

05 May 2008 john maclane

yippikaeyeah m***f***ckr!!!

26 Jun 2008 C Schwalm

Very slick.

17 Jul 2008 Gregoire Margoton

Very useful to get numbers which are always at the same place. You obtain first the matrix with all numeric data of the file and then you take your value in the matrix!
Thank you

22 Aug 2008 Frank Maenka

I was struggling with a .csv file with mixed text/numbers as well as a varying number of delimiters per line - and this worked perfectly - thanks!

06 Oct 2008 Cathy Ruhl

This has been a great help! I was wondering if there is a way to identify a number of headerlines to skip?

09 Oct 2008 Jacques Labrie

This the kind of features that matlab could be supposed to do right away. Thanks! Mathworks should pay you and put this in the next release! Valuable for students who wish to work/learn with matlab.

17 Oct 2008 djr djr  
22 Jan 2009 Adam Pilchak

amazing function.

04 Mar 2009 m4lte k  
16 Apr 2009 B. Andre Weinstock

Excellent solution for when xlsread cannot be called (no Microsoft Excel around) and csvread cannot differentiate between delimiting commas and commas in double quoted text.

03 Jul 2009 Debela

excellent code but very slow on reading large datas

19 Nov 2009 Hastiepen

After spending all day on the forums trying to figure out how to read a text file into a cell array and then convert it into a matrix, I found this which did everything I needed it to in one go - Excellent piece of work, thank you very much!

Please login to add a comment or rating.
Updates
04 May 2006

Updated the help text. No code changes.

08 May 2006

- Made 'options' case independent.
- Added some fields to 'result' (see 'result:', above).
- Removed result.errmess -- now uses error(errmess).
- Removed result.nan -- was equivalent to result.string.
- A few small bug fixes.

06 Jun 2006

Now works in Matlab 6.5.1 (R13, SP1) (maybe 6.5 too), versions <6.5 will NOT work.

15 Jun 2006

Better error report when file open fails. Somewhat quicker. Recommends 'waitbar alternative'. Ok with Matlab orig. waitbar too, of course.

20 Jul 2006

Close waitbar instead of deleting it, and some other minor waitbar compatibility fixes.

16 Aug 2006

- No more (La)TeX formatting of file names.
- Prefixed waitbar messages with '(readtext)'.

02 Oct 2006

- Better removal of comments. Could leave an empty first row before.
- Added a 'usewaitbar' option.
- Now removes empty last columns and rows.

08 Mar 2007

Version 1.8:
- Fixed a problem when the comment character was a regexp character, such as '*'.
- Fixed a problem when reading files with no data.

01 Apr 2007

- Now handles a string directly.
- Improved compatibility with Matlab 6.x regarding waitbar calls.
- Fixed bug: failed to recognize a quote if there was only one in the text.
- Fixed a bug processing source of delimiters only.

Tag Activity for this File
Tag Applied By Date/Time
data import Peder Axensten 22 Oct 2008 08:24:12
data export Peder Axensten 22 Oct 2008 08:24:12
import Peder Axensten 22 Oct 2008 08:24:12
read Peder Axensten 22 Oct 2008 08:24:12
load Peder Axensten 22 Oct 2008 08:24:12
text Peder Axensten 22 Oct 2008 08:24:12
variable Peder Axensten 22 Oct 2008 08:24:12
cell Peder Axensten 22 Oct 2008 08:24:12
numeric Peder Axensten 22 Oct 2008 08:24:12
array Peder Axensten 22 Oct 2008 08:24:12
delimited Peder Axensten 22 Oct 2008 08:24:12
potw Cristina McIntire 07 Nov 2008 13:20:38
data export Stanislav 26 May 2009 19:07:20
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com