Textscan Tool Help

This document describes how to use textscantool, an interactive GUI to read large text files with delimited data. The tool lets you specify which columns to read and what data types to use. It can also generate the M code to carry out the same task programmatically. There is a video tutorial for this tool in the directory that comes with this submission.

Contents

Starting the tool

To open the tool, type texscantool at the command line.

Specify text file, header and delimiter information

Select a text file

To select a text file data source, click Browse... and use the dialog that appears to pick a file. The size of the selected file on disk is displayed and a preview of the first 100 lines of the file is shown in the panel below.

Specify last header row number

Examine the preview to determine the number of header lines in the file that need to be skipped before the data starts. Use the mouse to click on the number of the row where the headers end. You must click on the row number on the left and not just anywhere in the row. This will populate the 'No. of header lines' edit box below. It will also try to determine the delimiter used (just by finding the most common characters in the row) and populate the edit box below.

Specify header and delimiter information manually

You can override the calculated header and delimiter information by specifying,

Click Next>.

Specify data types of columns or which to ignore

In the next panel, up to 100 rows of data is read in and displayed in a grid. By default the data is read in as strings, to ensure that it can all be read. In most cases you will be reading numerical data, so you will need to specify a numerical data type to use. This is the type that the data will be stored in, in the MATLAB workspace.

Select column(s)

Select a columns with the mouse (or multiple columns with CTRL mouse click). You can select all columns by clicking in the top left.

Choose data type or ignore

Then click the radio buttons below to select a data type. The tool will check that the column can be read with that data type by reading the file. Character strings for example cannot be read in as numerical data. If the data type is chosen works, it is included with the name in the header. Singles and integer data types will take up less storage space than the usual doubles.

Choose ignore if you do not want to read a column.

If you have a date/time string in a column and you wish to read it as an date/time number, specify an anonymous function calling the datenum function and select the radio button.

Note: There is a known bug in this tool in that after selecting a radio button it and the column becomes deselected.

Click Next>.

Specify import options

In the final panel, you specify a number of import options.

Number of data rows to read

Enter the number of data rows to be read. If you click outside the text box the estimated size of the array in the workspace and time to import is estimated and displayed. If you want to read the entire file but do not know how long it is, press the Count Lines in File button. It will determine the number of rows in the file, hence the number of data lines that can be read and populate the text box. This may take a while for large files.

Specify data structure

Next choose the data structure that you would like to read the data into. Options include,

Import data or generate code

To import the data now...

...click Import Data. Variables called data and columnHeaders are created in the base workspace. If there were no column headers in the file then columnHeaders will contain dummy strings column1, column2, etc.

To generated the code...

...click Generate Code

It generates a function to read the data and opens it in the editor. The function does not currently return a column header variable. You can get column header information if you specify the 2D cell array option, in which case they are included with the data.

If a variable imported takes up too much memory or you get an "out of memory" error, consider importing less rows of data or going back to the previous panel and specify a smaller data type.

Closing tool

Click Close, to exit the tool.