Textscan Tool Help
This document describes how to use textscantool, an interactive GUI to read large text files with delimited data. The tool lets you specify which columns to read and what data types to use. It can also generate the M code to carry out the same task programmatically. There is a video tutorial for this tool in the directory that comes with this submission.
To open the tool, type texscantool at the command line.
Select a text file
To select a text file data source, click Browse... and use the dialog that appears to pick a file. The size of the selected file on disk is displayed and a preview of the first 100 lines of the file is shown in the panel below.
Specify last header row number
Examine the preview to determine the number of header lines in the file that need to be skipped before the data starts. Use the mouse to click on the number of the row where the headers end. You must click on the row number on the left and not just anywhere in the row. This will populate the 'No. of header lines' edit box below. It will also try to determine the delimiter used (just by finding the most common characters in the row) and populate the edit box below.
Specify header and delimiter information manually
You can override the calculated header and delimiter information by specifying,
- The number of header lines
- Whether or not the text file contains column headers names/categories.
- The row that the columns headers are to be found. By default it assumes that the column headers are on the last header row that you just selected. You will need to change this if there are some header rows between the column headers and the actual data.
In the next panel, up to 100 rows of data is read in and displayed in a grid. By default the data is read in as strings, to ensure that it can all be read. In most cases you will be reading numerical data, so you will need to specify a numerical data type to use. This is the type that the data will be stored in, in the MATLAB workspace.
Select a columns with the mouse (or multiple columns with CTRL mouse click). You can select all columns by clicking in the top left.
Choose data type or ignore
Then click the radio buttons below to select a data type. The tool will check that the column can be read with that data type by reading the file. Character strings for example cannot be read in as numerical data. If the data type is chosen works, it is included with the name in the header. Singles and integer data types will take up less storage space than the usual doubles.
Choose ignore if you do not want to read a column.
If you have a date/time string in a column and you wish to read it as an date/time number, specify an anonymous function calling the datenum function and select the radio button.
Note: There is a known bug in this tool in that after selecting a radio button it and the column becomes deselected.
In the final panel, you specify a number of import options.
Number of data rows to read
Enter the number of data rows to be read. If you click outside the text box the estimated size of the array in the workspace and time to import is estimated and displayed. If you want to read the entire file but do not know how long it is, press the Count Lines in File button. It will determine the number of rows in the file, hence the number of data lines that can be read and populate the text box. This may take a while for large files.
Specify data structure
Next choose the data structure that you would like to read the data into. Options include,
- A row cell array, which is the most memory efficient for importing multiple data types
- A 2D numerical array which is the most memory efficient and is only possible if all data is of the same type and numerical.
- A 2D cell array, which is the most memory inefficient but lets you view different data types together with headers in the array editor similar to a spreadsheet.
To import the data now...
...click Import Data. Variables called data and columnHeaders are created in the base workspace. If there were no column headers in the file then columnHeaders will contain dummy strings column1, column2, etc.
To generated the code...
...click Generate Code
It generates a function to read the data and opens it in the editor. The function does not currently return a column header variable. You can get column header information if you specify the 2D cell array option, in which case they are included with the data.
If a variable imported takes up too much memory or you get an "out of memory" error, consider importing less rows of data or going back to the previous panel and specify a smaller data type.
Click Close, to exit the tool.