| Contents | Index |
Allow quick and efficient access to large text file with nonuniform-size entries
The BioIndexedFile class allows access to text files with nonuniform-size entries, such as sequences, annotations, and cross-references to data sets. It lets you quickly and efficiently access this data without loading the source file into memory.
This class lets you access individual entries or a subset of entries when the source file is too big to fit into memory. You can access entries using indices or keys. You can read and parse one or more entries using provided interpreters or a custom interpreter function.
BioIFobj = BioIndexedFile(Format, SourceFile) constructs a BioIndexedFile object that indexes the contents of SourceFile following the parsing rules defined by Format, where SourceFile and Format are strings specifying a text file and a file format, respectively. It also constructs an auxiliary index file to store information that allows efficient, direct access to SourceFile. The index file by default is stored in the same location as the source file and has the same name as the source file, but with an IDX extension. The BioIndexedFile constructor uses the index file to construct subsequent objects from SourceFile, which saves time.
BioIFobj = BioIndexedFile(Format, SourceFile, IndexDir) specifies the relative or absolute path to a folder to use when searching for or saving the index file.
BioIFobj = BioIndexedFile(Format, SourceFile, IndexFile) specifies a file name, optionally including a relative or absolute path, to use when searching for or saving the index file.
BioIFobj = BioIndexedFile(..., 'PropertyName', PropertyValue) constructs the object using options, specified as property name/property value pairs.
BioIFobj = BioIndexedFile(...,'IndexedByKeys', IndexedByKeysValue) specifies whether you can access the object BioIFobj using keys. Choices are true (default) or false.
BioIFobj = BioIndexedFile(...,'MemoryMappedIndex', MemoryMappedIndexValue) specifies whether the constructor stores the indices in the auxiliary index file and accesses them via memory maps (true) or loads the indices into memory at construction time (false). Default is true.
BioIFobj = BioIndexedFile(...,'Interpreter', InterpreterValue) specifies a handle to a function that the read method uses when parsing entries in the source file. This interpreter function must accept a single string of one or more concatenated entries and return a structure or an array of structures containing the interpreted data.
BioIFobj = BioIndexedFile(...,'Verbose', VerboseValue) controls the display of the status of the object construction. Choices are true (default) or false.
Note The following property name/property value pairs apply only when both of the following are true:
For source files with application-specific formats, the following property name/property value pairs are pre-defined and you cannot change them. |
BioIFobj = BioIndexedFile(...,'KeyColumn', KeyColumnValue) specifies the column in the 'TABLE' or 'MRTAB' file that contains the keys. Default is the first column.
BioIFobj = BioIndexedFile(...,'KeyToken', KeyTokenValue) specifies a string the occurs in each entry before the key, for 'FLAT' files that contain keys. Default is ' ', which indicates the key is the first string in each entry and is delimited by blank spaces.
BioIFobj = BioIndexedFile(...,'HeaderPrefix', HeaderPrefixValue) specifies a prefix that denotes header lines in the source file so the constructor ignores them when creating the object. Default is [], which means the constructor does not check for header lines in the source file.
BioIFobj = BioIndexedFile(...,'CommentPrefix', CommentPrefixValue) specifies a prefix that denotes comment lines in the source file so the constructor ignores them when creating the object. Default is [], which means the constructor does not check for comment lines in the source file.
BioIFobj = BioIndexedFile(...,'ContiguousEntries', ContiguousEntriesValue) specifies whether entries are on contiguous lines in the source file or are separated by empty lines or comment lines. Choices are true or false (default).
BioIFobj = BioIndexedFile(...,'TableDelimiter', TableDelimiterValue) specifies a delimiter symbol to use as a column separator for SourceFile when Format is 'TABLE' or 'MRTAB'. Choices are '\t' (horizontal tab), ' ' (blank space), or ',', (comma). Default is '\t'.
BioIFobj = BioIndexedFile(...,'EntryDelimiter', EntryDelimiterValue) specifies an delimiter symbol to use as a entry separator for SourceFile when Format is 'FLAT'. Default is '//'.
FileFormat |
File format of the source file This information is read only. Possible values are:
|
IndexedByKeys |
Whether or not the entries in the source file can be indexed by an alphanumeric key. This information is read only.
|
IndexFile |
Path and file name of the auxiliary index file. This information is read only. Use this property to confirm the name and location of the index file associated with the object.
|
InputFile |
Path and file name of the source file. This information is read only. Use this property to confirm the name and location of the source file from which the object was constructed.
|
Interpreter |
Handle to a function used by the read method to parse entries in the source file. This interpreter function must accept a single string of one or more concatenated entries and return a structure or an array of structures containing the interpreted data. Set this property when your source file has a 'TABLE', 'MRTAB', or 'FLAT' format. When your source file is an application-specific format such as 'SAM', 'FASTQ', or 'FASTA', then the default is a function handle appropriate for that file type and typically does not require you to change it.
|
MemoryMappedIndex |
Whether the indices to the source file are stored in a memory-mapped file or in memory.
|
NumEntries |
Number of entries indexed by the object. This information is read only.
|
| getDictionary | Retrieve reference sequence names from SAM-formatted source file associated with BioIndexedFile object |
| getEntryByIndex | Retrieve entries from source file associated with BioIndexedFile object using numeric index |
| getEntryByKey | Retrieve entries from source file associated with BioIndexedFile object using alphanumeric key |
| getIndexByKey | Retrieve indices from source file associated with BioIndexedFile object using alphanumeric key |
| getKeys | Retrieve alphanumeric keys from source file associated with BioIndexedFile object |
| getSubset | Create object containing subset of elements from BioIndexedFile object |
| read | Read one or more entries from source file associated with BioIndexedFile object |
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB Programming Fundamentals documentation.
Construct a BioIndexedFile object to access a table containing cross references between gene names and gene ontology (GO) terms:
% Create variable containing full absolute path of source file
sourcefile = which('yeastgenes.sgd');
% Construct a BioIndexedFile object from the source file. Indicate
% the source file is a tab-delimited file where contiguous rows
% with the same key are considered a single entry. Store the
% index file in the Current Folder. Indicate that keys are
% located in column 3 and that header lines are prefaced with !
gene2goObj = BioIndexedFile('mrtab', sourcefile, '.', ...
'KeyColumn', 3, 'HeaderPrefix','!')Return the GO term from all entries that are associated with the gene YAT2:
% Access entries that have a key of YAT2 YAT2_entries = getEntryByKey(gene2goObj, 'YAT2'); % Adjust object interpreter to return only the column containing % the GO term gene2goObj.Interpreter = @(x) regexp(x,'GO:\d+','match') % Parse the entries with a key of YAT2 and return all GO terms % from those entries GO_YAT2_entries = read(gene2goObj, 'YAT2')
GO_YAT2_entries = 'GO:0004092' 'GO:0005737' 'GO:0006066' 'GO:0006066' 'GO:0009437'
fastaread | fastqread | genbankread | memmapfile | samread

See how to analyze, visualize, and model biological data and systems using MathWorks products.
Get free kit| © 1984-2012- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |