|
"Heywood " <heywoodj123@yahoo.com> wrote in message <gmcl8m$5mq$1@fred.mathworks.com>...
> I've run into a parsing task that is driving me nuts. I have a string like this:
>
> line = '125746.100,A,010.0600,N,01000.31,E,0.00,0.00,020506,,,A'
>
> and would like to parse it into a vector like this:
>
> [125746.1 65 10.06 78 1000.31 5 0 0 20506 NaN NaN 65]
>
> that is, floats and integers get parsed directly, letters get parsed as their ASCII values, and null fields (consecutive delimiters) get parsed as NaNs. It's perfectly OK for the result vector to be all doubles -- so no problem rendering integers as floats. But because this task will be parsing huge files, this needs to be as fast as possible.
>
> The tricky thing is that nulls can appear in fields that, when populated, are either floats or characters. So simply replacing ',,' with something like ',NaN,' using STRREP won't work, since the parsing will stop at the first place where a %c specifier encounters a NaN.
>
> The fastest almost-working solution I've found so far is along the lines of
>
> sscanf(line,'%f,%c,%f,%c,%f,%c,%f,%f,%f,%f,%c,%c')'
>
> but that stops parsing at the first null field (after 020506). Replacing the %c specifiers with %s doesn't help.
>
> The only fully-working solution I've found so far is:
>
> dummy = textscan(regexprep(line(8:end-3),',',char(1)),'%s','delimiter',char(1));
> ind = find(~strcmpi(dummy{:}','') & isnan(str2double(dummy{:}')));
> result = str2double(dummy{:})'; result(ind) = double(cell2mat(dummy{1}(ind)));
>
> ... but a tic/toc timing test shows this to be about 60X slower than attempts using just SSCANF and/or TEXTSCAN.
>
> Can anyone suggest a faster way to accomplish the above, without all the find/str2double/cell2mat gymnastics?
>
> Gratefully,
>
> HJ
Does this give you something to work on?
STR = '1.100,A,010.0600,N,010.31,E,0.00,0.00,0206,,,A' ;
s = strread(STR,'%s','delimiter',',') ; % read in as strings (using e.g., textread)
f = str2double(s) ; % retrieve floats and integers
q = isnan(f) & ~cellfun('isempty',s) ; % position of ascii characters
f(q) = [s{q}] ;
% result
f.'
hth
Jos
|