Quickest way to convert numeric array to a cell array of strings
2 views (last 30 days)
Show older comments
In general terms I need to process many text files with mixed data types, both numeric and text. For the numeric data, I need to process them for quality since I deal with many sources of data and there is no standard I can strictly enforce. The data could be in integer, floating point, or some complex format, and I need to clean these files.
The input data is quickly converted to numbers with textscan, however, trying to convert the newly created numbers back to strings seems to be taking too long (6 times longer than converting the input text to numbers).
I'm currently using a combination of sprintf and textscan to convert a numeric array of doubles into a cell array of strings without losing precision.
numstr = sprintf(sprntffrmtstr, num);
vlnmcllnst = textscan(numstr,txtscnfrmtstr,'delimiter',' ');
vlcll = vlnmcllnst{1};
Lines 1 and 2 are taking up a significant amount of time in the profiler. I use this scheme in a for loop to process output from the textscan that converted the input text to numbers. Each loop is a column vector of numbers and both numstr and vlnmcllnst were pre-allocated before the loop.
Can someone speed this up?
6 Comments
Cedric
on 11 May 2013
If I understand well, you get this type of CSV files from somewhere and you convert them into cell arrays using TEXTSCAN, that include both numeric and alphanumeric types, and then you want to convert back numeric values into string? Is the purpose to export back to new/updated CSV files?
You want to be flexible, but it seems to me that all columns (channels?) in your CSV file have the same structure. Do you determine dynamically (and how?) which columns are numeric and which are not? And are you building the formatspec dynamically for SPRINTF/TEXTSCAN (as you seem to have 189 columns)?
Answers (2)
Cedric
on 12 May 2013
Edited: Cedric
on 12 May 2013
Ok, to be honest, my issue at this point is that this approach is not that common, and I can't figure out whether you are an experienced programmer in other languages and you know that it is the way to go - in which case I should focus on optimizing just a few lines of your code - or if you are less experienced - in which case I should discuss the general approach.
As a typical Swiss guy, I'll just take the central path ;-) and propose to discuss some simple code, so we have something concrete for brainstorming.
In the following, I read your CSV file, multiply all numeric values by 2, and export the outcome (including numeric and text values). I try to keep it simple at this stage, so I am using regexp to split the first line of data instead of making some more complicated f/text-scan/f analysis..
fname_in = 'exampledata.csv' ;
fname_out = 'exampledata_new.csv' ;
% - Open input/output files.
fid_in = fopen(fname_in, 'r') ;
fid_out = fopen(fname_out, 'w') ;
% - Copy header.
line = fgetl(fid_in) ;
fwrite(fid_out, line) ;
% - Analyse first line of data, define # of columns
% and which ones are numeric.
line = fgetl(fid_in) ;
buffer = regexp(line, ',', 'split') ;
nCol = numel(buffer) ;
data = str2double(buffer) ;
isnum = ~isnan(data) ; % Vector, flag numeric columns.
% - Build export format.
fmt = cell(1, nCol) ;
fmt(:) = {'%s,'} ;
fmt(isnum) = {'%g,'} ; % Default format for numeric
fmt = [fmt{:}] ; % data is %g at this point.
fmt = [fmt(1:end-1), '\n'] ;
% - Process rest of the file.
while true
% Process numeric values.
data_new = 2 * data(isnum) ;
% Export modified line.
buffer(isnum) = num2cell(data_new) ;
fprintf(fid_out, fmt, buffer{:}) ;
% Exit if end of file.
if feof(fid_in), break ; end
% Read line and extract numeric data.
buffer = regexp(fgetl(fid_in), ',', 'split') ;
data = str2double(buffer) ;
end
% Closes file, free resources, etc.
fclose(fid_in) ;
fclose(fid_out) ;
While this code is not robust, it has some flexibility in the sense that the number of columns could vary and the nature of columns (numeric/text) is detected.
Now if I understand well, you want to process a bit more the first line in order to get more information about the format of each column (so you can reproduce it exactly in the output)?
0 Comments
Will
on 16 May 2013
1 Comment
Cedric
on 16 May 2013
Edited: Cedric
on 16 May 2013
Regexp was not the point of the code above; it was a simple way to achieve data split/extraction until I fully understand what you want(ed) to achieve, especially on the part that builds the character string to output (precisely the "number to text scheme" that you refer to).
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!