Matlab coder str2num alternatives?

3 views (last 30 days)
Arun
Arun on 30 Aug 2018
Edited: dpb on 31 Aug 2018
I have this data stored in a character array. I've used fread and removed headers to get this data from a text file (I'm constrained not to use textscan or fileread as they are not supported by Matlab Coder, also find it difficult to use coder.ceval to use fscanf).
unsorted_data 1x767 char
1
-8.3033E-01 -4.2882E+00 -8.4900E+00 -4.0889E-01 -4.2372E+00 -1.3796E+00
-1.1903E+00 -3.9289E+00 -6.2813E+00 -9.2360E-01 -2.8582E+00 -1.2460E+00
2
-3.6261E+00 -4.7218E+00 1.4143E+01 1.6041E+00 -5.1505E+00 1.6737E+00
-3.9131E+00 -5.9048E+00 -2.7256E+01 2.0434E+00 -1.6630E+01 5.5229E+00
3
2.2578E+01 -1.7633E-02 2.1166E+01 2.8041E-01 1.8919E+00 2.4702E+01
6.0947E+01 5.1242E+00 4.0910E+01 -1.0404E+01 -4.8758E+00 5.0202E+01
Need to extract every third row (R1, R4, R7, R10,...) as double [Nx1] and a second matrix having the other rows of data [Nx6].
So far I'm able to extract the first part (R1, R4, R7, R10,...) in "numbers" variable, but I get NaNs for "Vector" variable. This would work with str2num but is not supported by Matlab Coder.
remain = unsorted_data;
data_str = string([]);
while (remain ~= string())
[token,remain] = strtok(remain, char(10));
data_str = [data_str ; token];
end
data = str2double(data_str);
len_data = length(data);
cnum = 1;
cvector = 1;
vector_rows = 2;
number = zeros(len_data/(vector_rows+1),1);
Vector = zeros(len_data*vector_rows/(vector_rows+1),1);
for i = 1:len_data
num_loc = (vector_rows+1)*(cnum-1)+1;
if i == num_loc
number(cnum,1) = data(i,1);
cnum = cnum+1;
else
Vector(cvector,1) = data(i,1);
cvector = cvector+1;
end
end
I'm looking to get two matrices of this data in the right format and secondly make this more efficient by replacing the "while" loop, as it takes too much time to process 5mil lines. Any help is greatly appreciated.
  4 Comments
dpb
dpb on 30 Aug 2018
What about fgetl and parse a line at a time? Is it supported?
Arun
Arun on 31 Aug 2018
No its not. Here's the list of all functions supported for C

Sign in to comment.

Accepted Answer

Stephen23
Stephen23 on 31 Aug 2018
Edited: Stephen23 on 31 Aug 2018
As far as I can tell from that list of coder-supported functions, something like this should work. The basic idea is to split the char vector into two preallocated cell arrays, then convert to numeric. Given your 1x767 char vector:
  • identify whitespace using isstrprop.
  • use diff and find to get indices of the numbers.
  • use eq and find to locate newline characters.
  • preallocate two cell arrays (perhaps transposed).
  • use for loop over the indices and collect the char numbers into the cell arrays.
  • apply str2double to both cell arrays.
  2 Comments
Arun
Arun on 31 Aug 2018
Edited: Arun on 31 Aug 2018
Can you show step 3 to find new line characters?
Also would this be process be faster than the while loop in the above code?
dpb
dpb on 31 Aug 2018
Edited: dpb on 31 Aug 2018
ix=find(unsorted_data==char(10));
unsorted_data is just an array of characters; internally they're just byes so they can be operated on as if were just numbers (which they are, internally, it's only for user interface they have a different interpretation).
That will be very fast; in the loop
while (remain ~= string())
[token,remain] = strtok(remain, char(10));
data_str = [data_str ; token];
end
you're using dynamic reallocation by concatenating each new token onto the previous data to build the array; that is about the most inefficient operation there is in Matlab as it forces reallocation and copy every pass. If the size gets large, the bottleneck really begins to show up.
ADDENDUM
Try this for starters...I loaded the file as char() w/o the headers into txt, turn it into row rather than column vector for the following...
ix=find(txt==char(10));
i1=1;
for i=1:10
i2=ix(i)-2;
s=txt(i1:i2);
disp(s),
i1=i2+3;
end
will find/break out the first 10 lines/records.
It does make it somewhat more of a pain when coder doesn't support any of the formatted read functions -- the problem w/ just applying str2double on the returned string above excepting for the 1:3:N single values is that str2double isn't vectorized; it returns NaN because the whole string isn't a single value for the rest of the records.
What you needs must do is iterate over ix except by 1:3:length(ix) and inside the loop increment to get the next two records but split them based on their fixed-column positions to pass to str2double.
Or, one could compute the start/stop locations of the records to remove the serial number records, then reshape() by the field width of each floating point field and end up with a long column to process/convert then reshape() the result in the end.

Sign in to comment.

More Answers (0)

Categories

Find more on Text Data Preparation in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!