Matlab coder str2num alternatives?
3 views (last 30 days)
Show older comments
I have this data stored in a character array. I've used fread and removed headers to get this data from a text file (I'm constrained not to use textscan or fileread as they are not supported by Matlab Coder, also find it difficult to use coder.ceval to use fscanf).
unsorted_data 1x767 char
1
-8.3033E-01 -4.2882E+00 -8.4900E+00 -4.0889E-01 -4.2372E+00 -1.3796E+00
-1.1903E+00 -3.9289E+00 -6.2813E+00 -9.2360E-01 -2.8582E+00 -1.2460E+00
2
-3.6261E+00 -4.7218E+00 1.4143E+01 1.6041E+00 -5.1505E+00 1.6737E+00
-3.9131E+00 -5.9048E+00 -2.7256E+01 2.0434E+00 -1.6630E+01 5.5229E+00
3
2.2578E+01 -1.7633E-02 2.1166E+01 2.8041E-01 1.8919E+00 2.4702E+01
6.0947E+01 5.1242E+00 4.0910E+01 -1.0404E+01 -4.8758E+00 5.0202E+01
Need to extract every third row (R1, R4, R7, R10,...) as double [Nx1] and a second matrix having the other rows of data [Nx6].
So far I'm able to extract the first part (R1, R4, R7, R10,...) in "numbers" variable, but I get NaNs for "Vector" variable. This would work with str2num but is not supported by Matlab Coder.
remain = unsorted_data;
data_str = string([]);
while (remain ~= string())
[token,remain] = strtok(remain, char(10));
data_str = [data_str ; token];
end
data = str2double(data_str);
len_data = length(data);
cnum = 1;
cvector = 1;
vector_rows = 2;
number = zeros(len_data/(vector_rows+1),1);
Vector = zeros(len_data*vector_rows/(vector_rows+1),1);
for i = 1:len_data
num_loc = (vector_rows+1)*(cnum-1)+1;
if i == num_loc
number(cnum,1) = data(i,1);
cnum = cnum+1;
else
Vector(cvector,1) = data(i,1);
cvector = cvector+1;
end
end
I'm looking to get two matrices of this data in the right format and secondly make this more efficient by replacing the "while" loop, as it takes too much time to process 5mil lines. Any help is greatly appreciated.
4 Comments
Accepted Answer
Stephen23
on 31 Aug 2018
Edited: Stephen23
on 31 Aug 2018
As far as I can tell from that list of coder-supported functions, something like this should work. The basic idea is to split the char vector into two preallocated cell arrays, then convert to numeric. Given your 1x767 char vector:
- identify whitespace using isstrprop.
- use diff and find to get indices of the numbers.
- use eq and find to locate newline characters.
- preallocate two cell arrays (perhaps transposed).
- use for loop over the indices and collect the char numbers into the cell arrays.
- apply str2double to both cell arrays.
2 Comments
dpb
on 31 Aug 2018
Edited: dpb
on 31 Aug 2018
ix=find(unsorted_data==char(10));
unsorted_data is just an array of characters; internally they're just byes so they can be operated on as if were just numbers (which they are, internally, it's only for user interface they have a different interpretation).
That will be very fast; in the loop
while (remain ~= string())
[token,remain] = strtok(remain, char(10));
data_str = [data_str ; token];
end
you're using dynamic reallocation by concatenating each new token onto the previous data to build the array; that is about the most inefficient operation there is in Matlab as it forces reallocation and copy every pass. If the size gets large, the bottleneck really begins to show up.
ADDENDUM
Try this for starters...I loaded the file as char() w/o the headers into txt, turn it into row rather than column vector for the following...
ix=find(txt==char(10));
i1=1;
for i=1:10
i2=ix(i)-2;
s=txt(i1:i2);
disp(s),
i1=i2+3;
end
will find/break out the first 10 lines/records.
It does make it somewhat more of a pain when coder doesn't support any of the formatted read functions -- the problem w/ just applying str2double on the returned string above excepting for the 1:3:N single values is that str2double isn't vectorized; it returns NaN because the whole string isn't a single value for the rest of the records.
What you needs must do is iterate over ix except by 1:3:length(ix) and inside the loop increment to get the next two records but split them based on their fixed-column positions to pass to str2double.
Or, one could compute the start/stop locations of the records to remove the serial number records, then reshape() by the field width of each floating point field and end up with a long column to process/convert then reshape() the result in the end.
More Answers (0)
See Also
Categories
Find more on Text Data Preparation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!