Parsing Strings with Values Missing

Hi everyone!
I am currently working on a code that will allow me to extract the elevation of multiple GPS's from a string of data. However, each line of data will only contain information about 4 (or less) GPS's before continuing on a new line. This means the last line often doesn't have the same amount of data as the first lines. I tried working around this by creating an if-else statement. Sadly, this doesn't work as Matlab when parsing the data does not recognize two consecutive commas as a value missing and doesn't count it. This means I will get the wrong values into my matrix. I don't know how to overcome this? I have copied a couple lines of my data below as well as my code. The code is over 800 lines in total so it's just a small excerpt of the entire code.
A quick explanation of the data - I am looking to extract the 2 digit number just before the 3 digit number. That's the elevation of the GPS's in the sky in degrees. I need both GPGSV and GLGSV. The first number is the amount of lines for the particular GPS reading. The second number is the actual line number - so the first line is line 1 of 3 and so on. The 3rd number is the number of satellites. The 4th number is irrelevant in my data collection.
Thank you very much in advance!
----------------------------------DATA------------------------------------
$GPGSV,3,1,12,01,09,252,27,03,46,296,47,04,02,227,20,14,27,103,46*7C
$GPGSV,3,2,12,16,25,184,26,22,02,159,32,23,19,300,48,25,19,041,40*74
$GPGSV,3,3,12,26,52,161,50,29,09,079,43,31,65,038,50,48,23,236,36*71
$GLGSV,3,1,09,67,08,149,,67,24,150,30,68,80,173,43,78,72,003,40*62
$GLGSV,3,2,09,70,10,333,,86,03,009,28,77,20,039,34,69,42,324,38*6E
$GLGSV,3,3,09,87,02,059,,,,,,,,,,,,,*5D
----------------------------------DATA------------------------------------
----------------------------------CODE------------------------------------
%GSV data
GSVcheck = strfind(AllData{1}, 'GSV');
GSVrows = find(~cellfun('isempty',GSVcheck));
GSVdata = AllData{1}(GSVrows);
GSVlength = floor(length(GSVdata)/6);
%'Empty' matrices
GSV = cell(DistanceLength*6,1);
%Parse $GSV
parseGSVdata = strsplit(GSVdata{counter},',');
numLines = parseGSVdata{2};
lineNum = parseGSVdata{3};
if lineNum ~= numLines
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
GSV{counter,3} = parseGSVdata{14};
GSV{counter,4} = parseGSVdata{18};
elseif lineNum == numLines
dataLeft = parseGSVdata{4};
dataAmount = numLines*4 - dataLeft;
if dataAmount == 1
GSV{counter,1} = parseGSVdata{6};
elseif dataAmount == 2
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
elseif dataAmount == 3
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
GSV{counter,3} = parseGSVdata{14};
elseif dataAmount == 4
GSV{counter,1} = parseGSVdata{6};
GSV{counter,2} = parseGSVdata{10};
GSV{counter,3} = parseGSVdata{14};
GSV{counter,4} = parseGSVdata{18};
end
end
----------------------------------CODE------------------------------------

 Accepted Answer

dpb
dpb on 3 Jun 2016
Edited: dpb on 4 Jun 2016
Actually, since the values are regularly spaced, simply create a format string for the ones you want...I picked the first and the last record...and put into a string gpg and glg, respectively...
>> fmt=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];
>> gpval=cell2mat(textscan(gpg,fmt,'delimiter',','))
gpval =
9 46 2 27
>> glval=cell2mat(textscan(glg,fmt,'delimiter',','))
glval =
2 NaN NaN NaN
>>
ADDENDUM
The missing value conundrum is associated with using a '%d' numeric format instead of '%f'; the default value of NaN can't be stored in an integer which is the default class returned. I was unaware of that until some further checking on what was happening...had always presumed everything numeric would be double(*) by default unless specifically cast to something else.
() Although it is, indeed, documented that *textscan returns the output class of int or uint, for us old-timers used to "everything in Matlab is double unless", it takes some getting used to these new-fangled ways. [f|s]scanf, for instance, do not do this but return double...and the old standby around "since forever" precursor to textscan, textread doesn't, either.
>> type int.dat
23,133
>> textread('int.dat','%d','delimiter',',')
ans =
23
133
>> whos ans
Name Size Bytes Class Attributes
ans 2x1 16 double
>>

4 Comments

Thore
Thore on 3 Jun 2016
Edited: Thore on 3 Jun 2016
Thank you so much for your answer! I was able to modify it slightly to work in the code. I now have a matrix with all the elevation values which is great. However, after searching online for a good while I must admit I have no idea how to modify the repmat in order to find a different value in the same set of data. The 2 digit number right after the 3 digit number is the noise on the signal. I have tried messing around with all the numbers to see what would happen but in the end I always end up with the numbers for the elevation. I have attached the new code below and how I wish to use it to find the 4th value in the repeated part of the data as well as the 2nd value which is the elevation. Thanks in advance.
--------------------------CODE---------------------------
%Parse $GSV - Elevation
fmt=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];
GSV_L = cell2mat(textscan(GSVdata{counter},fmt,'delimiter',',','collectoutput',true));
GSV = vertcat(GSV,GSV_L);
assignin('base','Elevation',GSV);
%Parse $GSV - Noise
fmt2=['%*s' repmat('%*f',1,4) repmat(['%f' repmat('%*f',1,3)],1,4) '%*s'];
Noise_L = cell2mat(textscan(GSVdata{counter},fmt2,'delimiter',',','collectoutput',true));
Noise = vertcat(GSV,Noise_L);
assignin('base','Noise',Noise);
--------------------------CODE---------------------------
"... I have no idea how to modify the repmat in order to find a different value in the same set of data."
It's not repmat that needs changing, it's the choice of which fields are to be returned and which skipped. All repmat is doing is repeating a given pattern a fixed number of times instead of writing each individual format one at a time manually.
Break it down from the left and inside out...it
  1. Skips a string: '%*s'
  2. Skips four numbers: repmat('%*f',1,4)
  3. Reads a number then skips three: ['%f' repmat('%*f',1,3)]
  4. And repeats that pattern 4 times: repmat([...],1,4)
  5. Then finally skips the last string: '%*s'
If you want a different group or multiple values, work through the position of each and build the matching string.
Alternatively, simply read the whole numeric array and then just delete from memory the columns not of interest.
ADDENDUM/RANT :) :
And, all this gyration because the creators of C (which Matlab formatted i/o uses C-library-derived functions fprintf and friends) couldn't stand the thought of "not invented here" and wrote a formatting string definition that can't accept repeat specifiers as were already well established in FORTRAN (now Fortran) at the time (along with introducing the problems with fixed width input parsing). :( With a sensible rearranging of the order of the field, the above logically could have been written far simpler as
FMT=['*A *4F 4(F *3F) *A'];
using the 'A' for character data from Fortran FORMAT instead of 's' and presuming keeping the '*' for skipping an input field which Fortran doesn't have.
END RANT (not directed really at TMW, just a "pet peeve")
Again, thank you so much for your help! I highly appreciate it. My code works perfectly now and I can continue analyzing my data. It does seem strange that it had to be done in such a cumbersome way if there was already an easy way to do it in Fortran but you have helped me past my hurdle.
Well, the repmat is solely Matlab; there's always the recourse of writing N (in this case, 20) individual format strings but I find it easier to keep track of "who's who in the zoo" if use the symmetry that is in the input record (presuming there is some, of course, which there usually is). C writes the format string as [Width[.Precision]]DataType instead of Fortran FORMAT DataType[Width[.Precision]]. Since there's a numeric value in front of the Type specifier, it makes parsing a form for a repeat multiplier very tough so it isn't implemented; hence you have to write every element explicitly in one form or another. What an unnecessary pain it is, indeed... :(
Anyway, that annoyance aside, glad you got it going; hope something was learned as well as solving the immediate problem.

Sign in to comment.

More Answers (0)

Products

Asked:

on 2 Jun 2016

Commented:

dpb
on 6 Jun 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!