What would be the best approach to solve this data mapping problem?

2 views (last 30 days)
I’ve got a large text file whose content looks like this;
MSN_BER (0:31) Observation #1 Rx'd at: (58570.500) Msg. Time: (58568.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
State Time: 12:00:00.000 (58570.500)
State Position: -1111.1111, -2222.2222, -3333.3333
MSN_RAM (0:32) Observation #100 Rx'd at: (58568.000) Msg. Time: (58568.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
Fmt: 10 (AIRBORN__ARRAY_LOT) Length: 1234 Remote Num: 1 Number of Observations: 00
Type: 1 Track ID: 12345 Time Tag: 58567.00000000
Band ID: 1 AC ID: 1 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -10000.12345678 Y: 2000.123456789 Z: 30000.12345678
Performance: 1.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58568.00000000
Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -40000.12345678 Y: 5000.123456789 Z: 60000.12345678
Performance: 11.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58569.00000000
Band ID: 1 AC ID: 14 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -70000.12345678 Y: 8000.123456789 Z: 90000.12345678
Performance: 11.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58570.00000000
Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -10000.12345678 Y: 4000.123456789 Z: 30000.12345678
Performance: 8.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
MSN_BER (0:31) Observation #2 Rx'd at: (58590.000) Msg. Time: (58568.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
State Time: 12:00:00.000 (58582.500)
State Position: -4444.4444, -5555.5555, -6666.6666
MSN_RAM (0:32) Observation #100 Rx'd at: (58569.000) Msg. Time: (58569.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rep Mode: Replay_Mode
Fmt: 10 (AIRBORN__ARRAY_LOT) Length: 5678 Remote Num: 1 Number of Observations: 01
Type: 1 Track ID: 12345 Time Tag: 58581.00000000
Band ID: 1 AC ID: 1 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -11000.12345678 Y: 4100.123456789 Z: 31000.12345678
Performance: 1.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58582.00000000
Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -21000.12345678 Y: 4200.123456789 Z: 32000.12345678
Performance: 4.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58585.00000000
Band ID: 1 AC ID: 6 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -31000.12345678 Y: 4300.123456789 Z: 33000.12345678
Performance: 7.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58586.00000000
Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -41000.12345678 Y: 4400.123456789 Z: 34000.12345678
Performance: 21.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
Type: 1 Track ID: 12345 Time Tag: 58588.00000000
Band ID: 1 AC ID: 2 Scan ID: 0 LRT/HRT: 1 Valid Flag: 0
Aircraft POS X: -51000.12345678 Y: 4500.123456789 Z: 35000.12345678
Performance: 20.12345 Hydro Pressures: 0.0000000 Compression: 0.000000
For processing and plotting, I’m looking for a way to do the following:
Create an < n x 11 double > array where the following parameters are included:
a. The state times (hh:mm:ss.000 and UTC) and state position values in _BER (5 parameters).
b. The time tag UTC time, AC ID value, the 3 platform position values, and the performance value for each AC ID = 1 and 2 (6 parameters).
Currently, I’m using 2 separate REGEXPs to extract the _BER parameters and AC ID parameters. They are:
% Parse out the BER State Times and Position Values
exp = 'State Time:\s+([\d:\.]+).\s+\(([\d.]+)\).*?State Position:\s+([-?\d\.]+),\s+([-?\d\.]+),\s+([-?\d\.]+)';
tokens = regexp(buffer, exp, 'tokens');
BER_State_Data = reshape(str2double([tokens{:}]), 5, []).';
% Parse out the AC ID values equal only to 1 or 2, their Time Tags, and
% the 3 platform position values (x,y,z) and performance values.
exp = '([\d\.]+)\s+Band[^A]+?AC ID:\s+([12]{1})\W.*?Aircraft POS X:\s+([-?\d\.]+).\s+Y:\s+([-?\d\.]+).\s+Z:\s+([-?\d\.]+).*?ance:\s+([\d\.e+-]+).';
tokens = regexp(buffer, exp, 'tokens');
AC12_data = reshape(str2double([tokens{:}]),6,[]).';
These 2 sets of commands yield:
BER_State_Data =
NaN 58570.5000000000 -1111.11110000000 -2222.22220000000 -3333.33330000000
NaN 58582.5000000000 -4444.44440000000 -5555.55550000000 -6666.66660000000
AC12_data =
58567 1 -10000.1234567800 2000.12345678900 30000.1234567800 1.12345000000000
58568 2 -40000.1234567800 5000.12345678900 60000.1234567800 11.1234500000000
58570 2 -10000.1234567800 4000.12345678900 30000.1234567800 8.12345000000000
58581 1 -11000.1234567800 4100.12345678900 31000.1234567800 1.12345000000000
58582 2 -21000.1234567800 4200.12345678900 32000.1234567800 4.12345000000000
58586 2 -41000.1234567800 4400.12345678900 34000.1234567800 21.1234500000000
58588 2 -51000.1234567800 4500.12345678900 35000.1234567800 20.1234500000000
However, I need an n x 11 array that looks like this:
NaN 58570.5 -1111.11 -2222.22 -3333.33 58567 1 -10000.1 2000.123 30000.12 1.12345
NaN 58570.5 -1111.11 -2222.22 -3333.33 58568 2 -40000.1 5000.123 60000.12 11.12345
NaN 58570.5 -1111.11 -2222.22 -3333.33 58570 2 -10000.1 4000.123 30000.12 8.12345
NaN 58582.5 -4444.44 -5555.56 -6666.67 58581 1 -11000.1 4100.123 31000.12 1.12345
NaN 58582.5 -4444.44 -5555.56 -6666.67 58582 2 -21000.1 4200.123 32000.12 4.12345
NaN 58582.5 -4444.44 -5555.56 -6666.67 58586 2 -41000.1 4400.123 34000.12 21.12345
NaN 58582.5 -4444.44 -5555.56 -6666.67 58588 2 -51000.1 4500.123 35000.12 20.12345
where the first state time & position data are mapped to the applicable AC IDs which follow it. Then the 2nd set of state time & position data are mapped to the applicable AC IDs which follow it. And so on until the end of the text file.
NOTEs: The NaNs are a result of the hh:mm:ss.000 state time in _BER, and are not a problem. The AC12 data message(s) will always follow the BER state data message– but the number of AC12 data messages can vary from 1 to many.
I’m not exactly sure how to approach the problem given the use of REGEXP. Can I use another MATLAB command (along with REGEXP) to map the applicable BER state message data to the AC12 data messages? Or just write the BER state message data into the array?
Any ideas would be appreciated. Thank you.

Accepted Answer

Cedric
Cedric on 15 Jul 2013
Edited: Cedric on 15 Jul 2013
It is not trivial in the sense that REGEXP provides you with two series of data with no information for relating one to the other. I see two options (without thinking too much)..
1. Instead of calling REGEXP twice, you call it a first time to get blocks based on a split matching 'MSN_BER'. You can then loop over these blocks and extract data that are to be mapped. E.g. (not tested):
EDIT: splitting using REGEXP is simpler than my first proposal..
bufferSplit = regexp(buffer, 'MSN_BER', 'split') ;
for bId = 1 : length(bufferSplit)
if isempty(bufferSplit{bId}), continue ; end
% Here, your code based on two REGEXP using bufferSplit{bId}
% instead of buffer.
end
this way you know that, at each step of the loop, BER_State_Data and AC12_Data belong to the same block.
First proposal (I leave it for the record):
startPos = regexp(buffer, 'MSN_BER', 'start') ;
nBlocks = length(startPos) ;
for bId = 1 : nBlocks
if bId < nBlocks
miniBuffer = buffer(startPos(bId):startPos(bId+1)-1) ;
else
miniBuffer = buffer(startPos(bId):end) ;
end
% Here, your code based on two REGEXP using miniBuffer instead of buffer.
end
2. If you can count on the fact (?) that the 'Time Tag:' field associated with entries that belong to the same block as a given BER entry are <= the 'Rx'd at:' (or State Time) field of the BER entry, then you can build the join directly from what you already have, using the 2nd column of BER_State_Data and the first column of AC12_Data. E.g. (not tested):
for berId = 1 : size(BER_State_Data, 1)
if berId == 1, prev = 0 ; else prev = BER_State_Data(berId-1,2) ; end
ac12Ids = AC12_data(:,1)>prev & AC12_data(:,1)<=BER_State_Data(berId,2) ;
% Here you build whatever you want with
% BER_State_Data(berId,:) and AC12_data(ac12Ids,:)
end
=========================================================
PS: if you are the Brad who asked earlier about calling various functions based on a "per column" function ID, here is one example:
f{1} = @sin ;
f{2} = @(x) x.^(1/2) ;
f{3} = @(x) -x ;
M = magic(8)
c = [1, 1, 1, 2, 2, 3, 3, 3] ;
fM = arrayfun(@(cId) f{c(cId)}(M(:,cId)), 1:length(c), 'UniformOutput', false);
cell2mat(fM)
  3 Comments
Cedric
Cedric on 15 Jul 2013
Edited: Cedric on 15 Jul 2013
For #1, this is why I start the loop with
if isempty(bufferSplit{bId}), continue ; end
REGEXP/split returns what is on both sides of splits, which mean '' when there is nothing. You can see this below:
>> regexp('ABA', 'A', 'split')
ans =
'' 'B' ''
>> regexp('BAB', 'A', 'split')
ans =
'B' 'B'
For #2,3, yes, what I am showing in my example is how to create a context which allows building the mapping, but I left this operation to you.
What you have to do is something like that:
buffer = fileread('Brad3.txt') ;
bufferSplit = regexp(buffer, 'MSN_BER', 'split') ;
nBlocks = length(buffer) ;
output = cell(nBlocks, 1) ;
for bId = 1 : length(bufferSplit)
if isempty(bufferSplit{bId}), continue ; end
% Your block of code with the modifications that I proposed.
exp = 'State Time:\s+([\d:\.]+).\s+\(([\d.]+)\).*?State Position:\s+([-?\d\.]+),\s+([-?\d\.]+),\s+([-?\d\.]+)';
tokens = regexp(bufferSplit{bId}, exp, 'tokens');
BER_State_Data = reshape(str2double([tokens{:}]), 5, []).';
exp = '([\d\.]+)\s+Band[^A]+?AC ID:\s+([12]{1})\W.*?Aircraft POS X:\s+([-?\d\.]+).\s+Y:\s+([-?\d\.]+).\s+Z:\s+([-?\d\.]+).*?ance:\s+([\d\.e+-]+).';
tokens = regexp(bufferSplit{bId}, exp, 'tokens');
AC12_data = reshape(str2double([tokens{:}]),6,[]).';
% A couple additional lines to map and store.
output{bId} = repmat(BER_State_Data, size(AC12_data, 1), 1) ;
output{bId} = [output{bId}, AC12_data] ;
end
output = cell2mat(output) ;
Brad
Brad on 19 Jul 2013
Cedric, it took some time to eliminate the bugs. But this approach works great.
Thanks again!

Sign in to comment.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!