Problem using regexp to extract certain lines

11 views (last 30 days)
I am trying to extract lines that begin with /keylog/midi from a file that looks like:
/keylog/midi 144 60 72 1.001300
/keylog/oscp 144 60 0.006736 1.030209
/keylog/oscp 144 60 0.000000 2.852801
/keylog/oscp 144 60 0.000000 2.869148
/keylog/midi 144 60 0 2.870843
And I need to separate the two lines from each other. The code I have right now is:
Fid=fopen('keyData.txt');
myLines=fgetl(Fid);
while ~feof(Fid)
myLines=fgetl(Fid);
on=regexp(myLines,'(/keylog/midi)(\s\d+\s\d+\s[^0]\s\d+(.)\d+)', 'match'); % on must equal the line that's fourth line is greater than 0
off=regexp(myLines, '(/keylog/midi)\s\d+\s\d+\s[0]+\s\d+(.)\d+', 'match'); % off must equal the line that's fourth number is 0
end
When I run the code
off='/keylog/midi 144 60 0 2.870843'
on={}
What is wrong with my regexp for on?

Accepted Answer

Simon
Simon on 22 Nov 2013
Hi!
I would suggest another approach. It is (in my opinion) easier to debug, because you can track your steps easily.
% open and read file
fid = fopen(FileName);
FC = textscan(fid, '%s', 'delimiter', '\n', 'whitespace', '');
fclose(fid);
FC = FC{1};
% remove blanks on start and end of line
FC = strtrim(FC);
% find all lines with '/keylog/midi'
FCmidi = FC(strncmp('/keylog/midi', FC, 12));
% read each remaining line, skipping the string '/keylog/midi'
C = cellfun(@(x) sscanf(x, '%*s %d %d %d %f'), FCmidi, 'UniformOutput', false);
% format C: each column is one log entry
C = [C{:}];
% on/of flag is in row 3
onoff = C(3, :);

More Answers (2)

Walter Roberson
Walter Roberson on 22 Nov 2013
[^0]\s should be [^0]\d* in order to eat the digits after the first non-zero one (e.g., [^0] will match the 7, and then the \d* will match the 2.
In the off expression, [0]+ will match one or more 0's. Will there ever be multiple 0's there, such as 00 ? If not then it would make more sense to get rid of the + and change the [0] to just 0
  3 Comments
Walter Roberson
Walter Roberson on 22 Nov 2013
Edited: Walter Roberson on 22 Nov 2013
Note: use \. to indicate a literal period.
Could you show your modified regular expressions?
John
John on 22 Nov 2013
This is the code after I added the \d* after the [^0], and changed the (.) to \. The first one still does not work.
on=regexp(myLines,'(/keylog/midi)\s\d+\s\d+\s[^0]\d*\s\d+\.\d+', 'match');
off=regexp(myLines, '(/keylog/midi)\s\d+\s\d+\s0\s\d+\.\d+', 'match');

Sign in to comment.


Yamoussa SANOGO
Yamoussa SANOGO on 15 Oct 2019
Edited: Yamoussa SANOGO on 15 Oct 2019
Hi there, I know this question has been around for a while, but I would add my suggestion in the case somebody else has the same problem. My approch would be a simple lookahead like this :
text =
" /keylog/midi 144 60 72 1.001300
/keylog/oscp 144 60 0.006736 1.030209
/keylog/oscp 144 60 0.000000 2.852801
/keylog/oscp 144 60 0.000000 2.869148
/keylog/midi 144 60 0 2.870843 " ;
rule = '(?<=\/keylog\/midi)(\s*\d*\s*\d*\s*\d*\.?\d*\s*\d*\.?\d*)' ;
matched_data = regexp(text,rule, 'match');
Then convert the matched data to string :
matched_data = [matched_data{:}];
This approch can be generalized by making the prefix '/oscp' and '/midi' a string variable and concatenate with the rest of the matching rule.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!