How do I exclude certain lines from data files?
Show older comments
I am trying to extract numerical values from hexadecimal values which are generated and stored in the form of .csv data files. However, the format of these .csv data files was recently altered by an update, such that the code no longer works as it relies on the detection of a keyword, in this case 'CUSTOM_MODE_STEP', in order to pick out the relevant lines in the data file. For context, the original data format is thus:
CUSTOM_MODE_STEP_0 = { 0x40 0x43 0x7D 0xAF 0x96 0xA2 0x00 0x00 }
CUSTOM_MODE_STEP_1 = { 0x00 0xF4 0x7D 0xAF 0x96 0xA2 0x01 0x00 }
CUSTOM_MODE_STEP_2 = { 0x7C 0x00 0x7D 0xAF 0x96 0xA2 0x02 0x80 }
However, the format was changed, so that the hexadecimal-containing lines in the data files are now separated by strings of text which obviously cannot be read:
CUSTOM_MODE_STEP_0 = { 0x40 0x43 0x7D 0xA2 0xA2 0xA2 0x00 0x00 }
CUSTOM_MODE_STEP_0_DESCRIPTION = [text]
CUSTOM_MODE_STEP_1 = { 0x00 0xF4 0x7D 0xA2 0xA2 0xA2 0x01 0x00 }
CUSTOM_MODE_STEP_1_DESCRIPTION = [text]
CUSTOM_MODE_STEP_2 = { 0x7C 0x00 0x7D 0xA2 0xA2 0xA2 0x02 0x80 }
CUSTOM_MODE_STEP_2_DESCRIPTION = [text]
I am using a pre-written script, and I am trying to edit it so that it can accommodate this change. The script is below:
for f=fields'
if contains(f,'CUSTOM_MODE_STEP')
ht = DataN.Periph.(char(f));
list = strsplit(ht,{',', '{', '}'});
DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))];
DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
end
end
The variable 'fields' is a 27x1 array, of which the CUSTOM_MODE_STEP variables (both hexadecimal and text values) are present within.
I was thinking of inserting an elseif statement like:
elseif contains(f,'DESCRIPTION')
but I'm unsure as to what command to use exactly to exclude those lines. I've also thought about referencing the correct cells in that array using fields{} but that hasn't worked:
f=fields{17),fields{19},fields{21};
Those numbers being the coordinates for the hexadecimal lines.
Any further information needed please let me know.
Answers (2)
Guillaume
on 23 Oct 2018
It sounds like your original code is very fragile. Looking at the portion you show, it's also not very efficient since there's a lot of array resizing. A single regexp, a call to sscanf and a bit of cell array manipulation is probably all that is needed to get the data you want.
It would be useful to have an example text file to validate against. With the attached file, based on your example data, this is the code I'd use:
filecontent = fileread('test.csv'); %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});
If you then want to convert that into a table with the same variable names as your original structure:
steps = array2table([stepnumber, stepvalues(:, 1:6)], 'VariableNames', {'Step', 't_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'})
3 Comments
Stanley
on 23 Oct 2018
Guillaume
on 8 Jan 2019
The only change that needs to be made to my original code, to account for the additional , separating the hex values in your latest example, is to replace the '0x%x ' in the sscanf call by '0x%x, ', so:
filecontent = fileread('test.csv'); %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x, ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});
"Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script"
While I can understand the resistance, the way you have it coded at present, the input format, parsing and creating of output data are all deeply interlinked. As you've found out, if the file format change you need to review everything. I would think that changing the design now would result in a lot less pain later. If it were me, I would write a parser that would be even more generic than the above (store the parsed data as key/values pairs) and afterward just look up the required keys.
Anyway, it is trivial to convert the output of the above into your original structure:
fnames = {'t_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'};
namevalues = [fnames; num2cell(stepvalues(:, 1:6), 1)];
dataN = struct(namevalues{:})
per isakson
on 4 Jan 2019
I downloaded example750.csv and tried a different approach of extracting and converting the hex-values
>> cssm( 'example750.csv' )
ans =
64 67 125 162 162 162 0 0
0 244 125 162 162 162 1 0
124 0 125 162 162 162 2 128
where
function out = cssm( ffs )
%
%% Read the file to a cell array of character rows
fid = fopen( ffs, 'r' );
cac = textscan( fid, '%[^\r\n]' );
cac = cac{1};
[~] = fclose( fid );
%% Extract the rows with hex values
pos = regexp( cac, 'CUSTOM_MODE_STEP_\d+\s+=\s+\{' );
cac( cellfun( @isempty, pos ) ) = [];
%% Extract the hex values, which are two characters following "0x"
hex = regexp( cac, '(?<=0x)[A-F\d]{2}', 'match' );
%% Convert to dec values. (hex2dec returns a column, thus reshape.)
dec = cellfun( @(c) reshape(hex2dec(c),1,[]), hex, 'uni',false );
out = cell2mat( dec );
end
Categories
Find more on Scripts in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!