Removing double empty lines from a text file

If a file contains more than one consecutive empty lines, they are replaced by one empty line.
% reading file
fid=fopen(outFile,'rt');
Data = textscan(fid,'%s','Delimiter','\n');
Data=Data{1}; % get rid of nesting
k=1; emptylines_occured=0;
for j=1:numel(Data)
if ~strcmp(Data(j),'') % not empty line
if emptylines_occured
newData{k}=''; k=k+1;
emptylines_occured=0;
end
newData(k)=Data(j); k=k+1;
else % empty line
emptylines_occured=1;
end
end
fclose(fid);
% writing file
fid=fopen(outFile,'wt');
for j=1:numel(newData)
fprintf(fid, '%s\n',newData{j});
end
fclose(fid);
Is there a more concise way?

2 Comments

Can you share your file?
This may be any text file, e.g. an m-file.

Sign in to comment.

 Accepted Answer

You can easily write the new file at the same time as you read the old one, which is faster and uses much less memory. Here is a simple version that create the new file with at most one empty line between any two non-empty lines:
[f1d,msg] = fopen('test_old.txt','rt');
assert(f1d>=3,msg)
[f2d,msg] = fopen('test_new.txt','wt');
assert(f2d>=3,msg)
prv = 'X';
while ~feof(f1d)
new = fgetl(f1d);
if numel(new) || numel(prv)
fprintf(f2d,'%s\n',new);
end
prv = new;
end
fclose(f1d);
fclose(f2d);
The test files are attached. Define prv as an empty char to ignore the leading empty line/s.

3 Comments

Be careful: using 'wt' on MS Windows would introduce \r characters in the file that might not have been there before.
@Walter Roberson: the original question uses the t option for both reading and writing, so presumably this is not a problem.
bbb_bbb
bbb_bbb on 9 Feb 2018
Edited: Stephen23 on 9 Feb 2018
This works excellently. Thanks.

Sign in to comment.

More Answers (1)

%read the file _and_ do the work of deleting extra empty lines.
new_text = regexprep( fileread(outFile), '(\r?\n)(\r?\n)+', '$1');
%write the result to a new file
fid = fopen('text_new.txt', 'w');
fwrite(fid, new_text);
fclose(fid)

3 Comments

bbb_bbb
bbb_bbb on 8 Feb 2018
Edited: bbb_bbb on 8 Feb 2018
This deletes all empty strings and garbles russian characters!
new_text = regexprep( fileread(outFile), '(\r?\n\r?\n)(\r?\n)+', '$1');
bbb_bbb
bbb_bbb on 8 Feb 2018
Edited: bbb_bbb on 8 Feb 2018
There is still problem with non-english characters. They are turned into 0xFF.

Sign in to comment.

Categories

Asked:

on 8 Feb 2018

Edited:

on 9 Feb 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!