Error when reading the text file

Question

M on 10 Apr 2023

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1944819-error-when-reading-the-text-file

Commented: Walter Roberson on 12 Apr 2023

Accepted Answer: Walter Roberson

output3.txt

Open in MATLAB Online

I have the following text file and I want to read it, when I read it I got the following error :

I know that the source of error comes from the lines that are like this:

1.00 0.9

9 1.00 0.99

it should be one line like this

1.00 0.99 1.00 0.99

is there a method to organize these lines, if not how can I exclude it automatically?

data = dlmread("output3.txt");
Error using dlmread
Unable to parse a "Numeric" field when reading row 918, field 1.
   Actual Text: "-"
   Expected: A number or literal "NaN", "Inf". (possibly signed, case insensitive)

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 11 Apr 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1944819-error-when-reading-the-text-file#answer_1213194

Open in MATLAB Online

output3.txt

S = fileread('output3.txt');
filtered_S = strjoin(regexp(S, '^\s*\S+\s+\S+\s+\S+\s+\S+\s*$', 'match', 'lineanchors'), '\n');
data = cell2mat(textscan(filtered_S, '%f %f %f %f'));
whos data
  Name         Size             Bytes  Class     Attributes

  data      7991x4             255712  double              

2 Comments
Show NoneHide None

M on 12 Apr 2023

Edited: M on 12 Apr 2023

@Walter Roberson Thank you, this is working well.

But could you please tell me what is the filtering doing exactly?

Walter Roberson on 12 Apr 2023

For regexp,

^ together with the lineanchors option is a pattern that matches only at the beginning of lines.

\s matches exactly one "whitespace", and \s* matches "zero or more whitespace in a row". Putting that together with the ^ that means that ^\s* looks for the beginning of lines and skips over any leading whitespace that might happen to be there.

\S matches exactly one "non-whitespace" and \S+ matches "one or more non-whitespace in a row".

After \S+ has finished matching characters in a column, you are in two possible situations: you might have reached the end of a line, or you might be positioned at at least one whitespace. We have matched one column so far and we want to discard lines that do not have four columns, so we can proceed with \s+ which is "one or more whitespace".

Then we are back to \S+ to match the second column, then \s+ to match the space between the second and third columns, then \S+ to match the third column, then \s+ to match the space between the third and fourth columns, then \S+ to match the fourth column. After that \s* matches "zero or more whitespace" -- so it absorbs any (possibly none) whitespace after the fourth column.

Lastly in the pattern $ together with the lineanchors option matches an end of line.

So we have constructed a pattern that matches exactly four columns of characters separated by whitespace, with optional whitespace at the beginning and end of line.

The 'match' causes regexp() to return what was matched. There are ways to modify what exactly is returned, but in this case what would be returned output would be the entire "inside" of the line, first to last characters (excluding the end-of-line characters).

When you do not use the 'once' option, regexp() continues processing the input, going back to the start of the pattern -- so it is looking through the file, picking out all the lines that have exactly four columns of text, and returning only those. Any line that does not have exactly four columns of text will be discarded by this pattern.

With these particular options and pattern, regexp() will return a cell array of character vectors, one entry for each line that it matched.

The strjoin() operating on the cell array of character vectors splices all of the cell array entries together again with newline between them. The result is a character vector in which every line has exactly four columns, and which differs from the original file in that all lines that did not have exactly four columns would be discarded.

Note that if you just happened to have a split line such as

1.0

0 0.99 1.00 0.99

then regexp would see that second line as having four columns and would blindly accept that line, discarding the 1.0 line. This pattern is not foolproof.

It would be possible in theory to:

detect a line that seemed to start validly with a floating point number, but had too few columns, and so infer that the following line must be corrupt and so discard the current line along with the following line
detect a line that started with no whitespace and immediately had an integer with no decimal point, and infer that the previous line must have been split and so discard the current line along with the previous line

... but it would be a bit of a nuisance.

Sign in to comment.

Answer 2

Image Analyst on 10 Apr 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1944819-error-when-reading-the-text-file#answer_1213179

Edited: Image Analyst on 10 Apr 2023

It's a highly corrupted file - it's a dumpster fire.

Numbers are missing on some lines, and many lines are blank with no numbers at all. Some lines have text and numbers both in the line.

I suggest you investigate why your file is being written out so messed up in the first place rather than try to read and repair a badly damaged file.

3 Comments
Show 1 older commentHide 1 older comment

Image Analyst on 11 Apr 2023

Edited: Image Analyst on 11 Apr 2023

Then perhaps you should post the original file before you manually altered it.

To process a sequence of files, see the FAQ:

https://matlab.fandom.com/wiki/FAQ#How_can_I_process_a_sequence_of_files?

[EDIT] Wait a minute. Are you saying the corrupted file you attached was the original file, or the one you manually edited?

M on 12 Apr 2023

I attached the original file

Sign in to comment.

Error when reading the text file

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

Error when reading the text file

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

3 Comments
Show 1 older commentHide 1 older comment