How to read strings from file with fscanf or sscanf (NOT textscan)?

Question

LifeSux SuperHard on 17 May 2013

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/76197-how-to-read-strings-from-file-with-fscanf-or-sscanf-not-textscan

So, of course, I'm having a little trouble right now. I'm trying to read a text file that goes something like this in a columnar order. What I would like to do is store the number, character and string columns seperately in arrays.

[Numbers] [Characters] [Strings]

Now, while I have figured out how to read the number and character columns into their own arrays, I cannot seem to do so with the string column. At least, not with fscanf or sscanf, which are the commands I want to use.

How can you read a file organized as such using fscanf or sscanf? (I know about textscan, I want to know if this is possible with fscanf or sscanf).

The first thing I tried was the following:

fid = fopen('Data.txt', 'w+'); 
B = fscanf(fid, '%d %c %s', [3,inf]);

Now while this worked fine for just the numbers and chars (i.e. B = fscanf(fid, '%d %c', [2,inf])), it fails for the above in the sense that it reads everything out of order (e.g. instead of B = [1,2,3...; a,b,c...; ABC, DEF, GHI...] I get B = [1,65,65; 66, 67, 2;...], just junk basically).

So I researched a bunch and tried out this:

fid = fopen('Data.txt', 'w+'); 
i = 1;
while ~feof(fid)
     line = fgets(fid);
     M(i) = sscanf(line, '%d, %c, %s', [3,inf];
     i = i+1; 
end

This runs, but M ends up coming out only as a row vector consisting of the first column of numbers in the data file. It just completely ignores the existence of chars and strings.

Now, to get a better understanding of the sscanf function I tried the following

fid = fopen('Data.txt', 'w+'); 
    i = 1;
    while ~feof(fid)
         line = fgets(fid);
         M(i) = sscanf(line, '%d, %d, %d', [3,inf];
         i = i+1; 
    end

For a sample set of data consisting of just columns of numbers. This, incidentally, does exactly the same thing as previously; it just reads the first number column of the data and quits. So, I don't even know how to use sscanf, feof, or fgets properly, basically. So I could also use some help here as well.

And I know trying to read just columns of numbers is trivial with fscanf, but I'm trying to understand sscanf and fgets here.

3 Comments
Show 1 older commentHide 1 older comment

Cedric on 17 May 2013

Edited: Cedric on 17 May 2013

Could you provide us with a few lines of your data file? There are several options that we can discuss.. meanwhile, be aware that 65 is the ASCII code of character 'A', 66 for 'B', etc, so what you thought was junk is not; it is 'ABC' not read/interpreted/displayed as a string.

LifeSux SuperHard on 17 May 2013

Edited: LifeSux SuperHard on 17 May 2013

Open in MATLAB Online

It is ASCII code, I imagine.

A ABC
A ABC
A ABC
A ABC
A ABC
A ABC
A ABC
A ABC
A ABC
A ABC

I created this expressly for testing these functions, but that is how my real data is organized (well, sort of, my data only has numbers and strings, and the strings are only two letters long, not three).

And yes, when I tried this with a column of numbers and a column of characters with fscanf I get

  65
  65
  65
...  ...

but this is simple to convert back to a character array (c = char(B(:,2)')). However, with the numbers, strings, and characters I get the following with fscanf

65 65
67 2
65 66
... ... ...

Where I would want something like

65 ?
65 ? 
65 ?
... ... ...

Sign in to comment.

Sign in to answer this question.

Answer 1

Cedric on 18 May 2013

2
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/76197-how-to-read-strings-from-file-with-fscanf-or-sscanf-not-textscan#answer_85861

Edited: Cedric on 18 May 2013

Open in MATLAB Online

Just a few alternate thoughts (and I'll think about FSCANF over the week end a little more).

=== Using REGEXP (available in almost all languages):

.. and the following content (to illustrate the flexibility):

A ABC
B ABC
C ABC DEF
D ABC
E ABC FGH
F ABC
G ABC
H ABC
I ABC
J ABC

Code:

 >> buffer = fileread('data.txt') ;    % Could be performed with FOPEN/FREAD 
                                       % to be more generic.
 >> pattern = '(?<Column1>\d+)\s(?<Column2>\w+)\s+(?<Column3>.*?)[\r\n]' ;
 >> n = regexp(buffer, pattern, 'names')
 n = 
 1x10 struct array with fields:
    Column1
    Column2
    Column3
 >> n(2)
 ans = 
    Column1: '2'
    Column2: 'B'
    Column3: 'ABC'
 >> n(3)
 ans = 
    Column1: '3'
    Column2: 'C'
    Column3: 'ABC DEF'
 >> str2double({n(:).Column1})
 ans =
     1     2     3     4     5     6     7     8     9    10

etc .. here I used named tokens and a struct array output, just for the fun of it. I don't think that it is what you are looking for, but I just wanted to illustrated a regexp-based approach for the record.

=== Reading array of chars and converting to cell array based on position of spaces and \n and/or \r:

... to update if asked by OP.

=== Using FSCANF:

.. and the following, more regular content:

A ABC
B ABC
C ABC
D ABC
E ABC
F ABC
G ABC
H ABC
I ABC
J ABC

Code:

 fid  = fopen('data_regular.txt', 'r') ;
 data = cell(1e6, 3) ;                    % Prealloc.
 rCnt = 0 ;                               % Row counter.
 while ~feof(fid)
    rCnt = rCnt + 1 ;
    data{rCnt,1} = fscanf(fid, '%d', 1) ;
    data{rCnt,2} = fscanf(fid, '%s', 1) ;
    data{rCnt,3} = fscanf(fid, '%s', 1) ;
 end
 fclose(fid) ;
 data = data(1:rCnt,:) ;                  % Truncate.

Using this, we get:

 >> data
 data = 
    [ 1]    'A'    'ABC'
    [ 2]    'B'    'ABC'
    [ 3]    'C'    'ABC'
    [ 4]    'D'    'ABC'
    [ 5]    'E'    'ABC'
    [ 6]    'F'    'ABC'
    [ 7]    'G'    'ABC'
    [ 8]    'H'    'ABC'
    [ 9]    'I'    'ABC'
    [10]    'J'    'ABC'

Note that EOF should be tested a little better (and not every three FSCANF, which assumes a well formed file). The whole could be in a TRY/CATCH statement otherwise.

=== Using FGETL + SSCANF:

It is more complicated than FSCANF, because the later moves forward an internal file pointer/counter as it reads the content, so the next read operation takes what follows. SSCANF doesn't work like this and you have to indicate what to extract and what to skip in the format. To illustrate:

 >> s = '12 A ABC' ;
 >> sscanf(s, '%d')                 % OK for the number.
 ans =
     12
 >> sscanf(s, '%s')                 % Can we do the same for the 2nd col? KO.
 ans =
 12AABC
 >> sscanf(s, '%*d %s', 1)          % Skip # and read a 1 char string => KO, ASCII.
 ans =
 65
 >> char(sscanf(s, '%*d %s', 1))    % => char, OK.
 ans =
 A
 >> char(sscanf(s, '%*d %s %*s'))   % Or read a string and skip next.
 ans =
 A
 >> char(sscanf(s, '%*d %*s %s'))   % Same for 3rd column, but dim KO.
 ans =
 A
 B
 C
 >> char(sscanf(s, '%*d %*s %s')).' % Transpose, OK.
 ans =
 ABC

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

per isakson on 17 May 2013

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/76197-how-to-read-strings-from-file-with-fscanf-or-sscanf-not-textscan#answer_85843

Edited: per isakson on 17 May 2013

Open in MATLAB Online

What you see is as documented. Clip from on-line help:

    sscanf finds three word matches for %s and two numeric matches for %d. Because
    the format specifier has a mixed %d and %s format, sscanf converts all 
    nonnumeric characters to numeric:
    [str count] = sscanf('5 strings and 4 spaces', '%d%s%s%d%s');
    str'
      Columns 1 through 9
         5   115   116   114   105   110   103   115    97
      Columns 10 through 18
       110   100     4   115   112    97    99   101   115
    count
    count =
         5

sscanf returns a numeric or a character array.

textscan can produce the output you are looking for.

4 Comments
Show 2 older commentsHide 2 older comments

LifeSux SuperHard on 17 May 2013

My real concern is that I cannot get fscanf to read a text document with strings in any obvious order, and I cannot get sscanf to work...at all really, not that they only read numeric data.

Why does sscanf only read the first column, as in the code I posted above? I don't know. I would like to know.

Incidentally, I just solved the fscanf problem . Each letter is treated as a number (as is made quite evident in the above answer), so that 'ABC' is 'n1,n2,n3.' With fscanf I was trying to read the file into an array of size [3,inf]. However, because each letter counts as it's own number, I really had to be reading the file into an array of size [5,inf]. I just tried this, and it worked perfectly. I imagine I can concatenate the the character arrays into one cell somehow but I'll worry about that later. But this only solves the problem if your string column consists of strings all of the same length. I still wouldn't know how to read a file with columns of strings of variable length with fscanf.

I still haven't solved the sscanf problem, but I'll try applying some of this new information to it.

And, with all due respect, I tried to make it fairly clear that I knew about textscan and had chosen not to use it. Yes, it produces the output I want. It works beautifully, but it is not the solution I am interested in.

The thing is, I don't know if textscan exists outside of Matlab. However, I definitely know that sscanf and fscanf exist outside of matlab in languages like c, c++, java, and others. I want to be familiar with commands that are used across many languages, not just one. And, sure, these commands are probably different slightly across various languages, but practicing with them in matlab is better than not using them at all.

I still am very interested in anything else people have to say about the sscanf problem.

LifeSux SuperHard on 17 May 2013

yeah no problem, it was a little vague.

per isakson on 18 May 2013

Edited: per isakson on 18 May 2013

"the sscanf problem" does that refer to "everything out of order"? I assume so.

The source of the "problem" is that Matlab reads data in column order, which is because of early influences from FORTRAN.

Sign in to comment.

How to read strings from file with fscanf or sscanf (NOT textscan)?

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

4 Comments
Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Community Treasure Hunt

How to read strings from file with fscanf or sscanf (NOT textscan)?

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

4 Comments Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments