fprintf is printing strange characters instead of numbers
Show older comments
I want to print a vector of unsigned integers to a text file, with a space between each number. But the file I get is just weird symbols. I must be doing some trivial mistake, it's not the first time it happens, I can't remember what could be the fix. It's happening on R2018b (but I remember it happening on older versions as well). Here's sample code below:
clear
data = uint32(zeros(1, 1615));
data(1:2:50) = 1;
output = fopen('output.txt', 'wt');
fprintf(output, '%d ', data);
fclose(output);
Output I get is: ‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‱‰‰‰‰‰‰‰‰‰‰ ...
Output I want is: 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 ...
17 Comments
Paolo Binetti
on 3 Oct 2018
Rik
on 3 Oct 2018
I cannot reproduce your problem in R2018b. I suspect there might be an issue with uint32 as an input, but I agree with Guillaume, this should not happen.
Guillaume
on 3 Oct 2018
Looks fine to me. What program are you using to look at its content?
Stephen23
on 3 Oct 2018
Opening the file with Notepad++ shows the 0's and 1's as expected.
Opening the file with MS Notepad shows those unwanted symbols.
What OS are you using?
Paolo Binetti
on 3 Oct 2018
"could you suggest an alternative way?"
- Don't use MS Notepad. It is very weak on features and vulnerable to pointless newline confusion. There are much better text editors around.
- Save as UTF-8 rather than ANSI. When I opened your file and saved it encoded as UTF-8 it could be opened using MS Notepad without any problems (attached as output1.txt).
Paolo Binetti
on 3 Oct 2018
Guillaume
on 3 Oct 2018
When I opened your file and saved it encoded as UTF-8 it could be opened using MS Notepad
You must have saved it as UTF16. For the three characters '01 ' in the file there is no difference at all between ANSI ad UTF8.
"You must have saved it as UTF16. For the three characters '01 ' in the file there is no difference at all between ANSI ad UTF8."
For the characters maybe, but not for the file as a whole: the inclusion of a Byte Order Mark is a significant change. And I notice that excluding the Byte Order Mark made this file unreadable with those same strange characters, even with UTF-8 (as selected by Notepad++).
Rik
on 3 Oct 2018
It is surprisingly difficult to figure out the encoding of a file, even if it is written correctly. There is a reason I wrote my readfile FEX submission in exasperation. I couldn't find any way to read the BOM with Matlab tools.
Walter Roberson
on 3 Oct 2018
Rik, see also my https://www.mathworks.com/matlabcentral/answers/285186-importing-data-without-knowing-number-of-columns#comment_368710 detect_utf_encoding
(I also have internal versions that detect more file types, such as figuring out that a file is an image file or .mat file. I did some research towards trying to figure out which character set a file was probably in, but it looks like I gave up on implementing that.)
Rik
on 3 Oct 2018
That is quite a neat piece of code, thank you for sharing it. I think it is strange that it is so difficult to figure out. Even the reportedly byte-by-byte reading seemingly skips those header bytes.
After testing it, it seems to fail for UTF-8 files I created with Matlab (it detects them as windows-1252). This shouldn't be this hard.
Walter Roberson
on 3 Oct 2018
Was your test file the same as from your FEX submission?
Rik
on 3 Oct 2018
I've attached a zip file with the 4 test files, a tester function and the version of readfile I just checked the files with (slightly tweaked compared to the FEX version, mostly cosmetic changes).
For completeness' sake I also included the version of detect_UTF_encoding you linked to and inserted that in my tester script.
Walter Roberson
on 3 Oct 2018
Yes, unfortunately if there is no byte order mark then it gets more difficult to figure out without false positives.
I have some internal versions of the detection routine that proceed to detect zip and image files and xls and .mat files; my intention was to proceed onwards to detect character encoding such as ISO-8896-1 versus ISO-8896-6 or SHIFT-JIS. I researched that, but once I got into the windows code pages, the number of cases was starting to feel too big for me to bother. Some of the cases could not be told apart (except perhaps by statistics); others had only a single code point difference (that is, one code point was assigned a meaning in one character set but not in the other character set), so if you saw the one codepoint you could disprove a particular character set, but lack of it would not prove the other...
Rik
on 3 Oct 2018
That is why I came close to giving up, even for telling UTF-8 and windows-1252 apart, which should be relatively easy, but turns out to be non-trivial. Notepad++ seems to do a better job than what I can manage with Matlab.
For the files I'm using, my FEX submission works, but I have no idea how future-proof that function is (or past-proof for that matter).
Accepted Answer
More Answers (0)
Categories
Find more on Data Import and Export in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!