Directory listing of extended ascii in windows

9 views (last 30 days)
Jim Hokanson
Jim Hokanson on 24 Aug 2013
EDIT: This question raised some interesting issues but I don't consider it to be answered. Based on feedback from this question I have asked a similar question with a much more specific task, http://www.mathworks.com/matlabcentral/answers/86186-working-with-unicode-paths.
ORIGINAL: Hi all,
I have a filename with 'é' in it. Dir() doesn't work and reports this as two separate characters, 'e´'. I'm using Win 7. Is there a setting I can change in Matlab or Windows to get this to work right? If I use Java things seem to work fine:
my_java_dir = java.io.File(my_dir);
file_list = my_java_dir.listFiles();
I'd rather things "just work" instead of using Java.
Thoughts?
Thanks, Jim
EDIT: This is a summary of some of the comments:
The code I am running is:
temp = dir(my_path);
file_name = temp([#]).name
For a file on windows automatically generated using a proprietary program, the file name includes the following character, 'é'
In Matlab however, file_name contains the following chars instead: 'e´'
From what I can tell, using native Matlab functionality, it is not possible to read a non 7-bit ascii file on a mac:
EDIT: I did not realize this was going to be as difficult to actually accomplish (i.e. to answer properly) as it has turned out to be. The details of some of the tests I have run have become a bit lost in the comments although at this point they are not relevant to a solution. At this point I don't consider the problem to be solved but I don't even have a test framework for trying to solve this problem! When I get a chance I'll be uploading an example file for people to test. Thanks.
  10 Comments
Jan
Jan on 27 Aug 2013
@Jim: This is an important question and equivalent problems will occur in the work of many users. The humor-looking part of my replies is caused by frustration after struggling with Unicode too long. But the problem is serious and my suggestion to avoid non-ASCII is also.

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 24 Aug 2013
What is the underlying file system type of the directory you are trying to work with? If it is not NTFS then you have a problem; see http://msdn.microsoft.com/en-us/library/windows/desktop/dd317748%28v=vs.85%29.aspx
  3 Comments
Jim Hokanson
Jim Hokanson on 25 Aug 2013
Dead bytes, yikes! I like +0, easier to type than double(str). I've added some clarifications in response to Jan's question, see above. Thanks.

Sign in to comment.


Jan
Jan on 25 Aug 2013
Edited: Jan on 25 Aug 2013
This sounds totally cruel. I've struggled UTF16 and UTF8 conversions for the file access also.
When I run this on my Win7/64 PC/local NTFS disk/Language = 'en_us.windows-1252' I get the expected correct results:
str = ['t', 233, 'st.txt'];
fid = fopen(str,'w');
fclose(fid);
a = dir('t*.txt'); % other patterns do not change the answer
double(a.name)
>> 116, 233, 115, 116, 46, 116, 120, 116
This is displayed in the Windows Explorer correctly also. But the DOS command DIR fails of course:
!dir t*st.txt
>> 25.08.13 23:20 8 tst.txt
It matters what "yields on disk" exactly mean. How did you test this?
  5 Comments
Jim Hokanson
Jim Hokanson on 27 Aug 2013
Jan, I agree, don't use special characters in file names. I tend not to but this particular example came from some file "in the wild." It would be nice to have a well documented set of rules of what can be done and what can't with respect to unicode. For example, Matlab's usage of a 16 byte character means it is impossible to accurately handle UTF-8 data streams which are only well mapped to UTF32 (4 byte character) data. Like many things, I think the first step is probably well documented (centrally, i.e. by TMW) usage modes and failures points.
Cédric, the problem actually comes from a Hungarian name, Georg Von Békésy, so it's the Hungarians that are giving me problems, not the French :)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!