Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
mxChar to wchar_t in Linux C-mex

Subject: mxChar to wchar_t in Linux C-mex

From: Jan Simon

Date: 19 Jan, 2011 00:28:05

Message: 1 of 5

Dear readers,

How do I convert a Matlab string to a wchar_t string in a C-Mex file under Linux?

I try to access file in a C-Mex and want to consider Unicode characters in the names also. Under Windows the conversion seems to be trivial: Just copy the bytes of the mxChar vector to a wchar_t vector and append a L'\0'. Both mxChar and wchar_t are actually UINT16.

Under Linux wchar have 4 bytes and a specific conversion is needed. I thought the C-function mbstowcs would do the conversion, but I can try it on Windows only: There mbstowcs stops after the first character of the Matlab string.

The documentation of the Mex-API function mxArrayToString tells, that "it supports multibyte character sets", but the output is a char*. Would this be correct then:
  wchar_t *WStr;
  char *Str;
  Str = mxArrayToString(prhs[0]); // prhs[0] is a Matlab string
  mwSize Len = mxGetNumberOfElements(prhs[0]);
  WStr = (wchar_t *) mxMalloc((Len + 1) * sizeof(wchar_t));
  mbctowcs(WStr, Str, Len); // of course with checking the output
  WStr[Len] = '\0'; // terminator

Or is there a direct way to convert mxChar to wchar_t strings?

Google finds just a few hits for "mxChar wchar Matlab", e.g.:
http://www.mathworks.se/matlabcentral/newsreader/view_thread/236507
It was a great that TMW decided to use 2 byte CHARs. But the documentation of at least Matlab 5.3 to 2009a does not really explain, how mxArrayToString handles the 2nd byte.

Thanks, Jan

Subject: mxChar to wchar_t in Linux C-mex

From: Jan Simon

Date: 19 Jan, 2011 21:58:04

Message: 2 of 5

Dear readers,

Bump. Are unicode file names too rare to catch the interest?

I've found the undocumented mxArrayToString_UTF16. But I did not get it to run currently.

Kind regards, Jan

Subject: mxChar to wchar_t in Linux C-mex

From: Praetorian

Date: 20 Jan, 2011 01:37:35

Message: 3 of 5

On Jan 18, 5:28 pm, "Jan Simon" <matlab.THIS_Y...@nMINUSsimon.de>
wrote:
> Dear readers,
>
> How do I convert a Matlab string to a wchar_t string in a C-Mex file under Linux?
>
> I try to access file in a C-Mex and want to consider Unicode characters in the names also. Under Windows the conversion seems to be trivial: Just copy the bytes of the mxChar vector to a wchar_t vector and append a L'\0'. Both mxChar and wchar_t are actually UINT16.
>
> Under Linux wchar have 4 bytes and a specific conversion is needed. I thought the C-function mbstowcs would do the conversion, but I can try it on Windows only: There mbstowcs stops after the first character of the Matlab string.
>
> The documentation of the Mex-API function mxArrayToString tells, that "it supports multibyte character sets", but the output is a char*. Would this be correct then:
>   wchar_t *WStr;
>   char *Str;
>   Str = mxArrayToString(prhs[0]);  // prhs[0] is a Matlab string
>   mwSize Len = mxGetNumberOfElements(prhs[0]);
>   WStr = (wchar_t *) mxMalloc((Len + 1) * sizeof(wchar_t));
>   mbctowcs(WStr, Str, Len);  // of course with checking the output
>   WStr[Len] = '\0';  // terminator
>
> Or is there a direct way to convert mxChar to wchar_t strings?
>
> Google finds just a few hits for "mxChar wchar Matlab", e.g.:http://www.mathworks.se/matlabcentral/newsreader/view_thread/236507
> It was a great that TMW decided to use 2 byte CHARs. But the documentation of at least Matlab 5.3 to 2009a does not really explain, how mxArrayToString handles the 2nd byte.
>
> Thanks, Jan

The mxArrayToString() documentation is ludicrous. If I recall
correctly, it says something like "supports multi-byte strings" and
that's it, not a single example of a UTF-16 string being converted
from an mxArray to a C-string. Since the function returns a char* I've
always assumed it converts the string to UTF-8, but mentioning that in
the documentation would be very helpful.

If you want a solution that can handle all Unicode encoding cases you
throw at it (think UTF-16 surrogate pairs etc.) then I'd suggest using
the ICU library (http://site.icu-project.org/). It is open source and
very reliable when it comes to all things Unicode.

If you don't want to use an external library, a solution that will
work in most cases is to do a copy yourself, do not use
mxArrayToString().

whcar_t *wstring = mxMalloc( (mxGetNumberOfElements(prhs[0]) + 1) *
sizeof(wchar_t) );
uint16_T *ch = mxGetData(prhs[0]);
wchar_t *p = wstring;

for( mwSize i = 0; i < mxGetNumberOfElements(prhs[0]); ++i ) {
  *p++ = *ch++;
}
*p = 0;

This doesn't handle UTF-16 characters that lie outside of the basic
UTF-16 code plane (I think they're called surrogate pairs). You can
find people's solutions to handling those if you google for it, but
once again, if you're serious about supporting all such cases, I'd
point you back to the ICU library.

Regards,
Ashish.

Subject: mxChar to wchar_t in Linux C-mex

From: Yair Altman

Date: 20 Jan, 2011 07:16:04

Message: 4 of 5

"Jan Simon" wrote in message <ih7mpc$q2q$1@fred.mathworks.com>...
> Dear readers,
>
> Bump. Are unicode file names too rare to catch the interest?
>
> I've found the undocumented mxArrayToString_UTF16. But I did not get it to run currently.
>
> Kind regards, Jan

If you look at the libmex.dll file, you will see hundreds of other similar functions, most of them undocumented. Following is the R2010a version. In particular, you will find:

mxArrayToByteChars
mxArrayToString
mxArrayToString_UTF16
mxArrayToString_UTF8

...and similarly:

mxCharMatrixToStrings
mxCharMatrixToStrings_UTF16
mxCharMatrixToStrings_UTF8

...and also some reverse functions:

mxCreateCharMatrixFromStrings
MXCREATECHARMATRIXFROMSTRINGS700
_MXCREATECHARMATRIXFROMSTRINGS700@12
_MXCREATECHARMATRIXFROMSTRINGS700_@12
MXCREATECHARMATRIXFROMSTRINGS730
_MXCREATECHARMATRIXFROMSTRINGS730@12
_MXCREATECHARMATRIXFROMSTRINGS@12
mxCreateCharMatrixFromStrings_700
mxCreateCharMatrixFromStrings_730
mxCreateCharMatrixFromStrings_UTF16
mxCreateCharMatrixFromStrings_UTF8
MXCREATECHARMATRIXFROMSTRS
MXCREATECHARMATRIXFROMSTRS700
_MXCREATECHARMATRIXFROMSTRS700@12
_MXCREATECHARMATRIXFROMSTRS700_@12
MXCREATECHARMATRIXFROMSTRS730
_MXCREATECHARMATRIXFROMSTRS730@12
_MXCREATECHARMATRIXFROMSTRS@12
mxCreateCharMatrixFromStrs_700

MXCREATESTRING
mxCreateString
_MXCREATESTRING@8
mxCreateString_UTF16
mxCreateString_UTF8
mxCreateStringFromNChars
mxCreateStringFromNChars_700
mxCreateStringFromNChars_730
mxCreateStringFromNChars_UTF16
mxCreateStringFromNChars_UTF8

...and finally:

mxGetNChars
mxGetNChars_700
mxGetNChars_730
mxGetNChars_UTF16
mxGetNChars_UTF8

MXGETSTRING
mxGetString
MXGETSTRING700
_MXGETSTRING700@16
_MXGETSTRING700_@16
MXGETSTRING730
_MXGETSTRING730@16
_MXGETSTRING@16
mxGetString_700
mxGetString_730
mxGetString_UTF16
mxGetString_UTF8

Note: suffixes such as 700 or 730 probably refer to the supported Matlab version (7.0, 7.3).

Yair Altman
http://UndocumentedMatlab.com

Subject: mxChar to wchar_t in Linux C-mex

From: Jan Simon

Date: 21 Jan, 2011 10:18:05

Message: 5 of 5

Dear Ashish, dear Yair,

Thanks for the helpful answers!
The ICU-lib seems to be the secure solution.
The undocumented functions are partially helpful due to the "un".

I'll send an enhancement request to TMW. If they have been so cute to decide for ushort16 characters before Microsoft, Apple and the Linux community started the ridiculous inconsistent 1-, 2-, 4-byte WCHAR implementations, it would be very nice to offer an interface to access the values of mxChar variables. The NATIVE2UNICODE function is a good start, but a documented Mex inteface is demanded also.

Kind regards, Jan

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us