fast initialization of cell array of strings mex

4 views (last 30 days)
I'm looking to initialize a cell array of strings in mex as quickly as possible from a long array of characters with an additional array of the locations of starts and stop from which to grab the strings. Any tips on ways of doing this quickly in Matlab?
I am planning on doing roughly the following:
  • intialize with mxCreateCellArray
  • initialize a cell with empty values using mxCreateCharArray - I think that using an empty value will prevent any initialization overhead. It would also be great if I could create a bunch of mxArray headers quickly in matlab without having to call a function for each one.
  • populate the cell data (string value) using string data that is 2 bytes per character and using mxSetData,mxSetM,mxSetN - here it seems that a smart memory allocator would allow initializing a large block of memory and then breaking it into smaller chunks, rather than individually requesting memory from the OS for each string
  2 Comments
Jim Hokanson
Jim Hokanson on 22 Nov 2016
After some testing, it seems like mxMalloc is only roughly 25% slower than malloc. Taking this performance hit seems unavoidable. Since the headers are also addressable (mxArray entries), it seems likely that significantly improving upon the obvious approach is unlikely. The best approach then seems to be to focus on any memory redundancy ...
James Tursa
James Tursa on 22 Nov 2016
See comments about using mxCreateUninitNumericMatrix below.

Sign in to comment.

Accepted Answer

James Tursa
James Tursa on 22 Nov 2016
Edited: James Tursa on 22 Nov 2016
So, sounds like you have a single array of char data in your C routine, with a second int array that contains start/stop locations. I assume you know how many strings are involved up front from the int array. So just use mxCreateCellArray and then loop through your strings with mxCreateString and mxSetCell. It is unclear to me why you think you need mxCreateNumericArray for anything, since it is OK to have a NULL cell element for an empty cell (that's what MATLAB does at the m-file level). Also it is unclear to me what kind of speed/resource advantage you think you will get by using mxSetData, mxSetM, and mxSetN in some way (unless perhaps the strings in your char array are not individually null terminated?). Can you elaborate? Are all of the strings unique or are some of them shared among multiple cell elements?
I don't know how to create a bunch of mxArray's en masse using the official API functions. You have to do it one at a time.
  3 Comments
James Tursa
James Tursa on 22 Nov 2016
I need to think about this some more. But here are my immediate comments:
--- mxCreateString ---
Yes I would agree with you that it must read the source data twice ... once to find the null termination (to get the length necessary for allocation), and then again to copy the source from 1-byte to 2-bytes. So it will probably be doing more work than you need to.
--- mxCreateNumericArray with mxCHAR_CLASS ---
I would instead advise using mxCreateUninitNumericArray which is faster because it just allocates the data memory without the overhead of initializing it also. Seems to work OK with mxCHAR_CLASS (this is what I use in my uninit mex function on the FEX if you want to see an example). The only catch is that this (and the companion mxCreateUnintNumericMatrix) only recently became part of the official API. So if you need a mex routine that works for older versions of MATLAB you will have to provide prototypes. And if you go back farther than R2008b then mxCreateUnintNumericArray isn't even in the library at all (only mxCreateUnintNumericMatrix is) so you would have to write a replacement routine for it (which I do in my uninit mex routine).
--- null terminated strings ---
For the API routines that take C-strings as input, then of course they need to be 1-byte per character and null terminated. But on the mxArray side of things there is no null termination ... it is simply a char array. So no need to worry about that.
So, since you know the lengths up front from your int start/stop array, I would advise using mxCreateUninitNumericMatrix to create the mxArray char variables (since it is available in all versions of MATLAB going back to at least R2006a) although you will need to provide a prototype (since it is not in the official API in earlier versions). Then a simple loop to copy the char (or short) data. Since you know the lengths the source data does not need to be null terminated. Empty cells can be left at NULL (unless there is some reason you want to physically have an empty char mxArray in those spots).
Sharing ---
Unless you have a lot of non-unique strings to deal with, I would not bother with sharing code. But if you did want to share, since you are in a cell array I would use mxCreateReference (0 overhead) instead of mxCreateSharedDataCopy.
--- initializing a large block of memory and then breaking it into smaller chunks ---
Not sure what you are really getting at here. Of course, I would assume that the MATLAB Memory Manager may do this all the time in the background for smaller allocations and keeps track of things (at the risk of fragmentation). But you as the user cannot do this. I.e., you cannot allocate one single chunk of memory and then piece it out to your mxArray char variables because you need to be able to individually deallocate them.
Jim Hokanson
Jim Hokanson on 23 Nov 2016
I think if I wanted to get really fancy the way forward would be to preinitialize short strings and then use the mxCreateReference() function when duplicates are encountered.
As always, thanks for the help.

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!