Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Quickly list variables in a MAT-file

Subject: Quickly list variables in a MAT-file

From: Benjamin Kraus

Date: 4 Feb, 2012 19:48:29

Message: 1 of 10

I'm writing a script that goes through several data directories looking for MAT files that meet a particular criterion. For each MAT file, I want to find out if it contains of a few specific variable names. If it does, I need to load the MAT file to get more information. Otherwise, the script can just ignore the MAT file.

I thought this could be done fairly simply with the 'who' or 'whos' command, but I'm running into some trouble. In particular, many of the MAT files are somewhat large (5-10MB), and calling 'whos' or 'who' on these files takes several seconds each, time which adds up very quickly.

For example, with one particular MAT file (that is ~6MB), I ran the following commands:

>> tic; vars = who('-file',name.mat'); toc
Elapsed time is 11.637099 seconds.

>> tic; vars = whos('-file','name.mat'); toc
Elapsed time is 11.225744 seconds.

>> tic; m = matfile('name.mat'); vars = fieldnames(m); toc
Elapsed time is 11.937726 seconds.

Now the odd part is that if I try to read a non-existent variable, it takes hardly any time at all:

>> tic; data = load('bk44-20110520-timefix-01.mat','history'); toc
Warning: Variable 'history' not found.
Elapsed time is 0.000965 seconds.

This leads me to believe that MATLAB is able to check for the existence of a single variable very quickly, so I don't understand why commands like 'who' and 'whos' take so long, and I'm wondering if there is a faster approach.

I'll be calling this function on at least 100+ MAT files, so the more I can speed up the function the better.

Any suggestions? I feel like this should be a fairly easy thing to do, so I'm hoping there is just a function I don't know about that does this already. My fallback plan is the following, but this doesn't quite work for everything I was hoping to do.

ws = warning('off','MATLAB:load:variableNotFound');
data = load('bk44-20110520-timefix-01.mat','history');
if(~isempty(data)) work with the file.
warning(ws)

- Ben

Subject: Quickly list variables in a MAT-file

From: Friedrich

Date: 6 Feb, 2012 13:13:09

Message: 2 of 10

Hi,

the performance seems pretty bad. Is the mat file located at some network drive?Are other users/application accessing your hard drive at the same time like you do?

Did you try using the C MATfile API to benchmark the perfomance? If not, you can try the following C code:

#include "mex.h"
#include "mat.h"

void mexFunction( int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[] )
{
  
   mxArray *var_info;
   const char *name;
   MATFile *mf = matOpen(mxArrayToString(prhs[0]), "r");
   var_info = matGetNextVariableInfo(mf, &name);
   while (var_info != NULL){
       mexPrintf("%s \n",name);
       var_info = matGetNextVariableInfo(mf, &name);
   }
   matClose(mf);
}

Compile it and call it, e.g. with

mat_test('a.mat')

Is the performance still that bad? Or does it speed things up?

Subject: Quickly list variables in a MAT-file

From: Friedrich

Date: 6 Feb, 2012 13:32:10

Message: 3 of 10

I improved the code a bit. Now it returns the name of the variables in the mat file as cell array to MATLAB. In addition, the mat file processing should be faster now. Please note, that there is no error checking in that C code, so be carefull with it:

#include "mex.h"
#include "mat.h"

void mexFunction( int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[] )
{
  const char **var_names;
  mwSize dims[2];
  int n_vars;
  int i;
  
  MATFile *mf = matOpen(mxArrayToString(prhs[0]), "r");
    
  var_names = (const char **)matGetDir(mf, &n_vars);
  
  dims[0] = 1;
  dims[1] = n_vars;
  plhs[0] = mxCreateCellArray( (mwSize) 2, dims);
  
  for(i = 0;i<n_vars;i++){
      mexPrintf("%s\n",var_names[i]);
      mxSetCell(plhs[0],i,mxCreateString(var_names[i]));
  }
  
  matClose(mf);
}

"Friedrich" wrote in message <jgojl5$440$1@newscl01ah.mathworks.com>...
> Hi,
>
> the performance seems pretty bad. Is the mat file located at some network drive?Are other users/application accessing your hard drive at the same time like you do?
>
> Did you try using the C MATfile API to benchmark the perfomance? If not, you can try the following C code:
>
> #include "mex.h"
> #include "mat.h"
>
> void mexFunction( int nlhs, mxArray *plhs[],
> int nrhs, const mxArray *prhs[] )
> {
>
> mxArray *var_info;
> const char *name;
> MATFile *mf = matOpen(mxArrayToString(prhs[0]), "r");
> var_info = matGetNextVariableInfo(mf, &name);
> while (var_info != NULL){
> mexPrintf("%s \n",name);
> var_info = matGetNextVariableInfo(mf, &name);
> }
> matClose(mf);
> }
>
> Compile it and call it, e.g. with
>
> mat_test('a.mat')
>
> Is the performance still that bad? Or does it speed things up?

Subject: Quickly list variables in a MAT-file

From: Benjamin Kraus

Date: 6 Feb, 2012 16:45:11

Message: 4 of 10

Friedrich,

Thank you for your reply.

To answer your initial questions: local SATA hard drive, not the same as the OS, but the drive is nearly full (~880GB used out of 930GB) if that matters. I'm the only user. There is a backup script that runs automatically, but I believe that is supposed to run at night, so I would be surprised if that is the problem. The computer is a little old (6 years), the script runs faster on my newer laptop, but still the same order of magnitude, so it may be twice as fast, which is better, but not great.

As it turns out, I actually broke down and wrote a MEX wrapper around 'matGetDir' on Saturday. My implementation is very similar to your own (there are only so many degrees of freedom). It is indeed much faster, so I'm using that solution.

For future reference, calling 'matGetNextVariableInfo' (your first implementation) runs about as fast as calling 'whos' (which is to say, slowly), while calling 'matGetDir' runs very quickly. For another test file, 'who' takes about 7.5s, while my MEX wrapper takes about 0.056s, a >100 times improvement.

I'll post my complete code as a separate post here (so it is easier to read). I'll upload it to the File Exchange if it seems people would be interested in it. I'm surprised there isn't a native MATLAB function for it already. If I had to guess, MATLAB's own 'who' function probably internally calls 'matGetNextVariableInfo'.

- Ben

"Friedrich" wrote in message <jgokoq$7pt$1@newscl01ah.mathworks.com>...
> I improved the code a bit. Now it returns the name of the variables in the mat file as cell array to MATLAB. In addition, the mat file processing should be faster now. Please note, that there is no error checking in that C code, so be carefull with it:
>
> #include "mex.h"
> #include "mat.h"
>
> void mexFunction( int nlhs, mxArray *plhs[],
> int nrhs, const mxArray *prhs[] )
> {
> const char **var_names;
> mwSize dims[2];
> int n_vars;
> int i;
>
> MATFile *mf = matOpen(mxArrayToString(prhs[0]), "r");
>
> var_names = (const char **)matGetDir(mf, &n_vars);
>
> dims[0] = 1;
> dims[1] = n_vars;
> plhs[0] = mxCreateCellArray( (mwSize) 2, dims);
>
> for(i = 0;i<n_vars;i++){
> mexPrintf("%s\n",var_names[i]);
> mxSetCell(plhs[0],i,mxCreateString(var_names[i]));
> }
>
> matClose(mf);
> }
>
> "Friedrich" wrote in message <jgojl5$440$1@newscl01ah.mathworks.com>...
> > Hi,
> >
> > the performance seems pretty bad. Is the mat file located at some network drive?Are other users/application accessing your hard drive at the same time like you do?
> >
> > Did you try using the C MATfile API to benchmark the perfomance? If not, you can try the following C code:
> >
> > #include "mex.h"
> > #include "mat.h"
> >
> > void mexFunction( int nlhs, mxArray *plhs[],
> > int nrhs, const mxArray *prhs[] )
> > {
> >
> > mxArray *var_info;
> > const char *name;
> > MATFile *mf = matOpen(mxArrayToString(prhs[0]), "r");
> > var_info = matGetNextVariableInfo(mf, &name);
> > while (var_info != NULL){
> > mexPrintf("%s \n",name);
> > var_info = matGetNextVariableInfo(mf, &name);
> > }
> > matClose(mf);
> > }
> >
> > Compile it and call it, e.g. with
> >
> > mat_test('a.mat')
> >
> > Is the performance still that bad? Or does it speed things up?

Subject: Quickly list variables in a MAT-file

From: Benjamin Kraus

Date: 6 Feb, 2012 16:49:10

Message: 5 of 10

#include "string.h"
#include "mex.h"
#include "mat.h"

void mexFunction(
    int nlhs, mxArray *plhs[],
    int nrhs, const mxArray *prhs[])
{
    int fnamelen, ndir, i;
    char *fname;
    const char **dir;
    MATFile *pmat;
  
    /* Check that inputs and outputs are OK */
    if(nrhs != 1)
        mexErrMsgIdAndTxt("matwho:usage","Usage: c = whomat(filename)");
    else if(nlhs != 1)
        mexErrMsgIdAndTxt("matwho:usage","Usage: c = whomat(filename)");
    else if(!mxIsChar(prhs[0]))
        mexErrMsgIdAndTxt("matwho:usage","First argument must be a filename (string).");
    
    /* Copy the first input argument to the filename. */
    fnamelen = mxGetNumberOfElements(prhs[0])+1;
    fname = mxCalloc(fnamelen, sizeof(char));
    mxGetString(prhs[0],fname,fnamelen);
    
    /* Open file for reading */
    pmat = matOpen(fname, "r");
    if (pmat == NULL) {
        mexErrMsgIdAndTxt("matwho:fileerror","Error opening file: %s\n", fname);
        plhs[0] = mxCreateCellMatrix(0,0);
        return;
    }

    /* Read mat file directory */
    dir = (const char **)matGetDir(pmat, &ndir);
    if (dir == NULL) {
        mexErrMsgIdAndTxt("matwho:fileerror","Error reading directory of file: %s\n", fname);
        plhs[0] = mxCreateCellMatrix(0,0);
        return;
    } else {
        plhs[0] = mxCreateCellMatrix(ndir,1);
        for (i=0; i < ndir; i++)
            mxSetCell(plhs[0], i, mxCreateString(dir[i]));
    }
    mxFree(dir);
    
    /* Close file */
    if (matClose(pmat) != 0) {
        mexErrMsgIdAndTxt("matwho:fileerror","Error closing file: %s\n", fname);
    }
}

Subject: Quickly list variables in a MAT-file

From: James Tursa

Date: 6 Feb, 2012 18:38:10

Message: 6 of 10

"Benjamin Kraus" <bkraus@bu.edu> wrote in message <jgp0a6$klh$1@newscl01ah.mathworks.com>...
>
> /* Copy the first input argument to the filename. */
> fnamelen = mxGetNumberOfElements(prhs[0])+1;
> fname = mxCalloc(fnamelen, sizeof(char));
> mxGetString(prhs[0],fname,fnamelen);

FYI, there is an API function mxArrayToString that does all of the above at once. Also, you leak fname since you never free it downstream. (It gets garbage collected, so no *actual* leak, but it is still not the best practice IMO).

James Tursa

Subject: Quickly list variables in a MAT-file

From: Benjamin Kraus

Date: 7 Feb, 2012 00:33:10

Message: 7 of 10

"James Tursa" wrote in message <jgp6mi$e41$1@newscl01ah.mathworks.com>...
> "Benjamin Kraus" <bkraus@bu.edu> wrote in message <jgp0a6$klh$1@newscl01ah.mathworks.com>...
> >
> > /* Copy the first input argument to the filename. */
> > fnamelen = mxGetNumberOfElements(prhs[0])+1;
> > fname = mxCalloc(fnamelen, sizeof(char));
> > mxGetString(prhs[0],fname,fnamelen);
>
> FYI, there is an API function mxArrayToString that does all of the above at once. Also, you leak fname since you never free it downstream. (It gets garbage collected, so no *actual* leak, but it is still not the best practice IMO).
>
> James Tursa

James,

Thanks for your feedback. I'm not as familiar with MEX programming as I am with MATLAB in general (and my general C knowledge is very rusty), so I appreciate the input, especially regarding memory issues. Is mxFree(fname) sufficient to solve this "leak"? I assume this should be called just prior to any calls to mexErrMsg, and along with mxFree(dir) at the end of the function.

As you seem to be very knowledgeable of the issues, two more questions for you:
(1) The documentation states: "In MEX-files... the MATLAB memory management facility maintains a list of all memory allocated by mxCalloc, mxMalloc, and mxRealloc."
I've seen similar references to mxCalloc, mxMalloc, and mxRealloc in other places (including some of your posts I just found while trying to answer this question). What I've never seen explicitly stated is that memory allocated by other function, such as "mxArrayToString" or "matGetDir" are included under this umbrella.

If I'm understanding correctly, "mxArrayToString" and "matGetDir" both use mxMalloc internally to allocate the memory they use, which is why it is safe to use mxFree in those cases. Additionally, I'm not using mxDestroyArray because neither "fname" or "dir" are mxArray objects.

(2) When I make the call:
 mexErrMsgIdAndTxt("matwho:fileerror","Error opening file: %s\n", fname);

Is it possible to "properly" deallocate the memory used by fname? Am I right in assuming that putting "mxFree(fname)" as the next line would never actually be reached (just like the "return" that is already there, which I suppose I could probably remove)?

Thanks again,
- Ben

Subject: Quickly list variables in a MAT-file

From: TideMan

Date: 6 Feb, 2012 19:02:45

Message: 8 of 10

On Feb 5, 8:48 am, "Benjamin Kraus" <bkr...@bu.edu> wrote:
> I'm writing a script that goes through several data directories looking for MAT files that meet a particular criterion. For each MAT file, I want to find out if it contains of a few specific variable names. If it does, I need to load the MAT file to get more information. Otherwise, the script can just ignore the MAT file.
>
> I thought this could be done fairly simply with the 'who' or 'whos' command, but I'm running into some trouble. In particular, many of the MAT files are somewhat large (5-10MB), and calling 'whos' or 'who' on these files takes several seconds each, time which adds up very quickly.
>
> For example, with one particular MAT file (that is ~6MB), I ran the following commands:
>
> >> tic; vars = who('-file',name.mat'); toc
>
> Elapsed time is 11.637099 seconds.
>
> >> tic; vars = whos('-file','name.mat'); toc
>
> Elapsed time is 11.225744 seconds.
>
> >> tic; m = matfile('name.mat'); vars = fieldnames(m); toc
>
> Elapsed time is 11.937726 seconds.
>
> Now the odd part is that if I try to read a non-existent variable, it takes hardly any time at all:
>
> >> tic; data = load('bk44-20110520-timefix-01.mat','history'); toc
>
> Warning: Variable 'history' not found.
> Elapsed time is 0.000965 seconds.
>
> This leads me to believe that MATLAB is able to check for the existence of a single variable very quickly, so I don't understand why commands like 'who' and 'whos' take so long, and I'm wondering if there is a faster approach.
>
> I'll be calling this function on at least 100+ MAT files, so the more I can speed up the function the better.
>
> Any suggestions? I feel like this should be a fairly easy thing to do, so I'm hoping there is just a function I don't know about that does this already. My fallback plan is the following, but this doesn't quite work for everything I was hoping to do.
>
> ws = warning('off','MATLAB:load:variableNotFound');
> data = load('bk44-20110520-timefix-01.mat','history');
> if(~isempty(data)) work with the file.
> warning(ws)
>
> - Ben

I have an old .m file (1999) written by us (who used to be a prolific
CSSM contributor) called fdir whose header looks like this:
function [h,p,e]=fdir(fnam,fposi,fendi,flg,tflg,cflg)

% [struct_entries[,struct_params,...]] = fdir(file_name)
%
% to display structure and contents of
% <file_name>.mat
% withouth loading the file

Maybe you can find this routine in the file exchange?

Subject: Quickly list variables in a MAT-file

From: James Tursa

Date: 7 Feb, 2012 07:18:10

Message: 9 of 10

"Benjamin Kraus" <bkraus@bu.edu> wrote in message <jgprg6$mqc$1@newscl01ah.mathworks.com>...
>
> Thanks for your feedback. I'm not as familiar with MEX programming as I am with MATLAB in general (and my general C knowledge is very rusty), so I appreciate the input, especially regarding memory issues. Is mxFree(fname) sufficient to solve this "leak"? I assume this should be called just prior to any calls to mexErrMsg, and along with mxFree(dir) at the end of the function.

Yes. It will get garbage collected in any event so you don't risk a memory leak, but I usually like to clean things up myself unless it makes the code too messy. Once you call mexErrETC you exit the mex routine ... no code lines after that (within the same block) are executed.

> As you seem to be very knowledgeable of the issues, two more questions for you:
> (1) The documentation states: "In MEX-files... the MATLAB memory management facility maintains a list of all memory allocated by mxCalloc, mxMalloc, and mxRealloc."
> I've seen similar references to mxCalloc, mxMalloc, and mxRealloc in other places (including some of your posts I just found while trying to answer this question). What I've never seen explicitly stated is that memory allocated by other function, such as "mxArrayToString" or "matGetDir" are included under this umbrella.

mxArrayToString, matGetDir, etc all fall under this umbrella. Use mxFree to free such memory.

> If I'm understanding correctly, "mxArrayToString" and "matGetDir" both use mxMalloc internally to allocate the memory they use, which is why it is safe to use mxFree in those cases. Additionally, I'm not using mxDestroyArray because neither "fname" or "dir" are mxArray objects.

Correct. Use mxDestroyArray *only* on mxArray variables.

> (2) When I make the call:
> mexErrMsgIdAndTxt("matwho:fileerror","Error opening file: %s\n", fname);
>
> Is it possible to "properly" deallocate the memory used by fname? Am I right in assuming that putting "mxFree(fname)" as the next line would never actually be reached (just like the "return" that is already there, which I suppose I could probably remove)?

Yes, you are correct. Putting a mxFree(fname) after this line will not get executed, and the return never gets reached as well. In this case you either rely on the garbage collection to save you or you copy fname to some local variable of sufficient length to print out and then free fname prior to calling the mexErrETC routine. (Personally, I would just let the garbage collection handle it at this point and not worry about writing elaborate code to manually free it.)

--------------------------------------------

OK, here is the skinny on the MATLAB Memory Manager (MMM) in mex routines, at least as I understand it. The MMM keeps track of everything that gets allocated via API calls in a mex routine for the purposes of garbage collection. Whenever the mex routine exits (either normally or via an error) all of the memory on this garbage collection list is automatically deallocated. The MMM knows which memory is an mxArray and which memory is non-mxArray. But there are some subtleties involved.

When you allocate a new mxArray via the mxCreateETC functions, the mxArray is tagged as temporary and the address of the mxArray gets put on this internal garbage collection list. The mxArray will typically contain pointers to other memory, e.g. the pr and pi pointers that point to the real and imaginary data for numeric variables. Those pointers are *not* separately put on this garbage collection list. The MMM depends on the fact that they are connected to the mxArray to get them garbage collected. When an mxArray on the garbage collection list gets destroyed (either via garbage collection or via a mxDestroyArray call), the mxArray structure itself and all of the memory behind the data pointers of the mxArray also gets freed at the same time and the address of the mxArray is removed from the garbage collection list. I.e., this is a *deep* free and everything attached to this mxArray
will get freed. If the mxArray is a cell array, all individual cells will get destroyed. If the mxArray is a struct array, all individual fields will get destroyed. Etc.

When you use mxCreateETC to create an mxArray variable and then subsequently attach it to a cell or struct array via mxSetCell or mxSetField etc then the mxArray gets taken off of the garbage collection list and it gets marked as being a sub-element (to a cell or struct array). At this point, whether it gets garbage collected or not depends entirely on what happens to its parent. Technically, it has just become one of the data elements of the cell or struct array and would show up in the pr list for the parent.

When you call mexMakeArrayPersistent on an mxArray, the address of that mxArray gets removed from the garbage collection list. At that point you risk a memory leak unless you have specific code in place to destroy it (typically via a mexAtExit function). E.g., if you returned from the mex routine back to MATLAB without manually calling mxDestroyArray on it and without remembering its pointer inside the mex routine, then you would have a memory leak since it is still taking up memory and you have lost the only pointer to this memory. There would be no way to subsequently free the memory without restarting MATLAB.

When you allocate memory *other* than with the mxCreateETC API functions, such as mxMalloc, mxCalloc, mxArrayToString, etc, then the address of this memory gets put on the garbage collection list. If you call mexMakeMemoryPersistent on this address, then the address is removed from the garbage collection list. Again, you risk a memory leak at this point unless you have code in place that will free this memory via mxFree. Typically this would be accomplished via a mexAtExit function that you would write. Regardless of which non-mxArray API function allocated the memory, use mxFree to free it.

Rule of Thumb: Use mxDestroyArray to free mxArray variables, use mxFree to free all other API memory.

When you exit a mex routine normally, the plhs[*] array is examined and all mxArray variables on this list are removed from the garbage collection list before garbage collection takes place.

You can play games with this and get in trouble unless you understand what is going on. For example, suppose you use mxCreateDoubleMatrix to create an mxArray. Then you use mxGetPr to get the data pointer, and then use mxSetPr to set the pr pointer to NULL. I.e., you detach the data memory from the mxArray. This data memory is *not* on the garbage collection list since it was not originally put there by the mxCreateDoubleMatrix function. Remember, only the address of the mxArray itself gets put on the garbage collection list. The pr memory, once it gets detached from the mxArray, is essentially persistent memory at that point and you risk a memory leak unless you have code in place to explicitly free it.

Now suppose you create an empty mxArray via the mxCreateDoubleMatrix function (i.e., size 0 x 0). Then separately you allocate some data memory via mxMalloc. In this case the address of the mxArray variable *and* the address of the data memory are both separately put on the garbage collection list. Now suppose you attach the data memory to the mxArray via a mxSetPr call. The mxSetPr call does two things: It sets the pr pointer of the mxArray equal to this data memory address, *and* it removes the data memory address from the garbage collection list. At this point the disposition of the data memory will depend entirely on the parent and whether the parent gets garbage collected or not. Side Note: The mxSetPr call does *not* free up any pr data memory that is already attached to the mxArray, it simply overwrites the pr pointer value with a new value. So you again risk a memory leak unless
you manually free this preexisting pr data memory first.

James Tursa

Subject: Quickly list variables in a MAT-file

From: James Tursa

Date: 12 Jun, 2013 18:57:08

Message: 10 of 10

"James Tursa" wrote in message <jgqj7i$782$1@newscl01ah.mathworks.com>...
>
> When you exit a mex routine normally, the plhs[*] array is examined and all mxArray variables on this list are removed from the garbage collection list before garbage collection takes place.

Correction: Actually the above statement is not true. Here is what really happens ...

When you exit a mex routine normally, the plhs[*] array is examined and shared data copies of mxArray variables on this list are actually returned (if stored on the MATLAB side). Then garbage collection takes place.

Example:

plhs[0] = mxCreateDoubleMatrix( 10, 10, mxREAL );
return;

In the above code, a temporary variable was created and the pointer to it was stored in plhs[0]. When the mex function returns, a shared data copy of this variable is made and returned to the caller. Then the variable in plhs[0] is destroyed. But since it is a shared data copy of what just got sent to the caller, the only thing that really happens in the destroy process is that plhs[0] gets removed from the linked list of shared data copies and then the mxArray header for plhs[0] gets free'd. The data portion of plhs[0] is not free'd (it is still attached to the variable that was sent to the caller).

James Tursa

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us