Code covered by the BSD License  

Highlights from
Struct array conversion

Struct array conversion

by

 

Convert in either direction between struct-array and structure of arrays

Examples of processing basic database from an xls spreadsheet

Examples of processing basic database from an xls spreadsheet

Contents

Requirements

Example spreadsheet from a brain imaging study

http://www.oasis-brains.org/pdf/oasis_longitudinal.xls

csv2struct and struct2struct from the MATLAB Central File Exchange:

http://www.mathworks.com/matlabcentral/fileexchange/26106-csv2struct http://www.mathworks.com/matlabcentral/fileexchange/authors/27434

Read spreadsheet in using csv2struct

oasis_ss = csv2struct('oasis_longitudinal.xls')
oasis_ss = 

    Subject_ID: {373x1 cell}
        MRI_ID: {373x1 cell}
         Group: {373x1 cell}
         Visit: {373x1 cell}
      MR_Delay: [373x1 double]
        M0x2FF: {373x1 cell}
          Hand: {373x1 cell}
           Age: [373x1 double]
          EDUC: [373x1 double]
           SES: {373x1 cell}
          MMSE: {373x1 cell}
           CDR: [373x1 double]
          eTIV: [373x1 double]
          nWBV: [373x1 double]
           ASF: [373x1 double]

This spreadsheet-like format of structure makes it easy to modify fields

For example, csv2struct needed to convert M/F to a valid variable name, which has produced M0x2FF; we will replace this with a new Gender field:

oasis_ss.Gender = oasis_ss.M0x2FF;
oasis_ss = rmfield(oasis_ss, 'M0x2FF')
oasis_ss = 

    Subject_ID: {373x1 cell}
        MRI_ID: {373x1 cell}
         Group: {373x1 cell}
         Visit: {373x1 cell}
      MR_Delay: [373x1 double]
          Hand: {373x1 cell}
           Age: [373x1 double]
          EDUC: [373x1 double]
           SES: {373x1 cell}
          MMSE: {373x1 cell}
           CDR: [373x1 double]
          eTIV: [373x1 double]
          nWBV: [373x1 double]
           ASF: [373x1 double]
        Gender: {373x1 cell}

Note that Gender is added last, we could use fieldnames and orderfields to put it correctly in the original location of M0x2FF, if that mattered, or one could even write a mvfield method like the rmfield one used above. Here, it's not worth the trouble!

The format does not make it easy to see an example database "record"

For example, it's not obvious how Gender is actually coded (e.g. are the entries 'male' and 'female' or 'M' and 'F' or 0 and 1, etc.) It's also tedious to sort the spreadsheet, because the sorting index vector must be applied to each variable in turn. We can convert to a more database-like format using struct2struct:

oasis_db = struct2struct(oasis_ss)
record1 = oasis_db(1)
oasis_db = 

373x1 struct array with fields:
    Subject_ID
    MRI_ID
    Group
    Visit
    MR_Delay
    Hand
    Age
    EDUC
    SES
    MMSE
    CDR
    eTIV
    nWBV
    ASF
    Gender


record1 = 

    Subject_ID: 'OAS2_0001'
        MRI_ID: 'OAS2_0001_MR1'
         Group: 'Nondemented'
         Visit: '1'
      MR_Delay: 0
          Hand: 'R'
           Age: 87
          EDUC: 14
           SES: 2
          MMSE: 27
           CDR: 0
          eTIV: 1.9866e+003
          nWBV: 0.6961
           ASF: 0.8834
        Gender: 'M'

Now let's say we want to study just the females over the age of 90

female = strcmp(oasis_ss.Gender, 'F');
Nfemale = sum(female)
age90plus = oasis_ss.Age > 90; % (note that fieldnames are case sensitive)
Nage90s = sum(age90plus)
Nboth = sum(female & age90plus)
Nfemale =

   213


Nage90s =

    16


Nboth =

    13

With the spreadsheet format, it was easy to get these filtering indicator variables, but it would be tedious to do the filtering itself (processing every variable in turn). Filtering the database is easy:

oasis_db = oasis_db(female & age90plus)
oasis_db = 

13x1 struct array with fields:
    Subject_ID
    MRI_ID
    Group
    Visit
    MR_Delay
    Hand
    Age
    EDUC
    SES
    MMSE
    CDR
    eTIV
    nWBV
    ASF
    Gender

Direct filtering and sorting with the database format is also possible

[ind ind] = sort([oasis_db.Age]); % note brackets (use braces for strings)
oasis_db = oasis_db(ind);
youngest = oasis_db(1)
oldest = oasis_db(end)
youngest = 

    Subject_ID: 'OAS2_0031'
        MRI_ID: 'OAS2_0031_MR3'
         Group: 'Converted'
         Visit: '3'
      MR_Delay: 1588
          Hand: 'R'
           Age: 91
          EDUC: 12
           SES: 3
          MMSE: 28
           CDR: 0.5000
          eTIV: 1.4632e+003
          nWBV: 0.6955
           ASF: 1.1994
        Gender: 'F'


oldest = 

    Subject_ID: 'OAS2_0087'
        MRI_ID: 'OAS2_0087_MR2'
         Group: 'Demented'
         Visit: '2'
      MR_Delay: 754
          Hand: 'R'
           Age: 98
          EDUC: 17
           SES: 1
          MMSE: 21
           CDR: 2
          eTIV: 1.5028e+003
          nWBV: 0.6596
           ASF: 1.1678
        Gender: 'F'

Either format can be used for simple computations

For example the mean total intracranial volume (in ml) for our subset:

subset_mean_eTIV_a = mean(oasis_ss.eTIV(female & age90plus))
subset_mean_eTIV_b = mean([oasis_db.eTIV])
subset_mean_eTIV_a =

  1.4542e+003


subset_mean_eTIV_b =

  1.4542e+003

An example to motivate MATLAB processing of spreadsheets/databases

This might all seem a little pointless here (why not use a spreadsheet application to compute means, etc?) but the key strength of this approach is integration with other MATLAB tools, e.g. for more advanced data-processing than possible in most spreadsheet software. Also, note that even things possible in spreadsheets (like plots) might be easier to program in MATLAB, so that e.g. reusing plotting code on multiple subsets of the data (males, females, left-handers, etc.) becomes easier.

for Gender = 'MF'
    selected = strcmp(oasis_ss.Gender, Gender);
    figure;
    plot(oasis_ss.Age(selected), oasis_ss.nWBV(selected), 'x')
    title(sprintf('Atrophy in %s', Gender))
    xlabel('Age (years)')
    ylabel('Normalised whole brain volume (ml)')
end

Further notes

I should admit that the data here is actually longitudinal -- there are a variable number of brain scans per subject taken over time; I have ignored this here... A more careful analysis would have filtered subjects (not individual records, which are scans here) for example using baseline age, but since this is just an illustration of using struct2struct it didn't seem too important.

There is currently no equivalent struct2csv to save the spreadsheet (or database) after processing; hopefully I or Jaroslaw Tuszynski (the author of csv2struct), or someone else (perhaps you!) will get create this at some point...

Copyright 2010 Ged Ridgway http://www.mathworks.com/matlabcentral/fileexchange/authors/27434

Contact us