File Exchange

image thumbnail

xml2struct

version 1.8 (2.94 KB) by

Convert an xml file into a MATLAB structure for easy access to the data.

456 Downloads

Updated

View License

Convert an xml file into a MATLAB structure for easy access to the data.

Comments and Ratings (86)

Paulo Fonte

Julian Hapke

really useful script, but rather slow on large xmls, the xmlread only used 1/10th of the overall time.
My improvement idea:
change
if (~isempty(regexprep(text.(textflag),'[\s]*','')))
to
if ~all(isspace(text.(textflag)))
and get a overall speedup of factor 2 (in my test case at least)

Alexey R.

Thank you so so much!

Eike Ullrich

thanks a lot, great work

Tianlong ma

thank you

I had to make a few modifications to get my XML file to work. I will put them below, but as this is my first time using this file type, mileage may vary.

Line 95 in version current as of 3/7/2017
   
  children.(name) = text;

That overwrote all of the child nodes that had data stored in them, given that the last node to be parsed was a comment (i.e. it only contained a string). Other nodes contained numerical values held as a string value. Here was my fix:

if isfield(text,'Text')
     children.(name) = str2num(text.Text);
else
     children.(name).('Comment') = text.Comment;
end

Overall very helpful, and was exactly what I needed after I put the replacement lines in.

Thanks!

Yifei Wang

That's Great!!! Thank you so much!!!

Hasenearl

ChrisDz

This is a very usefull script! I used it on reading AUTOSAR XML - Files! On reading AR-XML files I found two challanges:
1. The long replacemement texts for {'-'|':'|'.'} within xml-tags leads to the problem, that matlab fieldnames will become longer than 63 chars! I reduced them to {'_'|'c'|'d'}! That helped!
2. In case of XML Comments <!-- Comment --> to be used inside the XML-file the script Fails! I fixed this issue! Have a look at it!

Replace: Inside the function: parseChildNodes(...)
                    % CDz 2016-12-21 Commented Out Due to problems with
                    % XML-Comments
                    % if(~isempty(fieldnames(text)))
                    % children.(name){index} = text;
                    % end

                    % CDz 2016-12-21 Added to Handle XML - Comments
                    if(~isempty(text) && isstruct(text))
                    if find(strcmp(fieldnames(text),'Text'))
                    children.(name){index}.('Text') = text.Text;
                    elseif find(strcmp(fieldnames(text),'Comment'))
                    children.(name){index}.('Comment') = text.Comment;
                    end
                    end

and

                    % CDz 2016-12-21 Commented out due to problems with
                    % XML-Comments
                    % if(~isempty(text) && ~isempty(fieldnames(text)))
                    % children.(name) = text;
                    % end

                    % CDz 2016-12-21 Added to Handle XML - Comments
                    if(~isempty(text) && isstruct(text))
                    if find(strcmp(fieldnames(text),'Text'))
                    children.(name).('Text') = text.Text;
                    elseif find(strcmp(fieldnames(text),'Comment'))
                    children.(name).('Comment') = text.Comment;
                    end
                    end

@ Wouter Falkena: If you are interestd I can provide you a full copy of this file that you can update this script

Rody Oldenhuis

Rody Oldenhuis (view profile)

Well done

Simple and very useful. Very convenient.

I am a Phd student I need to apply this function on my code to get attribute of XML file

Mike Wehr

Works Great! Tried xml_toolbox but it's broken since 2014. This is a solid replacement.

please I need to edit a xml file

Anas Imran

Joe Yeh

Joe Yeh (view profile)

I have looked through the issues, implemented fixes, added some new feature to the script, and uploaded it here : https://www.mathworks.com/matlabcentral/fileexchange/58700-xml2struct . Please also try my updated version and let me know if it works better now.

Anael

Anael (view profile)

Neil's fix doesn't do the trick for me...

Anael

Anael (view profile)

Doesn't work right out of the box. Stéphane's fix works great!
Also same issue as Sebastien regarding comments and headers.

Keith Hooks

I received a "java.lang.OutOfMemoryError: GC overhead limit exceeded" when trying to open a Kanji dictionary file - http://www.edrdg.org/kanjidic/kanjidic2.xml.gz

Anael

Anael (view profile)

Good, but rather slow for large xml files.

Karan Gill

yoav

yoav (view profile)

Daniel

Can you update the code? I am having the same problem.

RONAK KOSTI

Thanks for this very flexible script! TOP!

zepp

zepp (view profile)

this is exactly what I was looking for.

Daniel

Daniel (view profile)

Thanks Wouter,

It's not exactly but appreciate your response. The probem is solved now. I may upload the code if someone have the same problem.

Stéphane

Hi,

Found a bug : When there is text and children in the same node, the text overwrites the children.

Fix:
Replace
 
  if(~isempty(fieldnames(text)))
    children.(name){index} = text;
  end

by:

  if isstruct(text)
    for fld=fieldnames(text)'
      children.(name){index}.(fld{1}) = text.(fld{1});
    end
  end

And replace also:

  if(~isempty(text) && ~isempty(fieldnames(text)))
    children.(name) = text;
  end

by:

  if isstruct(text)
    for fld=fieldnames(text)'
      children.(name).(fld{1}) = text.(fld{1});
    end
  end

Thanks.

superaga

Prabakaran R

Simple to use and works great !! Thanks for sharing the work.

moi

moi (view profile)

Ilya Belevich

Uri Cohen

A

A (view profile)

it takes a long time to run for larger XMLs. Is there anywhere in the code I can a waitbar to at least report progress to user? I tried the two for loops but that does not seem to be the bottleneck.

Benjamin Falk

Joerg

Joerg (view profile)

Kevin

Kevin (view profile)

Very simple to use and it works.

Alex

Alex (view profile)

"Andrew Wilson: The fix from Neill Weiss in an earlier comment/review seems to solve this, so it would be great to see that incorporated into an update!" thanks Andrew Wilson

Andrew Wilson

Works great for the most part, but the issue of nodes being lost when comments are present at the same level of the hierarchy is quite frustrating. The fix from Neill Weiss in an earlier comment/review seems to solve this, so it would be great to see that incorporated into an update!

Adam Wyatt

Seems to work fine except as reported by Sebastien Roy on 09/10/14 - xml comments don't work (resulting in a loss of the other data)

Downloaded this file this evening to process some XML data. worked just fine.

Bernhard

Sorry, pasted the wrong line.
Here is line 154 that fixes the problem for me:

text.(textflag) = char(getTextContent(theNode));

Bernhard

Great stuff.

Regarding that "Undefined function 'toCharArray' for input arguments of type 'double'." Error:

For me it worked to change line 154 into
text.(textflag) = char(getData(theNode))';
as it has been in an earlier version of xml2struct (mentioned in the comments in the code in line 153)

Chris FUNG

Great time saver when compared to using xmlread directly. However, there is a bug with child nodes when a text is present. The child node content will be set to the text and all other content of the child will be lost. A comment, being processed as text, will cause the same issue. Attempting to read this xml will not provide the expected result:

<?xml version="1.0" encoding="UTF-8"?>

<root>
<!-- Should be a benign comment -->
<mystuff>Valuable data</mystuff>
</root>
  

Some of the attributes in the XML file had underscores at the beginning which error because of disallowed field name. Simple strrep solved the problem.

Great!

Fredrik

Rody Oldenhuis

Rody Oldenhuis (view profile)

Timo Dörsam

Mark Mikofski

Mark Mikofski (view profile)

Stop using XML and use json.org/java [1] static XML.toJSONObject() method [2], there's a precompiled jar file in my dropbox [3] or use Newton King's JSON.NET [4] which is already precompiled by him and available from codeplex [5] just download and unzip then use the version for the .NET framework on your machine. Converting between XML and JSON is described in the documentation [6] and in this SO post [7]. See MATLAB documentation for more information on using Java [8] or .NET [9] in MATLAB. It's super easy!
[1] http://json.org/java/
[2](http://json.org/javadoc/org/json/XML.html#toJSONObject(java.lang.String))
[3] https://dl.dropboxusercontent.com/u/19049582/JSON.jar
[4] http://james.newtonking.com/pages/json-net.aspx
[5] https://json.codeplex.com/
[6] http://james.newtonking.com/projects/json/help/index.html?topic=html/ConvertingJSONandXML.htm
[7] http://stackoverflow.com/a/814027/1020470
[8] http://www.mathworks.com/help/matlab/using-java-libraries-in-matlab.html
[9] http://www.mathworks.com/help/matlab/using-net-libraries-in-matlab.html

Fábio Nery

I've seen some other users report this issue but could not find how to fix this:

Undefined function 'toCharArray' for input arguments of type 'double'.

Any idea?

Regards

Varoujan

Works well.
Didn't fully test for empty field cases like some commenters but I got a nice structure out of my input file.

I am disappointed that a similar functionality isn't built in Matlab. xmlread and xmlwrite alone are such a pain to access and/or update xml data.

Adam

Adam (view profile)

Hi,
Thanks for the file, it works great.
But I have also the same problem as Erik with empty data fields. Someone know how to fix this?

Yu

Yu (view profile)

Faster than xml_read, recommended!

Erik

Erik (view profile)

Thanks for the file, however I'm having an issue with empty data fields.

If I have a 100x50 XML data set which I can easily import into Excel. However there are a few fields which are empty. For example at (5,35:40), the XML data is empty.

When I use the xml2struct and then try and create a cell array in the same format (100x50) the data in row 5 between 40:50, shifts to the 35:45 position and I'm left with 5 empty spaces from 45:50 and as such the data is misaligned.

Any idea on how to deal with empty fields in order to maintain their position in the original file?

Thanks!

i was just wondering if someone could just confirm what i am doing is correct. when i want to convert xml into a matlab array, i type:
data=xml2struct('name of the file i want to convert'); ? is that all?

We are encountering the same issue reported by Raoul Herzog: Undefined function or method 'toCharArray' for input arguments of type 'double'. Is there a fix for this?

Neill Weiss

For the comment bug, @Sirius3, I changed the following code block from:

            if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
                %XML allows the same elements to be defined multiple times,
                %put each in a different cell
                if (isfield(children,name))
                    if (~iscell(children.(name)))
                    %put existsing element into cell format
                    children.(name) = {children.(name)};
                    end
                    index = length(children.(name))+1;
                    %add new element
                    children.(name){index} = childs;
                    if(~isempty(fieldnames(text)))
                    children.(name){index} = text;
                    end
                    if(~isempty(attr))
                    children.(name){index}.('Attributes') = attr;
                    end
                else
                    %add previously unknown (new) element to the structure
                    children.(name) = childs;
                    if(~isempty(text) && ~isempty(fieldnames(text)))
                    children.(name) = text;
                    end
                    if(~isempty(attr))
                    children.(name).('Attributes') = attr;
                    end
                end
            else

to

            if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
                %XML allows the same elements to be defined multiple times,
                %put each in a different cell
                if (isfield(children,name))
                    if (~iscell(children.(name)))
                    %put existsing element into cell format
                    children.(name) = {children.(name)};
                    end
                    index = length(children.(name))+1;
                    %add new element
                    children.(name){index} = childs;
                    textFieldNames = fieldnames(text);
                    for t = 1:length(textFieldNames)
                    textFieldName = textFieldNames{t};
                    children.(name){index}.(textFieldName) = text.(textFieldName);
                    end
                    if(~isempty(attr))
                    children.(name){index}.('Attributes') = attr;
                    end
                else
                    %add previously unknown (new) element to the structure
                    children.(name) = childs;
                    if(~isempty(text) && ~isempty(fieldnames(text)))
                    textFieldNames = fieldnames(text);
                    numTextFieldNames = length( textFieldNames );
                    for i = 1:numTextFieldNames
                    thisFieldName = textFieldNames{i};
                    children.(name).(thisFieldName) = text.(thisFieldName);
                    end
                    end
                    if(~isempty(attr))
                    children.(name).('Attributes') = attr;
                    end
                end
            else

Now, the children.(name) properties are not blown away when a comment is parsed.

Sirius3

bug: child nodes get lost, when there are comments between them. (line 95)

Gledi

Gledi (view profile)

First of all thank for the excellent code.
I have a "small" problem according to the cell. In you code, if there are more MORE THAN ONE child than you create a cell, otherwise not. What should I change to have the case: Even if the node has ONLY ONE child than I create a cell (with one element)

Matthew

Worked very well for me. Thank you so much.

Raoul Herzog

There seems to be a bug in xml2struct :
I can provide you the corresponding xml file if needed.

??? Undefined function or method 'toCharArray' for input arguments of type 'double'.

Error in ==> xml2struct>parseAttributes at 174
            str = toCharArray(toString(item(theAttributes,count-1)))';

Error in ==> xml2struct>getNodeData at 141
    attr = parseAttributes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
    [childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
            [text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct at 57
    s = parseChildNodes(xDoc);

Xiaohu

Xiaohu (view profile)

Ivan Smirnov

One of the problems that I personally encountered is that xml2struct can't handle CDATA blocks.

It can be easily fixed, replace line 67 with:
            if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
and line 94 with:
            elseif (strcmp(name,'#text') || strcmp(name, '#cdata_dash_section'))

Works great otherwise, thanks.

ali

ali (view profile)

Excellent! I was pulling my hair to read to numbers from XML file and with this I did it in one minute

Kevin Moerman

Kevin Moerman (view profile)

Works great for small files. I tested it for some larger files with >100000 entries and this takes around 178 seconds.

Kevin Moerman

Kevin Moerman (view profile)

Brad

Brad (view profile)

Wouter Falkena

Wouter Falkena (view profile)

Thank you for this suggestion Mr. Wanner. I have updated the file and it is currently under review by the MATLAB Central. It will appear here shortly.

Adrian Wanner

Thanks for your work.
You might want to speed up the attribute parsing by about 40% by replacing lines 152-154 by the following:
str=theAttributes.item(count-1).toString.toCharArray()';
k=strfind(str,'=');
attr_name = regexprep(str(1:(k(1)-1)),'[-:.]','_');
attributes.(attr_name) =str((k(1)+2):(end-1));

Mark

Mark (view profile)

Thanks, your auto field naming system worked great for me to work with data parsed out from XML files.

Bernard

Thanks a lot! I finally came across a tool that can extract info from a ISO19115/19139 xml file.

Joao Henriques

Simple and works pretty well! The structures are a bit verbose but they're supposed to be parsed by my program anyway; any attempts to collapse some of the nested structures would only slow down the code (some similar submissions do this but are much slower). Thanks!

Krishnan Suresh

Thanks v. much! I used it to read a Collada file (geometry file Google Sketch-up). Worked like a charm!

Wouter Falkena

Wouter Falkena (view profile)

You are correct. I have removed the '.xml' extension assumption, unless the file can not be found. The update file is currently under review by MATLAB Central and should appear here soon.

Mathieu

Warning: all XML files haven't '.xml' extension

Joanne

Joanne (view profile)

Worked on the first try for loading an OSM data file.

TideMan

I was tearing my hair out trying to figure out how to automatically access one tiny piece of data in a .xml file until I found this routine.

Yanai

Yanai (view profile)

Updates

1.8

Small bugfix in the CDATA and Comment structure fields.

1.7

Speed improvement due to X. Mo and added support for cdata and comments.

1.5

The function now replaces element and attribute names containing - by _dash_, . by _dot_ and : by _colon_

1.4

Attribute parsing speed increased by 40%

1.3

Corrected the uploaded file

1.2

Removed the assumption that the filename should have a '.xml' extension

1.1

Decreased the processing time for large XML files

MATLAB Release
MATLAB 7.9 (R2009b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video