# MATLAB Spoken Here

## XML and MATLAB: Navigating a Tree

This week I’m posting the third part in my series on using XML. Since I’ve had a request to cover this topic, I’ve moved it up in the schedule. We’ll be back to the new MATLAB R2010b features next week.

Last time in my XML in MATLAB series I explained the steps needed to create an XML DOM structure and build up an XML tree. This week I answer the question:” now that I have a tree, how can I extract data from it?” I’ll continue to use the AddressBook example from the last post. Remember, you can create a new tree or read one into MATLAB using the xmlwrite function.

For your reference, here are the other parts in the series:

There are at least two ways to navigate the tree in MATLAB. Both of the ways I describe here once again take advantage of the Java environment that runs with MATLAB. The first way makes use of the structure of the tree and relationship of the nodes, the second uses the XPath language to precisely pick out a node. Once again, here is the example tree:

<?xml version="1.0" encoding="utf-8"?>
<Entry>
<Name>Friendly J. Mathworker</Name>
<PhoneNumber>(508) 647-7000</PhoneNumber>
</Entry>


Let’s say I want to find Friendly’s phone number. To do this I’m going to start the root node, “AddressBook.” From there I will walk down the tree to AddressBook/Entry/PhoneNumber and get the the text of the PhoneNumber node.

% Get the "AddressBook" node
% Get all the "Entry" nodes
% Get the first "Entry"'s children
% Remember that java arrays are zero-based
friendlyInfo = entries.item(0).getChildNodes;
% Iterate over the nodes to find the "PhoneNumber"
% once there are no more siblinings, "node" will be empty
node = friendlyInfo.getFirstChild;
while ~isempty(node)
if strcmpi(node.getNodeName, 'PhoneNumber')
break;
else
node = node.getNextSibling;
end
end
phoneNumber = node.getTextContent

phoneNumber =

(508) 647-7000



The getChildNodes() method returns a list of nodes. There are several ways to navigate the returned node list. In the above example I used getFirstChild() which returns the first child (in this case, the Name node). Then using the getNextSibling() method, I can walk through all the other child nodes to find the one I’m looking for, in this case it’s PhoneNumber. I used the getNodeName() method to get the string value of the node in order to compare it with “PhoneNumber.” If you’re looking at the methods of a node, the getNodeName() method is redundant with the getTagName() method.

Once I have the desired node, I used the getTextContent() method to get the text inside the <PhoneNumber></PhoneNumber> tags. Note that if there are multiple PhoneNumber child nodes of the Entry, this will stop after finding the first one.

Another way to iterate over the children is to use item() method. Note that since this is a Java array, the array indices go from 0 to size-1.

for i=0:friendlyInfo.getLength - 1
if strcmpi(friendlyInfo.item(i).getTagName, 'PhoneNumber')
phoneNumber = friendlyInfo.item(i).getTextContent
end
end

phoneNumber =

(508) 647-7000



Instead of iterating to find the PhoneNumber node, we can use the ElementsByTagName method to find all the elements in the subtree that have a certain name. This then returns a list of matching nodes, which we can iterate, but since I know there’s only one PhoneNumber I just grabbed the 0′th element:

phoneNumber = friendlyInfo.getElementsByTagName('PhoneNumber').item(0).getTextContent

phoneNumber =

(508) 647-7000



Using XPath
XPath is a language for finding nodes in an XML document, and comes with Java. It works similarly to Java’s regular expression engine, in that you create a string that represents nodes you want to match, compile that to an internal representation and then evaluate it on your document. It’s an advanced step, and I can’t think of anything in regular MATLAB that works the same way. XPath expressions can start either from the top of the tree or anywhere within a document or document fragment. Node paths are represented like directory paths, in that that “..” goes up a level, “.” is the same level, and nodes are separated by forward slashes, “/”. In our example, the first phone number of a the first entry would be “AddressBook/Entry/PhoneNumber.” “//” represents anywhere in the document, so “//PhoneNumber” would also match the same nodes.

To use XPath, you first need to create an XPath object from the XPath factory. In the below example, I’ve first imported the xpath package to make it easier to type out all these various java classes. Once you have an XPath object, you can then compile and evaluate the expression.

% get the xpath mechanism into the workspace
import javax.xml.xpath.*
factory = XPathFactory.newInstance;
xpath = factory.newXPath;

% compile and evaluate the XPath Expression
phoneNumberNode = expression.evaluate(docNode, XPathConstants.NODE);
phoneNumber = phoneNumberNode.getTextContent

phoneNumber =

(508) 647-7000



In the above example, the evaluate() method takes the compiled XPath expression and an XPathConstant. This constant tells the expression what type of result to return. In this case, we’ve asked for a NODE, and so we get back the matching node object. But if we change the the constant to STRING, we get back the text of the matched node directly, as in the next example. You can ask also for NODESETs, NUMBERs, and BOOLEANs.

phoneNumber = expression.evaluate(docNode, XPathConstants.STRING)
phoneNumber =

(508) 647-7000



XPath is a complicated topic and probably worthy of it’s own follow-up post. The language is rich enough to precisely pick out any node, entity, attribute, or other piece of a data from an XML document starting anywhere in the tree.

This has been a meatier post than most for me, so please ask lots of follow-up questions or leave comments.

Reference

### 27 Responses to “XML and MATLAB: Navigating a Tree”

1. Cassandra Walker replied on :

Hi Michael,
Thanks for your information on getting text from XML files, however I am unsure (being a begginner Matlab user) once I obtain the text how do I then export the information into a text or Excel file?
I look forward to hearing from you!
Cassandra

2. Cassandra Walker replied on :

Hi Michael,

Sorry to be a pain, I just worked it out by using the char(), which converts it from a java.lang.string to a character…

This has been bugging me and I’m so thankful to have worked it through!!

Thanks
Cassandra

3. Thomas replied on :

Hello,
thanks for this great help.
But I have question concerning this topic. I have an XML file that contains nodes in the same level with the same name. Just the Attribute ‘ID’ is different. This would be in your example a second node ‘entry’. The nodes have attributes ‘ID=”1″‘ respectively ‘ID=”2″‘ ( ….). Is there a way to navigate through the XML by these attribute?

Thanks,
Thomas

4. Rich Quist replied on :

Thomas,
If you’re trying to retrieve an element, say “<PhoneNumber>” with a specific value in it’s ID attribute, say “work”, something like this might work:

% compile and evaluate the XPath Expression
expression = xpath.compile('/PhoneNumber[@ID=''work''']);
phoneNumberNode = expression.evaluate(docNode, XPathConstants.NODE);
phoneNumber = phoneNumberNode.getTextContent


There are a number of XML-related entries on the File Exchange, including one by Matthew Simoneau that shows an example using a NODESET in XPath, which you probably would use if you wanted to retrieve ALL of the nodes that had an “ID” attribute for processing: http://www.mathworks.com/matlabcentral/fileexchange/31382-using-xpath-from-matlab/content/html/xpath.html

Hope that helps.
Rich

5. MM replied on :

Let’s say that I am trying to get a specific number out of the xml file for a specific variable to be used in another function. I have been able to extract it as a text so that I can see the number, but not in a way that I can use it as an input in another function. Please let me know how I can accomplish this.

6. Mike replied on :

@MM,

Do you just need to convert from string to a numeric type? Take a look at the STR2NUM function.

E.g.

heightString = heightNode.getTextContent;
heightNum = str2num(char(heightString));

7. Stan replied on :

Nice article but what if I want to use xpath with an xml file that includes a namespace designation?

8. Dave replied on :

Epic fail.
xpath and Matlab – great for basic structures but completely fails when you introduce namespaces to the xml!

9. Nasser Hosseini replied on :

Hi Michael!

I wonder what ‘docNode’ is and how do you get it? It is not obvious from your example!

Regards
/Nasser Hosseini

10. Mike replied on :

@Dave,
I’m not sure what you mean. Can you provide example? Namespaces should be addable to the nodes.

@Nasser,
Thanks for that oversight. I explained it in the previous part about creating nodes, but for reference here it is for the example:

docNode = com.mathworks.xml.XMLUtils.createDocument('AddressBook');

11. Charlie Hogg replied on :

Hi,

I am having problems with retrieving data from an xml file. I think I have followed the instructions here.

When I try the following code, the coordinate node returns an empty element. Why is the node not identified?

I’m a beginner so I don’t know if I’ve imported and set up the factory correctly, or if there is something else I don’t understand.

Thanks,
Charlie

documentNode = docNode.getDocumentElement
%%

import javax.xml.xpath.*
factory = XPathFactory.newInstance;
xpath = factory.newXPath;

% compile and evaluate the XPath Expression
expression = xpath.compile(‘Document/Placemark/LineString/coordinates’);
coordinateNode = expression.evaluate(documentNode, XPathConstants.NODE)
data = coordinateNode.getTextContent

Here is also the xml description

Canale centreline.kml

Canale centreline

10.09984114068684,45.80687219300302,0 10.10001647099475,45.80695514950003,0

12. Charlie Hogg replied on :

That xml code didn’t come out right.

Let me try again

Canale centreline.kml

Canale centreline

#m_ylw-pushpin

1

10.09984114068684,45.80687219300302,0 10.10001647099475,45.80695514950003,0

13. Charlie Hogg replied on :

I couldn’t get to the bottom of how to use xpath unfortunately, so I read the xml data using the example shown in the xmlread documentation. This is not ideal, but at least I could get it to work. I couldn’t work out how to learn how the java language worked.

Suggestions would be very welcome.

14. Michael Katz replied on :

@Charlie,

It’s hard to say what is going on without understanding the XML. Unfortuanately with our blog software you’d have to replace <’s and >’s with &lt; and &ampgt to get them to show up here. One thing that you might try is replacing the expression with something like

 //Placemark/LineString/coordinates


Since you’re applying the expression on the document node. I’m not sure what the expectation is with your document, or what it’s structure is. Also, you might want to try using XPathConstants.NODESET instead of XPathConstants.NODE. To get return set of all the matching nodes.

15. Charlie Hogg replied on :

Thanks Michael.

I tried your suggestions. Using // made no difference: I still got the output

coordinateNode =
[]
??? Attempt to reference field of non-structure array.

I also used different nodes (for example //coordinates) but this gave no improvement.

I tried your second suggestion of using NODESET, and got the following response:

>> coordinateNode = expression.evaluate(docNode, XPathConstants.NODESET)
coordinateNode =
net.sf.saxon.dom.DOMNodeList@c3e000
>> data = coordinateNode.getTextContent
??? No appropriate method, property, or field getTextContent for class
net.sf.saxon.dom.DOMNodeList.

>> data = coordinateNode.getLength
data =
0

This is essentially the same response: the node coordinates are not being picked up by the function.

I am struggling to debug this because I don’t really know how the java classes and methods work. Presumably I need a single node to be able to use the getTextContent method.

Here is the essential parts of the xml tree with your suggested replacement. Hopefully this will help you see what is going on here.

To give some context, this data is produced by google earth when exporting a set of locations as a .kml file.

&lt?xml version=”1.0″ encoding=”utf-8″?&ampgt

&ltDocument&ampgt

&ltname&ampgtCanale centreline.kml&lt/name&ampgt

&ltPlacemark&ampgt

&ltname&ampgtCanale centreline&lt/name&ampgt

&ltstyleUrl&ampgt#m_ylw-pushpin&lt/styleUrl&ampgt

&ltLineString&ampgt

&lttessellate&ampgt1&lt/tessellate&ampgt

&ltcoordinates&ampgt
10.09984114068684,45.80687219300302,0 10.10001647099475,45.80695514950003,0 10.10009815060466,45.80700378862793,0 10.10014860229519,45.80703578631482,0 10.10022811504785,45.80709873377793,0 10.10031010039278,45.80713198033737,0 10.10039209567001,45.80716523014203,0 10.10043060939518,45.80721155298124,0 10.10059165048882,45.80731526414375,0 10.10067084675329,45.80738580284769,0 10.10077437654077,45.80742018705158,0 10.10087407286891,45.80749903339635,0 10.10094496915137,45.80753930965439,0 10.10105627273914,45.8076036518462,0 10.10113576682277,45.80766657410192,0 10.10119471893764,45.80772115465624,0 10.10126359962793,45.80778359269938,0 10.10136426386506,45.80784743536302,0 10.1014112702849,45.80791630897777,0 10.10148214037166,45.80795656414318,0 10.10154185964934,45.80800378862803,0 10.10161282555237,45.80804408470648,0 10.10169305938987,45.8080996247984,0 10.10176464185466,45.8081325076318,0 10.10183422692097,45.80818754113409,0 10.10190302533531,45.80824989951307,0 10.1019724151388,45.80830484378664,0 10.10207356450982,45.808361196677,0 10.10214093307033,45.80843827920521,0 10.10218914996636,45.80849228359421,0 10.10229299490651,45.80851911841853,0 10.10237499281356,45.8085523932947,0 10.10242320989466,45.80860639765308,0 10.10253629440558,45.80864846068262,0 10.10261559800962,45.80871125354677,0 10.10273070367636,45.80873117766292,0 10.10279152902868,45.80876351326417,0 10.10288478779878,45.80878987758641,0 10.1029859390649,45.80884622983381,0 10.10308911157812,45.80888044325135,0 10.10321278369886,45.80892297497288,0 10.10330469593155,45.80896409855487,0 10.10338602169821,45.80900475219077,0 10.1035831279132,45.80905795013989,0 10.10369554080581,45.8091073915239,0 10.10380930078051,45.80914207396828,0 10.1039243924581,45.80916198901878,0 10.10410953675727,45.80922946173238,0 10.1042655887779,45.80926599718044,0 10.10448377732788,45.80932009535048,0 10.10464581905896,45.80937169685852,0 10.10519473741717,45.80950956495738,0 10.10612207616,45.80976848013496,0 10.10668667174301,45.80991931642902,0 10.10735617652176,45.81011904635547,0 10.10755079422222,45.81019061814287,0 10.10780348654802,45.81027639187093,0 10.10803597017285,45.81037553650121,0 10.10821951016236,45.81046710032802,0 10.10838297427577,45.8105858071117,0 10.10848814678001,45.81069003563514,0 10.10861187605076,45.8108356424888,0 10.10876378535779,45.81102960623957,0 10.10879264406763,45.81105054198222,0 10.10880175134389,45.81107809248236,0 10.10883039778439,45.81110587300541,0 10.10885887582951,45.81114051062271,0 10.10886831282799,45.81115434247666,0 10.10890672473159,45.8111822327723,0 10.10893519903836,45.8112168662463,0 10.1089540707507,45.81124452682191,0 10.10897294201804,45.8112721866723,0 10.10899181299001,45.81129984645426,0 10.10901068384033,45.81132750526101,0 10.10903931941305,45.81135527398825,0 10.1090678193733,45.81138299496511,0 10.10908672014586,45.81140374559718,0 10.10910543183352,45.8114313447435,0 10.1091438470374,45.81145922269521,0 10.10916306040502,45.8114869991964,0
&lt/coordinates&ampgt

&lt/LineString&ampgt

&lt/Placemark&ampgt

&lt/Document&ampgt

&lt/kml&ampgt

16. Michael Katz replied on :

Ah, I get it now. This is because you have a xmlns in your docnode. The Java XPath implementation doesn’t know how to do this on its own, so you need to supply a NamespaceContext.

Try the following. Save this code as KMLNamesspaceContext.java:

import java.util.*;
import javax.xml.*;
import javax.xml.namespace.NamespaceContext;

public class KMLNamesspaceContext implements NamespaceContext {

public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("kml".equals(prefix)) return "http://www.opengis.net/kml/2.2";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;
}

public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}

public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}

}


Then in MATLAB, complile this to a class:

!javac ExampleNamespaceContext.java
nc = KMLNamesspaceContext


Now use that with your expression:

factory = XPathFactory.newInstance;
xpath = factory.newXPath;
xpath.setNamespaceContext(nc);
expression = xpath.compile('//kml:Document')
expression.evaluate(docNode, XPathConstants.NODE)

17. Charlie Hogg replied on :

Hi Michael,

Thanks again for the help.

I still get the same error of

coordinateNode =
[]
??? Attempt to reference field of non-structure array.

Error in ==> xpath_setup at 34
data = coordinateNode.getTextContent .

I have implemented your suggestion as

!/opt/sunjava-native/jdk/bin/javac KMLNamesspaceContext.java
nc = KMLNamesspaceContext

documentNode = docNode.getDocumentElement

import javax.xml.xpath.*
factory = XPathFactory.newInstance;
xpath = factory.newXPath;
xpath.setNamespaceContext(nc);
expression = xpath.compile(‘//kml:Documents’)
coordinateNode=expression.evaluate(docNode, XPathConstants.NODE)
data = coordinateNode.getTextContent

I have a work around based on http://www.mathworks.co.uk/help/techdoc/ref/xmlread.html

I use this example to write turn the xml file into a struct from which I can extract the data I want. This is not ideal and xpath looks much smoother, if I could get it to work.

18. Michael Katz replied on :

@Charlie,

Sorry my example did not fully cover your need. You’ll probably need something like:

expression = xpath.compile('//kml://Placemark/LineString/coordinates');


But I’m not too sure. If you’re still having trouble, please contact Technical Support. They will be able to give you more assistance than I am able to do in the comments, here.

19. Charlie Hogg replied on :

Thanks Michael. Maybe this function was a bit of a leap for me at the moment.

Could you suggest somewhere else that I could find documentation and examples of how to use the xpath functions? The http://xerces.apache.org/xerces-j/apiDocs/index.html pages are pretty much unintelligible to me. For example, how could I find out what arguments expression.evaluate() requires? There is no help in the matlab documentation I can see.

20. Michael Katz replied on :

@Charlie,

Unfortunately there’s not, mostly because this is a third party java package that we make available. Even the documentation at http://www.oxygenxml.com/apidoc/saxon-8.7.1/index.html is not very helpful. There are other sources of xpath tutorials on the net, like http://w3schools.com/xpath/xpath_syntax.asp, but are not specific to the java implementation.

You might also want to try the file exchange and MATLAB answers to see if there are more specific advice available. This also sounds like a good idea for a follow-up post. Let me know how it goes.

21. Jarrod replied on :

@Charlie,

I’ve just added a submission to the file exchange to hopefully provide an easy way to access nodes via XPath. It should support namespaces as well, though I’ve never tried it with Google Earth output

http://www.mathworks.com/matlabcentral/fileexchange/34711-xmlnode

I hope you find it useful. Let me know what you think!

22. Nikhil replied on :

Hi Mike,
Thanks for the article. Really helped as I could easily create a new xml file. However, navigating through an existing xml file, I was wondering if we could edit the text content of a node, or the node name itself for that matter.

23. Michael Katz replied on :

@Nikhil,
You can edit the text content with the setTextContent method. To make minor edits, use getTextContent to copy the String, modify it, and then replace it back. As for changing the node name, the DOM API does not specify a way to do this. You’ll have to create a new element with the desired name and move the first node’s children to the new element. And replace the first node with the new one.

24. Nikhil replied on :

Hey thanks a lot Mike! It worked out just well. I was initially editing with the tree structure, but XPath was so much smoother. Cheers.

25. Octavio replied on :

Hello, I have this:

How I can get data from this with xpath???

26. Octavio replied on :

Hello, I have this:

<TestStep comp=”GELE” datatype=”Number” group=”[010] Seq_Short Test” limhi=”0.002″ limlo=”-0.002″ measid=”010_272″ measname=”PinCheck ST_O_OUT_1″ start=”88079.574242300005″ status=”Passed” stepid=”ID#:MOsSkDksp0KkG29Tfvr4MC” stepname=”[010_272] PinCheck ST_O_OUT_1″ steptype=”NumericLimitTest” time=”0.0034241″ unit=”ampere” value=”-0.0000037545442″/>’

How I can get data from this with xpath???

27. John Rogers replied on :

Thanks for the series Michael.
I have trouble duplicating your code in this piece.

clear; clc
xmlFileName = ‘phoneBook.xml’;
friendlyInfo = entries.item(0).getChildNodes;
node = friendlyInfo.getFirstChild;
while ~isempty(node)
if strcmpi(node.getNodeName, ‘PhoneNumber’)
break;
else
node = node.getNextSibling;
end
end
phoneNumber = node.getTextContent

except for the first 3 lines, this is identical to your sample.
matlab reports this error on line 7:
??? No appropriate method, property, or field getFirstChild for class
org.apache.xerces.dom.CharacterDataImpl$1. if I run through line 5 and evaluate the first part of line 6, I get: >> friendlyInfo = entries.item(0) friendlyInfo = [#text: ] but evaluating all of line 6 produces: org.apache.xerces.dom.CharacterDataImpl$1@6231ed

What is going on?
Thanks,
John

News from the intersection of MATLAB, Community, and the web.

These postings are the author's and don't necessarily represent the opinions of MathWorks.