1

I parse a very large xml file (from jpylyzer, a jp2 properties extractor). This xml contains properties of many JP2 images, each one with the same elements, like :

//results/jpylyzer/fileInfo/fileName
//results/jpylyzer/properties/jp2HeaderBox/imageHeaderBox/height
//results/jpylyzer/properties/jp2HeaderBox/imageHeaderBox/width
//results/jpylyzer/properties/jp2HeaderBox/imageHeaderBox/bPCDepth

In order to reduce processing time, I'm using this method :

for (XPathExpression xPathExpression : listXPathExpression) {
    nodeList = (NodeList) xPathExpression.evaluate(document, XPathConstants.NODESET);
    //we use our list
}

It's very convenient and fast, but the number of elements must be as we expected for each property. As some properties are unique to some images, some xpath values won't be found for some images.

nodeList is filled ONLY with found values, which is a problem : there's no way to match those values to other ones as lists don't have the same size depending on how many properties has been found.

Is there a way to fill "blank" when no value is found ?

Barium Scoorge
  • 1,938
  • 3
  • 27
  • 48
  • Would be easy with XSLT. Not familiar with Java; Can you set a context item other than the root / document element? Can you apply further path expressions to the nodeList an expression evaluates to? Can you iterate over a nodeList? – Mathias Müller Jan 12 '15 at 10:30
  • I can iterate over nodelist. I guess i can evaluate too. – Barium Scoorge Jan 12 '15 at 11:01

1 Answers1

1

What you want is not possible with a single XPath expression, not even with version 2.0. In such a case, you have to reach for the higher-level language you embed XPath in.

As I'm not familiar with Java very much, I cannot give you specific code, but I can explain what you have to do.

I assume an XML document similar to

<results>
    <jpylyzer>
        <fileInfo>
            <fileName>Name of file</fileName>
        </fileInfo>
        <properties>
            <jp2HeaderBox>
                <imageHeaderBox>
                    <height>45</height>
                    <width>66</width>
                    <bPCDepth>386</bPCDepth>
                </imageHeaderBox>
                <imageHeaderBox>
                    <width>32</width>
                </imageHeaderBox>
            </jp2HeaderBox>
        </properties>
    </jpylyzer>
</results>

As a starting point, find an element that really is present in all XML documents, in all situations. For the sake of an example, let us assume imageHeaderBox is present everywhere, but its children height, width and bPCDepth are not necessarily there.

Find an XPath expression for the imageHeaderBox element:

/results/jpylyzer/properties/imageHeaderBox

evaluate the expression and save the result to a nodeList. Next, process this list further. This only works if XPath expressions can be applied to the individual items in a nodeList, but it seems you are optimistic about that:

I can iterate over nodelist. I guess i can evaluate too

Iterate over the nodeList (the result of the imageHeaderBox expression) and apply another path expression to each item.

XPath 2.0

In XPath 2.0, you can use an if/then statement that checks for the presence of a node. Assuming the imageHeaderBox element node as the context item:

if(height) then height else 'e.g. text saying there is no height'

XPath 1.0

With XPath 1.0, it's slightly more complicated:

concat(height, substring('e.g. text saying there is no height', 1 div not(height)))"

See Dimitre Novatchev's answer here for an explanation. The technique is known as the Becker method, probably introduced here.

Finally, the result list should look similar to

45
e.g. text saying there is no height
Community
  • 1
  • 1
Mathias Müller
  • 22,203
  • 13
  • 58
  • 75