1

Given the following XML, I want to select every potential element between "First heading" and "Second heading", these heading elements excluded. I am not sure what version of XSLT I can use (I'm modifying a sheet run by a proprietary app...)

<body>
  <h1 class="heading1">Some title</h1>
  <p class="bodytext">Some text.</p>

  <p class="sectiontitle">First heading</p>
  <p class="bodytext">Want that.</p>
  <div>
    <p class="bodytext">Want that too!</p>
  </div>

  <p class="sectiontitle">Second heading</p>
  <p class="bodytext">Some text</p>

  <p class="sectiontitle">Third heading</p>
  ...
</body>

Expected:

<p class="bodytext">Want that.</p>
<div>
  <p class="bodytext">Want that too!</p>
<div>

I know that p class="sectiontitle">First heading</p>:

  • will always be of the sectiontitle class.
  • will always contain First heading.
  • does not have to be first p of this class, its position is unknown.

I also now that I will stop once I find <p class="sectiontitle">Could be any title</p> (so based on class only)

I have seen the other similar posts about this kind of problems, and I still can't crack my case...

What I have tried, amongst other things:

//*[(preceding-sibling::p/text()="First heading") and (not(following-sibling::p[@class="sectiontitle"]))]
Flag
  • 497
  • 1
  • 3
  • 17

3 Answers3

2

You can use the following XPath expression (updated to avoid selecting the 2nd sectiontitle element) :

//p[@class='sectiontitle' and .='First heading']
 /following-sibling::*[
    preceding-sibling::p[@class='sectiontitle'][1] = 'First heading'
    and not(self::p/@class = 'sectiontitle')
 ]

Basically, the XPath returns following-sibling elements of the First Heading element, where the nearest preceding sibling 'sectiontitle' is the First Heading element itself.

har07
  • 88,338
  • 12
  • 84
  • 137
  • Thank you for the (quick) answer. It is a major leap forward but still does not return what is expected. – Flag Apr 11 '16 at 12:27
  • After tinkering with your answer I also noticed that the first heading does not have to be p[@class='sectiontitle'][1] but must be identified by the 2 critiria I mentioned. This thing is tricky. – Flag Apr 11 '16 at 12:37
  • @Flag don't you think the expression `preceding-sibling::p[@class='sectiontitle'][1] = 'First heading'` covers both criteria? – har07 Apr 11 '16 at 12:38
  • @Flag you are welcome! As an aside, XSLT has features like `current()` and variables that you can utilize to avoid constructing too complex/long XPath expression -if only the question was posted with more XSLT context in it :D – har07 Apr 11 '16 at 12:44
2

I think this is more straightforward, meaning you can specify between which two headings you want the output :

//p[@class='sectiontitle' and text()='Second heading']/preceding-sibling::*[preceding-sibling::p[@class='sectiontitle'][1] = 'First heading']

For example if you want to get output between 'Second heading' and 'Third heading' just change 'Second heading' to 'Third heading' and 'First heading' to 'Second Heading' in the above expression

SomeDude
  • 13,876
  • 5
  • 21
  • 44
  • That's pretty cool too, I might give it a go, and I'm sure other people will be interested as well. Thank you for your answer. – Flag Apr 11 '16 at 15:28
0

I discovered a great way to answer my own question using ids.

Let's say you want to select the following siblings of the current tag (a sectiontitle in my example), until you find any element that has a 'title' looking class, so for instance paragraphtitle or sectiontitle:

 <xsl:variable name="thisgid" select="generate-id(.)" />
 <xsl:apply-templates select="following-sibling::*[not(@class='sectiontitle' or @class='paragraphtitle')]
                              [generate-id(preceding-sibling::p[@class='sectiontitle'][1]) = $thisgid]"/>

That has solved many problems in my case.

Flag
  • 497
  • 1
  • 3
  • 17