I am writing a Porter stemmer in xQuery and as the first step I need to match consonant and vowel patterns. The consonant matching sequence from the Perl example I'm using as a basis for this is (?:[^aiueoy]|(?:(?<=[aiueo])y)|\by)
, and the vowel sequence is (?:[aiueo]|(?:(?<![aiueo])y))
. I need to expand that to also include the letter aesc (æ), and so this is what I have for my xquery regex:
let $v := element {"vowels"} {matches($f,"(?:([^aiueoy])|(?:(?:[aiueo]\1)y))")}
let $c := element {"consonants"} {matches($f,"(?:([aiueo])|(?:(?<![aiueo]\1)y))")}
A sample of the type of XML I am looking for is as follows:
<entry ref="173">
<headword>abǒve</headword>
<headword>abǒven</headword>
<variant>abufe</variant>
<variant>abufen</variant>
<variant>abuue</variant>
<variant>abuuen</variant>
<variant>abowve</variant>
<variant>obove</variant>
<variant>oboven</variant>
<variant>obufe</variant>
<variant>obufen</variant>
<variant>abof</variant>
<variant>obof</variant>
<variant>aboyf</variant>
<variant>aboun</variant>
<variant>aboune</variant>
<variant>abown</variant>
<variant>abowne</variant>
<variant>aboon</variant>
<variant>oboun</variant>
<variant>oboune</variant>
<variant>abow</variant>
<variant>aboʒe</variant>
<part_of_speech> adv. </part_of_speech>
</entry>
Running this in Saxon, however, I get the following error: Query failed with dynamic error: Syntax error at char 17 in regular expression: No expression before quantifier
I'm pretty sure my issue is that I'm not building the positive lookbehind properly, having changed it from <=
to \1
, but I'm not sure how I would build that aspect in a way that works with xQuery. Any suggestions would be much appreciated.