In Python script using lxml, I use the following Xpath to find elements with a certain text content that do not have a certain value for a particular attribute. Like this:
xpath('//el[text()="something" or text()="something else" or text()="this other thing" and @attrib!="A"]')
I also tried:
xpath('//el[text()="something" or text()="something else" or text()="this other thing" and not(@attrib="A")]')
This is part of a loop like this:
for element in root.xpath('//el[text()="something" or text()="something else" or text()="this other thing" and not(@attrib="A")]'):
element.get('attrib')
In the results I get lots of 'A' values. I don't understand what I'm doing wrong. This is not supposed to happen. I explicitly included 'not(@attrib="A")' as one of the conditions.
========= addition ============
for el in root_element.xpath('//tok[text()="altra" or text()="altres" or text()="altr" and not(@lemma="altre")]'):
wrong_lemma = el.get('lemma')
This is an example of a part of a document that contains the element that should not get matched but that IS matched. I get 'altre' as the value for variable 'wrong_lemma' in the output.
<tok id="w-1264" ord="5" lemma="altre" xpos="DI0CP0">altres</tok> <tok id="w-1265" ord="6" lemma="insigne" xpos="AQ0CP00">insignes</tok> <tok id="w-1266" ord="7" lemma="cavaller" xpos="NCMP000">cavallers</tok>
The following do not work either:
for el in root_element.xpath('//tok[text()="altra" or text()="altres" or text()="altr" and @lemma!="altre"]'):
wrong_lemma = el.get('lemma')
for el in root_element.xpath('//tok[text()="altra" or text()="altres" or text()="altr" and not(contains(@lemma!="altre"))]'):
wrong_lemma = el.get('lemma')