2

Python 3.2.5 x64 ElementTree

I have data that I need to format using python. Essentially I have file with elements and subelements. I need to delete the child elements of some of these elements. I have checked previous questions and I couldn't make a solution. The best I had so far only removes every second child element.

Sample data:

<Leg1:MOR oCount="7" xmlns:Leg1="http://what.not">
    <Leg1:Order>
        <Leg1:CTemp id="FO">
            <Leg1:Group bNum="001" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
                <Leg1:Group bNum="002" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
        </Leg1:CTemp>
        <Leg1:CTemp id="GO">
            <Leg1:Group bNum="001" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
            <Leg1:Group bNum="002" cCount="4">
                <Leg1:Dog ndate="112" pdate="111"/>
                <Leg1:Dog ndate="122" pdate="121"/>
                <Leg1:Dog ndate="132" pdate="131"/>
                <Leg1:Dog ndate="142" pdate="141"/>
            </Leg1:Group>
        </Leg1:CTemp>
    </Leg1:Order>
</Leg1:MOR>

What I need the output to look like:

<Leg1:MOR oCount="7" xmlns:Leg1="http://what.not">
    <Leg1:Order>
        <Leg1:CTemp id="FO">
            <Leg1:Group bNum="001" cCount="10"/>
            <Leg1:Group bNum="002" cCount="10"/>
        </Leg1:CTemp>
        <Leg1:CTemp id="GO">
            <Leg1:Group bNum="001" cCount="10"/>
            <Leg1:Group bNum="002" cCount="10"/>
        </Leg1:CTemp>
    </Leg1:Order>
</Leg1:MOR>

I haven't written anything in a while and my code is useless. I can parse the file, and write it I cannot get the processing right.

import xml.etree.cElementTree as ET
tree = ET.parse("input.xml")
root = tree.getroot()
for x in root.findall('./Order/CTemp/Group'):
    root.remove(x)
tree.write("output.xml")

How do I get it remove the Dog children of the CTemp elements?

Max Spencer
  • 1,701
  • 2
  • 12
  • 21
LCGA
  • 137
  • 1
  • 8

1 Answers1

1

If you can use lxml, try this:

import lxml.etree

tree = lxml.etree.parse("leg.xml")
for dog in tree.xpath("//Leg1:Dog",
                      namespaces={"Leg1": "http://what.not"}):
    parent = dog.xpath("..")[0]
    parent.remove(dog)
    parent.text = None
tree.write("leg.out.xml")

Now leg.out.xml looks like this:

<?xml version="1.0"?>
<Leg1:MOR xmlns:Leg1="http://what.not" oCount="7">
  <Leg1:Order>
    <Leg1:CTemp id="FO">
      <Leg1:Group bNum="001" cCount="4"/>
      <Leg1:Group bNum="002" cCount="4"/>
    </Leg1:CTemp>
    <Leg1:CTemp id="GO">
      <Leg1:Group bNum="001" cCount="4"/>
      <Leg1:Group bNum="002" cCount="4"/>
    </Leg1:CTemp>
  </Leg1:Order>
</Leg1:MOR>
  • 1
    Great thank you! One step closer. Now can you think of any way to concatenate the Group element from: `` to `` – LCGA May 13 '15 at 10:08
  • @LCGA I've improved my answer. –  May 13 '15 at 10:30
  • Awesome! Thank you so much. I hate to admit it but I was stuck on this for a full day yesterday. – LCGA May 13 '15 at 10:41
  • 1
    A small side note that your parsing of the xml file produces this error: `lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1` This is a problem with larger files, I changed the: `tree = lxml.etree.parse(open("leg.xml"))` to `tree = lxml.etree.parse("leg.xml")` – LCGA May 13 '15 at 11:21
  • So if I want to remove the Leg1: prefix from all the elements how would I go about doing that? – LCGA May 14 '15 at 06:03
  • Please ask a new question. –  May 14 '15 at 06:13
  • http://stackoverflow.com/questions/30232031/changing-an-elements-tag-in-xml-using-lxml-in-python @Tichodroma – LCGA May 14 '15 at 07:51
  • I will take a look later. –  May 14 '15 at 07:54