1

I'm trying to write a script to edit an Excel-encoding xml file that has what are normally considered invalid characters in xml (specifically 
 and a couple other control characters like that). These characters belong there and must stay exactly as is. Preferably using Python's minidom, how can I get the xml parser to ignore these invalid characters?

If it can't be done in minidom, could I get a link to a decent tutorial that shows how to use whatever library you suggest in a way where I can traverse the xml file and change some data values? I've tried looking at lxml, but I can't find a simple way to just go through the file and make changes in their documentation.

Luke Woodward
  • 63,336
  • 16
  • 89
  • 104
Jacob Zimmerman
  • 1,521
  • 11
  • 20
  • ` ` is not an invalid character, though other control characters are. Just remember that your file is a text file but not an XML file, so you can use any text processing tools on it, but you can't use XML tools. – Michael Kay Nov 08 '21 at 22:25
  • @MichaelKay Well, minidom begs to differ about whether it's an invalid character, but there are also other control characters used, so the point is moot. Yes, I COULD just use text processing, but this is a giant xml document and I would VERY much like to keep my code simpler. – Jacob Zimmerman Nov 09 '21 at 14:48
  • Well, if it's got invalid characters then it's NOT an XML document. – Michael Kay Nov 09 '21 at 16:32
  • Sorry if it seems unhelpful, but I'm trying to change your mindset. If you persist in thinking of it as an XML document then you will be frustrated that XML tools can't handle it. If you start to think of it as a non-XML document, then you will naturally start looking for tools that can handle non-XML documents. – Michael Kay Nov 10 '21 at 00:04
  • @MichaelKay Except I already know of xml tools that CAN deal with it. Knowing this, I was asking for help in THAT realm because it's 20x easier to use an already-built tool than to roll my own and get it wrong. – Jacob Zimmerman Nov 11 '21 at 02:37

0 Answers0