How do you read an XML file with extra text (i.e. namespaces) after the root tag using Python?

Question

XML (actually TCX) files produced by the routing site I use (plotaroute.com) produce a root tag line that is very long. I am not able to get Python to read this file. (My temporary workaround is to "manually" delete all the extra text after the root tag before running my program.)

If there's any text after "<TrainingCenterDatabase" in line 2 of the TCX file, the program won't print anything.

from dotenv import load_dotenv
import os

from xml.etree import ElementTree

from pathlib import Path

load_dotenv()
data_folder = Path(os.getenv('DATA_FOLDER'))
TCX_full = data_folder / "Poudre_Up024.tcx"
dom = ElementTree.parse(TCX_full)
trackpoints = dom.findall('Courses/Course/Track/Trackpoint')

i=0
for t in trackpoints:
    lat = float(t.find('Position/LatitudeDegrees').text)
    long = float(t.find('Position/LongitudeDegrees').text)
    alt = float(t.find('AltitudeMeters').text)
    dist_meters = float(t.find('DistanceMeters').text)
    print('%5d %10.6f %10.6f %6.1f %8.2f' % (i, lat, long, alt, dist_meters))
    i += 1

Here's a snippet of my original TCX file.

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">
<Folders>
<Courses>
<CourseFolder Name="Courses">
<CourseNameRef>
<Id>Poudre Up</Id>
</CourseNameRef>
</CourseFolder>
</Courses>
</Folders>
<Courses>
<Course>
<Name>Poudre Up</Name>
<Lap>
<TotalTimeSeconds>9714.643764117841</TotalTimeSeconds>
<DistanceMeters>50200.7974473868</DistanceMeters>
<BeginPosition>
<LatitudeDegrees>40.663732</LatitudeDegrees>
<LongitudeDegrees>-105.189232</LongitudeDegrees>
</BeginPosition>
<EndPosition>
<LatitudeDegrees>40.699534</LatitudeDegrees>
<LongitudeDegrees>-105.580839</LongitudeDegrees>
</EndPosition>
<Intensity>Active</Intensity>
<Cadence>0</Cadence>
</Lap>
<Track>
<Trackpoint>
<Time>2022-10-06T00:00:00Z</Time>
<Position>
<LatitudeDegrees>40.663732</LatitudeDegrees>
<LongitudeDegrees>-105.189232</LongitudeDegrees>
</Position>
<AltitudeMeters>1599</AltitudeMeters>
<DistanceMeters>0</DistanceMeters>
<SensorState>Absent</SensorState>
</Trackpoint>

MANY <Trackpoint></Trackpoint>

</Track>

SEVERAL <CoursePoint></CoursePoint>

</Course>
</Courses>
</TrainingCenterDatabase>

@miriamka Using the {*} wildcard technique in [https://stackoverflow.com/a/62117710/407651](https://stackoverflow.com/a/62117710/407651), I was able to read the original file successfully. So far, I haven't been able to succeed with register_namespace. — virtualdynamo, Oct 26 '22 at 01:55
What do you mean by "succeed with register_namespace"? `register_namespace` only affects serialization. See https://stackoverflow.com/a/58627058/407651 — mzjn, Oct 26 '22 at 08:03
@mzjn I was hoping to use **register_namepace** "to handle things in a more procedural fashion" per [https://medium.datadriveninvestor.com/getting-started-using-pythons-elementtree-to-navigate-xml-files-dc9bc720eaa6] — virtualdynamo, Oct 26 '22 at 12:38
That article is misleading IMHO, because it implies that `register_namespace` affects searching/querying of an XML document, which is false. — mzjn, Oct 26 '22 at 14:51
@mzjn Well that makes me feel better because I was getting nowhere. — virtualdynamo, Oct 27 '22 at 02:09
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community, Nov 02 '22 at 13:43

score 0 · Answer 1 · answered Oct 26 '22 at 12:43

Here's a snipppet of the quick and dirty solution of adding {*} to findall/find:

trackpoints = dom.findall('{*}Courses/{*}Course/{*}Track/{*}Trackpoint')

i=0
for t in trackpoints:
    lat = float(t.find('{*}Position/{*}LatitudeDegrees').text)
    long = float(t.find('{*}Position/{*}LongitudeDegrees').text)
    alt = float(t.find('{*}AltitudeMeters').text)
    dist_meters = float(t.find('{*}DistanceMeters').text)
    print('%5d %10.6f %10.6f %6.1f %8.2f' % (i, lat, long, alt, dist_meters))
    i += 1

How do you read an XML file with extra text (i.e. namespaces) after the root tag using Python?

1 Answers1