I have a large XML that I cant parse completely in R due to memory shortage. I would like just to extract some specific columns. I found other asked similar questions:
How to read large (~20 GB) xml file in R? Storing specific XML node values with R's xmlEventParse
I cant get it to work though with my data, it runs, but no data is returned. I did try to adjust the suggested solutions to my XML but it still does not work. Might be my lack of knowledge XML. Below is a example of my XML data, where cl, clssc, clp, clpssc, primclp
are the columns. How can I extract only cl
and clssc
without parsing the whole document first?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<abc:abc xmlns:abc="http://abc/abc" xsi:schemaLocation="http://abc/abc lala_20Q2.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<chcp>
<cl>2000000</cl>
<clssc>10934</clssc>
<clp>200000</clp>
<clpssc>10934</clpssc>
<primclp>Y</primclp>
</chcp>
<chcp>
<cl>2000000</cl>
<clssc>10934</clssc>
<clp>200000</clp>
<clpssc>10934</clpssc>
<primclp>Y</primclp>
</chcp>
<chcp>
<cl>2000000</cl>
<clssc>10934</clssc>
<clp>2000000</clp>
<clpssc>10934</clpssc>
<primclp>Y</primclp>
</chcp>
</abc:abc>