3

Here is my xml:

<Root>  
<FirstChild id="1" att="a">
    <SecondChild id="11" att="aa">
        <ThirdChild>123</ThirdChild>
        <ThirdChild>456</ThirdChild>
        <ThirdChild>789</ThirdChild>
    </SecondChild>
    <SecondChild id="12" att="ab">12</SecondChild>
    <SecondChild id="13" att="ac">13</SecondChild>
</FirstChild>  
<FirstChild id="2" att="b">2</FirstChild>  
<FirstChild id="3" att="c">3</FirstChild>
</Root>

This xml doc is very big and may be 1 GB size or more. For better performance in querying, i want to read xml doc step by step. So, in first step i want to read only "First Child"s and their attributes like below:

<FirstChild id="1" att="a"></FirstChild>  
<FirstChild id="2" att="b">2</FirstChild>  
<FirstChild id="3" att="c">3</FirstChild>

And after that, I maybe want to get "SecondChild"s by id of their parent and so ...

<SecondChild id="11" att="aa"></SecondChild>
<SecondChild id="12" att="ab">12</SecondChild>
<SecondChild id="13" att="ac">13</SecondChild>

How can I do it?

Note: XDoc.Descendants() or XDoc.Elements() load all specific elements with all child elements!

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Saleh Bagheri
  • 424
  • 4
  • 19
  • I think `XDoc.Descendants("FirstChild")` will do what you want, if you know the names beforehand. – Quantic May 01 '17 at 20:17
  • yes! but this function loads all child elements! – Saleh Bagheri May 01 '17 at 20:19
  • 4
    How are you loading the file? I assume you are using `XDocument` and not `XmlDocument`. If you called `XDocument.Load()` then [the whole file is in memory *already*](http://stackoverflow.com/questions/42732728/does-xdocument-load-loads-all-data-into-memory/42732752), so it shouldn't matter if you use `Descendants()` vs. `Descendants("FirstChild")`. – Quantic May 01 '17 at 20:32
  • I'm loading the file using XDocument! what way do you prefer? @Quantic – Saleh Bagheri May 01 '17 at 20:36
  • 2
    Here's an [SO post](http://stackoverflow.com/a/5838688/5884037) which might be of great help. – blaze_125 May 01 '17 at 20:36
  • Thanks @blaze_125 I should to try that. – Saleh Bagheri May 01 '17 at 20:40
  • 1
    In today's world, I don't think a 1GB document is too large to process in-memory, and re-reading the document from top to bottom multiple times is not as efficient as reading once and processing everything as you encounter it. On the filesystem, an XML file is just a text file, it is read and processed sequentially. The XMLReader class will help you do this, but will still be most efficient if you make one pass. – NetMage May 01 '17 at 21:35

3 Answers3

1

Provided that you have memory available to hold the file, I suggest treating each search step as an item in the outer collection of a PLINQ pipeline.

I would start with an XName collection for the node collections that you want to retrieve. By nesting queries within XElement constructors, you can return new instances of your target nodes, with only name and attribute information.

With a .Where(...) statement or two, you could also filter the attributes being kept, allow for some child nodes to be retained, etc.

using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

namespace LinqToXmlExample
{
    public class Program
    {
        public static void Main(string[] args)
        {
            XElement root = XElement.Load("[your file path here]");

            XName[] names = new XName[] { "firstChild", "secondChild", "thirdChild" };

            IEnumerable<XElement> elements =
                names.AsParallel()
                     .Select(
                         name =>
                             new XElement(
                                 $"result_{name}",
                                 root.Descendants(name)
                                     .AsParallel()
                                     .Select(
                                        x => new XElement(name, x.Attributes()))))
                     .ToArray();
        }
    }
}
Austin Drenski
  • 506
  • 4
  • 10
1

I suggest creating a new element and copy the attributes.

var sourceElement = ...get "<FirstChild id="1" att="a">...</FirstChild>" through looping, xpath or any method.

var element = new XElement(sourceElement.Name);
foreach( var attribute in sourceElement.Attributes()){
    element.Add(new XAttribute(attribute.Name, attribute.Value));
}
LosManos
  • 7,195
  • 6
  • 56
  • 107
0

In VB this you could do this to get a list of FirstChild

    'Dim yourpath As String = "your path here"
    Dim xe As XElement
    'to load from a file
    'xe = XElement.Load(yourpath)

    'for testing
    xe = <Root>
             <FirstChild id="1" att="a">
                 <SecondChild id="11" att="aa">
                     <ThirdChild>123</ThirdChild>
                     <ThirdChild>456</ThirdChild>
                     <ThirdChild>789</ThirdChild>
                 </SecondChild>
                 <SecondChild id="12" att="ab">12</SecondChild>
                 <SecondChild id="13" att="ac">13</SecondChild>
             </FirstChild>
             <FirstChild id="2" att="b">2</FirstChild>
             <FirstChild id="3" att="c">3</FirstChild>
         </Root>

    Dim ie As IEnumerable(Of XElement)
    ie = xe...<FirstChild>.Select(Function(el)
                                      'create a copy
                                      Dim foo As New XElement(el)
                                      foo.RemoveNodes()
                                      Return foo
                                  End Function)
dbasnett
  • 11,334
  • 2
  • 25
  • 33