2

Let me demonstrate my issue with an example. I have a data structure that looks like the one provided below:

[
[['A', 'B'], '1', '1...'],
[['A', 'B'], '2', '2...'],
[['A', 'C'], '3', '3...'],
[['A', 'C'], '4', '4...'],
[['A', 'A'], '5', '5...'],
[['A', 'D'], '6', '6...'],
[['C', 'A'], '7', '7...'],
[['D', 'C', 'B'], '8', "8..."],
[['D', 'A', 'B'], '9', "9..."],
[['D', 'A', 'A', 'Y'], '10', "10..."],
[['D', 'A', 'A', 'X'], '11', "11..."]
]

Each element starts with a list (the number of elements in this list is unknown), followed by two element (the last two elements are not important, they are just text). I want to create a new structure from this list. I want to combine elements based on their first element, like this:

'A'
    'A'
        '5', '5...'
    'B'
        '1', '1...'
        '2', '2...'
    'C'
        '3', '3...'
        '4', '4...'
    'D'
        '6', '6...'
'C'
    'A'
        '7', '7...'
'D'
    'A'
        'A'
            'X'
                '11', '11...'
            'Y'
                '10', '10...'
'D'
    'A'
        'B'
            '9', '9...'
    'C'
        'B'
            '8', '8...'

So the first element(in the first list) acts as the first level, the second element as the second level and so on.

I hope you can see what I am trying to do! I was hoping to use sort() and then use itemgetter and groupby but I then saw the number of elements are different in each list. How should I solve this problem?

I need to create this new structure in order to create a proper XML of the original input.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
theAlse
  • 5,577
  • 11
  • 68
  • 110

1 Answers1

3

You could use nested dictionaries to turn the list of lists into a tree-like structure:

result = {}
for row in data:
    path, item = row[0], row[1:]
    d = result
    for p in path[:-1]:
        d = d.setdefault(p, {})
    d.setdefault(path[-1], []).append(item)
print result

result will then look like this (indentation added for readability):

{'A': {'A': [['5', '5...']], 'C': [['3', '3...'], ['4', '4...']], 
       'B': [['1', '1...'], ['2', '2...']], 'D': [['6', '6...']]}, 
 'C': {'A': [['7', '7...']]}, 
 'D': {'A': {'A': {'Y': [['10', '10...']], 'X': [['11', '11...']]}, 
             'B': [['9', '9...']]}, 
       'C': {'B': [['8', '8...']]}}}

If you want to sort it, you can either sort the data a priori and use collection.OrderedDict instead of plain-old {}, or use a method that recursively loops the sub-dictionaries in sorted order, like this:

def print_sorted(d, depth=0):
    if isinstance(d, dict):
        for k in sorted(d):
            print "  " * depth, k
            print_sorted(d[k], depth+1)
    else:
        print "  " * depth, d
print_sorted(result)

Output is just like in your question. You just need to add the XML-writing stuff.

tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • I don't know how to traverse this in such a way so I can create proper XML tags. any ideas? If I use recursion I lose track off once it has come the most inner element. – theAlse Jun 18 '14 at 13:19
  • Using your method I implemented the whole thing and all of sudden I realized I am missing lots of element. Dict updates a duplicate element. So all the element with the same first element will be removed. As you can see in your own output, the element ['A', 'B'], '1', '1...' is nowhere to be found in your printout. – theAlse Jun 18 '14 at 14:15
  • It has been replaced by ['A', 'B'], '2', '2...']. I need to add these to the same list each as a tuple or something – theAlse Jun 18 '14 at 14:21
  • @theAlse Right, did not see that. Fixed. – tobias_k Jun 18 '14 at 14:33
  • I used your modified version and it is working fine :) thanks. – theAlse Jun 19 '14 at 05:37