0

Im trying to make a graph edge list from a dictionary in python, with the following code:

graph= []
for key, value in dic_test.items():
    for x in range (0,len(value)):
        if (x+1) < len(value):
            for y in range (1,len(value)):
                if y != x and y>x:
                    graph.append([value[x],value[y]])

This gets what I want, for example if I get this test dictionary:

dic_test= {1: ['A', 'E','F','G'], 2: ['B', 'D','X'], 3: ['C',"Y"],4:[],5:['f','h']}

I get the following output:

[['A', 'E'],
 ['A', 'F'],
 ['A', 'G'],
 ['E', 'F'],
 ['E', 'G'],
 ['F', 'G'],
 ['B', 'D'],
 ['B', 'X'],
 ['D', 'X'],
 ['C', 'Y'],
 ['f', 'h']]

Problem is when I run a big dictionary it runs until the kernel crashes, any ideas I could make this code more efficient?

Duarfo
  • 13
  • 5
  • How big a dictionary are you talking about? Are there a large number of keys, are the values long, or both? The number of unique pairs of elements in a list is `O(n^2)` where `n` is the size of the list. – Garrett Gutierrez Mar 08 '18 at 15:41

4 Answers4

0

You can use itertools.combinations()
(see how to get all combinations of a list's elements):

import itertools
dic_test= {1: ['A', 'E','F','G'], 2: ['B', 'D','X'], 3: ['C',"Y"],4:[],5:['f','h']}

_combinations = []
for _value in dic_test.values():
    _combinations.extend(list(itertools.combinations(_value, 2)))

print(_combinations)

[('A', 'E'),
 ('A', 'F'),
 ('A', 'G'),
 ('E', 'F'),
 ('E', 'G'),
 ('F', 'G'),
 ('B', 'D'),
 ('B', 'X'),
 ('D', 'X'),
 ('C', 'Y'),
 ('f', 'h')]

Since you have imported itertools, use itertools.chain() it is possible to do the following one liner:

list(itertools.chain(*[list(itertools.combinations(_value, 2)) for _value in dic_test.values()]))

Note

1. Performances issues:
- with list.extend() : 7.23 µs per loop
- with itertools.chain() : 8.15 µs per loop

2. Huge, very huge, very very very huge dictionnary:
As the operation you do on each key is independant one to another you can possibly parallelize your task (multiprocessing documentation if you require it)

David Leon
  • 1,017
  • 8
  • 25
0

Itertools might help you here, as each edge is just a 2 item combination of the vertexes in each sublist:

import itertools

output = []
for links in dic_test.values():
    output += map(list, itertools.combinations(links, 2))
Simon Brahan
  • 2,016
  • 1
  • 14
  • 22
0
for val in dic_test.values():
     a=itertools.combinations(val,2)
     for c in a:
         print(c)

gives the output

('A', 'E')
('A', 'F')
('A', 'G')
('E', 'F')
('E', 'G')
('F', 'G')
('B', 'D')
('B', 'X')
('D', 'X')
('C', 'Y')
('f', 'h')
Alper Fırat Kaya
  • 2,079
  • 1
  • 18
  • 18
0

In addition to the suggestions offering itertools, it's worth noting that you have sets of disjoint clicks. That means you don't need to do all of these computations at once and certainly don't need to store all the results in memory at the same time. It also means that you could parallelize.

Seth Rothschild
  • 384
  • 1
  • 14