0

I currently have a list of dictionaries shown below:

temp_indices_=[{0: {12:11,11:12}}, {0: {14:13,13:14}}, {0: {16:15,15:16}}, {0: {20:19,19:20}},{0: {24: 23, 23: 24, 22: 24}, 1: {24: 22, 23: 22, 22: 23}},{0: {28: 27, 27: 28, 26: 28}, 1: {28: 26, 27: 26, 26: 27}}]

To convert the list into a dataframe, the following code is called:

  temp_indices= pd.DataFrame()
  
  for ind in range(len(temp_indices_)):
       # print(ind)
        temp_indices = pd.concat([temp_indices,pd.DataFrame(temp_indices_[ind][0].items())],axis=0)
  temp_indices = temp_indices.rename(columns={0:'ind',1:'label_ind'})

An example output from temp_indices is shown below which should concat all dictionaries into one dataframe:

   ind  label_ind
0   12  11
1   11  12
0   14  13
1   13  14
0   16  15
1   15  16
0   20  19
1   19  20
0   24  23
1   23  24
2   22  24
0   28  27
1   27  28
2   26  28
0   28  26 
1   27  26  
2   26 27

To improve speed I have tried out pd.Series(temp_indices_).explode().reset_index() as well as pd.DataFrame(map(lambda i: pd.DataFrame(i[0].items()), temp_indices_)) but can not drill down to the core dictionary to convert it to a dataframe.

explode method

Sade
  • 450
  • 7
  • 27

2 Answers2

1

Use list comprehension for speedup:

  • Three loops have been used inside list comprehension. One for iterating over the list of dictionaries. Second for accessing values from dictionary. And thired for accessing key,value pair along with increasing index.
  • Then make dataframe from resultant list.
  • Since column named 'label' contains tuple of values so break it using df['label'].tolist()
  • Finally delete the column named 'label'
data = [(ind,list(value.items())[ind]) for i in temp_indices_ for value in i.values() for ind in range(len(value))]
df = pd.DataFrame(data, columns =["Index","label"])
df[['ind', 'label_ind']] = pd.DataFrame(df['label'].tolist(), index=df.index)
df.drop(['label'], axis=1, inplace=True)
print(df)

        Index  ind  label_ind
    0       0   12         11
    1       1   11         12
    2       0   14         13
    3       1   13         14
    4       0   16         15
    5       1   15         16
    6       0   20         19
    7       1   19         20
    8       0   24         23
    9       1   23         24
    10      2   22         24
    11      0   24         22
    12      1   23         22
    13      2   22         23
    14      0   28         27
    15      1   27         28
    16      2   26         28
    17      0   28         26
    18      1   27         26
    19      2   26         27

Hamza usman ghani
  • 2,264
  • 5
  • 19
  • this code is not readable at all and making such a massive list comprehension is not Pythonic – gold_cy May 04 '21 at 12:23
  • not really, list comprehensions are not meant to be used in such a way, it makes it unapproachable to beginners and is generally not readable – gold_cy May 04 '21 at 12:37
  • List comprehensions are more readable and faster. Also the line wrote above too :) – Hamza usman ghani May 04 '21 at 12:39
  • sure list comprehensions are readable but not when you make it three levels deep – gold_cy May 04 '21 at 12:40
  • Time completed where temp_indices has 250299 samples: 2.683337099995697 secs – Sade May 04 '21 at 13:55
  • The most important aspect is the improvement on speed. I have another function that I was working on yesterday to improve speed https://stackoverflow.com/questions/67348247/improving-the-speed-when-calculating-permutation-on-multiple-elements-in-list-of . Currently at 2083.1114619 secs but if you can top my speed that will great :) – Sade May 04 '21 at 14:08
0

This just sounds like a problem that can be solved through recursion with the final output being used to create a DataFrame.

def unpacker(data, parent_idx=None):
    final = []
    
    if isinstance(data, list):
        for row in data:
            for k, v in row.items():
                if isinstance(v, dict):
                    unpacked = unpacker(v, parent_idx=k)
                    for row1 in unpacked:
                        final.append(row1)
    else:
        for k1, v1 in data.items():
            final.append((parent_idx, k1, v1))
    
    return final

l = unpacker(temp_indices_)
df = pd.DataFrame(l, columns=["Index", "Ind", "Label_Ind"])
print(df)

    Index  Ind  Label_Ind
0       0   12         11
1       0   11         12
2       0   14         13
3       0   13         14
4       0   16         15
5       0   15         16
6       0   20         19
7       0   19         20
8       0   24         23
9       0   23         24
10      0   22         24
11      1   24         22
12      1   23         22
13      1   22         23
14      0   28         27
15      0   27         28
16      0   26         28
17      1   28         26
18      1   27         26
19      1   26         27
gold_cy
  • 13,648
  • 3
  • 23
  • 45
  • Would the for loops not have major impact on the speed. I currently have (1311612, 60) samples in my dataset? – Sade May 04 '21 at 12:14
  • yes, but there is no pure `pandas` solution. solving unpacking nested dictionaries through recursion is a standard approach – gold_cy May 04 '21 at 12:25