Fastest way to convert a list of dictionaries (each having multiple sub-dictionaries) into a single dataframe

Question

I currently have a list of dictionaries shown below:

temp_indices_=[{0: {12:11,11:12}}, {0: {14:13,13:14}}, {0: {16:15,15:16}}, {0: {20:19,19:20}},{0: {24: 23, 23: 24, 22: 24}, 1: {24: 22, 23: 22, 22: 23}},{0: {28: 27, 27: 28, 26: 28}, 1: {28: 26, 27: 26, 26: 27}}]

To convert the list into a dataframe, the following code is called:

  temp_indices= pd.DataFrame()
  
  for ind in range(len(temp_indices_)):
       # print(ind)
        temp_indices = pd.concat([temp_indices,pd.DataFrame(temp_indices_[ind][0].items())],axis=0)
  temp_indices = temp_indices.rename(columns={0:'ind',1:'label_ind'})

An example output from temp_indices is shown below which should concat all dictionaries into one dataframe:

   ind  label_ind
0   12  11
1   11  12
0   14  13
1   13  14
0   16  15
1   15  16
0   20  19
1   19  20
0   24  23
1   23  24
2   22  24
0   28  27
1   27  28
2   26  28
0   28  26 
1   27  26  
2   26 27

To improve speed I have tried out pd.Series(temp_indices_).explode().reset_index() as well as pd.DataFrame(map(lambda i: pd.DataFrame(i[0].items()), temp_indices_)) but can not drill down to the core dictionary to convert it to a dataframe.

Let me update it quickly - so that you can recreate the list of dictionary — Sade, May 04 '21 at 11:42

Hamza usman ghani · Accepted Answer · 2021-05-04T12:29:08.940

1

Use list comprehension for speedup:

Three loops have been used inside list comprehension. One for iterating over the list of dictionaries. Second for accessing values from dictionary. And thired for accessing key,value pair along with increasing index.
Then make dataframe from resultant list.
Since column named 'label' contains tuple of values so break it using df['label'].tolist()
Finally delete the column named 'label'

data = [(ind,list(value.items())[ind]) for i in temp_indices_ for value in i.values() for ind in range(len(value))]
df = pd.DataFrame(data, columns =["Index","label"])
df[['ind', 'label_ind']] = pd.DataFrame(df['label'].tolist(), index=df.index)
df.drop(['label'], axis=1, inplace=True)
print(df)

        Index  ind  label_ind
    0       0   12         11
    1       1   11         12
    2       0   14         13
    3       1   13         14
    4       0   16         15
    5       1   15         16
    6       0   20         19
    7       1   19         20
    8       0   24         23
    9       1   23         24
    10      2   22         24
    11      0   24         22
    12      1   23         22
    13      2   22         23
    14      0   28         27
    15      1   27         28
    16      2   26         28
    17      0   28         26
    18      1   27         26
    19      2   26         27

edited May 04 '21 at 12:29

answered May 04 '21 at 12:21

Hamza usman ghani

2,264
5
19

this code is not readable at all and making such a massive list comprehension is not Pythonic – gold_cy May 04 '21 at 12:23
not really, list comprehensions are not meant to be used in such a way, it makes it unapproachable to beginners and is generally not readable – gold_cy May 04 '21 at 12:37
List comprehensions are more readable and faster. Also the line wrote above too :) – Hamza usman ghani May 04 '21 at 12:39
sure list comprehensions are readable but not when you make it three levels deep – gold_cy May 04 '21 at 12:40
Time completed where temp_indices has 250299 samples: 2.683337099995697 secs – Sade May 04 '21 at 13:55
The most important aspect is the improvement on speed. I have another function that I was working on yesterday to improve speed https://stackoverflow.com/questions/67348247/improving-the-speed-when-calculating-permutation-on-multiple-elements-in-list-of . Currently at 2083.1114619 secs but if you can top my speed that will great :) – Sade May 04 '21 at 14:08

score 0 · Answer 2 · answered May 04 '21 at 12:11

This just sounds like a problem that can be solved through recursion with the final output being used to create a DataFrame.

def unpacker(data, parent_idx=None):
    final = []
    
    if isinstance(data, list):
        for row in data:
            for k, v in row.items():
                if isinstance(v, dict):
                    unpacked = unpacker(v, parent_idx=k)
                    for row1 in unpacked:
                        final.append(row1)
    else:
        for k1, v1 in data.items():
            final.append((parent_idx, k1, v1))
    
    return final

l = unpacker(temp_indices_)
df = pd.DataFrame(l, columns=["Index", "Ind", "Label_Ind"])
print(df)

    Index  Ind  Label_Ind
0       0   12         11
1       0   11         12
2       0   14         13
3       0   13         14
4       0   16         15
5       0   15         16
6       0   20         19
7       0   19         20
8       0   24         23
9       0   23         24
10      0   22         24
11      1   24         22
12      1   23         22
13      1   22         23
14      0   28         27
15      0   27         28
16      0   26         28
17      1   28         26
18      1   27         26
19      1   26         27

Would the for loops not have major impact on the speed. I currently have (1311612, 60) samples in my dataset? — Sade, May 04 '21 at 12:14
yes, but there is no pure `pandas` solution. solving unpacking nested dictionaries through recursion is a standard approach — gold_cy, May 04 '21 at 12:25

Fastest way to convert a list of dictionaries (each having multiple sub-dictionaries) into a single dataframe

2 Answers2