I am making a multi-plot using a Pandas dataframe and matplotlib. However, when I alter the dataframe and remove one of the items I get an error:
ValueError: cannot reindex from a duplicate axis
My initial code is the following and it plots great, but I have an extra group (plot) that I don't need:
branchGroups = allData['BranchGroupings'].unique()
fig2 = plt.figure(figsize = (15,15))
for i,branchGroups in enumerate(branchGroups):
ax = plt.subplot(3,3,i+1)
idx = allData['BranchGroupings'] == branchGroups
kmf.fit(T[idx], C[idx], label=branchGroups)
kmf.plot(ax=ax, legend=False)
plt.title(branchGroups)
plt.xlabel('Timeline in Months')
plt.xlim(0,150)
fig2.tight_layout()
fig2.suptitle('Cumulative Hazard Function of Employee Groups', size = 16)
fig2.subplots_adjust(top=0.88, hspace = .4)
plt.show()
In branchGroups there are 7 items when I print them out:
['BranchMgr', 'Banker', 'Service', 'MDOandRSM', 'SBRMandBBRM','FC', 'DE']
The code above makes all seven plots nicely, but I don't need the 'DE' grouping (one plot for each of the groups).
So, I did a drop of the DE by performing the following:
#remove the DE from the data set
noDE = allData[allData.BranchGroupings != 'DE']
This drops the 'DE' from the categories and reduces the number of rows. I do a head(), and it looks great; a new data frame.
Then, modifying the plot to give the 6 groups and plot the reduced data frame noDE, I used the same code with some name changes like fig3 rather than fig2 and changed idx to idxx to prevent overwriting, otherwise it's the same except the new data frame reference noDE:
Groups = noDE['BranchGroupings'].unique() #new data frame noDE
fig3 = plt.figure(figsize = (15,15))
for i,Groups in enumerate(Groups):
ax = plt.subplot(3,2,i+1)
idxx = noDE['BranchGroupings'] == Groups #new idxx rather than idx
kmf.fit(T[idxx], C[idxx], label=Groups)
kmf.plot(ax=ax, legend=False)
plt.title(Groups)
plt.xlabel('Timeline in Months')
plt.xlim(0,150)
if i ==0:
plt.ylabel('Frac Employed After $n$ Months')
if i ==3:
plt.ylabel('Frac Employed After $n$ Months')
fig3.tight_layout()
fig3.suptitle('Survivability of Branch Employees', size = 16)
fig3.subplots_adjust(top=0.88, hspace = .4)
plt.show()
Except, I get the error mentioned above
cannot reindex from a duplicate axis
and the traceback shows that it is associated with the line below:
kmf.fit(T[idxx], C[idxx], label=Groups)
Most likely due to the re-assignment line above it:
idxx = noDE['BranchGroupings'] == Groups
Do I need to reset/drop or do something to the new data frame noDE to reset this?
Update - this has been solved; I am not sure how 'pythonic' it is, but it works:
Okay, after more research on this, it seems that when slicing a dataframe, there is an inheritance issue. I found out from another post here.
Initially, performing the following:
noDE.index.is_unique
returns False
To make the clean slice the following steps are needed:
#create the slice using the .copy
noDE = allData[['ProdCat', 'Duration', 'Observed', 'BranchGroupings']].copy()
#remove the DE from the data set
noDE = noDE.loc[noDE['BranchGroupings'] != 'DE'] #use .loc for cleaner slice
#reset the index so that it is unique
noDE['index'] = np.arange(len(noDE))
noDE = noDE.set_index('index')
Now performing the noDE.index.is_unique
returns True
and the error is gone.