0

I want to create a column that displays a count of how many times 2 parameters in a column are met in each row. For example, in the below image, the "n" column is what I need to display. It is a number that is duplicated based on if the row meets each parameter. I want it to give a count as the "n" which displays the number of times that the unit/month are the same.

Not sure how to tackle this without using group_by... any thoughts!?

example_data

Here is a script to build out the dataframe:

lst = [['unit_1', 1], ['unit_1', 1], 
       ['unit_1', 1], ['unit_1', 1],
       ['unit_1', 1], ['unit_2', 1], 
       ['unit_2', 1], ['unit_2', 1], 
       ['unit_2', 1], ['unit_4', 2], 
       ['unit_8', 4], ['unit_5', 2]] 

df = pd.DataFrame(lst, columns =['unit', 'month'])

1 Answers1

1
  • Use groupby & join
  • The data has been changed to reflect the condition where there are multiple months for a particular unit. In this case, unit_1 has month 1 and 2.
import pandas as pd

# data
df = pd.DataFrame({'unit': ['unit_1', 'unit_1', 'unit_1', 'unit_1', 'unit_1', 'unit_2', 'unit_2', 'unit_2', 'unit_2', 'unit_4', 'unit_5', 'unit_6'],
                   'month': [1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 4]})   
   unit  month
 unit_1      1
 unit_1      1
 unit_1      1
 unit_1      2
 unit_1      2
 unit_2      1
 unit_2      1
 unit_2      1
 unit_2      1
 unit_4      2
 unit_5      2
 unit_6      4

t = pd.DataFrame(df.groupby(['unit', 'month'])['month'].count()).rename(columns={'month': 'n'})

                n
  unit  month   
unit_1      1   3
            2   2
unit_2      1   4
unit_4      2   1
unit_5      2   1
unit_6      4   1


# now join them
df.set_index(['unit', 'month']).join(t, on=['unit', 'month']).reset_index()


   unit  month  n
 unit_1      1  3
 unit_1      1  3
 unit_1      1  3
 unit_1      2  2
 unit_1      2  2
 unit_2      1  4
 unit_2      1  4
 unit_2      1  4
 unit_2      1  4
 unit_4      2  1
 unit_5      2  1
 unit_6      4  1
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158