1

I'm fairly new to python and pandas so forgive me if this is a somewhat basic question. I am reading in some data from a csv file, I want to do a tally from column 'gender' of 'M', 'F' and NaN. The code below outputs this:

    import pandas as pd
    import numpy as np

    df = pd.read_csv("....csv")
    count = pd.value_counts(df['gender'],dropna=False)

This outputs:

    M      22
    F       3
    NaN     1

However, I don't want to just see these as a tally, I want the values to be assigned to variables. I.e. have

    male = pd.value_counts(df['gender'],'M',dropna=False)

or something similar, giving male = 22 (and the same for female and Nan), however I can't find an obvious way to do this using pandas. Any advice? Many thanks in advance!

Ben
  • 13
  • 4
  • And how do I call count? It is not part of pandas as far as I can see and saying "name 'count' is not defined" if I don't include a library? – Ben Dec 15 '18 at 21:04
  • well... you get your counts like `count = df['gender'].value_counts(dropna=False)`... then you get a series whose index is the key and the value is the count... you can then access individual values by `male = count['M']` for instance... – Jon Clements Dec 15 '18 at 21:05
  • So what you already had but pointing out you can use `[]` syntax to access the values... – Jon Clements Dec 15 '18 at 21:08
  • Ah got it! Thanks a lot for your help. – Ben Dec 15 '18 at 21:09
  • Any idea how to get this to list the NaN values in the same way? I get: File "/usr...python2.7/dist-packages/pandas/core/indexes/base.py", line 3132, in get_value raise e1 KeyError: 'NaN' error when I try to do the same for count['NaN'] or count [' ']? – Ben Dec 15 '18 at 21:16
  • access it using `np.nan`, eg: `count[np.nan]`... – Jon Clements Dec 15 '18 at 21:18
  • TypeError: cannot do label indexing on with these indexers [nan] of You seen this one before? – Ben Dec 15 '18 at 21:26
  • Umm... is `type(count)` a `pandas.core.series.Series` ? – Jon Clements Dec 15 '18 at 21:29
  • Yes (IGNORE: need 12 more characters to send) – Ben Dec 15 '18 at 21:42
  • Hmm having googled a bit more (https://stackoverflow.com/questions/26266362/how-to-count-the-nan-values-in-a-column-in-pandas-dataframe) your suggestion seems to match suggestions elsewhere, so I don't quite know – Ben Dec 15 '18 at 22:12
  • what pandas version are you using? – Jon Clements Dec 15 '18 at 22:14
  • pandas 0.23.4 so I think I'm up to date? – Ben Dec 15 '18 at 22:22
  • I also have a related question if you have time: Say instead of just M, F, nan we had potentially hundreds of options (e.g. favourite food) and I wanted to pull out the 10 most common foods: the number of times that food is chosen and the name of the food, is that possible using pandas or are their more suitable tools out there? – Ben Dec 15 '18 at 23:18
  • `value_counts` returns the series in descending order of frequency... so you can just do `count.head(10)` to get the top 10... – Jon Clements Dec 15 '18 at 23:20

1 Answers1

1

In this example we take the count of the gender series filtered by == "male"

import pandas as pd
import random
df = pd.DataFrame({'gender': [random.choice(['male', 'female']) for x in range(100)]})
count_men = df[df["gender"] == "male"].count()
count_men

And if you just want the integer you can take it as the zeroth value:

count_men = df[df["gender"] == "male"].count()[0]
Charles Landau
  • 4,187
  • 1
  • 8
  • 24