1

I'm exploring a database built like this

excerpt from the database I'm working on

So it's basically a collection of Youtube comments, that I have started to analyse: I've managed to ad column counting the number of words by comment, as well as another one for ngrams (which I intend to explore later). I've managed to get a list of the 10 most frequent words for the whole period, but I've been unable to get the word frequency by months: for each month, I would like to get a list of the 10 most frequent words.

Thanks for your help!

1 Answers1

3

I hope you can try this,

import pandas as pd from collections import Counter

Option-1:

df=df.set_index(df['at'])
for u,v in df.groupby(pd.Grouper(freq="M")):
    words=sum(v['text'].str.split(' ').values.tolist(),[])
    c = Counter(words)
    print c.most_common(10)

Option-2:

df=df.set_index(df['at'])
for u,v in df.groupby(pd.Grouper(freq="M")):
    words=sum(v['text'].str.split(' ').values.tolist(),[])
    top_words=pd.Series(words).value_counts()[:10]
    print top_words.index.tolist()
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • 1
    This is great! Both are working very well! Thanks a lot! Just so I can understand it better, what do u and v stand for in your loop? I don't really understand how it works... – Pauline Ziserman Nov 03 '18 at 18:54
  • 1
    @PaulineZiserman - `pd.Grouper(freq="M")` it group your dataframe by month wise, i.e., each iteration contains each month data. V contains filtered dataframe, U contains name of the group. For more details visit, https://stackoverflow.com/questions/27405483/how-to-loop-over-grouped-pandas-dataframe – Mohamed Thasin ah Nov 03 '18 at 18:59