Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

My question is simple, I have a dataframe and I groupby the results based on a column and get the size like this:

df.groupby('column').size()

Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:

df.groupby('column').size() > X

and it prints out some True and False values.

The grouped result is a regular DataFrame, so just filter the results as usual:

 import pandas as pd
 df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})
 after = df.groupby('a').size()
 >> after
 a    3
 b    2
 c    1
 d    1
 dtype: int64
 >> after[after > 2]
 a    3
 dtype: int64
        

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.