Pandas Groupby and Sum Only One Column

So I have a dataframe, df1, that looks like the following:

 A B C
1 foo 12 California
2 foo 22 California
3 bar 8 Rhode Island
4 bar 32 Rhode Island
5 baz 15 Ohio
6 baz 26 Ohio

I want to group by column A and then sum column B while keeping the value in column C. Something like this:

 A B C
1 foo 34 California
2 bar 40 Rhode Island
3 baz 41 Ohio

The issue is, when I say

df.groupby('A').sum()

column C gets removed, returning

 B
A
bar 40
baz 41
foo 34

How can I get around this and keep column C when I group and sum?

3 Answers

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

If you don't care what's in your column C and just want the nth value, you could just do this:

df.groupby('A').agg({'B' : 'sum', 'C' : lambda x: x.iloc[n]})

Another option is to use groupby.agg and use the first method on column "C".

out = df.groupby('A', as_index=False, sort=False).agg({'B':'sum', 'C':'first'})

Output:

 A B C
0 foo 34 California
1 bar 40 Rhode Island
2 baz 41 Ohio

Velvet Star Monitor

Pandas Groupby and Sum Only One Column

3 Answers

Your Answer

Sign up or log in

Post as a guest

Similar Journal

Persona 3 Portable - 10/21 atm, reached tartarus – What do I do?

Ability timers increasing when overused

How do I complete the "Everyone's A Critic" mission?

Which versions of Final Fantasy VI include multiplayer battle support?