Le contexte

Je veux, pour chaque équipe, les lignes de la base de données qui contient les trois meilleurs joueurs.

Dans ma tête, c'est une combinaison de Dataframe.nlargest() et Dataframe.groupby() mais je ne pense pas que cela soit pris en charge. Ma solution idéale est:

  • effectuée directement sur df sans avoir à créer d'autres dataframes
  • lisible, et
  • relativement performant (la vraie forme de df est de 7M lignes et 5 cols)

Contribution

import pandas as pd
df = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"A","4":"A","5":"B","6":"B","7":"B","8":"B","9":"B","10":"C","11":"C","12":"C","13":"C","14":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Donna","4":"Elizabeth","5":"Fran","6":"Greta","7":"Heather","8":"Iris","9":"Jackie","10":"Kelly","11":"Lucy","12":"Molly","13":"Nina","14":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":8,"4":10,"5":28,"6":29,"7":18,"8":25,"9":9,"10":12,"11":23,"12":18,"13":10,"14":15}}')
| team | player    | points |
|------|-----------|--------|
| A    | Alice     | 15     |
| A    | Becky     | 11     |
| A    | Carmen    | 13     |
| A    | Donna     | 8      |
| A    | Elizabeth | 10     |
| B    | Fran      | 28     |
| B    | Greta     | 29     |
| B    | Heather   | 18     |
| B    | Iris      | 25     |
| B    | Jackie    | 9      |
| C    | Kelly     | 12     |
| C    | Lucy      | 23     |
| C    | Molly     | 18     |
| C    | Nina      | 10     |
| C    | Ophelia   | 15     |

Sortie désirée

df_output = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"B","4":"B","5":"B","6":"C","7":"C","8":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Fran","4":"Greta","5":"Iris","6":"Lucy","7":"Molly","8":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":28,"4":29,"5":25,"6":23,"7":18,"8":15}}')
df_output
| team | player  | points |
|------|---------|--------|
| A    | Alice   | 15     |
| A    | Becky   | 11     |
| A    | Carmen  | 13     |
| B    | Fran    | 28     |
| B    | Greta   | 29     |
| B    | Iris    | 25     |
| C    | Lucy    | 23     |
| C    | Molly   | 18     |
| C    | Ophelia | 15     |
1
Anders Swanson 2 juin 2020 à 20:14

4 réponses

Meilleure réponse

Vous pouvez utiliser df.groupby.rank méthode:

In [1401]: df[df.groupby('team')['points'].rank(ascending=False) <= 3]
Out[1401]: 
   team   player  points
0     A    Alice      15
1     A    Becky      11
2     A   Carmen      13
5     B     Fran      28
6     B    Greta      29
8     B     Iris      25
11    C     Lucy      23
12    C    Molly      18
14    C  Ophelia      15
3
Mayank Porwal 2 juin 2020 à 17:23

Quelque chose comme ça pourrait fonctionner -

df.loc[df.groupby(['team'])['points'].nlargest(3).reset_index().drop(['team','points'], axis=1)['level_1'].values]
   team   player  points
0     A    Alice      15
2     A   Carmen      13
1     A    Becky      11
6     B    Greta      29
5     B     Fran      28
8     B     Iris      25
11    C     Lucy      23
12    C    Molly      18
14    C  Ophelia      15
1
Sajan 2 juin 2020 à 17:30

Une autre méthode est sort_values et groupby().tail/head:

df.sort_values('points').groupby('team').tail(3)

Production:

   team   player  points
1     A    Becky      11
2     A   Carmen      13
0     A    Alice      15
14    C  Ophelia      15
12    C    Molly      18
11    C     Lucy      23
8     B     Iris      25
5     B     Fran      28
6     B    Greta      29

Ou

df.sort_values('points', ascending=False).groupby('team').head(3)

Production:

   team   player  points
6     B    Greta      29
5     B     Fran      28
8     B     Iris      25
11    C     Lucy      23
12    C    Molly      18
0     A    Alice      15
14    C  Ophelia      15
2     A   Carmen      13
1     A    Becky      11
2
Quang Hoang 2 juin 2020 à 17:33

Vous pouvez utiliser df.groupby avec df.nlargest

df.groupby('team').apply(lambda x:x.nlargest(3,'points')).reset_index(drop=True)

  team   player  points
0    A    Alice      15
1    A   Carmen      13
2    A    Becky      11
3    B    Greta      29
4    B     Fran      28
5    B     Iris      25
6    C     Lucy      23
7    C    Molly      18
8    C  Ophelia      15
2
Ch3steR 2 juin 2020 à 17:26