Je veux obtenir la valeur d'une autre colonne en fonction d'une valeur dans une certaine colonne, dans la même ligne.

Exemple:

Pour business id = '123', je veux récupérer le business_name

Df:

biz_id  biz_name
123      chew
456      bite
123      chew

Code:

df['biz_name'].loc[df['biz_id'] == 123]

Me renvoie:

chew
chew

Comment obtenir une seule valeur de 'chew' au format chaîne?

1
jxn 17 janv. 2017 à 09:58

2 réponses

Meilleure réponse

Vous pouvez utiliser iloc ou iat pour sélectionner première valeur de Series:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0])
chew

Ou:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0])
chew

Avec query:

print (df.query('biz_id == 123')['biz_name'].iloc[0])
chew

Ou sélectionnez la première valeur dans list ou numpy array:

print (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0])
chew

print (df.loc[df['biz_id'] == 123, 'biz_name'].values[0])
chew

Horaires :

In [18]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0])
1000 loops, best of 3: 399 µs per loop

In [19]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0])
The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 391 µs per loop

In [20]: %timeit (df.query('biz_id == 123')['biz_name'].iloc[0])
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.75 ms per loop

In [21]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0])
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 384 µs per loop

In [22]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].values[0])
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 370 µs per loop

In [23]: %timeit (df.loc[df.biz_id.eq(123).idxmax(), 'biz_name'])
1000 loops, best of 3: 517 µs per loop
1
jezrael 17 janv. 2017 à 07:15

Utilisez idxmax pour récupérer l'index de la première valeur maximale

df.loc[df.biz_id.eq(123).idxmax(), 'biz_name']

'chew'
2
piRSquared 17 janv. 2017 à 07:05