J'utilise RandomizedSearchCV et KNeighborsClassifier pour essayer de prédicuter un défaut de prêt.

L'utilisation de RandomizedSearchCV semble excellente en théorie, mais quand je le teste, elle trouve que le meilleur best_esimator_ est celui qui prédit toutes les mêmes étiquettes.

(les données sont divisées 75% PAYÉ 25% par défaut), donc j'obtiens une précision de 75%, mais il ne fait que prédire tout PAYÉ.

n_neighbors = [int(x) for x in np.linspace(start = 1, stop = len(X_train)/3, num = 5)]
weights = ['uniform', 'distance']
algorithm  = ["auto","ball_tree","kd_tree","brute"]
leaf_size  = [int(x) for x in np.linspace(10, 100, num = 5)]
p  = [1,2]       

random_grid = {'n_neighbors': n_neighbors,
               'weights': weights,
               'algorithm': algorithm,
               'leaf_size': leaf_size,
               'p': p}

knn_clf = KNeighborsClassifier()
knn_random = RandomizedSearchCV(estimator = knn_clf, param_distributions = random_grid, n_iter = 25, cv = 3, verbose=1,)
knn_random.fit(X_train, y_train)

Puis-je faire quelque chose pour lutter contre cela? Existe-t-il un paramètre que je peux transmettre pour empêcher que cela se produise? Ou puis-je faire quelque chose dans mes données?

y_test:

38        PAIDOFF
189       PAIDOFF
140       PAIDOFF
286    COLLECTION
142       PAIDOFF
101       PAIDOFF
187       PAIDOFF
139       PAIDOFF
149       PAIDOFF
11        PAIDOFF
269    COLLECTION
231       PAIDOFF
258       PAIDOFF
84        PAIDOFF
242       PAIDOFF
344    COLLECTION
104       PAIDOFF
214       PAIDOFF
109       PAIDOFF
76        PAIDOFF
41        PAIDOFF
262    COLLECTION
125       PAIDOFF
107       PAIDOFF
27        PAIDOFF
14        PAIDOFF
92        PAIDOFF
194       PAIDOFF
113       PAIDOFF
333    COLLECTION
          ...    
320    COLLECTION
15        PAIDOFF
72        PAIDOFF
122       PAIDOFF
243       PAIDOFF
184       PAIDOFF
294    COLLECTION
280    COLLECTION
218       PAIDOFF
197       PAIDOFF
133       PAIDOFF
143       PAIDOFF
179       PAIDOFF
249       PAIDOFF
80        PAIDOFF
331    COLLECTION
137       PAIDOFF
103       PAIDOFF
120       PAIDOFF
248       PAIDOFF
5         PAIDOFF
236       PAIDOFF
219       PAIDOFF
322    COLLECTION
283    COLLECTION
135       PAIDOFF
124       PAIDOFF
293    COLLECTION
166       PAIDOFF
85        PAIDOFF

prédiction:

array(['PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF', 'PAIDOFF',
       'PAIDOFF', 'PAIDOFF'], dtype=object)
0
Lewis Morris 3 nov. 2019 à 19:17