Project log

From imdb
Jump to: navigation, search


Crashed server 13 oct 2015 (est040)

I ran the following script and crashed the server:

...
my_data = load_files('/data_bck/private/est040/NAK_top11/', encoding='utf-8', decode_error='ignore')  # noqa

text_clf = Pipeline([('vect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('clf', MultinomialNB()),
                     ])
text_clf = text_clf.fit(my_data.data, my_data.target)

parameters = {'vect__ngram_range': [(1, 1), (1, 2)],
              'tfidf__use_idf': (True, False),
              'clf__alpha': (1e-2, 1e-3),
              }

# this is a gridsearch for text_clf, a SGDClassifier
gs_clf = GridSearchCV(text_clf, parameters, n_jobs=-1) # this might be the problem, -1 means use all cores
# use many cores if we has them
gs_clf = gs_clf.fit(my_data.data, my_data.target)

best_parameters, score, _ = max(gs_clf.grid_scores_, key=lambda x: x[1])
for param_name in sorted(parameters.keys()):
    print("%s: %r" % (param_name, best_parameters[param_name]))

print(score)

This lead to an unresponsive server the required a reboot.


We need some way to make sure we can stop a berserk process in the future..