Project log

From imdb
Revision as of 12:47, 14 October 2015 by Est040 (talk | contribs) (Crashed server 13 oct 2015 (est040))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Crashed server 13 oct 2015 (est040)

I ran the following script and crashed the server:

my_data = load_files('/data_bck/private/est040/NAK_top11/', encoding='utf-8', decode_error='ignore')  # noqa

text_clf = Pipeline([('vect', CountVectorizer()),
                     ('tfidf', TfidfTransformer()),
                     ('clf', MultinomialNB()),
text_clf =,

parameters = {'vect__ngram_range': [(1, 1), (1, 2)],
              'tfidf__use_idf': (True, False),
              'clf__alpha': (1e-2, 1e-3),

# this is a gridsearch for text_clf, a SGDClassifier
gs_clf = GridSearchCV(text_clf, parameters, n_jobs=-1) # this might be the problem, -1 means use all cores
# use many cores if we has them
gs_clf =,

best_parameters, score, _ = max(gs_clf.grid_scores_, key=lambda x: x[1])
for param_name in sorted(parameters.keys()):
    print("%s: %r" % (param_name, best_parameters[param_name]))


This lead to an unresponsive server the required a reboot.

We need some way to make sure we can stop a berserk process in the future..