multiprocessing_example   A
last analyzed

Complexity

Total Complexity 3

Size/Duplication

Total Lines 114
Duplicated Lines 32.46 %

Importance

Changes 0
Metric Value
wmc 3
eloc 74
dl 37
loc 114
rs 10
c 0
b 0
f 0

3 Functions

Rating   Name   Duplication   Size   Complexity  
A model_gbc() 13 13 1
A model_rfc() 12 12 1
A model_etc() 12 12 1

How to fix   Duplicated Code   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

1
"""
2
Hyperactive can perform optimizations of multiple different objective functions
3
in parallel. This can be done via multiprocessing, joblib or a custom wrapper-function.
4
The processes won't communicate with each other.
5
6
You can add as many searches to the optimization run (.add_search(...)) and
7
run each of those searches n-times (n_jobs).
8
9
In the example below we are performing 4 searches in parallel:
10
    - model_etc one time
11
    - model_rfc one time
12
    - model_gbc two times
13
14
"""
15
import numpy as np
16
from sklearn.model_selection import cross_val_score
17
from sklearn.ensemble import GradientBoostingClassifier
18
from sklearn.ensemble import RandomForestClassifier
19
from sklearn.ensemble import ExtraTreesClassifier
20
from xgboost import XGBClassifier
21
from sklearn.datasets import load_breast_cancer
22
from hyperactive import Hyperactive
23
24
data = load_breast_cancer()
25
X, y = data.data, data.target
26
27
28 View Code Duplication
def model_etc(opt):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
29
    etc = ExtraTreesClassifier(
30
        n_estimators=opt["n_estimators"],
31
        criterion=opt["criterion"],
32
        max_features=opt["max_features"],
33
        min_samples_split=opt["min_samples_split"],
34
        min_samples_leaf=opt["min_samples_leaf"],
35
        bootstrap=opt["bootstrap"],
36
    )
37
    scores = cross_val_score(etc, X, y, cv=3)
38
39
    return scores.mean()
40
41
42 View Code Duplication
def model_rfc(opt):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
43
    rfc = RandomForestClassifier(
44
        n_estimators=opt["n_estimators"],
45
        criterion=opt["criterion"],
46
        max_features=opt["max_features"],
47
        min_samples_split=opt["min_samples_split"],
48
        min_samples_leaf=opt["min_samples_leaf"],
49
        bootstrap=opt["bootstrap"],
50
    )
51
    scores = cross_val_score(rfc, X, y, cv=3)
52
53
    return scores.mean()
54
55
56 View Code Duplication
def model_gbc(opt):
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
57
    gbc = GradientBoostingClassifier(
58
        n_estimators=opt["n_estimators"],
59
        learning_rate=opt["learning_rate"],
60
        max_depth=opt["max_depth"],
61
        min_samples_split=opt["min_samples_split"],
62
        min_samples_leaf=opt["min_samples_leaf"],
63
        subsample=opt["subsample"],
64
        max_features=opt["max_features"],
65
    )
66
    scores = cross_val_score(gbc, X, y, cv=3)
67
68
    return scores.mean()
69
70
71
search_space_etc = {
72
    "n_estimators": list(range(10, 200, 10)),
73
    "criterion": ["gini", "entropy"],
74
    "max_features": list(np.arange(0.05, 1.01, 0.05)),
75
    "min_samples_split": list(range(2, 21)),
76
    "min_samples_leaf": list(range(1, 21)),
77
    "bootstrap": [True, False],
78
}
79
80
81
search_space_rfc = {
82
    "n_estimators": list(range(10, 200, 10)),
83
    "criterion": ["gini", "entropy"],
84
    "max_features": list(np.arange(0.05, 1.01, 0.05)),
85
    "min_samples_split": list(range(2, 21)),
86
    "min_samples_leaf": list(range(1, 21)),
87
    "bootstrap": [True, False],
88
}
89
90
91
search_space_gbc = {
92
    "n_estimators": list(range(10, 200, 10)),
93
    "learning_rate": [1e-3, 1e-2, 1e-1, 0.5, 1.0],
94
    "max_depth": list(range(1, 11)),
95
    "min_samples_split": list(range(2, 21)),
96
    "min_samples_leaf": list(range(1, 21)),
97
    "subsample": list(np.arange(0.05, 1.01, 0.05)),
98
    "max_features": list(np.arange(0.05, 1.01, 0.05)),
99
}
100
101
102
hyper = Hyperactive()
103
hyper.add_search(model_etc, search_space_etc, n_iter=50)
104
hyper.add_search(model_rfc, search_space_rfc, n_iter=50)
105
hyper.add_search(model_gbc, search_space_gbc, n_iter=50, n_jobs=2)
106
hyper.run(max_time=5)
107
108
search_data_etc = hyper.search_data(model_etc)
109
search_data_rfc = hyper.search_data(model_rfc)
110
search_data_gbc = hyper.search_data(model_gbc)
111
112
print("\n ExtraTreesClassifier search data \n", search_data_etc)
113
print("\n GradientBoostingClassifier search data \n", search_data_gbc)
114