Passed
Push — master ( c241e4...b050e9 )
by Simon
01:57
created

nsga_ii_sampler_example.main()   A

Complexity

Conditions 3

Size

Total Lines 127
Code Lines 32

Duplication

Lines 127
Ratio 100 %

Importance

Changes 0
Metric Value
eloc 32
dl 127
loc 127
rs 9.112
c 0
b 0
f 0
cc 3
nop 0

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
NSGAIISampler Example - Multi-objective Optimization with NSGA-II
3
4
NSGA-II (Non-dominated Sorting Genetic Algorithm II) is designed for
5
multi-objective optimization problems where you want to optimize multiple
6
conflicting objectives simultaneously. It finds a Pareto front of solutions.
7
8
Characteristics:
9
- Multi-objective evolutionary algorithm
10
- Finds Pareto-optimal solutions (non-dominated set)
11
- Balances multiple conflicting objectives
12
- Population-based search with selection pressure
13
- Elitist approach preserving best solutions
14
- Crowding distance for diversity preservation
15
16
Note: For demonstration, we'll create a multi-objective problem from
17
a single-objective one by optimizing both performance and model complexity.
18
"""
19
20
import numpy as np
21
from sklearn.datasets import load_digits
22
from sklearn.ensemble import RandomForestClassifier
23
from sklearn.model_selection import cross_val_score
24
25
from hyperactive.experiment.integrations import SklearnCvExperiment
26
from hyperactive.opt.optuna import NSGAIISampler
27
28
29
class MultiObjectiveExperiment:
30
    """Multi-objective experiment: maximize accuracy, minimize complexity."""
31
32
    def __init__(self, X, y):
33
        self.X = X
34
        self.y = y
35
36
    def __call__(self, **params):
37
        # Create model with parameters
38
        model = RandomForestClassifier(random_state=42, **params)
39
40
        # Objective 1: Maximize accuracy (we'll return negative for minimization)
41
        scores = cross_val_score(model, self.X, self.y, cv=3)
42
        accuracy = np.mean(scores)
43
44
        # Objective 2: Minimize model complexity (number of parameters)
45
        # For Random Forest: roughly n_estimators × max_depth × n_features
46
        complexity = (
47
            params["n_estimators"] * params.get("max_depth", 10) * self.X.shape[1]
48
        )
49
50
        # NSGA-II minimizes objectives, so we return both as minimization
51
        # Note: This is a simplified multi-objective setup for demonstration
52
        return [-accuracy, complexity / 10000]  # Scale complexity for better balance
53
54
55
def nsga_ii_theory():
56
    """Explain NSGA-II algorithm theory."""
57
    # NSGA-II Algorithm (Multi-objective Optimization):
58
    #
59
    # 1. Core Concepts:
60
    #    - Pareto Dominance: Solution A dominates B if A is better in all objectives
61
    #    - Pareto Front: Set of non-dominated solutions
62
    #    - Trade-offs: Improving one objective may worsen another
63
    #
64
    # 2. NSGA-II Process:
65
    #    - Initialize population randomly
66
    #    - For each generation:
67
    #      a) Fast non-dominated sorting (rank solutions by dominance)
68
    #      b) Crowding distance calculation (preserve diversity)
69
    #      c) Selection based on rank and crowding distance
70
    #      d) Crossover and mutation to create offspring
71
    #
72
    # 3. Selection Criteria:
73
    #    - Primary: Non-domination rank (prefer better fronts)
74
    #    - Secondary: Crowding distance (prefer diverse solutions)
75
    #    - Elitist: Best solutions always survive
76
    #
77
    # 4. Output:
78
    #    - Set of Pareto-optimal solutions
79
    #    - User chooses final solution based on preferences
80
81
82 View Code Duplication
def main():
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
83
    # === NSGAIISampler Example ===
84
    # Multi-objective Optimization with NSGA-II
85
86
    nsga_ii_theory()
87
88
    # Load dataset
89
    X, y = load_digits(return_X_y=True)
90
    print(f"Dataset: Handwritten digits ({X.shape[0]} samples, {X.shape[1]} features)")
91
92
    # Create multi-objective experiment
93
    experiment = MultiObjectiveExperiment(X, y)
94
95
    # Multi-objective Problem:
96
    #   Objective 1: Maximize classification accuracy
97
    #   Objective 2: Minimize model complexity
98
    #   → Trade-off between performance and simplicity
99
100
    # Define search space
101
    param_space = {
102
        "n_estimators": (10, 200),  # Number of trees
103
        "max_depth": (1, 20),  # Tree depth (complexity)
104
        "min_samples_split": (2, 20),  # Minimum samples to split
105
        "min_samples_leaf": (1, 10),  # Minimum samples per leaf
106
        "max_features": ["sqrt", "log2", None],  # Feature sampling
107
    }
108
109
    # Search Space:
110
    # for param, space in param_space.items():
111
    #   print(f"  {param}: {space}")
112
113
    # Configure NSGAIISampler
114
    optimizer = NSGAIISampler(
115
        param_space=param_space,
116
        n_trials=50,  # Population evolves over multiple generations
117
        random_state=42,
118
        experiment=experiment,
119
        population_size=20,  # Population size for genetic algorithm
120
        mutation_prob=0.1,  # Mutation probability
121
        crossover_prob=0.9,  # Crossover probability
122
    )
123
124
    # NSGAIISampler Configuration:
125
    # n_trials: configured above
126
    # population_size: for genetic algorithm
127
    # mutation_prob: mutation probability
128
    # crossover_prob: crossover probability
129
    # Selection: Non-dominated sorting + crowding distance
130
131
    # Note: This example demonstrates the interface.
132
    # In practice, NSGA-II returns multiple Pareto-optimal solutions.
133
    # For single-objective problems, consider TPE or GP samplers instead.
134
135
    # Run optimization
136
    # Running NSGA-II multi-objective optimization...
137
138
    try:
139
        best_params = optimizer.run()
140
141
        # Results
142
        print("\n=== Results ===")
143
        print(f"Best parameters: {best_params}")
144
        print(f"Best score: {optimizer.best_score_:.4f}")
145
        print()
146
147
        # NSGA-II typically returns multiple solutions along Pareto front:
148
        #  High accuracy, high complexity models
149
        #  Medium accuracy, medium complexity models
150
        #  Lower accuracy, low complexity models
151
        #  User selects based on preferences/constraints
152
153
    except Exception as e:
154
        print(f"Multi-objective optimization example: {e}")
155
        print("Note: This demonstrates the interface for multi-objective problems.")
156
        return None, None
157
158
    # NSGA-II Evolution Process:
159
    #
160
    # Generation 1: Random initialization
161
    #  Diverse population across parameter space
162
    #  Wide range of accuracy/complexity trade-offs
163
164
    # Generations 2-N: Evolutionary improvement
165
    #  Non-dominated sorting identifies best fronts
166
    #  Crowding distance maintains solution diversity
167
    #  Crossover combines good solutions
168
    #  Mutation explores new parameter regions
169
170
    # Final Population: Pareto front approximation
171
    #  Multiple non-dominated solutions
172
    #  Represents optimal trade-offs
173
    #  User chooses based on domain requirements
174
175
    # Key Advantages:
176
    #  Handles multiple conflicting objectives naturally
177
    #  Finds diverse set of optimal trade-offs
178
    #  No need to specify objective weights a priori
179
    #  Provides insight into objective relationships
180
    #  Robust to objective scaling differences
181
182
    # Best Use Cases:
183
    #  True multi-objective problems (accuracy vs speed, cost vs quality)
184
    #  When trade-offs between objectives are important
185
    #  Robustness analysis with multiple criteria
186
    #  When single objective formulation is unclear
187
188
    # Limitations:
189
    #  More complex than single-objective methods
190
    #  Requires more evaluations (population-based)
191
    #  May be overkill for single-objective problems
192
    #  Final solution selection still required
193
194
    # When to Use NSGA-II vs Single-objective Methods:
195
    # Use NSGA-II when:
196
    #    Multiple objectives genuinely conflict
197
    #    Trade-off analysis is valuable
198
    #    Objective weights are unknown
199
    #
200
    # Use TPE/GP when:
201
    #    Single clear objective
202
    #    Computational budget is limited
203
    #    Faster convergence needed
204
205
    if "best_params" in locals():
206
        return best_params, optimizer.best_score_
207
    else:
208
        return None, None
209
210
211
if __name__ == "__main__":
212
    best_params, best_score = main()
213