Conditions | 1 |
Total Lines | 107 |
Code Lines | 26 |
Lines | 0 |
Ratio | 0 % |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
1 | """ |
||
52 | def main(): |
||
53 | # === GPSampler Example === |
||
54 | # Gaussian Process Bayesian Optimization |
||
55 | |||
56 | gaussian_process_theory() |
||
57 | |||
58 | # Load dataset - classification problem |
||
59 | X, y = load_breast_cancer(return_X_y=True) |
||
60 | print( |
||
61 | f"Dataset: Breast cancer classification ({X.shape[0]} samples, {X.shape[1]} features)" |
||
62 | ) |
||
63 | |||
64 | # Create experiment |
||
65 | estimator = SVC(random_state=42) |
||
66 | experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5) |
||
67 | |||
68 | # Define search space - mixed parameter types |
||
69 | param_space = { |
||
70 | "C": (0.01, 100), # Continuous - regularization |
||
71 | "gamma": (1e-6, 1e2), # Continuous - RBF parameter |
||
72 | "kernel": ["rbf", "poly", "sigmoid"], # Categorical |
||
73 | "degree": (2, 5), # Integer - polynomial degree |
||
74 | "coef0": (0.0, 1.0), # Continuous - kernel coefficient |
||
75 | } |
||
76 | |||
77 | # Search Space (Mixed parameter types): |
||
78 | # for param, space in param_space.items(): |
||
79 | # print(f" {param}: {space}") |
||
80 | |||
81 | # Configure GPSampler |
||
82 | optimizer = GPSampler( |
||
83 | param_space=param_space, |
||
84 | n_trials=25, # Fewer trials - GP is sample efficient |
||
85 | random_state=42, |
||
86 | experiment=experiment, |
||
87 | n_startup_trials=8, # Random initialization before GP modeling |
||
88 | deterministic_objective=False, # Set True if objective is noise-free |
||
89 | ) |
||
90 | |||
91 | # GPSampler Configuration: |
||
92 | # n_trials: configured above |
||
93 | # n_startup_trials: random initialization |
||
94 | # deterministic_objective: configures noise handling |
||
95 | # Acquisition function: Expected Improvement (default) |
||
96 | |||
97 | # Run optimization |
||
98 | # Running GP-based optimization... |
||
99 | best_params = optimizer.run() |
||
100 | |||
101 | # Results |
||
102 | print("\n=== Results ===") |
||
103 | print(f"Best parameters: {best_params}") |
||
104 | print(f"Best score: {optimizer.best_score_:.4f}") |
||
105 | print() |
||
106 | |||
107 | # GP Optimization Phases: |
||
108 | # |
||
109 | # Phase 1 (Trials 1-8): Random Exploration |
||
110 | # Random sampling for initial GP training data |
||
111 | # Builds diverse set of observations |
||
112 | # No model assumptions yet |
||
113 | |||
114 | # Phase 2 (Trials 9-25): GP-guided Search |
||
115 | # GP model learns from observed data |
||
116 | # Acquisition function balances: |
||
117 | # - Exploitation: areas with high predicted performance |
||
118 | # - Exploration: areas with high uncertainty |
||
119 | # Sequential decision making with uncertainty |
||
120 | |||
121 | # GP Model Characteristics: |
||
122 | # Handles mixed parameter types (continuous, discrete, categorical) |
||
123 | # Provides uncertainty estimates for all predictions |
||
124 | # Automatically balances exploration vs exploitation |
||
125 | # Sample efficient - good for expensive evaluations |
||
126 | # Can incorporate prior knowledge through mean/kernel functions |
||
127 | |||
128 | # Acquisition Function Behavior: |
||
129 | # High mean + low variance → exploitation |
||
130 | # Low mean + high variance → exploration |
||
131 | # Balanced trade-off prevents premature convergence |
||
132 | # Adapts exploration strategy based on observed data |
||
133 | |||
134 | # Best Use Cases: |
||
135 | # Expensive objective function evaluations |
||
136 | # Small to medium parameter spaces (< 20 dimensions) |
||
137 | # When uncertainty quantification is valuable |
||
138 | # Mixed parameter types (continuous + categorical) |
||
139 | # Noisy objective functions (with appropriate kernel) |
||
140 | |||
141 | # Limitations: |
||
142 | # Computational cost grows with number of observations |
||
143 | # Hyperparameter tuning for GP kernel |
||
144 | # May struggle in very high dimensions |
||
145 | # Assumes some smoothness in objective function |
||
146 | |||
147 | # Comparison with TPESampler: |
||
148 | # GPSampler advantages: |
||
149 | # + Principled uncertainty quantification |
||
150 | # + Better for expensive evaluations |
||
151 | # + Can handle constraints naturally |
||
152 | # |
||
153 | # TPESampler advantages: |
||
154 | # + Faster computation |
||
155 | # + Better scalability to high dimensions |
||
156 | # + More robust hyperparameter defaults |
||
157 | |||
158 | return best_params, optimizer.best_score_ |
||
159 | |||
163 |