Conditions | 2 |
Total Lines | 116 |
Code Lines | 26 |
Lines | 0 |
Ratio | 0 % |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
1 | """ |
||
70 | def main(): |
||
71 | # === GridSampler Example === |
||
72 | # Exhaustive Grid Search |
||
73 | |||
74 | grid_search_theory() |
||
75 | demonstrate_curse_of_dimensionality() |
||
76 | |||
77 | # Load dataset - simple classification |
||
78 | X, y = load_iris(return_X_y=True) |
||
79 | print(f"Dataset: Iris classification ({X.shape[0]} samples, {X.shape[1]} features)") |
||
80 | |||
81 | # Create experiment |
||
82 | estimator = KNeighborsClassifier() |
||
83 | experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5) |
||
84 | |||
85 | # Define search space - DISCRETE values only for grid search |
||
86 | param_space = { |
||
87 | "n_neighbors": [1, 3, 5, 7, 11, 15, 21], # 7 values |
||
88 | "weights": ["uniform", "distance"], # 2 values |
||
89 | "metric": ["euclidean", "manhattan", "minkowski"], # 3 values |
||
90 | "p": [1, 2], # Only relevant for minkowski metric # 2 values |
||
91 | } |
||
92 | |||
93 | # Total combinations: 7 × 2 × 3 × 2 = 84 combinations |
||
94 | total_combinations = 1 |
||
95 | for param, values in param_space.items(): |
||
96 | total_combinations *= len(values) |
||
97 | |||
98 | # Search Space (Discrete grids only): |
||
99 | # for param, values in param_space.items(): |
||
100 | # print(f" {param}: {values} ({len(values)} values)") |
||
101 | # Total combinations: calculated above |
||
102 | |||
103 | # Configure GridSampler |
||
104 | optimizer = GridSampler( |
||
105 | param_space=param_space, |
||
106 | n_trials=total_combinations, # Will evaluate all combinations |
||
107 | random_state=42, # For deterministic ordering |
||
108 | experiment=experiment, |
||
109 | ) |
||
110 | |||
111 | # GridSampler Configuration: |
||
112 | # n_trials: matches total combinations |
||
113 | # search_space: automatically derived from param_space |
||
114 | # Systematic evaluation of every combination |
||
115 | |||
116 | # Run optimization |
||
117 | # Running exhaustive grid search... |
||
118 | best_params = optimizer.run() |
||
119 | |||
120 | # Results |
||
121 | print("\n=== Results ===") |
||
122 | print(f"Best parameters: {best_params}") |
||
123 | print(f"Best score: {optimizer.best_score_:.4f}") |
||
124 | print() |
||
125 | |||
126 | # Grid Search Characteristics: |
||
127 | # |
||
128 | # Exhaustive Coverage: |
||
129 | # - Evaluated all parameter combinations |
||
130 | # - Guaranteed to find best configuration within grid |
||
131 | # - No risk of missing good regions |
||
132 | |||
133 | # Reproducibility: |
||
134 | # - Same grid → same results every time |
||
135 | # - Deterministic evaluation order |
||
136 | # - No randomness or hyperparameters |
||
137 | |||
138 | # Interpretability: |
||
139 | # - Easy to understand methodology |
||
140 | # - Clear relationship between grid density and accuracy |
||
141 | # - Results easily visualized and analyzed |
||
142 | |||
143 | # Grid Design Considerations: |
||
144 | # |
||
145 | # Parameter Value Selection: |
||
146 | # Include reasonable ranges for each parameter |
||
147 | # Use domain knowledge to choose meaningful values |
||
148 | # Consider logarithmic spacing for scale-sensitive parameters |
||
149 | # Start coarse, then refine around promising regions |
||
150 | # |
||
151 | # Computational Budget: |
||
152 | # Balance grid density with available compute |
||
153 | # Consider parallel evaluation to speed up |
||
154 | # Use coarse grids for initial exploration |
||
155 | # |
||
156 | # Best Use Cases: |
||
157 | # Small parameter spaces (< 6 parameters) |
||
158 | # Discrete/categorical parameters |
||
159 | # When exhaustive evaluation is feasible |
||
160 | # Baseline comparison for other methods |
||
161 | # When interpretability is crucial |
||
162 | # Parallel computing environments |
||
163 | # |
||
164 | # Limitations: |
||
165 | # Exponential scaling with parameter count |
||
166 | # May miss optimal values between grid points |
||
167 | # Inefficient for continuous parameters |
||
168 | # No adaptive learning or focusing |
||
169 | # Can waste evaluations in clearly bad regions |
||
170 | # |
||
171 | # Grid Search vs Other Methods: |
||
172 | # |
||
173 | # vs Random Search: |
||
174 | # + Systematic coverage guarantee |
||
175 | # + Reproducible results |
||
176 | # - Exponential scaling |
||
177 | # - Less efficient in high dimensions |
||
178 | # |
||
179 | # vs Bayesian Optimization: |
||
180 | # + No assumptions about objective function |
||
181 | # + Guaranteed to find grid optimum |
||
182 | # - Much less sample efficient |
||
183 | # - No learning from previous evaluations |
||
184 | |||
185 | return best_params, optimizer.best_score_ |
||
186 | |||
190 |