grid_sampler_example - Code Metrics - Inspection of "[ENH] `optuna` optimizer interface (#155)" - SimonBlanke/Hyperactive - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Push — master ( c241e4...b050e9 )

by Simon

created 2025-08-16 16:30 UTC

grid_sampler_example A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	190
Duplicated Lines	0 %

Importance

Changes

Metric	Value
wmc	4
eloc	38
dl	0
loc	190
rs	10
c	0
b	0
f	0

3 Functions

Rating	Name	Size	Complexity
A	main()	116	2
A	grid_search_theory()	2	1
A	demonstrate_curse_of_dimensionality()	2	1

"""
GridSampler Example - Exhaustive Grid Search

The GridSampler performs exhaustive search over a discretized parameter grid.
It systematically evaluates every combination of specified parameter values,
ensuring complete coverage but potentially requiring many evaluations.

Characteristics:
- Exhaustive search over predefined parameter grids
- Systematic and reproducible exploration
- Guarantees finding the best combination within the grid
- No learning or adaptation
- Best for small, discrete parameter spaces
- Interpretable and deterministic results
"""

import numpy as np
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

from hyperactive.experiment.integrations import SklearnCvExperiment
from hyperactive.opt.optuna import GridSampler


def grid_search_theory():
    """Explain grid search methodology."""
    # Grid Search Methodology:
    #
    # 1. Parameter Discretization:
    #    - Each continuous parameter divided into discrete levels
    #    - Categorical parameters use all specified values
    #    - Creates n₁ × n₂ × ... × nₖ total combinations
    #
    # 2. Systematic Evaluation:
    #    - Every combination evaluated exactly once
    #    - No randomness or learning involved
    #    - Order of evaluation is deterministic
    #
    # 3. Optimality Guarantees:
    #    - Finds global optimum within the discrete grid
    #    - Quality depends on grid resolution
    #    - May miss optimal values between grid points
    #
    # 4. Computational Complexity:
    #    - Exponential growth with number of parameters
    #    - Curse of dimensionality for many parameters
    #    - Embarrassingly parallel


def demonstrate_curse_of_dimensionality():
    """Show how grid search scales with dimensions."""
    # Grid Search Scaling (Curse of Dimensionality):
    #
    # scenarios = [
    #     (2, 5, "2 parameters × 5 values each"),
    #     (3, 5, "3 parameters × 5 values each"),
    #     (4, 5, "4 parameters × 5 values each"),
    #     (5, 10, "5 parameters × 10 values each"),
    #     (10, 3, "10 parameters × 3 values each"),
    # ]
    #
    # for n_params, n_values, description in scenarios:
    #     total_combinations = n_values ** n_params
    #     print(f"  {description}: {total_combinations:,} combinations")
    #
    # → Grid search works best with small parameter spaces!


def main():
    # === GridSampler Example ===
    # Exhaustive Grid Search

    grid_search_theory()
    demonstrate_curse_of_dimensionality()

    # Load dataset - simple classification
    X, y = load_iris(return_X_y=True)
    print(f"Dataset: Iris classification ({X.shape[0]} samples, {X.shape[1]} features)")

    # Create experiment
    estimator = KNeighborsClassifier()
    experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)

    # Define search space - DISCRETE values only for grid search
    param_space = {
        "n_neighbors": [1, 3, 5, 7, 11, 15, 21],  # 7 values
        "weights": ["uniform", "distance"],  # 2 values
        "metric": ["euclidean", "manhattan", "minkowski"],  # 3 values
        "p": [1, 2],  # Only relevant for minkowski metric     # 2 values
    }

    # Total combinations: 7 × 2 × 3 × 2 = 84 combinations
    total_combinations = 1
    for param, values in param_space.items():
        total_combinations *= len(values)

    # Search Space (Discrete grids only):
    # for param, values in param_space.items():
    #   print(f"  {param}: {values} ({len(values)} values)")
    # Total combinations: calculated above

    # Configure GridSampler
    optimizer = GridSampler(
        param_space=param_space,
        n_trials=total_combinations,  # Will evaluate all combinations
        random_state=42,  # For deterministic ordering
        experiment=experiment,
    )

    # GridSampler Configuration:
    # n_trials: matches total combinations
    # search_space: automatically derived from param_space
    # Systematic evaluation of every combination

    # Run optimization
    # Running exhaustive grid search...
    best_params = optimizer.run()

    # Results
    print("\n=== Results ===")
    print(f"Best parameters: {best_params}")
    print(f"Best score: {optimizer.best_score_:.4f}")
    print()

    # Grid Search Characteristics:
    #
    #  Exhaustive Coverage:
    #   - Evaluated all parameter combinations
    #   - Guaranteed to find best configuration within grid
    #   - No risk of missing good regions

    #  Reproducibility:
    #   - Same grid → same results every time
    #   - Deterministic evaluation order
    #   - No randomness or hyperparameters

    #  Interpretability:
    #   - Easy to understand methodology
    #   - Clear relationship between grid density and accuracy
    #   - Results easily visualized and analyzed

    # Grid Design Considerations:
    #
    # Parameter Value Selection:
    #  Include reasonable ranges for each parameter
    #  Use domain knowledge to choose meaningful values
    #  Consider logarithmic spacing for scale-sensitive parameters
    #  Start coarse, then refine around promising regions
    #
    # Computational Budget:
    #  Balance grid density with available compute
    #  Consider parallel evaluation to speed up
    #  Use coarse grids for initial exploration
    #
    # Best Use Cases:
    #  Small parameter spaces (< 6 parameters)
    #  Discrete/categorical parameters
    #  When exhaustive evaluation is feasible
    #  Baseline comparison for other methods
    #  When interpretability is crucial
    #  Parallel computing environments
    #
    # Limitations:
    #  Exponential scaling with parameter count
    #  May miss optimal values between grid points
    #  Inefficient for continuous parameters
    #  No adaptive learning or focusing
    #  Can waste evaluations in clearly bad regions
    #
    # Grid Search vs Other Methods:
    #
    # vs Random Search:
    #   + Systematic coverage guarantee
    #   + Reproducible results
    #   - Exponential scaling
    #   - Less efficient in high dimensions
    #
    # vs Bayesian Optimization:
    #   + No assumptions about objective function
    #   + Guaranteed to find grid optimum
    #   - Much less sample efficient
    #   - No learning from previous evaluations

    return best_params, optimizer.best_score_


if __name__ == "__main__":
    best_params, best_score = main()


1			"""
2			GridSampler Example - Exhaustive Grid Search
3
4			The GridSampler performs exhaustive search over a discretized parameter grid.
5			It systematically evaluates every combination of specified parameter values,
6			ensuring complete coverage but potentially requiring many evaluations.
7
8			Characteristics:
9			- Exhaustive search over predefined parameter grids
10			- Systematic and reproducible exploration
11			- Guarantees finding the best combination within the grid
12			- No learning or adaptation
13			- Best for small, discrete parameter spaces
14			- Interpretable and deterministic results
15			"""
16
17			import numpy as np
18			from sklearn.datasets import load_iris
19			from sklearn.neighbors import KNeighborsClassifier
20			from sklearn.model_selection import cross_val_score
21
22			from hyperactive.experiment.integrations import SklearnCvExperiment
23			from hyperactive.opt.optuna import GridSampler
24
25
26			def grid_search_theory():
27			"""Explain grid search methodology."""
28			# Grid Search Methodology:
29			#
30			# 1. Parameter Discretization:
31			# - Each continuous parameter divided into discrete levels
32			# - Categorical parameters use all specified values
33			# - Creates n₁ × n₂ × ... × nₖ total combinations
34			#
35			# 2. Systematic Evaluation:
36			# - Every combination evaluated exactly once
37			# - No randomness or learning involved
38			# - Order of evaluation is deterministic
39			#
40			# 3. Optimality Guarantees:
41			# - Finds global optimum within the discrete grid
42			# - Quality depends on grid resolution
43			# - May miss optimal values between grid points
44			#
45			# 4. Computational Complexity:
46			# - Exponential growth with number of parameters
47			# - Curse of dimensionality for many parameters
48			# - Embarrassingly parallel
49
50
51			def demonstrate_curse_of_dimensionality():
52			"""Show how grid search scales with dimensions."""
53			# Grid Search Scaling (Curse of Dimensionality):
54			#
55			# scenarios = [
56			# (2, 5, "2 parameters × 5 values each"),
57			# (3, 5, "3 parameters × 5 values each"),
58			# (4, 5, "4 parameters × 5 values each"),
59			# (5, 10, "5 parameters × 10 values each"),
60			# (10, 3, "10 parameters × 3 values each"),
61			# ]
62			#
63			# for n_params, n_values, description in scenarios:
64			# total_combinations = n_values ** n_params
65			# print(f" {description}: {total_combinations:,} combinations")
66			#
67			# → Grid search works best with small parameter spaces!
68
69
70			def main():
71			# === GridSampler Example ===
72			# Exhaustive Grid Search
73
74			grid_search_theory()
75			demonstrate_curse_of_dimensionality()
76
77			# Load dataset - simple classification
78			X, y = load_iris(return_X_y=True)
79			print(f"Dataset: Iris classification ({X.shape[0]} samples, {X.shape[1]} features)")
80
81			# Create experiment
82			estimator = KNeighborsClassifier()
83			experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)
84
85			# Define search space - DISCRETE values only for grid search
86			param_space = {
87			"n_neighbors": [1, 3, 5, 7, 11, 15, 21], # 7 values
88			"weights": ["uniform", "distance"], # 2 values
89			"metric": ["euclidean", "manhattan", "minkowski"], # 3 values
90			"p": [1, 2], # Only relevant for minkowski metric # 2 values
91			}
92
93			# Total combinations: 7 × 2 × 3 × 2 = 84 combinations
94			total_combinations = 1
95			for param, values in param_space.items():
96			total_combinations *= len(values)
97
98			# Search Space (Discrete grids only):
99			# for param, values in param_space.items():
100			# print(f" {param}: {values} ({len(values)} values)")
101			# Total combinations: calculated above
102
103			# Configure GridSampler
104			optimizer = GridSampler(
105			param_space=param_space,
106			n_trials=total_combinations, # Will evaluate all combinations
107			random_state=42, # For deterministic ordering
108			experiment=experiment,
109			)
110
111			# GridSampler Configuration:
112			# n_trials: matches total combinations
113			# search_space: automatically derived from param_space
114			# Systematic evaluation of every combination
115
116			# Run optimization
117			# Running exhaustive grid search...
118			best_params = optimizer.run()
119
120			# Results
121			print("\n=== Results ===")
122			print(f"Best parameters: {best_params}")
123			print(f"Best score: {optimizer.best_score_:.4f}")
124			print()
125
126			# Grid Search Characteristics:
127			#
128			# Exhaustive Coverage:
129			# - Evaluated all parameter combinations
130			# - Guaranteed to find best configuration within grid
131			# - No risk of missing good regions
132
133			# Reproducibility:
134			# - Same grid → same results every time
135			# - Deterministic evaluation order
136			# - No randomness or hyperparameters
137
138			# Interpretability:
139			# - Easy to understand methodology
140			# - Clear relationship between grid density and accuracy
141			# - Results easily visualized and analyzed
142
143			# Grid Design Considerations:
144			#
145			# Parameter Value Selection:
146			# Include reasonable ranges for each parameter
147			# Use domain knowledge to choose meaningful values
148			# Consider logarithmic spacing for scale-sensitive parameters
149			# Start coarse, then refine around promising regions
150			#
151			# Computational Budget:
152			# Balance grid density with available compute
153			# Consider parallel evaluation to speed up
154			# Use coarse grids for initial exploration
155			#
156			# Best Use Cases:
157			# Small parameter spaces (< 6 parameters)
158			# Discrete/categorical parameters
159			# When exhaustive evaluation is feasible
160			# Baseline comparison for other methods
161			# When interpretability is crucial
162			# Parallel computing environments
163			#
164			# Limitations:
165			# Exponential scaling with parameter count
166			# May miss optimal values between grid points
167			# Inefficient for continuous parameters
168			# No adaptive learning or focusing
169			# Can waste evaluations in clearly bad regions
170			#
171			# Grid Search vs Other Methods:
172			#
173			# vs Random Search:
174			# + Systematic coverage guarantee
175			# + Reproducible results
176			# - Exponential scaling
177			# - Less efficient in high dimensions
178			#
179			# vs Bayesian Optimization:
180			# + No assumptions about objective function
181			# + Guaranteed to find grid optimum
182			# - Much less sample efficient
183			# - No learning from previous evaluations
184
185			return best_params, optimizer.best_score_
186
187
188			if __name__ == "__main__":
189			best_params, best_score = main()
190

SimonBlanke / Hyperactive

Push — master ( c241e4...b050e9 )

grid_sampler_example A

Complexity

Size/Duplication

Importance

3 Functions

Duplication Side-by-Side

Filter issues like