gp_sampler_example - Code Metrics - Inspection of "[ENH] `optuna` optimizer interface (#155)" - SimonBlanke/Hyperactive - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Push — master ( c241e4...b050e9 )

by Simon

created 2025-08-16 16:30 UTC

gp_sampler_example A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	163
Duplicated Lines	0 %

Importance

Changes

Metric	Value
wmc	2
eloc	37
dl	0
loc	163
rs	10
c	0
b	0
f	0

2 Functions

Rating	Name	Duplication	Size	Complexity
A	main()	0	107	1
A	gaussian_process_theory()	0	2	1

"""
GPSampler Example - Gaussian Process Bayesian Optimization

The GPSampler uses Gaussian Processes to model the objective function and
select promising parameter configurations. It's particularly effective for
expensive function evaluations and provides uncertainty estimates.

Characteristics:
- Bayesian optimization with Gaussian Process surrogate model
- Balances exploration (high uncertainty) and exploitation (high mean)
- Works well with mixed parameter types
- Provides uncertainty quantification
- Efficient for expensive objective functions
- Can handle constraints and noisy observations
"""

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score

from hyperactive.experiment.integrations import SklearnCvExperiment
from hyperactive.opt.optuna import GPSampler


def gaussian_process_theory():
    """Explain Gaussian Process theory for optimization."""
    # Gaussian Process Bayesian Optimization:
    #
    # 1. Surrogate Model:
    #    - GP models f(x) ~ N(μ(x), σ²(x))
    #    - μ(x): predicted mean (expected objective value)
    #    - σ²(x): predicted variance (uncertainty estimate)
    #
    # 2. Acquisition Function:
    #    - Balances exploration vs exploitation
    #    - Common choices: Expected Improvement (EI), Upper Confidence Bound (UCB)
    #    - Selects next point to evaluate: x_next = argmax acquisition(x)
    #
    # 3. Iterative Process:
    #    - Fit GP to observed data (x_i, f(x_i))
    #    - Optimize acquisition function to find x_next
    #    - Evaluate f(x_next)
    #    - Update dataset and repeat
    #
    # 4. Key Advantages:
    #    - Uncertainty-aware: explores uncertain regions
    #    - Sample efficient: good for expensive evaluations
    #    - Principled: grounded in Bayesian inference


def main():
    # === GPSampler Example ===
    # Gaussian Process Bayesian Optimization

    gaussian_process_theory()

    # Load dataset - classification problem
    X, y = load_breast_cancer(return_X_y=True)
    print(
        f"Dataset: Breast cancer classification ({X.shape[0]} samples, {X.shape[1]} features)"
    )

    # Create experiment
    estimator = SVC(random_state=42)
    experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)

    # Define search space - mixed parameter types
    param_space = {
        "C": (0.01, 100),  # Continuous - regularization
        "gamma": (1e-6, 1e2),  # Continuous - RBF parameter
        "kernel": ["rbf", "poly", "sigmoid"],  # Categorical
        "degree": (2, 5),  # Integer - polynomial degree
        "coef0": (0.0, 1.0),  # Continuous - kernel coefficient
    }

    # Search Space (Mixed parameter types):
    # for param, space in param_space.items():
    #   print(f"  {param}: {space}")

    # Configure GPSampler
    optimizer = GPSampler(
        param_space=param_space,
        n_trials=25,  # Fewer trials - GP is sample efficient
        random_state=42,
        experiment=experiment,
        n_startup_trials=8,  # Random initialization before GP modeling
        deterministic_objective=False,  # Set True if objective is noise-free
    )

    # GPSampler Configuration:
    # n_trials: configured above
    # n_startup_trials: random initialization
    # deterministic_objective: configures noise handling
    # Acquisition function: Expected Improvement (default)

    # Run optimization
    # Running GP-based optimization...
    best_params = optimizer.run()

    # Results
    print("\n=== Results ===")
    print(f"Best parameters: {best_params}")
    print(f"Best score: {optimizer.best_score_:.4f}")
    print()

    # GP Optimization Phases:
    #
    # Phase 1 (Trials 1-8): Random Exploration
    #  Random sampling for initial GP training data
    #  Builds diverse set of observations
    #  No model assumptions yet

    # Phase 2 (Trials 9-25): GP-guided Search
    #  GP model learns from observed data
    #  Acquisition function balances:
    #   - Exploitation: areas with high predicted performance
    #   - Exploration: areas with high uncertainty
    #  Sequential decision making with uncertainty

    # GP Model Characteristics:
    #  Handles mixed parameter types (continuous, discrete, categorical)
    #  Provides uncertainty estimates for all predictions
    #  Automatically balances exploration vs exploitation
    #  Sample efficient - good for expensive evaluations
    #  Can incorporate prior knowledge through mean/kernel functions

    # Acquisition Function Behavior:
    #  High mean + low variance → exploitation
    #  Low mean + high variance → exploration
    #  Balanced trade-off prevents premature convergence
    #  Adapts exploration strategy based on observed data

    # Best Use Cases:
    #  Expensive objective function evaluations
    #  Small to medium parameter spaces (< 20 dimensions)
    #  When uncertainty quantification is valuable
    #  Mixed parameter types (continuous + categorical)
    #  Noisy objective functions (with appropriate kernel)

    # Limitations:
    #  Computational cost grows with number of observations
    #  Hyperparameter tuning for GP kernel
    #  May struggle in very high dimensions
    #  Assumes some smoothness in objective function

    # Comparison with TPESampler:
    # GPSampler advantages:
    #   + Principled uncertainty quantification
    #   + Better for expensive evaluations
    #   + Can handle constraints naturally
    #
    # TPESampler advantages:
    #   + Faster computation
    #   + Better scalability to high dimensions
    #   + More robust hyperparameter defaults

    return best_params, optimizer.best_score_


if __name__ == "__main__":
    best_params, best_score = main()


1			"""
2			GPSampler Example - Gaussian Process Bayesian Optimization
3
4			The GPSampler uses Gaussian Processes to model the objective function and
5			select promising parameter configurations. It's particularly effective for
6			expensive function evaluations and provides uncertainty estimates.
7
8			Characteristics:
9			- Bayesian optimization with Gaussian Process surrogate model
10			- Balances exploration (high uncertainty) and exploitation (high mean)
11			- Works well with mixed parameter types
12			- Provides uncertainty quantification
13			- Efficient for expensive objective functions
14			- Can handle constraints and noisy observations
15			"""
16
17			import numpy as np
18			from sklearn.datasets import load_breast_cancer
19			from sklearn.svm import SVC
20			from sklearn.model_selection import cross_val_score
21
22			from hyperactive.experiment.integrations import SklearnCvExperiment
23			from hyperactive.opt.optuna import GPSampler
24
25
26			def gaussian_process_theory():
27			"""Explain Gaussian Process theory for optimization."""
28			# Gaussian Process Bayesian Optimization:
29			#
30			# 1. Surrogate Model:
31			# - GP models f(x) ~ N(μ(x), σ²(x))
32			# - μ(x): predicted mean (expected objective value)
33			# - σ²(x): predicted variance (uncertainty estimate)
34			#
35			# 2. Acquisition Function:
36			# - Balances exploration vs exploitation
37			# - Common choices: Expected Improvement (EI), Upper Confidence Bound (UCB)
38			# - Selects next point to evaluate: x_next = argmax acquisition(x)
39			#
40			# 3. Iterative Process:
41			# - Fit GP to observed data (x_i, f(x_i))
42			# - Optimize acquisition function to find x_next
43			# - Evaluate f(x_next)
44			# - Update dataset and repeat
45			#
46			# 4. Key Advantages:
47			# - Uncertainty-aware: explores uncertain regions
48			# - Sample efficient: good for expensive evaluations
49			# - Principled: grounded in Bayesian inference
50
51
52			def main():
53			# === GPSampler Example ===
54			# Gaussian Process Bayesian Optimization
55
56			gaussian_process_theory()
57
58			# Load dataset - classification problem
59			X, y = load_breast_cancer(return_X_y=True)
60			print(
61			f"Dataset: Breast cancer classification ({X.shape[0]} samples, {X.shape[1]} features)"
62			)
63
64			# Create experiment
65			estimator = SVC(random_state=42)
66			experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)
67
68			# Define search space - mixed parameter types
69			param_space = {
70			"C": (0.01, 100), # Continuous - regularization
71			"gamma": (1e-6, 1e2), # Continuous - RBF parameter
72			"kernel": ["rbf", "poly", "sigmoid"], # Categorical
73			"degree": (2, 5), # Integer - polynomial degree
74			"coef0": (0.0, 1.0), # Continuous - kernel coefficient
75			}
76
77			# Search Space (Mixed parameter types):
78			# for param, space in param_space.items():
79			# print(f" {param}: {space}")
80
81			# Configure GPSampler
82			optimizer = GPSampler(
83			param_space=param_space,
84			n_trials=25, # Fewer trials - GP is sample efficient
85			random_state=42,
86			experiment=experiment,
87			n_startup_trials=8, # Random initialization before GP modeling
88			deterministic_objective=False, # Set True if objective is noise-free
89			)
90
91			# GPSampler Configuration:
92			# n_trials: configured above
93			# n_startup_trials: random initialization
94			# deterministic_objective: configures noise handling
95			# Acquisition function: Expected Improvement (default)
96
97			# Run optimization
98			# Running GP-based optimization...
99			best_params = optimizer.run()
100
101			# Results
102			print("\n=== Results ===")
103			print(f"Best parameters: {best_params}")
104			print(f"Best score: {optimizer.best_score_:.4f}")
105			print()
106
107			# GP Optimization Phases:
108			#
109			# Phase 1 (Trials 1-8): Random Exploration
110			# Random sampling for initial GP training data
111			# Builds diverse set of observations
112			# No model assumptions yet
113
114			# Phase 2 (Trials 9-25): GP-guided Search
115			# GP model learns from observed data
116			# Acquisition function balances:
117			# - Exploitation: areas with high predicted performance
118			# - Exploration: areas with high uncertainty
119			# Sequential decision making with uncertainty
120
121			# GP Model Characteristics:
122			# Handles mixed parameter types (continuous, discrete, categorical)
123			# Provides uncertainty estimates for all predictions
124			# Automatically balances exploration vs exploitation
125			# Sample efficient - good for expensive evaluations
126			# Can incorporate prior knowledge through mean/kernel functions
127
128			# Acquisition Function Behavior:
129			# High mean + low variance → exploitation
130			# Low mean + high variance → exploration
131			# Balanced trade-off prevents premature convergence
132			# Adapts exploration strategy based on observed data
133
134			# Best Use Cases:
135			# Expensive objective function evaluations
136			# Small to medium parameter spaces (< 20 dimensions)
137			# When uncertainty quantification is valuable
138			# Mixed parameter types (continuous + categorical)
139			# Noisy objective functions (with appropriate kernel)
140
141			# Limitations:
142			# Computational cost grows with number of observations
143			# Hyperparameter tuning for GP kernel
144			# May struggle in very high dimensions
145			# Assumes some smoothness in objective function
146
147			# Comparison with TPESampler:
148			# GPSampler advantages:
149			# + Principled uncertainty quantification
150			# + Better for expensive evaluations
151			# + Can handle constraints naturally
152			#
153			# TPESampler advantages:
154			# + Faster computation
155			# + Better scalability to high dimensions
156			# + More robust hyperparameter defaults
157
158			return best_params, optimizer.best_score_
159
160
161			if __name__ == "__main__":
162			best_params, best_score = main()
163

SimonBlanke / Hyperactive

Push — master ( c241e4...b050e9 )

gp_sampler_example A

Complexity

Size/Duplication

Importance

2 Functions

Duplication Side-by-Side

Filter issues like