qmc_sampler_example - Code Metrics - Inspection of "[ENH] `optuna` optimizer interface (#155)" - SimonBlanke/Hyperactive - Measure and Improve Code Quality continuously with Scrutinizer

Passed

Push — master ( c241e4...b050e9 )

by Simon

created 2025-08-16 16:30 UTC

qmc_sampler_example A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	215
Duplicated Lines	0 %

Importance

Changes

Metric	Value
wmc	3
eloc	37
dl	0
loc	215
rs	10
c	0
b	0
f	0

3 Functions

Rating	Name	Size	Complexity
A	demonstrate_space_filling()	2	1
A	main()	128	1
A	qmc_theory()	2	1

"""
QMCSampler Example - Quasi-Monte Carlo Sampling

The QMCSampler uses Quasi-Monte Carlo sequences (like Sobol or Halton)
to generate low-discrepancy samples. These sequences provide better
coverage of the parameter space compared to purely random sampling.

Characteristics:
- Low-discrepancy sequences for uniform space filling
- Better convergence than random sampling
- Deterministic sequence generation
- Excellent space coverage properties
- No learning from previous evaluations
- Good baseline for comparison with adaptive methods

QMC sequences are particularly effective for:
- Integration and sampling problems
- Initial design of experiments
- Baseline optimization comparisons
"""

import numpy as np
from sklearn.datasets import load_wine
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

from hyperactive.experiment.integrations import SklearnCvExperiment
from hyperactive.opt.optuna import QMCSampler


def qmc_theory():
    """Explain Quasi-Monte Carlo theory."""
    # Quasi-Monte Carlo (QMC) Theory:
    #
    # 1. Low-Discrepancy Sequences:
    #    - Deterministic sequences that fill space uniformly
    #    - Better distribution than random sampling
    #    - Minimize gaps and clusters in parameter space
    #    - Convergence rate O(log^d(N)/N) vs O(1/√N) for random
    #
    # 2. Common QMC Sequences:
    #    - Sobol: Based on binary representations
    #    - Halton: Based on prime number bases
    #    - Faure: Generalization of Halton sequences
    #    - Each has different strengths for different dimensions
    #
    # 3. Space-Filling Properties:
    #    - Stratification: Even coverage of parameter regions
    #    - Low discrepancy: Uniform distribution approximation
    #    - Correlation breaking: Reduces clustering
    #
    # 4. Advantages over Random Sampling:
    #    - Better convergence for integration
    #    - More uniform exploration
    #    - Reproducible sequences
    #    - No unlucky clustering


def demonstrate_space_filling():
    """Demonstrate space-filling properties conceptually."""
    # Space-Filling Comparison:
    #
    # Random Sampling:
    #    Can have clusters and gaps
    #    Uneven coverage especially with few samples
    #    Variance in coverage quality
    #    Some regions may be under-explored
    #
    # Quasi-Monte Carlo (QMC):
    #    Systematic space filling
    #    Even coverage with fewer samples
    #    Consistent coverage quality
    #    All regions explored proportionally
    #
    # Grid Sampling:
    #    Perfect regular coverage
    #    Exponential scaling with dimensions
    #    May miss optimal points between grid lines
    #
    # → QMC provides balanced approach between random and grid


def main():
    # === QMCSampler Example ===
    # Quasi-Monte Carlo Low-Discrepancy Sampling

    qmc_theory()
    demonstrate_space_filling()

    # Load dataset
    X, y = load_wine(return_X_y=True)
    print(f"Dataset: Wine classification ({X.shape[0]} samples, {X.shape[1]} features)")

    # Create experiment
    estimator = LogisticRegression(random_state=42, max_iter=1000)
    experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)

    # Define search space
    param_space = {
        "C": (0.001, 100),  # Regularization strength
        "l1_ratio": (0.0, 1.0),  # Elastic net mixing parameter
        "solver": ["liblinear", "saga"],  # Solver algorithm
        "penalty": ["l1", "l2", "elasticnet"],  # Regularization type
    }

    # Search Space:
    # C: (0.001, 100) - Regularization strength
    # l1_ratio: (0.0, 1.0) - Elastic net mixing parameter
    # solver: ['liblinear', 'saga'] - Solver algorithm
    # penalty: ['l1', 'l2', 'elasticnet'] - Regularization type

    # Configure QMCSampler
    optimizer = QMCSampler(
        param_space=param_space,
        n_trials=32,  # Power of 2 often works well for QMC
        random_state=42,
        experiment=experiment,
        qmc_type="sobol",  # Sobol or Halton sequences
        scramble=True,  # Randomized QMC (Owen scrambling)
    )

    # QMCSampler Configuration:
    # n_trials: 32 (power of 2 for better QMC properties)
    # qmc_type: 'sobol' sequence
    # scramble: True (randomized QMC)
    # Deterministic low-discrepancy sampling

    # QMC Sequence Types:
    #  Sobol: Excellent for moderate dimensions, binary-based
    #  Halton: Good for low dimensions, prime-based
    #  Scrambling: Adds randomization while preserving uniformity

    # Run optimization
    # Running QMC sampling optimization...
    best_params = optimizer.run()

    # Results
    print("\n=== Results ===")
    print(f"Best parameters: {best_params}")
    print(f"Best score: {optimizer.best_score_:.4f}")
    print()

    # QMC behavior analysis:
    #
    # QMC Sampling Analysis:
    #
    # Sequence Properties:
    #  Deterministic generation (reproducible with same seed)
    #  Low-discrepancy (uniform distribution approximation)
    #  Space-filling (systematic coverage of parameter space)
    #  Stratification (even coverage of all regions)
    #  No clustering or large gaps
    #
    # Sobol Sequence Characteristics:
    #  Binary-based construction
    #  Good equidistribution properties
    #  Effective for dimensions up to ~40
    #  Popular choice for QMC sampling
    #
    # Scrambling Benefits (when enabled):
    #  Breaks regularity patterns
    #  Provides Monte Carlo error estimates
    #  Maintains low-discrepancy properties
    #  Reduces correlation artifacts
    #
    # QMC vs Other Sampling Methods:
    #
    # vs Pure Random Sampling:
    #   + Better space coverage with fewer samples
    #   + More consistent performance
    #   + Faster convergence for integration-like problems
    #   - No true randomness (if needed for some applications)
    #
    # vs Grid Search:
    #   + Works well in higher dimensions
    #   + No exponential scaling
    #   + Covers continuous spaces naturally
    #   - No systematic guarantee of finding grid optimum
    #
    # vs Adaptive Methods (TPE, GP):
    #   + No assumptions about objective function
    #   + Embarrassingly parallel
    #   + Consistent performance regardless of function type
    #   - No learning from previous evaluations
    #   - May waste evaluations in clearly suboptimal regions
    #
    # Best Use Cases:
    #  Design of experiments (DoE)
    #  Initial exploration phase
    #  Baseline for comparing adaptive methods
    #  Integration and sampling problems
    #  When function evaluations are parallelizable
    #  Robustness testing across parameter space
    #
    # Implementation Considerations:
    #  Use powers of 2 for n_trials with Sobol sequences
    #  Consider scrambling for better statistical properties
    #  Choose sequence type based on dimensionality:
    #   - Sobol: Good general choice
    #   - Halton: Better for low dimensions (< 6)
    #  QMC works best with transformed uniform parameters
    #
    # Practical Recommendations:
    # 1. Use QMC for initial exploration (first 20-50 evaluations)
    # 2. Switch to adaptive methods (TPE/GP) for focused search
    # 3. Use for sensitivity analysis across full parameter space
    # 4. Good choice when unsure about objective function properties
    # 5. Ideal for parallel evaluation scenarios

    return best_params, optimizer.best_score_


if __name__ == "__main__":
    best_params, best_score = main()


1			"""
2			QMCSampler Example - Quasi-Monte Carlo Sampling
3
4			The QMCSampler uses Quasi-Monte Carlo sequences (like Sobol or Halton)
5			to generate low-discrepancy samples. These sequences provide better
6			coverage of the parameter space compared to purely random sampling.
7
8			Characteristics:
9			- Low-discrepancy sequences for uniform space filling
10			- Better convergence than random sampling
11			- Deterministic sequence generation
12			- Excellent space coverage properties
13			- No learning from previous evaluations
14			- Good baseline for comparison with adaptive methods
15
16			QMC sequences are particularly effective for:
17			- Integration and sampling problems
18			- Initial design of experiments
19			- Baseline optimization comparisons
20			"""
21
22			import numpy as np
23			from sklearn.datasets import load_wine
24			from sklearn.linear_model import LogisticRegression
25			from sklearn.model_selection import cross_val_score
26
27			from hyperactive.experiment.integrations import SklearnCvExperiment
28			from hyperactive.opt.optuna import QMCSampler
29
30
31			def qmc_theory():
32			"""Explain Quasi-Monte Carlo theory."""
33			# Quasi-Monte Carlo (QMC) Theory:
34			#
35			# 1. Low-Discrepancy Sequences:
36			# - Deterministic sequences that fill space uniformly
37			# - Better distribution than random sampling
38			# - Minimize gaps and clusters in parameter space
39			# - Convergence rate O(log^d(N)/N) vs O(1/√N) for random
40			#
41			# 2. Common QMC Sequences:
42			# - Sobol: Based on binary representations
43			# - Halton: Based on prime number bases
44			# - Faure: Generalization of Halton sequences
45			# - Each has different strengths for different dimensions
46			#
47			# 3. Space-Filling Properties:
48			# - Stratification: Even coverage of parameter regions
49			# - Low discrepancy: Uniform distribution approximation
50			# - Correlation breaking: Reduces clustering
51			#
52			# 4. Advantages over Random Sampling:
53			# - Better convergence for integration
54			# - More uniform exploration
55			# - Reproducible sequences
56			# - No unlucky clustering
57
58
59			def demonstrate_space_filling():
60			"""Demonstrate space-filling properties conceptually."""
61			# Space-Filling Comparison:
62			#
63			# Random Sampling:
64			# Can have clusters and gaps
65			# Uneven coverage especially with few samples
66			# Variance in coverage quality
67			# Some regions may be under-explored
68			#
69			# Quasi-Monte Carlo (QMC):
70			# Systematic space filling
71			# Even coverage with fewer samples
72			# Consistent coverage quality
73			# All regions explored proportionally
74			#
75			# Grid Sampling:
76			# Perfect regular coverage
77			# Exponential scaling with dimensions
78			# May miss optimal points between grid lines
79			#
80			# → QMC provides balanced approach between random and grid
81
82
83			def main():
84			# === QMCSampler Example ===
85			# Quasi-Monte Carlo Low-Discrepancy Sampling
86
87			qmc_theory()
88			demonstrate_space_filling()
89
90			# Load dataset
91			X, y = load_wine(return_X_y=True)
92			print(f"Dataset: Wine classification ({X.shape[0]} samples, {X.shape[1]} features)")
93
94			# Create experiment
95			estimator = LogisticRegression(random_state=42, max_iter=1000)
96			experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)
97
98			# Define search space
99			param_space = {
100			"C": (0.001, 100), # Regularization strength
101			"l1_ratio": (0.0, 1.0), # Elastic net mixing parameter
102			"solver": ["liblinear", "saga"], # Solver algorithm
103			"penalty": ["l1", "l2", "elasticnet"], # Regularization type
104			}
105
106			# Search Space:
107			# C: (0.001, 100) - Regularization strength
108			# l1_ratio: (0.0, 1.0) - Elastic net mixing parameter
109			# solver: ['liblinear', 'saga'] - Solver algorithm
110			# penalty: ['l1', 'l2', 'elasticnet'] - Regularization type
111
112			# Configure QMCSampler
113			optimizer = QMCSampler(
114			param_space=param_space,
115			n_trials=32, # Power of 2 often works well for QMC
116			random_state=42,
117			experiment=experiment,
118			qmc_type="sobol", # Sobol or Halton sequences
119			scramble=True, # Randomized QMC (Owen scrambling)
120			)
121
122			# QMCSampler Configuration:
123			# n_trials: 32 (power of 2 for better QMC properties)
124			# qmc_type: 'sobol' sequence
125			# scramble: True (randomized QMC)
126			# Deterministic low-discrepancy sampling
127
128			# QMC Sequence Types:
129			# Sobol: Excellent for moderate dimensions, binary-based
130			# Halton: Good for low dimensions, prime-based
131			# Scrambling: Adds randomization while preserving uniformity
132
133			# Run optimization
134			# Running QMC sampling optimization...
135			best_params = optimizer.run()
136
137			# Results
138			print("\n=== Results ===")
139			print(f"Best parameters: {best_params}")
140			print(f"Best score: {optimizer.best_score_:.4f}")
141			print()
142
143			# QMC behavior analysis:
144			#
145			# QMC Sampling Analysis:
146			#
147			# Sequence Properties:
148			# Deterministic generation (reproducible with same seed)
149			# Low-discrepancy (uniform distribution approximation)
150			# Space-filling (systematic coverage of parameter space)
151			# Stratification (even coverage of all regions)
152			# No clustering or large gaps
153			#
154			# Sobol Sequence Characteristics:
155			# Binary-based construction
156			# Good equidistribution properties
157			# Effective for dimensions up to ~40
158			# Popular choice for QMC sampling
159			#
160			# Scrambling Benefits (when enabled):
161			# Breaks regularity patterns
162			# Provides Monte Carlo error estimates
163			# Maintains low-discrepancy properties
164			# Reduces correlation artifacts
165			#
166			# QMC vs Other Sampling Methods:
167			#
168			# vs Pure Random Sampling:
169			# + Better space coverage with fewer samples
170			# + More consistent performance
171			# + Faster convergence for integration-like problems
172			# - No true randomness (if needed for some applications)
173			#
174			# vs Grid Search:
175			# + Works well in higher dimensions
176			# + No exponential scaling
177			# + Covers continuous spaces naturally
178			# - No systematic guarantee of finding grid optimum
179			#
180			# vs Adaptive Methods (TPE, GP):
181			# + No assumptions about objective function
182			# + Embarrassingly parallel
183			# + Consistent performance regardless of function type
184			# - No learning from previous evaluations
185			# - May waste evaluations in clearly suboptimal regions
186			#
187			# Best Use Cases:
188			# Design of experiments (DoE)
189			# Initial exploration phase
190			# Baseline for comparing adaptive methods
191			# Integration and sampling problems
192			# When function evaluations are parallelizable
193			# Robustness testing across parameter space
194			#
195			# Implementation Considerations:
196			# Use powers of 2 for n_trials with Sobol sequences
197			# Consider scrambling for better statistical properties
198			# Choose sequence type based on dimensionality:
199			# - Sobol: Good general choice
200			# - Halton: Better for low dimensions (< 6)
201			# QMC works best with transformed uniform parameters
202			#
203			# Practical Recommendations:
204			# 1. Use QMC for initial exploration (first 20-50 evaluations)
205			# 2. Switch to adaptive methods (TPE/GP) for focused search
206			# 3. Use for sensitivity analysis across full parameter space
207			# 4. Good choice when unsure about objective function properties
208			# 5. Ideal for parallel evaluation scenarios
209
210			return best_params, optimizer.best_score_
211
212
213			if __name__ == "__main__":
214			best_params, best_score = main()
215

SimonBlanke / Hyperactive

Push — master ( c241e4...b050e9 )

qmc_sampler_example A

Complexity

Size/Duplication

Importance

3 Functions

Duplication Side-by-Side

Filter issues like