Passed
Push — master ( c241e4...b050e9 )
by Simon
01:57
created

qmc_sampler_example   A

Complexity

Total Complexity 3

Size/Duplication

Total Lines 215
Duplicated Lines 0 %

Importance

Changes 0
Metric Value
wmc 3
eloc 37
dl 0
loc 215
rs 10
c 0
b 0
f 0

3 Functions

Rating   Name   Duplication   Size   Complexity  
A demonstrate_space_filling() 0 2 1
A main() 0 128 1
A qmc_theory() 0 2 1
1
"""
2
QMCSampler Example - Quasi-Monte Carlo Sampling
3
4
The QMCSampler uses Quasi-Monte Carlo sequences (like Sobol or Halton)
5
to generate low-discrepancy samples. These sequences provide better
6
coverage of the parameter space compared to purely random sampling.
7
8
Characteristics:
9
- Low-discrepancy sequences for uniform space filling
10
- Better convergence than random sampling
11
- Deterministic sequence generation
12
- Excellent space coverage properties
13
- No learning from previous evaluations
14
- Good baseline for comparison with adaptive methods
15
16
QMC sequences are particularly effective for:
17
- Integration and sampling problems
18
- Initial design of experiments
19
- Baseline optimization comparisons
20
"""
21
22
import numpy as np
23
from sklearn.datasets import load_wine
24
from sklearn.linear_model import LogisticRegression
25
from sklearn.model_selection import cross_val_score
26
27
from hyperactive.experiment.integrations import SklearnCvExperiment
28
from hyperactive.opt.optuna import QMCSampler
29
30
31
def qmc_theory():
32
    """Explain Quasi-Monte Carlo theory."""
33
    # Quasi-Monte Carlo (QMC) Theory:
34
    #
35
    # 1. Low-Discrepancy Sequences:
36
    #    - Deterministic sequences that fill space uniformly
37
    #    - Better distribution than random sampling
38
    #    - Minimize gaps and clusters in parameter space
39
    #    - Convergence rate O(log^d(N)/N) vs O(1/√N) for random
40
    #
41
    # 2. Common QMC Sequences:
42
    #    - Sobol: Based on binary representations
43
    #    - Halton: Based on prime number bases
44
    #    - Faure: Generalization of Halton sequences
45
    #    - Each has different strengths for different dimensions
46
    #
47
    # 3. Space-Filling Properties:
48
    #    - Stratification: Even coverage of parameter regions
49
    #    - Low discrepancy: Uniform distribution approximation
50
    #    - Correlation breaking: Reduces clustering
51
    #
52
    # 4. Advantages over Random Sampling:
53
    #    - Better convergence for integration
54
    #    - More uniform exploration
55
    #    - Reproducible sequences
56
    #    - No unlucky clustering
57
58
59
def demonstrate_space_filling():
60
    """Demonstrate space-filling properties conceptually."""
61
    # Space-Filling Comparison:
62
    #
63
    # Random Sampling:
64
    #    Can have clusters and gaps
65
    #    Uneven coverage especially with few samples
66
    #    Variance in coverage quality
67
    #    Some regions may be under-explored
68
    #
69
    # Quasi-Monte Carlo (QMC):
70
    #    Systematic space filling
71
    #    Even coverage with fewer samples
72
    #    Consistent coverage quality
73
    #    All regions explored proportionally
74
    #
75
    # Grid Sampling:
76
    #    Perfect regular coverage
77
    #    Exponential scaling with dimensions
78
    #    May miss optimal points between grid lines
79
    #
80
    # → QMC provides balanced approach between random and grid
81
82
83
def main():
84
    # === QMCSampler Example ===
85
    # Quasi-Monte Carlo Low-Discrepancy Sampling
86
87
    qmc_theory()
88
    demonstrate_space_filling()
89
90
    # Load dataset
91
    X, y = load_wine(return_X_y=True)
92
    print(f"Dataset: Wine classification ({X.shape[0]} samples, {X.shape[1]} features)")
93
94
    # Create experiment
95
    estimator = LogisticRegression(random_state=42, max_iter=1000)
96
    experiment = SklearnCvExperiment(estimator=estimator, X=X, y=y, cv=5)
97
98
    # Define search space
99
    param_space = {
100
        "C": (0.001, 100),  # Regularization strength
101
        "l1_ratio": (0.0, 1.0),  # Elastic net mixing parameter
102
        "solver": ["liblinear", "saga"],  # Solver algorithm
103
        "penalty": ["l1", "l2", "elasticnet"],  # Regularization type
104
    }
105
106
    # Search Space:
107
    # C: (0.001, 100) - Regularization strength
108
    # l1_ratio: (0.0, 1.0) - Elastic net mixing parameter
109
    # solver: ['liblinear', 'saga'] - Solver algorithm
110
    # penalty: ['l1', 'l2', 'elasticnet'] - Regularization type
111
112
    # Configure QMCSampler
113
    optimizer = QMCSampler(
114
        param_space=param_space,
115
        n_trials=32,  # Power of 2 often works well for QMC
116
        random_state=42,
117
        experiment=experiment,
118
        qmc_type="sobol",  # Sobol or Halton sequences
119
        scramble=True,  # Randomized QMC (Owen scrambling)
120
    )
121
122
    # QMCSampler Configuration:
123
    # n_trials: 32 (power of 2 for better QMC properties)
124
    # qmc_type: 'sobol' sequence
125
    # scramble: True (randomized QMC)
126
    # Deterministic low-discrepancy sampling
127
128
    # QMC Sequence Types:
129
    #  Sobol: Excellent for moderate dimensions, binary-based
130
    #  Halton: Good for low dimensions, prime-based
131
    #  Scrambling: Adds randomization while preserving uniformity
132
133
    # Run optimization
134
    # Running QMC sampling optimization...
135
    best_params = optimizer.run()
136
137
    # Results
138
    print("\n=== Results ===")
139
    print(f"Best parameters: {best_params}")
140
    print(f"Best score: {optimizer.best_score_:.4f}")
141
    print()
142
143
    # QMC behavior analysis:
144
    #
145
    # QMC Sampling Analysis:
146
    #
147
    # Sequence Properties:
148
    #  Deterministic generation (reproducible with same seed)
149
    #  Low-discrepancy (uniform distribution approximation)
150
    #  Space-filling (systematic coverage of parameter space)
151
    #  Stratification (even coverage of all regions)
152
    #  No clustering or large gaps
153
    #
154
    # Sobol Sequence Characteristics:
155
    #  Binary-based construction
156
    #  Good equidistribution properties
157
    #  Effective for dimensions up to ~40
158
    #  Popular choice for QMC sampling
159
    #
160
    # Scrambling Benefits (when enabled):
161
    #  Breaks regularity patterns
162
    #  Provides Monte Carlo error estimates
163
    #  Maintains low-discrepancy properties
164
    #  Reduces correlation artifacts
165
    #
166
    # QMC vs Other Sampling Methods:
167
    #
168
    # vs Pure Random Sampling:
169
    #   + Better space coverage with fewer samples
170
    #   + More consistent performance
171
    #   + Faster convergence for integration-like problems
172
    #   - No true randomness (if needed for some applications)
173
    #
174
    # vs Grid Search:
175
    #   + Works well in higher dimensions
176
    #   + No exponential scaling
177
    #   + Covers continuous spaces naturally
178
    #   - No systematic guarantee of finding grid optimum
179
    #
180
    # vs Adaptive Methods (TPE, GP):
181
    #   + No assumptions about objective function
182
    #   + Embarrassingly parallel
183
    #   + Consistent performance regardless of function type
184
    #   - No learning from previous evaluations
185
    #   - May waste evaluations in clearly suboptimal regions
186
    #
187
    # Best Use Cases:
188
    #  Design of experiments (DoE)
189
    #  Initial exploration phase
190
    #  Baseline for comparing adaptive methods
191
    #  Integration and sampling problems
192
    #  When function evaluations are parallelizable
193
    #  Robustness testing across parameter space
194
    #
195
    # Implementation Considerations:
196
    #  Use powers of 2 for n_trials with Sobol sequences
197
    #  Consider scrambling for better statistical properties
198
    #  Choose sequence type based on dimensionality:
199
    #   - Sobol: Good general choice
200
    #   - Halton: Better for low dimensions (< 6)
201
    #  QMC works best with transformed uniform parameters
202
    #
203
    # Practical Recommendations:
204
    # 1. Use QMC for initial exploration (first 20-50 evaluations)
205
    # 2. Switch to adaptive methods (TPE/GP) for focused search
206
    # 3. Use for sensitivity analysis across full parameter space
207
    # 4. Good choice when unsure about objective function properties
208
    # 5. Ideal for parallel evaluation scenarios
209
210
    return best_params, optimizer.best_score_
211
212
213
if __name__ == "__main__":
214
    best_params, best_score = main()
215