Completed
Branch master (f50597)
by Wouter
52s
created

TargetContrastivePessimisticClassifier   B

Complexity

Total Complexity 50

Size/Duplication

Total Lines 505
Duplicated Lines 0 %

Test Coverage

Coverage 0%

Importance

Changes 3
Bugs 0 Features 0
Metric Value
c 3
b 0
f 0
dl 0
loc 505
ccs 0
cts 171
cp 0
rs 8.6206
wmc 50

14 Methods

Rating   Name   Duplication   Size   Complexity  
B fit() 0 33 3
A __init__() 0 53 3
B predict() 0 24 4
B learning_rate_t() 0 27 5
F tcpr_da() 0 99 9
A remove_intercept() 0 12 1
B project_simplex() 0 30 1
A error_rate() 0 3 1
A add_intercept() 0 22 2
C neg_log_likelihood() 0 56 9
A get_params() 0 10 2
B risk() 0 25 2
A combine_class_covariances() 0 23 3
B discriminant_parameters() 0 68 5

How to fix   Complexity   

Complex Class

Complex classes like TargetContrastivePessimisticClassifier often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
#!/usr/bin/env python
2
# -*- coding: utf-8 -*-
3
4
import numpy as np
5
import numpy.linalg as al
6
from scipy.stats import multivariate_normal as mvn
7
import sklearn as sk
8
from sklearn.svm import LinearSVC
9
from sklearn.linear_model import LogisticRegression, LinearRegression
10
from sklearn.model_selection import cross_val_predict
11
from os.path import basename
12
13
from .util import one_hot, regularize_matrix
14
15
16
class TargetContrastivePessimisticClassifier(object):
17
    """
18
    Classifiers based on Target Contrastive Pessimistic Risk minimization.
19
20
    Methods contain models, risk functions, parameter estimation, etc.
21
    """
22
23
    def __init__(self, loss='lda', l2=1.0, max_iter=500, tolerance=1e-12,
24
                 learning_rate=1.0, rate_decay='linear', verbosity=0):
25
        """
26
        Select a particular type of TCPR classifier.
27
28
        INPUT   (1) str 'loss': loss function for TCP Risk, options: 'ls',
29
                    'least_squares', 'lda', 'linear_discriminant_analysis',
30
                    'qda', 'quadratic_discriminant_analysis' (def: 'lda')
31
                (2) float 'l2': l2-regularization parameter value (def:0.01)
32
                (3) int 'max_iter': maximum number of iterations for
33
                    optimization (def: 500)
34
                (4) float 'tolerance': convergence criterion on the TCP
35
                    parameters (def: 1e-5)
36
                (5) float 'learning_rate': parameter for size of update of
37
                    gradient (def: 1.0)
38
                (6) str 'rate_decay': type of learning rate decay, options:
39
                    'linear', 'quadratic', 'geometric', 'exponential'
40
                    (def: 'linear')
41
        """
42
        # Classifier options
43
        self.loss = loss
44
        self.l2 = l2
45
46
        # Optimization parameters
47
        self.max_iter = max_iter
48
        self.tolerance = tolerance
49
        self.learning_rate = learning_rate
50
        self.rate_decay = rate_decay
51
        self.verbosity = verbosity
52
53
        if self.loss in ['linear discriminant analysis', 'lda']:
54
55
            # Set to short name
56
            self.loss = 'lda'
57
58
        elif self.loss in ['quadratic discriminant analysis', 'qda']:
59
60
            # Set to short name
61
            self.loss = 'qda'
62
63
        else:
64
            # Other loss functions are not implemented
65
            raise ValueError('Model not implemented.')
66
67
        # Initialize classifier and classes parameters
68
        self.parameters = []
69
        self.classes = []
70
71
        # Whether model has been trained
72
        self.is_trained = False
73
74
        # Dimensionality of training data
75
        self.train_data_dim = 0
76
77
    def add_intercept(self, X):
78
        """Add 1's to data as last features."""
79
        # Data shape
80
        N, D = X.shape
81
82
        # Check if there's not already an intercept column
83
        if np.any(np.sum(X, axis=0) == N):
84
85
            # Report
86
            print('Intercept is not the last feature. Swapping..')
87
88
            # Find which column contains the intercept
89
            intercept_index = np.argwhere(np.sum(X, axis=0) == N)
90
91
            # Swap intercept to last
92
            X = X[:, np.setdiff1d(np.arange(D), intercept_index)]
93
94
        # Add intercept as last column
95
        X = np.hstack((X, np.ones((N, 1))))
96
97
        # Append column of 1's to data, and increment dimensionality
98
        return X, D+1
99
100
    def remove_intercept(self, X):
101
        """Remove 1's from data as last features."""
102
        # Data shape
103
        N, D = X.shape
104
105
        # Find which column contains the intercept
106
        intercept_index = np.argwhere(np.sum(X, axis=0) == N)
107
108
        # Swap intercept to last
109
        X = X[:, np.setdiff1d(np.arange(D), intercept_index)]
110
111
        return X, D-1
112
113
    def project_simplex(self, v, z=1.0):
114
        """
115
        Project vector onto simplex using sorting.
116
117
        Reference: "Efficient Projections onto the L1-Ball for Learning in High
118
        Dimensions (Duchi, Shalev-Shwartz, Singer, Chandra, 2006)."
119
120
        INPUT   (1) array 'v': vector to be projected (n dimensions by 0)
121
                (2) float 'z': constant (def: 1.0)
122
        OUTPUT  (1) array 'w': projected vector (n dimensions by 0)
123
        """
124
        # Number of dimensions
125
        n = v.shape[0]
126
127
        # Sort vector
128
        mu = np.sort(v, axis=0)[::-1]
129
130
        # Find rho
131
        C = np.cumsum(mu) - z
132
        j = np.arange(n) + 1
133
        rho = j[mu - C/j > 0][-1]
134
135
        # Define theta
136
        theta = C[mu - C/j > 0][-1] / float(rho)
137
138
        # Subtract theta from original vector and cap at 0
139
        w = np.maximum(v - theta, 0)
140
141
        # Return projected vector
142
        return w
143
144
    def learning_rate_t(self, t):
145
        """Compute current learning rate after decay."""
146
        # Select rate decay
147
        if self.rate_decay == 'linear':
148
149
            # Linear dropoff between t=0 and t=T
150
            alpha = (self.max_iter - t)/(self.learning_rate*self.max_iter)
151
152
        elif self.rate_decay == 'quadratic':
153
154
            # Quadratic dropoff between t=0 and t=T
155
            alpha = ((self.max_iter - t)/(self.learning_rate*self.max_iter))**2
156
157
        elif self.rate_decay == 'geometric':
158
159
            # Drop rate off inversely to time
160
            alpha = 1 / (self.learning_rate * t)
161
162
        elif self.rate_decay == 'exponential':
163
164
            # Exponential dropoff
165
            alpha = np.exp(-self.learning_rate * t)
166
167
        else:
168
            raise ValueError('Rate decay type unknown.')
169
170
        return alpha
171
172
    def risk(self, Z, theta, q):
173
        """
174
        Compute target contrastive pessimistic risk.
175
176
        INPUT   (1) array 'Z': target samples (M samples by D features)
177
                (2) array 'theta': classifier parameters (D features by
178
                    K classes)
179
                (3) array 'q': soft labels (M samples by K classes)
180
        OUTPUT  (1) float: risk
181
        """
182
        # Number of classes
183
        K = q.shape[1]
184
185
        # Compute negative log-likelihood
186
        L = self.neg_log_likelihood(Z, theta)
187
188
        # Weight loss by soft labels
189
        for k in range(K):
190
            L[:, k] *= q[:, k]
191
192
        # Sum over weighted losses
193
        L = np.sum(L, axis=1)
194
195
        # Risk is average loss
196
        return np.mean(L, axis=0)
197
198
    def neg_log_likelihood(self, X, theta):
199
        """
200
        Compute negative log-likelihood under Gaussian distributions.
201
202
        INPUT   (1) array 'X': data (N samples by D features)
203
                (2) tuple 'theta': containing class proportions 'pi', class
204
                    means 'mu', and class-covariances 'Si'
205
        OUTPUT  (1) array 'L': loss (N samples by K classes)
206
        """
207
        # Unpack parameters
208
        pi, mu, Si = theta
209
210
        # Check if parameter sets match
211
        assert pi.shape[1] == mu.shape[0]
212
        assert mu.shape[1] == Si.shape[0]
213
        assert Si.shape[0] == Si.shape[1]
214
215
        # Number of classes
216
        K = pi.shape[1]
217
218
        # Data shape
219
        N, D = X.shape
220
221
        # Preallocate loss array
222
        L = np.zeros((N, K))
223
224
        for k in range(K):
225
226
            # Check for linear or quadratic
227
            if self.loss == 'lda':
228
229
                try:
230
                    # Probability under k-th Gaussian with shared covariance
231
                    probs = mvn.pdf(X, mu[k, :], Si)
232
233
                except al.LinAlgError as err:
234
                    print('Covariance matrix is singular. Add regularization.')
235
                    raise err
236
237
            elif self.loss == 'qda':
238
239
                try:
240
                    # Probability under k-th Gaussian with own covariance
241
                    probs = mvn.pdf(X, mu[k, :], Si[:, :, k])
242
243
                except al.LinAlgError as err:
244
                    print('Covariance matrix is singular. Add regularization.')
245
                    raise err
246
247
            else:
248
                raise ValueError('Loss unknown.')
249
250
            # Negative log-likelihood
251
            L[:, k] = -np.log(pi[0, k]) - np.log(probs)
252
253
        return L
254
255
    def discriminant_parameters(self, X, Y):
256
        """
257
        Estimate parameters of Gaussian distribution for discriminant analysis.
258
259
        INPUT   (1) array 'X': data array (N samples by D features)
260
                (2) array 'Y': label array (N samples by K classes)
261
        OUTPUT  (1) array 'pi': class proportions (1 by K classes)
262
                (2) array 'mu': class means (K classes by D features)
263
                (3) array 'Si': class covariances (D features D features by
264
                    K classes)
265
        """
266
        # Check labels
267
        K = Y.shape[1]
268
        assert K > 1
269
270
        # Data shape
271
        N, D = X.shape
272
273
        # Preallocate parameter arrays
274
        pi = np.zeros((1, K))
275
        mu = np.zeros((K, D))
276
        Si = np.zeros((D, D, K))
277
278
        # For each class
279
        for k in range(K):
280
281
            # Number of samples for current class
282
            Nk = np.sum(Y[:, k], axis=0)
283
284
            # Check for no samples assigned to certain class
285
            if Nk == 0:
286
287
                # Proportion of samples for current class
288
                pi[0, k] = 0
289
290
                # Mean of current class
291
                mu[k, :] = np.zeros((1, D))
292
293
                # Covariance of current class
294
                Si[:, :, k] = np.eye(D, D)
295
296
            else:
297
298
                # Proportion of samples for current class
299
                pi[0, k] = Nk / N
300
301
                # Mean of current class
302
                mu[k, :] = np.dot(Y[:, k].T, X) / Nk
303
304
                # Subtract mean from data
305
                X_ = X - mu[k, :]
306
307
                # Diagonalize current label vector
308
                dYk = np.diag(Y[:, k])
309
310
                # Covariance of current class
311
                Si[:, :, k] = np.dot(np.dot(X_.T, dYk), X_) / Nk
312
313
                # Regularization
314
                Si[:, :, k] = regularize_matrix(Si[:, :, k], a=self.l2)
315
316
        # Check for linear or quadratic discriminant analysis
317
        if self.loss == 'lda':
318
319
            # In LDA, the class-covariance matrices are combined
320
            Si = self.combine_class_covariances(Si, pi)
321
322
        return pi, mu, Si
323
324
    def combine_class_covariances(self, Si, pi):
325
        """
326
        Linear combination of class covariance matrices.
327
328
        INPUT   (1) array 'Si': Covariance matrix (D features by D features by
329
                    K classes)
330
                (2) array 'pi': class proportions (1 by K classes)
331
        OUTPUT  (1) array 'Si': Combined covariance matrix (D by D)
332
        """
333
        # Number of classes
334
        K = Si.shape[2]
335
336
        # Check if w is size K
337
        assert pi.shape[1] == K
338
339
        # For each class
340
        for k in range(K):
341
342
            # Weight each class-covariance
343
            Si[:, :, k] = Si[:, :, k] * pi[0, k]
344
345
        # Sum over weighted class-covariances
346
        return np.sum(Si, axis=2)
347
348
    def tcpr_da(self, X, y, Z):
349
        """
350
        Target Contrastive Pessimistic Risk - discriminant analysis.
351
352
        INPUT   (1) array 'X': source data (N samples by D features)
353
                (2) array 'y': source labels (N samples by 1)
354
                (3) array 'Z': target data (M samples by D features)
355
        OUTPUT  (1) array 'theta': classifier parameters (D features by K
356
                classes)
357
        """
358
        # Data shapes
359
        N, DX = X.shape
360
        M, DZ = Z.shape
361
362
        # Assert equivalent dimensionalities
363
        assert DX == DZ
364
365
        # Augment data with bias if necessary
366
        X, DX = self.remove_intercept(X)
367
        Z, DZ = self.remove_intercept(Z)
368
369
        # Label properties
370
        classes = np.unique(y)
371
        K = len(classes)
372
373
        # Check for at least 2 classes
374
        assert K > 1
375
376
        # Map labels to one-hot-encoding
377
        Y = one_hot(y)
378
379
        # Estimate parameters of source model
380
        theta_ref = self.discriminant_parameters(X, Y)
381
382
        # Loss is negative log-likelihood under reference parameters
383
        L_ref = self.neg_log_likelihood(Z, theta_ref)
384
385
        # Initialize target posterior
386
        q = np.ones((M, K)) / K
387
388
        print('Starting TCP optimization')
389
390
        TCPRt = np.inf
391
        for t in range(self.max_iter):
392
393
            # Maximization phase
394
395
            # Estimate parameters using TCP risk
396
            theta_tcp = self.discriminant_parameters(Z, q)
397
398
            # Minimization phase
399
400
            # Compute loss under new parameters
401
            L_tcp = self.neg_log_likelihood(Z, theta_tcp)
402
403
            # Gradient is difference in losses
404
            Dq = L_tcp - L_ref
405
406
            # Update learning rate
407
            alpha = self.learning_rate_t(t)
408
409
            # Steepest descent step
410
            q -= alpha*Dq
411
412
            # Project back onto simplex
413
            for m in range(M):
414
                q[m, :] = self.project_simplex(q[m, :])
415
416
            # Monitor progress
417
418
            # Risks of current parameters
419
            R_tcp = self.risk(Z, theta_tcp, q)
420
            R_ref = self.risk(Z, theta_ref, q)
421
422
            # Assert no numerical problems
423
            assert not np.isnan(R_tcp)
424
            assert not np.isnan(R_ref)
425
426
            # Current TCP risk
427
            TCPRt_ = R_tcp - R_ref
428
429
            # Change in risk difference for this iteration
430
            dR = al.norm(TCPRt - TCPRt_)
431
432
            # Check for change smaller than tolerance
433
            if (dR < self.tolerance):
434
                print('Broke at iteration '+str(t)+', TCP Risk = '+str(TCPRt_))
435
                break
436
437
            # Report progress
438
            if (t % 100) == 1:
439
                print('Iteration ' + str(t) + '/' + str(self.max_iter) +
440
                      ', TCP Risk = ' + str(TCPRt_))
441
442
            # Update
443
            TCPRt = TCPRt_
444
445
        # Return estimated parameters
446
        return theta_tcp
447
448
    def fit(self, X, y, Z):
449
        """
450
        Fit/train an importance-weighted classifier.
451
452
        INPUT   (1) array 'X': source data (N samples by D features)
453
                (2) array 'y': source labels (N samples by 1)
454
                (3) array 'Z': target data (M samples by D features)
455
        OUTPUT  None
456
        """
457
        # Data shapes
458
        N, DX = X.shape
459
        M, DZ = Z.shape
460
461
        # Assert equivalent dimensionalities
462
        assert DX == DZ
463
464
        if self.loss in ['lda', 'qda']:
465
466
            # Discriminant analysis model for TCPR
467
            self.parameters = self.tcpr_da(X, y, Z)
468
469
        else:
470
            # Other loss functions are not implemented
471
            raise ValueError('Loss function unknown.')
472
473
        # Extract and store classes
474
        self.classes = np.unique(y)
475
476
        # Mark classifier as trained
477
        self.is_trained = True
478
479
        # Store training data dimensionality
480
        self.train_data_dim = DX
481
482
    def predict(self, Z_):
483
        """
484
        Make predictions on new dataset.
485
486
        INPUT   (1) array 'Z_': new data set (M samples by D features)
487
        OUTPUT  (2) array 'preds': label predictions (M samples by 1)
488
        """
489
        # Data shape
490
        M, D = Z_.shape
491
492
        # If classifier is trained, check for same dimensionality
493
        if self.is_trained:
494
            assert D == self.train_data_dim
495
496
        if self.loss in ['lda', 'qda']:
497
498
            # Compute probabilities under each distribution
499
            probs = self.neg_log_likelihood(Z_, self.parameters)
500
501
            # Take largest probability as predictions
502
            preds = np.argmax(probs, axis=1)
503
504
        # Return predictions array
505
        return preds
506
507
    def get_params(self):
508
        """Return classifier parameters."""
509
        # Check if classifier is trained
510
        if self.is_trained:
511
            return self.parameters
512
513
        else:
514
            # Throw soft error
515
            print('Classifier is not trained yet.')
516
            return []
517
518
    def error_rate(self, preds, u_):
519
        """Compute classification error rate."""
520
        return np.mean(preds != u_, axis=0)
521