TargetContrastivePessimisticClassifier - Code Metrics - Inspection of "Extended iw tests." - wmkouw/libTLDA - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Branch — master (f50597)

by Wouter

created 2018-06-12 14:40 UTC

TargetContrastivePessimisticClassifier B

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	505
Duplicated Lines	0 %

Test Coverage

Coverage

Importance

Changes	3
Bugs	0	Features	0

Metric	Value
c	3
b	0
f	0
dl	0
loc	505
ccs	0
cts	171
cp	0
rs	8.6206
wmc	50

14 Methods

Rating	Name	Size	Complexity
B	fit()	33	3
A	__init__()	53	3
B	predict()	24	4
B	learning_rate_t()	27	5
F	tcpr_da()	99	9
A	remove_intercept()	12	1
B	project_simplex()	30	1
A	error_rate()	3	1
A	add_intercept()	22	2
C	neg_log_likelihood()	56	9
A	get_params()	10	2
B	risk()	25	2
A	combine_class_covariances()	23	3
B	discriminant_parameters()	68	5

How to fix Complexity

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import numpy as np
import numpy.linalg as al
from scipy.stats import multivariate_normal as mvn
import sklearn as sk
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.model_selection import cross_val_predict
from os.path import basename

from .util import one_hot, regularize_matrix


class TargetContrastivePessimisticClassifier(object):
    """
    Classifiers based on Target Contrastive Pessimistic Risk minimization.

    Methods contain models, risk functions, parameter estimation, etc.
    """

    def __init__(self, loss='lda', l2=1.0, max_iter=500, tolerance=1e-12,
                 learning_rate=1.0, rate_decay='linear', verbosity=0):
        """
        Select a particular type of TCPR classifier.

        INPUT   (1) str 'loss': loss function for TCP Risk, options: 'ls',
                    'least_squares', 'lda', 'linear_discriminant_analysis',
                    'qda', 'quadratic_discriminant_analysis' (def: 'lda')
                (2) float 'l2': l2-regularization parameter value (def:0.01)
                (3) int 'max_iter': maximum number of iterations for
                    optimization (def: 500)
                (4) float 'tolerance': convergence criterion on the TCP
                    parameters (def: 1e-5)
                (5) float 'learning_rate': parameter for size of update of
                    gradient (def: 1.0)
                (6) str 'rate_decay': type of learning rate decay, options:
                    'linear', 'quadratic', 'geometric', 'exponential'
                    (def: 'linear')
        """
        # Classifier options
        self.loss = loss
        self.l2 = l2

        # Optimization parameters
        self.max_iter = max_iter
        self.tolerance = tolerance
        self.learning_rate = learning_rate
        self.rate_decay = rate_decay
        self.verbosity = verbosity

        if self.loss in ['linear discriminant analysis', 'lda']:

            # Set to short name
            self.loss = 'lda'

        elif self.loss in ['quadratic discriminant analysis', 'qda']:

            # Set to short name
            self.loss = 'qda'

        else:
            # Other loss functions are not implemented
            raise ValueError('Model not implemented.')

        # Initialize classifier and classes parameters
        self.parameters = []
        self.classes = []

        # Whether model has been trained
        self.is_trained = False

        # Dimensionality of training data
        self.train_data_dim = 0

    def add_intercept(self, X):
        """Add 1's to data as last features."""
        # Data shape
        N, D = X.shape

        # Check if there's not already an intercept column
        if np.any(np.sum(X, axis=0) == N):

            # Report
            print('Intercept is not the last feature. Swapping..')

            # Find which column contains the intercept
            intercept_index = np.argwhere(np.sum(X, axis=0) == N)

            # Swap intercept to last
            X = X[:, np.setdiff1d(np.arange(D), intercept_index)]

        # Add intercept as last column
        X = np.hstack((X, np.ones((N, 1))))

        # Append column of 1's to data, and increment dimensionality
        return X, D+1

    def remove_intercept(self, X):
        """Remove 1's from data as last features."""
        # Data shape
        N, D = X.shape

        # Find which column contains the intercept
        intercept_index = np.argwhere(np.sum(X, axis=0) == N)

        # Swap intercept to last
        X = X[:, np.setdiff1d(np.arange(D), intercept_index)]

        return X, D-1

    def project_simplex(self, v, z=1.0):
        """
        Project vector onto simplex using sorting.

        Reference: "Efficient Projections onto the L1-Ball for Learning in High
        Dimensions (Duchi, Shalev-Shwartz, Singer, Chandra, 2006)."

        INPUT   (1) array 'v': vector to be projected (n dimensions by 0)
                (2) float 'z': constant (def: 1.0)
        OUTPUT  (1) array 'w': projected vector (n dimensions by 0)
        """
        # Number of dimensions
        n = v.shape[0]

        # Sort vector
        mu = np.sort(v, axis=0)[::-1]

        # Find rho
        C = np.cumsum(mu) - z
        j = np.arange(n) + 1
        rho = j[mu - C/j > 0][-1]

        # Define theta
        theta = C[mu - C/j > 0][-1] / float(rho)

        # Subtract theta from original vector and cap at 0
        w = np.maximum(v - theta, 0)

        # Return projected vector
        return w

    def learning_rate_t(self, t):
        """Compute current learning rate after decay."""
        # Select rate decay
        if self.rate_decay == 'linear':

            # Linear dropoff between t=0 and t=T
            alpha = (self.max_iter - t)/(self.learning_rate*self.max_iter)

        elif self.rate_decay == 'quadratic':

            # Quadratic dropoff between t=0 and t=T
            alpha = ((self.max_iter - t)/(self.learning_rate*self.max_iter))**2

        elif self.rate_decay == 'geometric':

            # Drop rate off inversely to time
            alpha = 1 / (self.learning_rate * t)

        elif self.rate_decay == 'exponential':

            # Exponential dropoff
            alpha = np.exp(-self.learning_rate * t)

        else:
            raise ValueError('Rate decay type unknown.')

        return alpha

    def risk(self, Z, theta, q):
        """
        Compute target contrastive pessimistic risk.

        INPUT   (1) array 'Z': target samples (M samples by D features)
                (2) array 'theta': classifier parameters (D features by
                    K classes)
                (3) array 'q': soft labels (M samples by K classes)
        OUTPUT  (1) float: risk
        """
        # Number of classes
        K = q.shape[1]

        # Compute negative log-likelihood
        L = self.neg_log_likelihood(Z, theta)

        # Weight loss by soft labels
        for k in range(K):
            L[:, k] *= q[:, k]

        # Sum over weighted losses
        L = np.sum(L, axis=1)

        # Risk is average loss
        return np.mean(L, axis=0)

    def neg_log_likelihood(self, X, theta):
        """
        Compute negative log-likelihood under Gaussian distributions.

        INPUT   (1) array 'X': data (N samples by D features)
                (2) tuple 'theta': containing class proportions 'pi', class
                    means 'mu', and class-covariances 'Si'
        OUTPUT  (1) array 'L': loss (N samples by K classes)
        """
        # Unpack parameters
        pi, mu, Si = theta

        # Check if parameter sets match
        assert pi.shape[1] == mu.shape[0]
        assert mu.shape[1] == Si.shape[0]
        assert Si.shape[0] == Si.shape[1]

        # Number of classes
        K = pi.shape[1]

        # Data shape
        N, D = X.shape

        # Preallocate loss array
        L = np.zeros((N, K))

        for k in range(K):

            # Check for linear or quadratic
            if self.loss == 'lda':

                try:
                    # Probability under k-th Gaussian with shared covariance
                    probs = mvn.pdf(X, mu[k, :], Si)

                except al.LinAlgError as err:
                    print('Covariance matrix is singular. Add regularization.')
                    raise err

            elif self.loss == 'qda':

                try:
                    # Probability under k-th Gaussian with own covariance
                    probs = mvn.pdf(X, mu[k, :], Si[:, :, k])

                except al.LinAlgError as err:
                    print('Covariance matrix is singular. Add regularization.')
                    raise err

            else:
                raise ValueError('Loss unknown.')

            # Negative log-likelihood
            L[:, k] = -np.log(pi[0, k]) - np.log(probs)

        return L

    def discriminant_parameters(self, X, Y):
        """
        Estimate parameters of Gaussian distribution for discriminant analysis.

        INPUT   (1) array 'X': data array (N samples by D features)
                (2) array 'Y': label array (N samples by K classes)
        OUTPUT  (1) array 'pi': class proportions (1 by K classes)
                (2) array 'mu': class means (K classes by D features)
                (3) array 'Si': class covariances (D features D features by
                    K classes)
        """
        # Check labels
        K = Y.shape[1]
        assert K > 1

        # Data shape
        N, D = X.shape

        # Preallocate parameter arrays
        pi = np.zeros((1, K))
        mu = np.zeros((K, D))
        Si = np.zeros((D, D, K))

        # For each class
        for k in range(K):

            # Number of samples for current class
            Nk = np.sum(Y[:, k], axis=0)

            # Check for no samples assigned to certain class
            if Nk == 0:

                # Proportion of samples for current class
                pi[0, k] = 0

                # Mean of current class
                mu[k, :] = np.zeros((1, D))

                # Covariance of current class
                Si[:, :, k] = np.eye(D, D)

            else:

                # Proportion of samples for current class
                pi[0, k] = Nk / N

                # Mean of current class
                mu[k, :] = np.dot(Y[:, k].T, X) / Nk

                # Subtract mean from data
                X_ = X - mu[k, :]

                # Diagonalize current label vector
                dYk = np.diag(Y[:, k])

                # Covariance of current class
                Si[:, :, k] = np.dot(np.dot(X_.T, dYk), X_) / Nk

                # Regularization
                Si[:, :, k] = regularize_matrix(Si[:, :, k], a=self.l2)

        # Check for linear or quadratic discriminant analysis
        if self.loss == 'lda':

            # In LDA, the class-covariance matrices are combined
            Si = self.combine_class_covariances(Si, pi)

        return pi, mu, Si

    def combine_class_covariances(self, Si, pi):
        """
        Linear combination of class covariance matrices.

        INPUT   (1) array 'Si': Covariance matrix (D features by D features by
                    K classes)
                (2) array 'pi': class proportions (1 by K classes)
        OUTPUT  (1) array 'Si': Combined covariance matrix (D by D)
        """
        # Number of classes
        K = Si.shape[2]

        # Check if w is size K
        assert pi.shape[1] == K

        # For each class
        for k in range(K):

            # Weight each class-covariance
            Si[:, :, k] = Si[:, :, k] * pi[0, k]

        # Sum over weighted class-covariances
        return np.sum(Si, axis=2)

    def tcpr_da(self, X, y, Z):
        """
        Target Contrastive Pessimistic Risk - discriminant analysis.

        INPUT   (1) array 'X': source data (N samples by D features)
                (2) array 'y': source labels (N samples by 1)
                (3) array 'Z': target data (M samples by D features)
        OUTPUT  (1) array 'theta': classifier parameters (D features by K
                classes)
        """
        # Data shapes
        N, DX = X.shape
        M, DZ = Z.shape

        # Assert equivalent dimensionalities
        assert DX == DZ

        # Augment data with bias if necessary
        X, DX = self.remove_intercept(X)
        Z, DZ = self.remove_intercept(Z)

        # Label properties
        classes = np.unique(y)
        K = len(classes)

        # Check for at least 2 classes
        assert K > 1

        # Map labels to one-hot-encoding
        Y = one_hot(y)

        # Estimate parameters of source model
        theta_ref = self.discriminant_parameters(X, Y)

        # Loss is negative log-likelihood under reference parameters
        L_ref = self.neg_log_likelihood(Z, theta_ref)

        # Initialize target posterior
        q = np.ones((M, K)) / K

        print('Starting TCP optimization')

        TCPRt = np.inf
        for t in range(self.max_iter):

            # Maximization phase

            # Estimate parameters using TCP risk
            theta_tcp = self.discriminant_parameters(Z, q)

            # Minimization phase

            # Compute loss under new parameters
            L_tcp = self.neg_log_likelihood(Z, theta_tcp)

            # Gradient is difference in losses
            Dq = L_tcp - L_ref

            # Update learning rate
            alpha = self.learning_rate_t(t)

            # Steepest descent step
            q -= alpha*Dq

            # Project back onto simplex
            for m in range(M):
                q[m, :] = self.project_simplex(q[m, :])

            # Monitor progress

            # Risks of current parameters
            R_tcp = self.risk(Z, theta_tcp, q)
            R_ref = self.risk(Z, theta_ref, q)

            # Assert no numerical problems
            assert not np.isnan(R_tcp)
            assert not np.isnan(R_ref)

            # Current TCP risk
            TCPRt_ = R_tcp - R_ref

            # Change in risk difference for this iteration
            dR = al.norm(TCPRt - TCPRt_)

            # Check for change smaller than tolerance
            if (dR < self.tolerance):
                print('Broke at iteration '+str(t)+', TCP Risk = '+str(TCPRt_))
                break

            # Report progress
            if (t % 100) == 1:
                print('Iteration ' + str(t) + '/' + str(self.max_iter) +
                      ', TCP Risk = ' + str(TCPRt_))

            # Update
            TCPRt = TCPRt_

        # Return estimated parameters
        return theta_tcp

    def fit(self, X, y, Z):
        """
        Fit/train an importance-weighted classifier.

        INPUT   (1) array 'X': source data (N samples by D features)
                (2) array 'y': source labels (N samples by 1)
                (3) array 'Z': target data (M samples by D features)
        OUTPUT  None
        """
        # Data shapes
        N, DX = X.shape
        M, DZ = Z.shape

        # Assert equivalent dimensionalities
        assert DX == DZ

        if self.loss in ['lda', 'qda']:

            # Discriminant analysis model for TCPR
            self.parameters = self.tcpr_da(X, y, Z)

        else:
            # Other loss functions are not implemented
            raise ValueError('Loss function unknown.')

        # Extract and store classes
        self.classes = np.unique(y)

        # Mark classifier as trained
        self.is_trained = True

        # Store training data dimensionality
        self.train_data_dim = DX

    def predict(self, Z_):
        """
        Make predictions on new dataset.

        INPUT   (1) array 'Z_': new data set (M samples by D features)
        OUTPUT  (2) array 'preds': label predictions (M samples by 1)
        """
        # Data shape
        M, D = Z_.shape

        # If classifier is trained, check for same dimensionality
        if self.is_trained:
            assert D == self.train_data_dim

        if self.loss in ['lda', 'qda']:

            # Compute probabilities under each distribution
            probs = self.neg_log_likelihood(Z_, self.parameters)

            # Take largest probability as predictions
            preds = np.argmax(probs, axis=1)

        # Return predictions array
        return preds

    def get_params(self):
        """Return classifier parameters."""
        # Check if classifier is trained
        if self.is_trained:
            return self.parameters

        else:
            # Throw soft error
            print('Classifier is not trained yet.')
            return []

    def error_rate(self, preds, u_):
        """Compute classification error rate."""
        return np.mean(preds != u_, axis=0)


1			#!/usr/bin/env python
2			# -- coding: utf-8 --
3
4			import numpy as np
5			import numpy.linalg as al
6			from scipy.stats import multivariate_normal as mvn
7			import sklearn as sk
8			from sklearn.svm import LinearSVC
9			from sklearn.linear_model import LogisticRegression, LinearRegression
10			from sklearn.model_selection import cross_val_predict
11			from os.path import basename
12
13			from .util import one_hot, regularize_matrix
14
15
16			class TargetContrastivePessimisticClassifier(object):
17			"""
18			Classifiers based on Target Contrastive Pessimistic Risk minimization.
19
20			Methods contain models, risk functions, parameter estimation, etc.
21			"""
22
23			def __init__(self, loss='lda', l2=1.0, max_iter=500, tolerance=1e-12,
24			learning_rate=1.0, rate_decay='linear', verbosity=0):
25			"""
26			Select a particular type of TCPR classifier.
27
28			INPUT (1) str 'loss': loss function for TCP Risk, options: 'ls',
29			'least_squares', 'lda', 'linear_discriminant_analysis',
30			'qda', 'quadratic_discriminant_analysis' (def: 'lda')
31			(2) float 'l2': l2-regularization parameter value (def:0.01)
32			(3) int 'max_iter': maximum number of iterations for
33			optimization (def: 500)
34			(4) float 'tolerance': convergence criterion on the TCP
35			parameters (def: 1e-5)
36			(5) float 'learning_rate': parameter for size of update of
37			gradient (def: 1.0)
38			(6) str 'rate_decay': type of learning rate decay, options:
39			'linear', 'quadratic', 'geometric', 'exponential'
40			(def: 'linear')
41			"""
42			# Classifier options
43			self.loss = loss
44			self.l2 = l2
45
46			# Optimization parameters
47			self.max_iter = max_iter
48			self.tolerance = tolerance
49			self.learning_rate = learning_rate
50			self.rate_decay = rate_decay
51			self.verbosity = verbosity
52
53			if self.loss in ['linear discriminant analysis', 'lda']:
54
55			# Set to short name
56			self.loss = 'lda'
57
58			elif self.loss in ['quadratic discriminant analysis', 'qda']:
59
60			# Set to short name
61			self.loss = 'qda'
62
63			else:
64			# Other loss functions are not implemented
65			raise ValueError('Model not implemented.')
66
67			# Initialize classifier and classes parameters
68			self.parameters = []
69			self.classes = []
70
71			# Whether model has been trained
72			self.is_trained = False
73
74			# Dimensionality of training data
75			self.train_data_dim = 0
76
77			def add_intercept(self, X):
78			"""Add 1's to data as last features."""
79			# Data shape
80			N, D = X.shape
81
82			# Check if there's not already an intercept column
83			if np.any(np.sum(X, axis=0) == N):
84
85			# Report
86			print('Intercept is not the last feature. Swapping..')
87
88			# Find which column contains the intercept
89			intercept_index = np.argwhere(np.sum(X, axis=0) == N)
90
91			# Swap intercept to last
92			X = X[:, np.setdiff1d(np.arange(D), intercept_index)]
93
94			# Add intercept as last column
95			X = np.hstack((X, np.ones((N, 1))))
96
97			# Append column of 1's to data, and increment dimensionality
98			return X, D+1
99
100			def remove_intercept(self, X):
101			"""Remove 1's from data as last features."""
102			# Data shape
103			N, D = X.shape
104
105			# Find which column contains the intercept
106			intercept_index = np.argwhere(np.sum(X, axis=0) == N)
107
108			# Swap intercept to last
109			X = X[:, np.setdiff1d(np.arange(D), intercept_index)]
110
111			return X, D-1
112
113			def project_simplex(self, v, z=1.0):
114			"""
115			Project vector onto simplex using sorting.
116
117			Reference: "Efficient Projections onto the L1-Ball for Learning in High
118			Dimensions (Duchi, Shalev-Shwartz, Singer, Chandra, 2006)."
119
120			INPUT (1) array 'v': vector to be projected (n dimensions by 0)
121			(2) float 'z': constant (def: 1.0)
122			OUTPUT (1) array 'w': projected vector (n dimensions by 0)
123			"""
124			# Number of dimensions
125			n = v.shape[0]
126
127			# Sort vector
128			mu = np.sort(v, axis=0)[::-1]
129
130			# Find rho
131			C = np.cumsum(mu) - z
132			j = np.arange(n) + 1
133			rho = j[mu - C/j > 0][-1]
134
135			# Define theta
136			theta = C[mu - C/j > 0][-1] / float(rho)
137
138			# Subtract theta from original vector and cap at 0
139			w = np.maximum(v - theta, 0)
140
141			# Return projected vector
142			return w
143
144			def learning_rate_t(self, t):
145			"""Compute current learning rate after decay."""
146			# Select rate decay
147			if self.rate_decay == 'linear':
148
149			# Linear dropoff between t=0 and t=T
150			alpha = (self.max_iter - t)/(self.learning_rate*self.max_iter)
151
152			elif self.rate_decay == 'quadratic':
153
154			# Quadratic dropoff between t=0 and t=T
155			alpha = ((self.max_iter - t)/(self.learning_rateself.max_iter))*2
156
157			elif self.rate_decay == 'geometric':
158
159			# Drop rate off inversely to time
160			alpha = 1 / (self.learning_rate * t)
161
162			elif self.rate_decay == 'exponential':
163
164			# Exponential dropoff
165			alpha = np.exp(-self.learning_rate * t)
166
167			else:
168			raise ValueError('Rate decay type unknown.')
169
170			return alpha
171
172			def risk(self, Z, theta, q):
173			"""
174			Compute target contrastive pessimistic risk.
175
176			INPUT (1) array 'Z': target samples (M samples by D features)
177			(2) array 'theta': classifier parameters (D features by
178			K classes)
179			(3) array 'q': soft labels (M samples by K classes)
180			OUTPUT (1) float: risk
181			"""
182			# Number of classes
183			K = q.shape[1]
184
185			# Compute negative log-likelihood
186			L = self.neg_log_likelihood(Z, theta)
187
188			# Weight loss by soft labels
189			for k in range(K):
190			L[:, k] *= q[:, k]
191
192			# Sum over weighted losses
193			L = np.sum(L, axis=1)
194
195			# Risk is average loss
196			return np.mean(L, axis=0)
197
198			def neg_log_likelihood(self, X, theta):
199			"""
200			Compute negative log-likelihood under Gaussian distributions.
201
202			INPUT (1) array 'X': data (N samples by D features)
203			(2) tuple 'theta': containing class proportions 'pi', class
204			means 'mu', and class-covariances 'Si'
205			OUTPUT (1) array 'L': loss (N samples by K classes)
206			"""
207			# Unpack parameters
208			pi, mu, Si = theta
209
210			# Check if parameter sets match
211			assert pi.shape[1] == mu.shape[0]
212			assert mu.shape[1] == Si.shape[0]
213			assert Si.shape[0] == Si.shape[1]
214
215			# Number of classes
216			K = pi.shape[1]
217
218			# Data shape
219			N, D = X.shape
220
221			# Preallocate loss array
222			L = np.zeros((N, K))
223
224			for k in range(K):
225
226			# Check for linear or quadratic
227			if self.loss == 'lda':
228
229			try:
230			# Probability under k-th Gaussian with shared covariance
231			probs = mvn.pdf(X, mu[k, :], Si)
232
233			except al.LinAlgError as err:
234			print('Covariance matrix is singular. Add regularization.')
235			raise err
236
237			elif self.loss == 'qda':
238
239			try:
240			# Probability under k-th Gaussian with own covariance
241			probs = mvn.pdf(X, mu[k, :], Si[:, :, k])
242
243			except al.LinAlgError as err:
244			print('Covariance matrix is singular. Add regularization.')
245			raise err
246
247			else:
248			raise ValueError('Loss unknown.')
249
250			# Negative log-likelihood
251			L[:, k] = -np.log(pi[0, k]) - np.log(probs)
252
253			return L
254
255			def discriminant_parameters(self, X, Y):
256			"""
257			Estimate parameters of Gaussian distribution for discriminant analysis.
258
259			INPUT (1) array 'X': data array (N samples by D features)
260			(2) array 'Y': label array (N samples by K classes)
261			OUTPUT (1) array 'pi': class proportions (1 by K classes)
262			(2) array 'mu': class means (K classes by D features)
263			(3) array 'Si': class covariances (D features D features by
264			K classes)
265			"""
266			# Check labels
267			K = Y.shape[1]
268			assert K > 1
269
270			# Data shape
271			N, D = X.shape
272
273			# Preallocate parameter arrays
274			pi = np.zeros((1, K))
275			mu = np.zeros((K, D))
276			Si = np.zeros((D, D, K))
277
278			# For each class
279			for k in range(K):
280
281			# Number of samples for current class
282			Nk = np.sum(Y[:, k], axis=0)
283
284			# Check for no samples assigned to certain class
285			if Nk == 0:
286
287			# Proportion of samples for current class
288			pi[0, k] = 0
289
290			# Mean of current class
291			mu[k, :] = np.zeros((1, D))
292
293			# Covariance of current class
294			Si[:, :, k] = np.eye(D, D)
295
296			else:
297
298			# Proportion of samples for current class
299			pi[0, k] = Nk / N
300
301			# Mean of current class
302			mu[k, :] = np.dot(Y[:, k].T, X) / Nk
303
304			# Subtract mean from data
305			X_ = X - mu[k, :]
306
307			# Diagonalize current label vector
308			dYk = np.diag(Y[:, k])
309
310			# Covariance of current class
311			Si[:, :, k] = np.dot(np.dot(X_.T, dYk), X_) / Nk
312
313			# Regularization
314			Si[:, :, k] = regularize_matrix(Si[:, :, k], a=self.l2)
315
316			# Check for linear or quadratic discriminant analysis
317			if self.loss == 'lda':
318
319			# In LDA, the class-covariance matrices are combined
320			Si = self.combine_class_covariances(Si, pi)
321
322			return pi, mu, Si
323
324			def combine_class_covariances(self, Si, pi):
325			"""
326			Linear combination of class covariance matrices.
327
328			INPUT (1) array 'Si': Covariance matrix (D features by D features by
329			K classes)
330			(2) array 'pi': class proportions (1 by K classes)
331			OUTPUT (1) array 'Si': Combined covariance matrix (D by D)
332			"""
333			# Number of classes
334			K = Si.shape[2]
335
336			# Check if w is size K
337			assert pi.shape[1] == K
338
339			# For each class
340			for k in range(K):
341
342			# Weight each class-covariance
343			Si[:, :, k] = Si[:, :, k] * pi[0, k]
344
345			# Sum over weighted class-covariances
346			return np.sum(Si, axis=2)
347
348			def tcpr_da(self, X, y, Z):
349			"""
350			Target Contrastive Pessimistic Risk - discriminant analysis.
351
352			INPUT (1) array 'X': source data (N samples by D features)
353			(2) array 'y': source labels (N samples by 1)
354			(3) array 'Z': target data (M samples by D features)
355			OUTPUT (1) array 'theta': classifier parameters (D features by K
356			classes)
357			"""
358			# Data shapes
359			N, DX = X.shape
360			M, DZ = Z.shape
361
362			# Assert equivalent dimensionalities
363			assert DX == DZ
364
365			# Augment data with bias if necessary
366			X, DX = self.remove_intercept(X)
367			Z, DZ = self.remove_intercept(Z)
368
369			# Label properties
370			classes = np.unique(y)
371			K = len(classes)
372
373			# Check for at least 2 classes
374			assert K > 1
375
376			# Map labels to one-hot-encoding
377			Y = one_hot(y)
378
379			# Estimate parameters of source model
380			theta_ref = self.discriminant_parameters(X, Y)
381
382			# Loss is negative log-likelihood under reference parameters
383			L_ref = self.neg_log_likelihood(Z, theta_ref)
384
385			# Initialize target posterior
386			q = np.ones((M, K)) / K
387
388			print('Starting TCP optimization')
389
390			TCPRt = np.inf
391			for t in range(self.max_iter):
392
393			# Maximization phase
394
395			# Estimate parameters using TCP risk
396			theta_tcp = self.discriminant_parameters(Z, q)
397
398			# Minimization phase
399
400			# Compute loss under new parameters
401			L_tcp = self.neg_log_likelihood(Z, theta_tcp)
402
403			# Gradient is difference in losses
404			Dq = L_tcp - L_ref
405
406			# Update learning rate
407			alpha = self.learning_rate_t(t)
408
409			# Steepest descent step
410			q -= alpha*Dq
411
412			# Project back onto simplex
413			for m in range(M):
414			q[m, :] = self.project_simplex(q[m, :])
415
416			# Monitor progress
417
418			# Risks of current parameters
419			R_tcp = self.risk(Z, theta_tcp, q)
420			R_ref = self.risk(Z, theta_ref, q)
421
422			# Assert no numerical problems
423			assert not np.isnan(R_tcp)
424			assert not np.isnan(R_ref)
425
426			# Current TCP risk
427			TCPRt_ = R_tcp - R_ref
428
429			# Change in risk difference for this iteration
430			dR = al.norm(TCPRt - TCPRt_)
431
432			# Check for change smaller than tolerance
433			if (dR < self.tolerance):
434			print('Broke at iteration '+str(t)+', TCP Risk = '+str(TCPRt_))
435			break
436
437			# Report progress
438			if (t % 100) == 1:
439			print('Iteration ' + str(t) + '/' + str(self.max_iter) +
440			', TCP Risk = ' + str(TCPRt_))
441
442			# Update
443			TCPRt = TCPRt_
444
445			# Return estimated parameters
446			return theta_tcp
447
448			def fit(self, X, y, Z):
449			"""
450			Fit/train an importance-weighted classifier.
451
452			INPUT (1) array 'X': source data (N samples by D features)
453			(2) array 'y': source labels (N samples by 1)
454			(3) array 'Z': target data (M samples by D features)
455			OUTPUT None
456			"""
457			# Data shapes
458			N, DX = X.shape
459			M, DZ = Z.shape
460
461			# Assert equivalent dimensionalities
462			assert DX == DZ
463
464			if self.loss in ['lda', 'qda']:
465
466			# Discriminant analysis model for TCPR
467			self.parameters = self.tcpr_da(X, y, Z)
468
469			else:
470			# Other loss functions are not implemented
471			raise ValueError('Loss function unknown.')
472
473			# Extract and store classes
474			self.classes = np.unique(y)
475
476			# Mark classifier as trained
477			self.is_trained = True
478
479			# Store training data dimensionality
480			self.train_data_dim = DX
481
482			def predict(self, Z_):
483			"""
484			Make predictions on new dataset.
485
486			INPUT (1) array 'Z_': new data set (M samples by D features)
487			OUTPUT (2) array 'preds': label predictions (M samples by 1)
488			"""
489			# Data shape
490			M, D = Z_.shape
491
492			# If classifier is trained, check for same dimensionality
493			if self.is_trained:
494			assert D == self.train_data_dim
495
496			if self.loss in ['lda', 'qda']:
497
498			# Compute probabilities under each distribution
499			probs = self.neg_log_likelihood(Z_, self.parameters)
500
501			# Take largest probability as predictions
502			preds = np.argmax(probs, axis=1)
503
504			# Return predictions array
505			return preds
506
507			def get_params(self):
508			"""Return classifier parameters."""
509			# Check if classifier is trained
510			if self.is_trained:
511			return self.parameters
512
513			else:
514			# Throw soft error
515			print('Classifier is not trained yet.')
516			return []
517
518			def error_rate(self, preds, u_):
519			"""Compute classification error rate."""
520			return np.mean(preds != u_, axis=0)
521

wmkouw / libTLDA

Branch — master (f50597)

TargetContrastivePessimisticClassifier B

Complexity

Size/Duplication

Test Coverage

Importance

14 Methods

How to fix Complexity

Complex Class

Duplication Side-by-Side

Filter issues like