Completed
Push — master ( ed5c34...292ca9 )
by Dafne van
07:01
created

kNN_accuracy()   B

Complexity

Conditions 1

Size

Total Lines 35

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 7
CRAP Score 1

Importance

Changes 1
Bugs 0 Features 0
Metric Value
cc 1
c 1
b 0
f 0
dl 0
loc 35
ccs 7
cts 7
cp 1
crap 1
rs 8.8571
1
'''
2
 Summary:
3
 Function generate_models from modelgen.py generates and compiles models
4
 Function train_models_on_samples trains those models
5
 Function plotTrainingProcess plots the training process
6
 Function find_best_architecture is wrapper function that combines
7
 these steps
8
 Example function calls in 'EvaluateDifferentModels.ipynb'
9
'''
10 1
import numpy as np
11 1
from matplotlib import pyplot as plt
12 1
from . import modelgen
13 1
from sklearn import neighbors, metrics
14 1
import warnings
15
16
17 1
def train_models_on_samples(X_train, y_train, X_val, y_val, models,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
18
                            nr_epochs=5, subset_size=100, verbose=True):
19
    """
20
    Given a list of compiled models, this function trains
21
    them all on a subset of the train data. If the given size of the subset is
22
    smaller then the size of the data, the complete data set is used.
23
    Parameters
24
    ----------
25
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
26
        The input dataset for training
27
    y_train : numpy array of shape (num_samples, num_classes)
28
        The output classes for the train data, in binary format
29
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
30
        The input dataset for validation
31
    y_val : numpy array of shape (num_samples_val, num_classes)
32
        The output classes for the validation data, in binary format
33
    models : list of model, params, modeltypes
34
        List of keras models to train
35
    nr_epochs : int, optional
36
        nr of epochs to use for training one model
37
    subset_size :
38
        The number of samples used from the complete train set
39
    subsize_set : int, optional
40
        number of samples to use from the training set for training these models
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (80/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
41
    verbose : bool, optional
42
        flag for displaying verbose output
43
44
    Returns
45
    ----------
46
    histories : list of Keras History objects
47
        train histories for all models
48
    val_accuracies : list of floats
49
        validation accuraracies of the models
50
    val_losses : list of floats
51
        validation losses of the models
52
    """
53
    # if subset_size is smaller then X_train, this will work fine
54 1
    X_train_sub = X_train[:subset_size, :, :]
0 ignored issues
show
Coding Style Naming introduced by
The name X_train_sub does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
55 1
    y_train_sub = y_train[:subset_size, :]
56
57 1
    histories = []
58 1
    val_accuracies = []
59 1
    val_losses = []
60 1
    for model, params, model_types in models:
61 1
        history = model.fit(X_train_sub, y_train_sub,
62
                            nb_epoch=nr_epochs, batch_size=20,
63
                            # see comment on subsize_set
64
                            validation_data=(X_val, y_val),
65
                            verbose=verbose)
66 1
        histories.append(history)
67 1
        val_accuracies.append(history.history['val_acc'][-1])
68 1
        val_losses.append(history.history['val_loss'][-1])
69
70 1
    return histories, val_accuracies, val_losses
71
72
73 1
def plotTrainingProcess(history, name='Model', ax=None):
0 ignored issues
show
Coding Style Naming introduced by
The name plotTrainingProcess does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
74
    """
75
    This function plots the loss and accuracy on the train and validation set,
76
    for each epoch in the history of one model.
77
78
    Parameters
79
    ----------
80
    history : keras History object for one model
81
        The history object of the training process corresponding to one model
82
83
    """
84
    if ax is None:
85
        fig, ax = plt.subplots()
86
    ax2 = ax.twinx()
87
    LN = len(history.history['val_loss'])
0 ignored issues
show
Coding Style Naming introduced by
The name LN does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
88
    val_loss, = ax.plot(range(LN), history.history['val_loss'], 'g--',
89
                        label='validation loss')
90
    train_loss, = ax.plot(range(LN), history.history['loss'], 'g-',
91
                          label='train loss')
92
    val_acc, = ax2.plot(range(LN), history.history['val_acc'], 'b--',
93
                        label='validation accuracy')
94
    train_acc, = ax2.plot(range(LN), history.history['acc'], 'b-',
95
                          label='train accuracy')
96
    ax.set_xlabel('epoch')
97
    ax.set_ylabel('loss', color='g')
98
    ax2.set_ylabel('accuracy', color='b')
99
    plt.legend(handles=[val_loss, train_loss, val_acc, train_acc],
100
               loc=2, bbox_to_anchor=(1.1, 1))
101
    plt.title(name)
102
103
104 1
def find_best_architecture(X_train, y_train, X_val, y_val, verbose=True,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
105
                           number_of_models=5, nr_epochs=5, subset_size=100,
106
                           **kwargs
107
                           ):
108
    """
109
    Tries out a number of models on a subsample of the data,
110
    and outputs the best found architecture and hyperparameters.
111
112
    Parameters
113
    ----------
114
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
115
        The input dataset for training
116
    y_train : numpy array of shape (num_samples, num_classes)
117
        The output classes for the train data, in binary format
118
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
119
        The input dataset for validation
120
    y_val : numpy array of shape (num_samples_val, num_classes)
121
        The output classes for the validation data, in binary format
122
    verbose : bool, optional
123
        flag for displaying verbose output
124
    number_of_models : int
125
        The number of models to generate and test
126
    nr_epochs : int
127
        The number of epochs that each model is trained
128
    subset_size : int
129
        The size of the subset of the data that is used for finding the optimal architecture
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
130
    **kwargs: key-value parameters
131
        parameters for generating the models (see docstring for modelgen.generate_models)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (89/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
132
133
    Returns
134
    ----------
135
    best_model : Keras model
136
        Best performing model, already trained on a small sample data set.
137
    best_params : dict
138
        Dictionary containing the hyperparameters for the best model
139
    best_model_type : str
140
        Type of the best model
141
    knn_acc : float
142
        accuaracy for kNN prediction on validation set
143
    """
144 1
    models = modelgen.generate_models(X_train.shape, y_train.shape[1],
145
                                      number_of_models=number_of_models,
146
                                      **kwargs)
147 1
    histories, val_accuracies, val_losses = train_models_on_samples(X_train,
148
                                                                    y_train,
149
                                                                    X_val,
150
                                                                    y_val,
151
                                                                    models,
152
                                                                    nr_epochs,
153
                                                                    subset_size=subset_size,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
154
                                                                    verbose=verbose)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (84/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
155 1
    best_model_index = np.argmax(val_accuracies)
156 1
    best_model, best_params, best_model_type = models[best_model_index]
157 1
    knn_acc = kNN_accuracy(
158
        X_train[:subset_size, :, :], y_train[:subset_size, :], X_val, y_val)
159 1
    if verbose:
160
        for i in range(len(models)):  # <= now one plot per model, ultimately we
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (80/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
161
            # may want all models in one plot to allow for direct comparison
162
            name = str(models[i][1])
163
            plotTrainingProcess(histories[i], name)
164
        print('Best model: model ', best_model_index)
165
        print('Model type: ', best_model_type)
166
        print('Hyperparameters: ', best_params)
167
        print('Accuracy on validation set: ', val_accuracies[best_model_index])
168
        print('Accuracy of kNN on validation set', knn_acc)
169
170 1
    if val_accuracies[best_model_index] < knn_acc:
171
        warnings.warn('Best model not better than kNN: ' +
172
                      str(val_accuracies[best_model_index]) + ' vs  ' +
173
                      str(knn_acc)
174
                      )
175 1
    return best_model, best_params, best_model_type, knn_acc
176
177
178 1
def kNN_accuracy(X_train, y_train, X_val, y_val, k=1):
0 ignored issues
show
Coding Style Naming introduced by
The name kNN_accuracy does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
179
    """
180
    Performs k-Neigherst Neighbors and returns the accuracy score.
181
182
    Parameters
183
    ----------
184
    X_train : numpy array
185
        Train set of shape (num_samples, num_timesteps, num_channels)
186
    y_train : numpy array
187
        Class labels for train set
188
    X_val : numpy array
189
        Validation set of shape (num_samples, num_timesteps, num_channels)
190
    y_val : numpy array
191
        Class labels for validation set
192
    k : int
193
        number of neighbors to use for classifying
194
195
    Returns
196
    -------
197
    accuracy: float
198
        accuracy score on the validation set
199
    """
200 1
    num_samples, num_timesteps, num_channels = X_train.shape
201 1
    clf = neighbors.KNeighborsClassifier(k)
202 1
    clf.fit(
203
        X_train.reshape(
204
            num_samples,
205
            num_timesteps *
206
            num_channels),
207
        y_train)
208 1
    num_samples, num_timesteps, num_channels = X_val.shape
209 1
    val_predict = clf.predict(
210
        X_val.reshape(num_samples,
211
                      num_timesteps * num_channels))
212
    return metrics.accuracy_score(val_predict, y_val)
213