Completed
Push — master ( 81cbcf...c2b817 )
by Christiaan
06:36
created

plotTrainingProcess()   B

Complexity

Conditions 2

Size

Total Lines 33

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 1
CRAP Score 5.2029

Importance

Changes 2
Bugs 0 Features 1
Metric Value
cc 2
c 2
b 0
f 1
dl 0
loc 33
ccs 1
cts 14
cp 0.0714
crap 5.2029
rs 8.8571
1
"""
2
 Summary:
3
 Function generate_models from modelgen.py generates and compiles models
4
 Function train_models_on_samples trains those models
5
 Function plotTrainingProcess plots the training process
6
 Function find_best_architecture is wrapper function that combines
7
 these steps
8
 Example function calls in 'EvaluateDifferentModels.ipynb'
9
"""
10 1
import numpy as np
11 1
from matplotlib import pyplot as plt
12 1
from . import modelgen
13 1
from sklearn import neighbors, metrics
14 1
import warnings
15 1
import json
16 1
import os
17
18
19 1
def train_models_on_samples(X_train, y_train, X_val, y_val, models,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
20
                            nr_epochs=5, subset_size=100, verbose=True,
21
                            outputfile=None):
22
    """
23
    Given a list of compiled models, this function trains
24
    them all on a subset of the train data. If the given size of the subset is
25
    smaller then the size of the data, the complete data set is used.
26
27
    Parameters
28
    ----------
29
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
30
        The input dataset for training
31
    y_train : numpy array of shape (num_samples, num_classes)
32
        The output classes for the train data, in binary format
33
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
34
        The input dataset for validation
35
    y_val : numpy array of shape (num_samples_val, num_classes)
36
        The output classes for the validation data, in binary format
37
    models : list of model, params, modeltypes
38
        List of keras models to train
39
    nr_epochs : int, optional
40
        nr of epochs to use for training one model
41
    subset_size :
42
        The number of samples used from the complete train set
43
    verbose : bool, optional
44
        flag for displaying verbose output
45
    outputfile : str, optional
46
        File location to store the model results
47
48
    Returns
49
    ----------
50
    histories : list of Keras History objects
51
        train histories for all models
52
    val_accuracies : list of floats
53
        validation accuraracies of the models
54
    val_losses : list of floats
55
        validation losses of the models
56
    """
57
    # if subset_size is smaller then X_train, this will work fine
58 1
    X_train_sub = X_train[:subset_size,:,:]
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
X_train_sub = X_train[:subset_size,:,:]
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train_sub = X_train[:subset_size,:,:]
^
Loading history...
Coding Style Naming introduced by
The name X_train_sub does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
59 1
    y_train_sub = y_train[:subset_size,:]
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
y_train_sub = y_train[:subset_size,:]
^
Loading history...
60
61 1
    histories = []
62 1
    val_accuracies = []
63 1
    val_losses = []
64 1
    for i, (model, params, model_types) in enumerate(models):
65 1
        if verbose:
66
            print('Training model %d'%i, model_types)
67 1
        history = model.fit(X_train_sub, y_train_sub,
68
                            nb_epoch=nr_epochs, batch_size=20,
69
                            # see comment on subsize_set
70
                            validation_data=(X_val, y_val),
71
                            verbose=verbose)
72 1
        histories.append(history)
73 1
        val_accuracies.append(history.history['val_acc'][-1])
74 1
        val_losses.append(history.history['val_loss'][-1])
75 1
        if outputfile is not None:
76
            storetrainhist2json(params, model_types,
77
                                history.history, outputfile)
78 1
    return histories, val_accuracies, val_losses
79
80
81 1
def storetrainhist2json(params, model_type, history, outputfile):
82
    """
83
    This function stores the model parameters, the loss and accuracy history
84
    of one model in a JSON file. It appends the model information to the
85
    existing models in the file.
86
87
    Parameters
88
    ----------
89
    params : dict
90
        parameters for one model
91
    model_type : Keras model object
92
        Keras model object for one model
93
    history : dict
94
        training history from one model
95
    outputfile : str
96
        path where the json file needs to be stored
97
    """
98 1
    jsondata = params.copy()
99 1
    for k in jsondata.keys():
100 1
        if isinstance(jsondata[k], np.ndarray):
101 1
            jsondata[k] = jsondata[k].tolist()
102 1
    jsondata['train_acc'] = history['acc']
103 1
    jsondata['train_loss'] = history['loss']
104 1
    jsondata['val_acc'] = history['val_acc']
105 1
    jsondata['val_loss'] = history['val_loss']
106 1
    jsondata['modeltype'] = model_type
107 1
    jsondata['modeltype'] = model_type
108 1
    if os.path.isfile(outputfile):
109
        with open(outputfile, 'r') as outfile:
110
            previousdata = json.load(outfile)
111
    else:
112 1
        previousdata = []
113 1
    previousdata.append(jsondata)
114 1
    with open(outputfile, 'w') as outfile:
115 1
        json.dump(previousdata, outfile, sort_keys = True,
0 ignored issues
show
Coding Style introduced by
No space allowed around keyword argument assignment
json.dump(previousdata, outfile, sort_keys = True,
^
Loading history...
116
                  indent = 4, ensure_ascii=False)
0 ignored issues
show
Coding Style introduced by
No space allowed around keyword argument assignment
indent = 4, ensure_ascii=False)
^
Loading history...
117
118
119 1
def plotTrainingProcess(history, name='Model', ax=None):
0 ignored issues
show
Coding Style Naming introduced by
The name plotTrainingProcess does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
120
    """
121
    This function plots the loss and accuracy on the train and validation set,
122
    for each epoch in the history of one model.
123
124
    Parameters
125
    ----------
126
    history : keras History object
127
        The history object of the training process corresponding to one model
128
    name : str
129
        Name of the model, to display in the title
130
    ax : Axis, optional
131
        Specific axis to plot on
132
133
    """
134
    if ax is None:
135
        fig, ax = plt.subplots()
136
    ax2 = ax.twinx()
137
    LN = len(history.history['val_loss'])
0 ignored issues
show
Coding Style Naming introduced by
The name LN does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
138
    val_loss, = ax.plot(range(LN), history.history['val_loss'], 'g--',
139
                        label='validation loss')
140
    train_loss, = ax.plot(range(LN), history.history['loss'], 'g-',
141
                          label='train loss')
142
    val_acc, = ax2.plot(range(LN), history.history['val_acc'], 'b--',
143
                        label='validation accuracy')
144
    train_acc, = ax2.plot(range(LN), history.history['acc'], 'b-',
145
                          label='train accuracy')
146
    ax.set_xlabel('epoch')
147
    ax.set_ylabel('loss', color='g')
148
    ax2.set_ylabel('accuracy', color='b')
149
    plt.legend(handles=[val_loss, train_loss, val_acc, train_acc],
150
               loc=2, bbox_to_anchor=(1.1, 1))
151
    plt.title(name)
152
153
154 1
def find_best_architecture(X_train, y_train, X_val, y_val, verbose=True,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
155
                           number_of_models=5, nr_epochs=5, subset_size=100,
156
                           outputpath=None, **kwargs
157
                           ):
158
    """
159
    Tries out a number of models on a subsample of the data,
160
    and outputs the best found architecture and hyperparameters.
161
162
    Parameters
163
    ----------
164
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
165
        The input dataset for training
166
    y_train : numpy array of shape (num_samples, num_classes)
167
        The output classes for the train data, in binary format
168
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
169
        The input dataset for validation
170
    y_val : numpy array of shape (num_samples_val, num_classes)
171
        The output classes for the validation data, in binary format
172
    verbose : bool, optional
173
        flag for displaying verbose output
174
    number_of_models : int
175
        The number of models to generate and test
176
    nr_epochs : int
177
        The number of epochs that each model is trained
178
    subset_size : int
179
        The size of the subset of the data that is used for finding
180
        the optimal architecture
181
    outputpath : str, optional
182
        File location to store the model results
183
    **kwargs: key-value parameters
184
        parameters for generating the models
185
        (see docstring for modelgen.generate_models)
186
187
    Returns
188
    ----------
189
    best_model : Keras model
190
        Best performing model, already trained on a small sample data set.
191
    best_params : dict
192
        Dictionary containing the hyperparameters for the best model
193
    best_model_type : str
194
        Type of the best model
195
    knn_acc : float
196
        accuaracy for kNN prediction on validation set
197
    """
198 1
    models = modelgen.generate_models(X_train.shape, y_train.shape[1],
199
                                      number_of_models=number_of_models,
200
                                      **kwargs)
201 1
    histories, val_accuracies, val_losses = train_models_on_samples(X_train,
202
                                                                    y_train,
203
                                                                    X_val,
204
                                                                    y_val,
205
                                                                    models,
206
                                                                    nr_epochs,
207
                                                                    subset_size=subset_size,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
208
                                                                    verbose=verbose,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (84/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
209
                                                                    outputfile=outputpath)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (90/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
210 1
    best_model_index = np.argmax(val_accuracies)
211 1
    best_model, best_params, best_model_type = models[best_model_index]
212 1
    knn_acc = kNN_accuracy(
213
        X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
214 1
    if verbose:
215
        for i in range(len(models)):  # now one plot per model, ultimately we
216
            # may want all models in one plot to allow for direct comparison
217
            name = str(models[i][1])
218
            plotTrainingProcess(histories[i], name)
219
        print('Best model: model ', best_model_index)
220
        print('Model type: ', best_model_type)
221
        print('Hyperparameters: ', best_params)
222
        print('Accuracy on validation set: ', val_accuracies[best_model_index])
223
        print('Accuracy of kNN on validation set', knn_acc)
224
225 1
    if val_accuracies[best_model_index] < knn_acc:
226
        warnings.warn('Best model not better than kNN: ' +
227
                      str(val_accuracies[best_model_index]) + ' vs  ' +
228
                      str(knn_acc)
229
                      )
230 1
    return best_model, best_params, best_model_type, knn_acc
231
232
233 1
def kNN_accuracy(X_train, y_train, X_val, y_val, k=1):
0 ignored issues
show
Coding Style Naming introduced by
The name kNN_accuracy does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
234
    """
235
    Performs k-Neigherst Neighbors and returns the accuracy score.
236
237
    Parameters
238
    ----------
239
    X_train : numpy array
240
        Train set of shape (num_samples, num_timesteps, num_channels)
241
    y_train : numpy array
242
        Class labels for train set
243
    X_val : numpy array
244
        Validation set of shape (num_samples, num_timesteps, num_channels)
245
    y_val : numpy array
246
        Class labels for validation set
247
    k : int
248
        number of neighbors to use for classifying
249
250
    Returns
251
    -------
252
    accuracy: float
253
        accuracy score on the validation set
254
    """
255 1
    num_samples, num_timesteps, num_channels = X_train.shape
256 1
    clf = neighbors.KNeighborsClassifier(k)
257 1
    clf.fit(
258
        X_train.reshape(
259
            num_samples,
260
            num_timesteps *
261
            num_channels),
262
        y_train)
263 1
    num_samples, num_timesteps, num_channels = X_val.shape
264 1
    val_predict = clf.predict(
265
        X_val.reshape(num_samples,
266
                      num_timesteps * num_channels))
267
    return metrics.accuracy_score(val_predict, y_val)
268