Completed
Push — master ( a29c6d...c3cf5a )
by Dafne van
11:26
created

storetrainhist2json()   B

Complexity

Conditions 4

Size

Total Lines 29

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 14
CRAP Score 4.0312

Importance

Changes 0
Metric Value
cc 4
dl 0
loc 29
ccs 14
cts 16
cp 0.875
crap 4.0312
rs 8.5806
c 0
b 0
f 0
1
'''
2
 Summary:
3
 Function generate_models from modelgen.py generates and compiles models
4
 Function train_models_on_samples trains those models
5
 Function plotTrainingProcess plots the training process
6
 Function find_best_architecture is wrapper function that combines
7
 these steps
8
 Example function calls in 'EvaluateDifferentModels.ipynb'
9
'''
10 1
import numpy as np
11 1
from matplotlib import pyplot as plt
12 1
from . import modelgen
13 1
from sklearn import neighbors, metrics
14 1
import warnings
15 1
import json
16 1
import os
17
18 1
def train_models_on_samples(X_train, y_train, X_val, y_val, models,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
19
                            nr_epochs=5, subset_size=100, verbose=True,
20
                            outputfile=None):
21
    """
22
    Given a list of compiled models, this function trains
23
    them all on a subset of the train data. If the given size of the subset is
24
    smaller then the size of the data, the complete data set is used.
25
    Parameters
26
    ----------
27
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
28
        The input dataset for training
29
    y_train : numpy array of shape (num_samples, num_classes)
30
        The output classes for the train data, in binary format
31
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
32
        The input dataset for validation
33
    y_val : numpy array of shape (num_samples_val, num_classes)
34
        The output classes for the validation data, in binary format
35
    models : list of model, params, modeltypes
36
        List of keras models to train
37
    nr_epochs : int, optional
38
        nr of epochs to use for training one model
39
    subset_size :
40
        The number of samples used from the complete train set
41
    subsize_set : int, optional
42
        number of samples to use from the training set for training these models
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (80/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
43
    verbose : bool, optional
44
        flag for displaying verbose output
45
46
    Returns
47
    ----------
48
    histories : list of Keras History objects
49
        train histories for all models
50
    val_accuracies : list of floats
51
        validation accuraracies of the models
52
    val_losses : list of floats
53
        validation losses of the models
54
    """
55
    # if subset_size is smaller then X_train, this will work fine
56 1
    X_train_sub = X_train[:subset_size, :, :]
0 ignored issues
show
Coding Style Naming introduced by
The name X_train_sub does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
57 1
    y_train_sub = y_train[:subset_size, :]
58
59 1
    histories = []
60 1
    val_accuracies = []
61 1
    val_losses = []
62 1
    for model, params, model_types in models:
63 1
        history = model.fit(X_train_sub, y_train_sub,
64
                            nb_epoch=nr_epochs, batch_size=20,
65
                            # see comment on subsize_set
66
                            validation_data=(X_val, y_val),
67
                            verbose=verbose)
68 1
        histories.append(history)
69 1
        val_accuracies.append(history.history['val_acc'][-1])
70 1
        val_losses.append(history.history['val_loss'][-1])
71 1
        if outputfile is not None:
72
            storetrainhist2json(params, model_types, history.history, outputfile)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (81/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
73 1
    return histories, val_accuracies, val_losses
74
75
76 1
def storetrainhist2json(params, model_type, history, outputfile):
77
    """
78
    This function stores the model parameters, the loss and accuracy history
79
    of one model in a JSON file. It appends the model information to the existing models in the file.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (101/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
80
81
    Parameters
82
    ----------
83
    params : dictionary with parameters for one model
84
    model_type : Keras model object for one model
85
    history : dictionary with training history from one model
86
    outputfile : str of path where the json file needs to be stored
87
88
    """
89 1
    jsondata = params.copy()
90 1
    jsondata['filters'] = jsondata['filters'].tolist()
91 1
    jsondata['train_acc'] = history['acc']
92 1
    jsondata['train_loss'] = history['loss']
93 1
    jsondata['val_acc'] = history['val_acc']
94 1
    jsondata['val_loss'] = history['val_loss']
95 1
    jsondata['modeltype'] = model_type
96 1
    jsondata['modeltype'] = model_type
97 1
    if os.path.isfile(outputfile):
98
        with open(outputfile, 'r') as outfile:
99
            previousdata = json.load(outfile)
100
    else:
101 1
        previousdata = []
102 1
    previousdata.append(jsondata)
103 1
    with open(outputfile, 'w') as outfile:
104 1
            json.dump(previousdata, outfile, sort_keys = True, indent = 4,ensure_ascii=False)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (93/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Coding Style introduced by
The indentation here looks off. 8 spaces were expected, but 12 were found.
Loading history...
Coding Style introduced by
No space allowed around keyword argument assignment
json.dump(previousdata, outfile, sort_keys = True, indent = 4,ensure_ascii=False)
^
Loading history...
Coding Style introduced by
No space allowed around keyword argument assignment
json.dump(previousdata, outfile, sort_keys = True, indent = 4,ensure_ascii=False)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
json.dump(previousdata, outfile, sort_keys = True, indent = 4,ensure_ascii=False)
^
Loading history...
105
106
107 1
def plotTrainingProcess(history, name='Model', ax=None):
0 ignored issues
show
Coding Style Naming introduced by
The name plotTrainingProcess does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
108
    """
109
    This function plots the loss and accuracy on the train and validation set,
110
    for each epoch in the history of one model.
111
112
    Parameters
113
    ----------
114
    history : keras History object for one model
115
        The history object of the training process corresponding to one model
116
117
    """
118
    if ax is None:
119
        fig, ax = plt.subplots()
120
    ax2 = ax.twinx()
121
    LN = len(history.history['val_loss'])
0 ignored issues
show
Coding Style Naming introduced by
The name LN does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
122
    val_loss, = ax.plot(range(LN), history.history['val_loss'], 'g--',
123
                        label='validation loss')
124
    train_loss, = ax.plot(range(LN), history.history['loss'], 'g-',
125
                          label='train loss')
126
    val_acc, = ax2.plot(range(LN), history.history['val_acc'], 'b--',
127
                        label='validation accuracy')
128
    train_acc, = ax2.plot(range(LN), history.history['acc'], 'b-',
129
                          label='train accuracy')
130
    ax.set_xlabel('epoch')
131
    ax.set_ylabel('loss', color='g')
132
    ax2.set_ylabel('accuracy', color='b')
133
    plt.legend(handles=[val_loss, train_loss, val_acc, train_acc],
134
               loc=2, bbox_to_anchor=(1.1, 1))
135
    plt.title(name)
136
137
138 1
def find_best_architecture(X_train, y_train, X_val, y_val, verbose=True,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
139
                           number_of_models=5, nr_epochs=5, subset_size=100,
140
                           outputpath=None, **kwargs
141
                           ):
142
    """
143
    Tries out a number of models on a subsample of the data,
144
    and outputs the best found architecture and hyperparameters.
145
146
    Parameters
147
    ----------
148
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
149
        The input dataset for training
150
    y_train : numpy array of shape (num_samples, num_classes)
151
        The output classes for the train data, in binary format
152
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
153
        The input dataset for validation
154
    y_val : numpy array of shape (num_samples_val, num_classes)
155
        The output classes for the validation data, in binary format
156
    verbose : bool, optional
157
        flag for displaying verbose output
158
    number_of_models : int
159
        The number of models to generate and test
160
    nr_epochs : int
161
        The number of epochs that each model is trained
162
    subset_size : int
163
        The size of the subset of the data that is used for finding the optimal architecture
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
164
    **kwargs: key-value parameters
165
        parameters for generating the models (see docstring for modelgen.generate_models)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (89/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
166
167
    Returns
168
    ----------
169
    best_model : Keras model
170
        Best performing model, already trained on a small sample data set.
171
    best_params : dict
172
        Dictionary containing the hyperparameters for the best model
173
    best_model_type : str
174
        Type of the best model
175
    knn_acc : float
176
        accuaracy for kNN prediction on validation set
177
    """
178 1
    models = modelgen.generate_models(X_train.shape, y_train.shape[1],
179
                                      number_of_models=number_of_models,
180
                                      **kwargs)
181 1
    histories, val_accuracies, val_losses = train_models_on_samples(X_train,
182
                                                                    y_train,
183
                                                                    X_val,
184
                                                                    y_val,
185
                                                                    models,
186
                                                                    nr_epochs,
187
                                                                    subset_size=subset_size,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
188
                                                                    verbose=verbose,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (84/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
189
                                                                    outputfile=outputpath)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (90/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
190 1
    best_model_index = np.argmax(val_accuracies)
191 1
    best_model, best_params, best_model_type = models[best_model_index]
192 1
    knn_acc = kNN_accuracy(
193
        X_train[:subset_size, :, :], y_train[:subset_size, :], X_val, y_val)
194 1
    if verbose:
195
        for i in range(len(models)):  # <= now one plot per model, ultimately we
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (80/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
196
            # may want all models in one plot to allow for direct comparison
197
            name = str(models[i][1])
198
            plotTrainingProcess(histories[i], name)
199
        print('Best model: model ', best_model_index)
200
        print('Model type: ', best_model_type)
201
        print('Hyperparameters: ', best_params)
202
        print('Accuracy on validation set: ', val_accuracies[best_model_index])
203
        print('Accuracy of kNN on validation set', knn_acc)
204
205 1
    if val_accuracies[best_model_index] < knn_acc:
206
        warnings.warn('Best model not better than kNN: ' +
207
                      str(val_accuracies[best_model_index]) + ' vs  ' +
208
                      str(knn_acc)
209
                      )
210 1
    return best_model, best_params, best_model_type, knn_acc
211
212
213 1
def kNN_accuracy(X_train, y_train, X_val, y_val, k=1):
0 ignored issues
show
Coding Style Naming introduced by
The name kNN_accuracy does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
214
    """
215
    Performs k-Neigherst Neighbors and returns the accuracy score.
216
217
    Parameters
218
    ----------
219
    X_train : numpy array
220
        Train set of shape (num_samples, num_timesteps, num_channels)
221
    y_train : numpy array
222
        Class labels for train set
223
    X_val : numpy array
224
        Validation set of shape (num_samples, num_timesteps, num_channels)
225
    y_val : numpy array
226
        Class labels for validation set
227
    k : int
228
        number of neighbors to use for classifying
229
230
    Returns
231
    -------
232
    accuracy: float
233
        accuracy score on the validation set
234
    """
235 1
    num_samples, num_timesteps, num_channels = X_train.shape
236 1
    clf = neighbors.KNeighborsClassifier(k)
237 1
    clf.fit(
238
        X_train.reshape(
239
            num_samples,
240
            num_timesteps *
241
            num_channels),
242
        y_train)
243 1
    num_samples, num_timesteps, num_channels = X_val.shape
244 1
    val_predict = clf.predict(
245
        X_val.reshape(num_samples,
246
                      num_timesteps * num_channels))
247
    return metrics.accuracy_score(val_predict, y_val)
248