Completed
Push — master ( 5ff7a9...7751f9 )
by Dafne van
09:39
created

storetrainhist2json()   B

Complexity

Conditions 6

Size

Total Lines 31

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 16
CRAP Score 6.0493

Importance

Changes 0
Metric Value
cc 6
dl 0
loc 31
ccs 16
cts 18
cp 0.8889
crap 6.0493
rs 7.5384
c 0
b 0
f 0
1
"""
2
 Summary:
3
 Function generate_models from modelgen.py generates and compiles models
4
 Function train_models_on_samples trains those models
5
 Function plotTrainingProcess plots the training process
6
 Function find_best_architecture is wrapper function that combines
7
 these steps
8
 Example function calls in 'EvaluateDifferentModels.ipynb'
9
"""
10 1
import numpy as np
11 1
from matplotlib import pyplot as plt
12 1
from . import modelgen
13 1
from sklearn import neighbors, metrics
14 1
import warnings
15 1
import json
16 1
import os
17
18 1
def train_models_on_samples(X_train, y_train, X_val, y_val, models,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
19
                            nr_epochs=5, subset_size=100, verbose=True,
20
                            outputfile=None):
21
    """
22
    Given a list of compiled models, this function trains
23
    them all on a subset of the train data. If the given size of the subset is
24
    smaller then the size of the data, the complete data set is used.
25
    Parameters
26
    ----------
27
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
28
        The input dataset for training
29
    y_train : numpy array of shape (num_samples, num_classes)
30
        The output classes for the train data, in binary format
31
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
32
        The input dataset for validation
33
    y_val : numpy array of shape (num_samples_val, num_classes)
34
        The output classes for the validation data, in binary format
35
    models : list of model, params, modeltypes
36
        List of keras models to train
37
    nr_epochs : int, optional
38
        nr of epochs to use for training one model
39
    subset_size :
40
        The number of samples used from the complete train set
41
    subsize_set : int, optional
42
        number of samples to use from the training set for training these models
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (80/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
43
    verbose : bool, optional
44
        flag for displaying verbose output
45
46
    Returns
47
    ----------
48
    histories : list of Keras History objects
49
        train histories for all models
50
    val_accuracies : list of floats
51
        validation accuraracies of the models
52
    val_losses : list of floats
53
        validation losses of the models
54
    """
55
    # if subset_size is smaller then X_train, this will work fine
56 1
    X_train_sub = X_train[:subset_size,:,:]
0 ignored issues
show
Coding Style Naming introduced by
The name X_train_sub does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train_sub = X_train[:subset_size,:,:]
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train_sub = X_train[:subset_size,:,:]
^
Loading history...
57 1
    y_train_sub = y_train[:subset_size,:]
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
y_train_sub = y_train[:subset_size,:]
^
Loading history...
58
59 1
    histories = []
60 1
    val_accuracies = []
61 1
    val_losses = []
62 1
    for i, (model, params, model_types) in enumerate(models):
63 1
        if verbose:
64
            print('Training model %d'%i, model_types)
65 1
        history = model.fit(X_train_sub, y_train_sub,
66
                            nb_epoch=nr_epochs, batch_size=20,
67
                            # see comment on subsize_set
68
                            validation_data=(X_val, y_val),
69
                            verbose=verbose)
70 1
        histories.append(history)
71 1
        val_accuracies.append(history.history['val_acc'][-1])
72 1
        val_losses.append(history.history['val_loss'][-1])
73 1
        if outputfile is not None:
74
            storetrainhist2json(params, model_types, history.history, outputfile)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (81/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
75 1
    return histories, val_accuracies, val_losses
76
77
78 1
def storetrainhist2json(params, model_type, history, outputfile):
79
    """
80
    This function stores the model parameters, the loss and accuracy history
81
    of one model in a JSON file. It appends the model information to the existing models in the file.
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (101/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
82
83
    Parameters
84
    ----------
85
    params : dictionary with parameters for one model
86
    model_type : Keras model object for one model
87
    history : dictionary with training history from one model
88
    outputfile : str of path where the json file needs to be stored
89
90
    """
91 1
    jsondata = params.copy()
92 1
    for k in jsondata.keys():
93 1
        if isinstance(jsondata[k], np.ndarray):
94 1
            jsondata[k] = jsondata[k].tolist()
95 1
    jsondata['train_acc'] = history['acc']
96 1
    jsondata['train_loss'] = history['loss']
97 1
    jsondata['val_acc'] = history['val_acc']
98 1
    jsondata['val_loss'] = history['val_loss']
99 1
    jsondata['modeltype'] = model_type
100 1
    jsondata['modeltype'] = model_type
101 1
    if os.path.isfile(outputfile):
102
        with open(outputfile, 'r') as outfile:
103
            previousdata = json.load(outfile)
104
    else:
105 1
        previousdata = []
106 1
    previousdata.append(jsondata)
107 1
    with open(outputfile, 'w') as outfile:
108 1
            json.dump(previousdata, outfile, sort_keys = True, indent = 4, ensure_ascii=False)
0 ignored issues
show
Coding Style introduced by
The indentation here looks off. 8 spaces were expected, but 12 were found.
Loading history...
Coding Style introduced by
This line is too long as per the coding-style (94/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
Coding Style introduced by
No space allowed around keyword argument assignment
json.dump(previousdata, outfile, sort_keys = True, indent = 4, ensure_ascii=False)
^
Loading history...
Coding Style introduced by
No space allowed around keyword argument assignment
json.dump(previousdata, outfile, sort_keys = True, indent = 4, ensure_ascii=False)
^
Loading history...
109
110
111 1
def plotTrainingProcess(history, name='Model', ax=None):
0 ignored issues
show
Coding Style Naming introduced by
The name plotTrainingProcess does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
112
    """
113
    This function plots the loss and accuracy on the train and validation set,
114
    for each epoch in the history of one model.
115
116
    Parameters
117
    ----------
118
    history : keras History object for one model
119
        The history object of the training process corresponding to one model
120
121
    """
122
    if ax is None:
123
        fig, ax = plt.subplots()
124
    ax2 = ax.twinx()
125
    LN = len(history.history['val_loss'])
0 ignored issues
show
Coding Style Naming introduced by
The name LN does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
126
    val_loss, = ax.plot(range(LN), history.history['val_loss'], 'g--',
127
                        label='validation loss')
128
    train_loss, = ax.plot(range(LN), history.history['loss'], 'g-',
129
                          label='train loss')
130
    val_acc, = ax2.plot(range(LN), history.history['val_acc'], 'b--',
131
                        label='validation accuracy')
132
    train_acc, = ax2.plot(range(LN), history.history['acc'], 'b-',
133
                          label='train accuracy')
134
    ax.set_xlabel('epoch')
135
    ax.set_ylabel('loss', color='g')
136
    ax2.set_ylabel('accuracy', color='b')
137
    plt.legend(handles=[val_loss, train_loss, val_acc, train_acc],
138
               loc=2, bbox_to_anchor=(1.1, 1))
139
    plt.title(name)
140
141
142 1
def find_best_architecture(X_train, y_train, X_val, y_val, verbose=True,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
143
                           number_of_models=5, nr_epochs=5, subset_size=100,
144
                           outputpath=None, **kwargs
145
                           ):
146
    """
147
    Tries out a number of models on a subsample of the data,
148
    and outputs the best found architecture and hyperparameters.
149
150
    Parameters
151
    ----------
152
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
153
        The input dataset for training
154
    y_train : numpy array of shape (num_samples, num_classes)
155
        The output classes for the train data, in binary format
156
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
157
        The input dataset for validation
158
    y_val : numpy array of shape (num_samples_val, num_classes)
159
        The output classes for the validation data, in binary format
160
    verbose : bool, optional
161
        flag for displaying verbose output
162
    number_of_models : int
163
        The number of models to generate and test
164
    nr_epochs : int
165
        The number of epochs that each model is trained
166
    subset_size : int
167
        The size of the subset of the data that is used for finding the optimal architecture
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
168
    **kwargs: key-value parameters
169
        parameters for generating the models (see docstring for modelgen.generate_models)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (89/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
170
171
    Returns
172
    ----------
173
    best_model : Keras model
174
        Best performing model, already trained on a small sample data set.
175
    best_params : dict
176
        Dictionary containing the hyperparameters for the best model
177
    best_model_type : str
178
        Type of the best model
179
    knn_acc : float
180
        accuaracy for kNN prediction on validation set
181
    """
182 1
    models = modelgen.generate_models(X_train.shape, y_train.shape[1],
183
                                      number_of_models=number_of_models,
184
                                      **kwargs)
185 1
    histories, val_accuracies, val_losses = train_models_on_samples(X_train,
186
                                                                    y_train,
187
                                                                    X_val,
188
                                                                    y_val,
189
                                                                    models,
190
                                                                    nr_epochs,
191
                                                                    subset_size=subset_size,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
192
                                                                    verbose=verbose,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (84/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
193
                                                                    outputfile=outputpath)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (90/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
194 1
    best_model_index = np.argmax(val_accuracies)
195 1
    best_model, best_params, best_model_type = models[best_model_index]
196 1
    knn_acc = kNN_accuracy(
197
        X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
198 1
    if verbose:
199
        for i in range(len(models)):  # <= now one plot per model, ultimately we
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (80/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
200
            # may want all models in one plot to allow for direct comparison
201
            name = str(models[i][1])
202
            plotTrainingProcess(histories[i], name)
203
        print('Best model: model ', best_model_index)
204
        print('Model type: ', best_model_type)
205
        print('Hyperparameters: ', best_params)
206
        print('Accuracy on validation set: ', val_accuracies[best_model_index])
207
        print('Accuracy of kNN on validation set', knn_acc)
208
209 1
    if val_accuracies[best_model_index] < knn_acc:
210
        warnings.warn('Best model not better than kNN: ' +
211
                      str(val_accuracies[best_model_index]) + ' vs  ' +
212
                      str(knn_acc)
213
                      )
214 1
    return best_model, best_params, best_model_type, knn_acc
215
216
217 1
def kNN_accuracy(X_train, y_train, X_val, y_val, k=1):
0 ignored issues
show
Coding Style Naming introduced by
The name kNN_accuracy does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
218
    """
219
    Performs k-Neigherst Neighbors and returns the accuracy score.
220
221
    Parameters
222
    ----------
223
    X_train : numpy array
224
        Train set of shape (num_samples, num_timesteps, num_channels)
225
    y_train : numpy array
226
        Class labels for train set
227
    X_val : numpy array
228
        Validation set of shape (num_samples, num_timesteps, num_channels)
229
    y_val : numpy array
230
        Class labels for validation set
231
    k : int
232
        number of neighbors to use for classifying
233
234
    Returns
235
    -------
236
    accuracy: float
237
        accuracy score on the validation set
238
    """
239 1
    num_samples, num_timesteps, num_channels = X_train.shape
240 1
    clf = neighbors.KNeighborsClassifier(k)
241 1
    clf.fit(
242
        X_train.reshape(
243
            num_samples,
244
            num_timesteps *
245
            num_channels),
246
        y_train)
247 1
    num_samples, num_timesteps, num_channels = X_val.shape
248 1
    val_predict = clf.predict(
249
        X_val.reshape(num_samples,
250
                      num_timesteps * num_channels))
251
    return metrics.accuracy_score(val_predict, y_val)
252