Completed
Push — master ( 7751f9...ca34ad )
by Dafne van
07:15
created

train_models_on_samples()   B

Complexity

Conditions 4

Size

Total Lines 61

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 14
CRAP Score 4.0312

Importance

Changes 2
Bugs 0 Features 1
Metric Value
cc 4
dl 0
loc 61
ccs 14
cts 16
cp 0.875
crap 4.0312
rs 8.9392
c 2
b 0
f 1

How to fix   Long Method   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

1
"""
2
 Summary:
3
 Function generate_models from modelgen.py generates and compiles models
4
 Function train_models_on_samples trains those models
5
 Function plotTrainingProcess plots the training process
6
 Function find_best_architecture is wrapper function that combines
7
 these steps
8
 Example function calls in 'EvaluateDifferentModels.ipynb'
9
"""
10 1
import numpy as np
11 1
from matplotlib import pyplot as plt
12 1
from . import modelgen
13 1
from sklearn import neighbors, metrics
14 1
import warnings
15 1
import json
16 1
import os
17
18
19 1
def train_models_on_samples(X_train, y_train, X_val, y_val, models,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
20
                            nr_epochs=5, subset_size=100, verbose=True,
21
                            outputfile=None):
22
    """
23
    Given a list of compiled models, this function trains
24
    them all on a subset of the train data. If the given size of the subset is
25
    smaller then the size of the data, the complete data set is used.
26
    Parameters
27
    ----------
28
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
29
        The input dataset for training
30
    y_train : numpy array of shape (num_samples, num_classes)
31
        The output classes for the train data, in binary format
32
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
33
        The input dataset for validation
34
    y_val : numpy array of shape (num_samples_val, num_classes)
35
        The output classes for the validation data, in binary format
36
    models : list of model, params, modeltypes
37
        List of keras models to train
38
    nr_epochs : int, optional
39
        nr of epochs to use for training one model
40
    subset_size :
41
        The number of samples used from the complete train set
42
    subsize_set : int, optional
43
        number of samples to use from the training set for training the models
44
    verbose : bool, optional
45
        flag for displaying verbose output
46
    outputfile : str, optional
47
        File location to store the model results
48
49
    Returns
50
    ----------
51
    histories : list of Keras History objects
52
        train histories for all models
53
    val_accuracies : list of floats
54
        validation accuraracies of the models
55
    val_losses : list of floats
56
        validation losses of the models
57
    """
58
    # if subset_size is smaller then X_train, this will work fine
59 1
    X_train_sub = X_train[:subset_size,:,:]
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
X_train_sub = X_train[:subset_size,:,:]
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train_sub = X_train[:subset_size,:,:]
^
Loading history...
Coding Style Naming introduced by
The name X_train_sub does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
60 1
    y_train_sub = y_train[:subset_size,:]
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
y_train_sub = y_train[:subset_size,:]
^
Loading history...
61
62 1
    histories = []
63 1
    val_accuracies = []
64 1
    val_losses = []
65 1
    for i, (model, params, model_types) in enumerate(models):
66 1
        if verbose:
67
            print('Training model %d'%i, model_types)
68 1
        history = model.fit(X_train_sub, y_train_sub,
69
                            nb_epoch=nr_epochs, batch_size=20,
70
                            # see comment on subsize_set
71
                            validation_data=(X_val, y_val),
72
                            verbose=verbose)
73 1
        histories.append(history)
74 1
        val_accuracies.append(history.history['val_acc'][-1])
75 1
        val_losses.append(history.history['val_loss'][-1])
76 1
        if outputfile is not None:
77
            storetrainhist2json(params, model_types,
78
                                history.history, outputfile)
79 1
    return histories, val_accuracies, val_losses
80
81
82 1
def storetrainhist2json(params, model_type, history, outputfile):
83
    """
84
    This function stores the model parameters, the loss and accuracy history
85
    of one model in a JSON file. It appends the model information to the
86
    existing models in the file.
87
88
    Parameters
89
    ----------
90
    params : dictionary with parameters for one model
91
    model_type : Keras model object for one model
92
    history : dictionary with training history from one model
93
    outputfile : str of path where the json file needs to be stored
94
95
    """
96 1
    jsondata = params.copy()
97 1
    for k in jsondata.keys():
98 1
        if isinstance(jsondata[k], np.ndarray):
99 1
            jsondata[k] = jsondata[k].tolist()
100 1
    jsondata['train_acc'] = history['acc']
101 1
    jsondata['train_loss'] = history['loss']
102 1
    jsondata['val_acc'] = history['val_acc']
103 1
    jsondata['val_loss'] = history['val_loss']
104 1
    jsondata['modeltype'] = model_type
105 1
    jsondata['modeltype'] = model_type
106 1
    if os.path.isfile(outputfile):
107
        with open(outputfile, 'r') as outfile:
108
            previousdata = json.load(outfile)
109
    else:
110 1
        previousdata = []
111 1
    previousdata.append(jsondata)
112 1
    with open(outputfile, 'w') as outfile:
113 1
            json.dump(previousdata, outfile, sort_keys = True,
0 ignored issues
show
Coding Style introduced by
The indentation here looks off. 8 spaces were expected, but 12 were found.
Loading history...
Coding Style introduced by
No space allowed around keyword argument assignment
json.dump(previousdata, outfile, sort_keys = True,
^
Loading history...
114
                      indent = 4, ensure_ascii=False)
0 ignored issues
show
Coding Style introduced by
No space allowed around keyword argument assignment
indent = 4, ensure_ascii=False)
^
Loading history...
115
116
117 1
def plotTrainingProcess(history, name='Model', ax=None):
0 ignored issues
show
Coding Style Naming introduced by
The name plotTrainingProcess does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
118
    """
119
    This function plots the loss and accuracy on the train and validation set,
120
    for each epoch in the history of one model.
121
122
    Parameters
123
    ----------
124
    history : keras History object for one model
125
        The history object of the training process corresponding to one model
126
127
    """
128
    if ax is None:
129
        fig, ax = plt.subplots()
130
    ax2 = ax.twinx()
131
    LN = len(history.history['val_loss'])
0 ignored issues
show
Coding Style Naming introduced by
The name LN does not conform to the variable naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
132
    val_loss, = ax.plot(range(LN), history.history['val_loss'], 'g--',
133
                        label='validation loss')
134
    train_loss, = ax.plot(range(LN), history.history['loss'], 'g-',
135
                          label='train loss')
136
    val_acc, = ax2.plot(range(LN), history.history['val_acc'], 'b--',
137
                        label='validation accuracy')
138
    train_acc, = ax2.plot(range(LN), history.history['acc'], 'b-',
139
                          label='train accuracy')
140
    ax.set_xlabel('epoch')
141
    ax.set_ylabel('loss', color='g')
142
    ax2.set_ylabel('accuracy', color='b')
143
    plt.legend(handles=[val_loss, train_loss, val_acc, train_acc],
144
               loc=2, bbox_to_anchor=(1.1, 1))
145
    plt.title(name)
146
147
148 1
def find_best_architecture(X_train, y_train, X_val, y_val, verbose=True,
0 ignored issues
show
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
149
                           number_of_models=5, nr_epochs=5, subset_size=100,
150
                           outputpath=None, **kwargs
151
                           ):
152
    """
153
    Tries out a number of models on a subsample of the data,
154
    and outputs the best found architecture and hyperparameters.
155
156
    Parameters
157
    ----------
158
    X_train : numpy array of shape (num_samples, num_timesteps, num_channels)
159
        The input dataset for training
160
    y_train : numpy array of shape (num_samples, num_classes)
161
        The output classes for the train data, in binary format
162
    X_val : numpy array of shape (num_samples_val, num_timesteps, num_channels)
163
        The input dataset for validation
164
    y_val : numpy array of shape (num_samples_val, num_classes)
165
        The output classes for the validation data, in binary format
166
    verbose : bool, optional
167
        flag for displaying verbose output
168
    number_of_models : int
169
        The number of models to generate and test
170
    nr_epochs : int
171
        The number of epochs that each model is trained
172
    subset_size : int
173
        The size of the subset of the data that is used for finding
174
        the optimal architecture
175
    **kwargs: key-value parameters
176
        parameters for generating the models
177
        (see docstring for modelgen.generate_models)
178
179
    Returns
180
    ----------
181
    best_model : Keras model
182
        Best performing model, already trained on a small sample data set.
183
    best_params : dict
184
        Dictionary containing the hyperparameters for the best model
185
    best_model_type : str
186
        Type of the best model
187
    knn_acc : float
188
        accuaracy for kNN prediction on validation set
189
    """
190 1
    models = modelgen.generate_models(X_train.shape, y_train.shape[1],
191
                                      number_of_models=number_of_models,
192
                                      **kwargs)
193 1
    histories, val_accuracies, val_losses = train_models_on_samples(X_train,
194
                                                                    y_train,
195
                                                                    X_val,
196
                                                                    y_val,
197
                                                                    models,
198
                                                                    nr_epochs,
199
                                                                    subset_size=subset_size,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (92/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
200
                                                                    verbose=verbose,
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (84/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
201
                                                                    outputfile=outputpath)
0 ignored issues
show
Coding Style introduced by
This line is too long as per the coding-style (90/79).

This check looks for lines that are too long. You can specify the maximum line length.

Loading history...
202 1
    best_model_index = np.argmax(val_accuracies)
203 1
    best_model, best_params, best_model_type = models[best_model_index]
204 1
    knn_acc = kNN_accuracy(
205
        X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
0 ignored issues
show
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
Coding Style introduced by
Exactly one space required after comma
X_train[:subset_size,:,:], y_train[:subset_size,:], X_val, y_val)
^
Loading history...
206 1
    if verbose:
207
        for i in range(len(models)):  # now one plot per model, ultimately we
208
            # may want all models in one plot to allow for direct comparison
209
            name = str(models[i][1])
210
            plotTrainingProcess(histories[i], name)
211
        print('Best model: model ', best_model_index)
212
        print('Model type: ', best_model_type)
213
        print('Hyperparameters: ', best_params)
214
        print('Accuracy on validation set: ', val_accuracies[best_model_index])
215
        print('Accuracy of kNN on validation set', knn_acc)
216
217 1
    if val_accuracies[best_model_index] < knn_acc:
218
        warnings.warn('Best model not better than kNN: ' +
219
                      str(val_accuracies[best_model_index]) + ' vs  ' +
220
                      str(knn_acc)
221
                      )
222 1
    return best_model, best_params, best_model_type, knn_acc
223
224
225 1
def kNN_accuracy(X_train, y_train, X_val, y_val, k=1):
0 ignored issues
show
Coding Style Naming introduced by
The name kNN_accuracy does not conform to the function naming conventions ([a-z_][a-z0-9_]{2,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_train does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
Coding Style Naming introduced by
The name X_val does not conform to the argument naming conventions ([a-z_][a-z0-9_]{1,30}$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
226
    """
227
    Performs k-Neigherst Neighbors and returns the accuracy score.
228
229
    Parameters
230
    ----------
231
    X_train : numpy array
232
        Train set of shape (num_samples, num_timesteps, num_channels)
233
    y_train : numpy array
234
        Class labels for train set
235
    X_val : numpy array
236
        Validation set of shape (num_samples, num_timesteps, num_channels)
237
    y_val : numpy array
238
        Class labels for validation set
239
    k : int
240
        number of neighbors to use for classifying
241
242
    Returns
243
    -------
244
    accuracy: float
245
        accuracy score on the validation set
246
    """
247 1
    num_samples, num_timesteps, num_channels = X_train.shape
248 1
    clf = neighbors.KNeighborsClassifier(k)
249 1
    clf.fit(
250
        X_train.reshape(
251
            num_samples,
252
            num_timesteps *
253
            num_channels),
254
        y_train)
255 1
    num_samples, num_timesteps, num_channels = X_val.shape
256 1
    val_predict = clf.predict(
257
        X_val.reshape(num_samples,
258
                      num_timesteps * num_channels))
259
    return metrics.accuracy_score(val_predict, y_val)
260