| Conditions | 23 |
| Total Lines | 74 |
| Lines | 0 |
| Ratio | 0 % |
| Tests | 1 |
| CRAP Score | 520.2811 |
| Changes | 6 | ||
| Bugs | 0 | Features | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like fetch_and_preprocess() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | """ |
||
| 142 | 1 | def fetch_and_preprocess(directory_to_extract_to, columns_to_use=None): |
|
| 143 | """ |
||
| 144 | High level function to fetch_and_preprocess the PAMAP2 dataset |
||
| 145 | directory_to_extract_to: the directory where the data will be stored |
||
| 146 | columns_to_use: the columns to use |
||
| 147 | """ |
||
| 148 | if columns_to_use is None: |
||
| 149 | columns_to_use = ['hand_acc_16g_x', 'hand_acc_16g_y', 'hand_acc_16g_z', |
||
| 150 | 'ankle_acc_16g_x', 'ankle_acc_16g_y', 'ankle_acc_16g_z', |
||
| 151 | 'chest_acc_16g_x', 'chest_acc_16g_y', 'chest_acc_16g_z'] |
||
| 152 | targetdir = fetch_data(directory_to_extract_to) |
||
| 153 | outdatapath = targetdir + '/PAMAP2_Dataset' + '/slidingwindow512cleaned/' |
||
| 154 | if not os.path.exists(outdatapath): |
||
| 155 | os.makedirs(outdatapath) |
||
| 156 | if os.path.isfile(outdatapath+'x_train.npy'): |
||
| 157 | print('Data previously pre-processed and np-files saved to ' + |
||
| 158 | outdatapath) |
||
| 159 | else: |
||
| 160 | datadir = targetdir + '/PAMAP2_Dataset/Protocol' |
||
| 161 | filenames = listdir(datadir) |
||
| 162 | print('Start pre-processing all ' + str(len(filenames)) + ' files...') |
||
| 163 | # load the files and put them in a list of pandas dataframes: |
||
| 164 | datasets = [pd.read_csv(datadir+'/'+fn, header=None, sep=' ') \ |
||
| 165 | for fn in filenames] |
||
| 166 | datasets = addheader(datasets) # add headers to the datasets |
||
| 167 | #Interpolate dataset to get same sample rate between channels |
||
| 168 | datasets_filled = [d.interpolate() for d in datasets] |
||
| 169 | # Create mapping for class labels |
||
| 170 | ysetall = [set(np.array(data.activityID)) - set([0]) \ |
||
| 171 | for data in datasets_filled] |
||
| 172 | classlabels = list(set.union(*[set(y) for y in ysetall])) |
||
| 173 | nr_classes = len(classlabels) |
||
| 174 | mapclasses = {classlabels[i] : i for i in range(len(classlabels))} |
||
| 175 | #Create input (x) and output (y) sets |
||
| 176 | xall = [np.array(data[columns_to_use]) for data in datasets_filled] |
||
| 177 | yall = [np.array(data.activityID) for data in datasets_filled] |
||
| 178 | |||
| 179 | xylists = [split_activities(y, x) for x, y in zip(xall, yall)] |
||
| 180 | Xlists, ylists = zip(*xylists) |
||
| 181 | ybinarylists = [transform_y(y, mapclasses, nr_classes) for y in ylists] |
||
| 182 | # Split in train, test and val |
||
| 183 | train_range = slice(0, 6) |
||
| 184 | val_range = 6 |
||
| 185 | test_range = slice(7, len(datasets_filled)) |
||
| 186 | x_trainlist = [X for Xlist in Xlists[train_range] for X in Xlist] |
||
| 187 | x_vallist = [X for X in Xlists[val_range]] |
||
| 188 | x_testlist = [X for Xlist in Xlists[test_range] for X in Xlist] |
||
| 189 | y_trainlist = [y for ylist in ybinarylists[train_range] for y in ylist] |
||
| 190 | y_vallist = [y for y in ybinarylists[val_range]] |
||
| 191 | y_testlist = [y for ylist in ybinarylists[test_range] for y in ylist] |
||
| 192 | |||
| 193 | # Take sliding-window frames. Target is label of last time step |
||
| 194 | # Data is 100 Hz |
||
| 195 | frame_length = int(5.12 * 100) |
||
| 196 | step = 1 * 100 |
||
| 197 | x_train = [] |
||
| 198 | y_train = [] |
||
| 199 | x_val = [] |
||
| 200 | y_val = [] |
||
| 201 | x_test = [] |
||
| 202 | y_test = [] |
||
| 203 | sliding_window(frame_length, step, x_train, y_train, y_trainlist, \ |
||
| 204 | x_trainlist) |
||
| 205 | sliding_window(frame_length, step, x_val, y_val, y_vallist, x_vallist) |
||
| 206 | sliding_window(frame_length, step, x_test, y_test, \ |
||
| 207 | y_testlist, x_testlist) |
||
| 208 | numpify_and_store(x_train, y_train, 'X_train', 'y_train', \ |
||
| 209 | outdatapath, shuffle=True) |
||
| 210 | numpify_and_store(x_val, y_val, 'X_val', 'y_val', outdatapath, \ |
||
| 211 | shuffle=False) |
||
| 212 | numpify_and_store(x_test, y_test, 'X_test', 'y_test', outdatapath, \ |
||
| 213 | shuffle=False) |
||
| 214 | print('Processed data succesfully stored in ' + outdatapath) |
||
| 215 | return outdatapath |
||
| 216 | |||
| 226 |
This check looks for invalid names for a range of different identifiers.
You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.
If your project includes a Pylint configuration file, the settings contained in that file take precedence.
To find out more about Pylint, please refer to their site.