Conditions | 23 |
Total Lines | 74 |
Lines | 0 |
Ratio | 0 % |
Tests | 1 |
CRAP Score | 520.2811 |
Changes | 6 | ||
Bugs | 0 | Features | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like fetch_and_preprocess() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | """ |
||
142 | 1 | def fetch_and_preprocess(directory_to_extract_to, columns_to_use=None): |
|
143 | """ |
||
144 | High level function to fetch_and_preprocess the PAMAP2 dataset |
||
145 | directory_to_extract_to: the directory where the data will be stored |
||
146 | columns_to_use: the columns to use |
||
147 | """ |
||
148 | if columns_to_use is None: |
||
149 | columns_to_use = ['hand_acc_16g_x', 'hand_acc_16g_y', 'hand_acc_16g_z', |
||
150 | 'ankle_acc_16g_x', 'ankle_acc_16g_y', 'ankle_acc_16g_z', |
||
151 | 'chest_acc_16g_x', 'chest_acc_16g_y', 'chest_acc_16g_z'] |
||
152 | targetdir = fetch_data(directory_to_extract_to) |
||
153 | outdatapath = targetdir + '/PAMAP2_Dataset' + '/slidingwindow512cleaned/' |
||
154 | if not os.path.exists(outdatapath): |
||
155 | os.makedirs(outdatapath) |
||
156 | if os.path.isfile(outdatapath+'x_train.npy'): |
||
157 | print('Data previously pre-processed and np-files saved to ' + |
||
158 | outdatapath) |
||
159 | else: |
||
160 | datadir = targetdir + '/PAMAP2_Dataset/Protocol' |
||
161 | filenames = listdir(datadir) |
||
162 | print('Start pre-processing all ' + str(len(filenames)) + ' files...') |
||
163 | # load the files and put them in a list of pandas dataframes: |
||
164 | datasets = [pd.read_csv(datadir+'/'+fn, header=None, sep=' ') \ |
||
165 | for fn in filenames] |
||
166 | datasets = addheader(datasets) # add headers to the datasets |
||
167 | #Interpolate dataset to get same sample rate between channels |
||
168 | datasets_filled = [d.interpolate() for d in datasets] |
||
169 | # Create mapping for class labels |
||
170 | ysetall = [set(np.array(data.activityID)) - set([0]) \ |
||
171 | for data in datasets_filled] |
||
172 | classlabels = list(set.union(*[set(y) for y in ysetall])) |
||
173 | nr_classes = len(classlabels) |
||
174 | mapclasses = {classlabels[i] : i for i in range(len(classlabels))} |
||
175 | #Create input (x) and output (y) sets |
||
176 | xall = [np.array(data[columns_to_use]) for data in datasets_filled] |
||
177 | yall = [np.array(data.activityID) for data in datasets_filled] |
||
178 | |||
179 | xylists = [split_activities(y, x) for x, y in zip(xall, yall)] |
||
180 | Xlists, ylists = zip(*xylists) |
||
181 | ybinarylists = [transform_y(y, mapclasses, nr_classes) for y in ylists] |
||
182 | # Split in train, test and val |
||
183 | train_range = slice(0, 6) |
||
184 | val_range = 6 |
||
185 | test_range = slice(7, len(datasets_filled)) |
||
186 | x_trainlist = [X for Xlist in Xlists[train_range] for X in Xlist] |
||
187 | x_vallist = [X for X in Xlists[val_range]] |
||
188 | x_testlist = [X for Xlist in Xlists[test_range] for X in Xlist] |
||
189 | y_trainlist = [y for ylist in ybinarylists[train_range] for y in ylist] |
||
190 | y_vallist = [y for y in ybinarylists[val_range]] |
||
191 | y_testlist = [y for ylist in ybinarylists[test_range] for y in ylist] |
||
192 | |||
193 | # Take sliding-window frames. Target is label of last time step |
||
194 | # Data is 100 Hz |
||
195 | frame_length = int(5.12 * 100) |
||
196 | step = 1 * 100 |
||
197 | x_train = [] |
||
198 | y_train = [] |
||
199 | x_val = [] |
||
200 | y_val = [] |
||
201 | x_test = [] |
||
202 | y_test = [] |
||
203 | sliding_window(frame_length, step, x_train, y_train, y_trainlist, \ |
||
204 | x_trainlist) |
||
205 | sliding_window(frame_length, step, x_val, y_val, y_vallist, x_vallist) |
||
206 | sliding_window(frame_length, step, x_test, y_test, \ |
||
207 | y_testlist, x_testlist) |
||
208 | numpify_and_store(x_train, y_train, 'X_train', 'y_train', \ |
||
209 | outdatapath, shuffle=True) |
||
210 | numpify_and_store(x_val, y_val, 'X_val', 'y_val', outdatapath, \ |
||
211 | shuffle=False) |
||
212 | numpify_and_store(x_test, y_test, 'X_test', 'y_test', outdatapath, \ |
||
213 | shuffle=False) |
||
214 | print('Processed data succesfully stored in ' + outdatapath) |
||
215 | return outdatapath |
||
216 | |||
226 |
This check looks for invalid names for a range of different identifiers.
You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.
If your project includes a Pylint configuration file, the settings contained in that file take precedence.
To find out more about Pylint, please refer to their site.