Conditions | 12 |
Total Lines | 99 |
Code Lines | 37 |
Lines | 0 |
Ratio | 0 % |
Tests | 32 |
CRAP Score | 12 |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.phonetic._daitch_mokotoff.DaitchMokotoff.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | # -*- coding: utf-8 -*- |
||
260 | 1 | def encode(self, word, max_length=6, zero_pad=True): |
|
261 | """Return the Daitch-Mokotoff Soundex code for a word. |
||
262 | |||
263 | Parameters |
||
264 | ---------- |
||
265 | word : str |
||
266 | The word to transform |
||
267 | max_length : int |
||
268 | The length of the code returned (defaults to 6; must be between 6 |
||
269 | and 64) |
||
270 | zero_pad : bool |
||
271 | Pad the end of the return value with 0s to achieve a max_length |
||
272 | string |
||
273 | |||
274 | Returns |
||
275 | ------- |
||
276 | str |
||
277 | The Daitch-Mokotoff Soundex value |
||
278 | |||
279 | Examples |
||
280 | -------- |
||
281 | >>> pe = DaitchMokotoff() |
||
282 | >>> sorted(pe.encode('Christopher')) |
||
283 | ['494379', '594379'] |
||
284 | >>> pe.encode('Niall') |
||
285 | {'680000'} |
||
286 | >>> pe.encode('Smith') |
||
287 | {'463000'} |
||
288 | >>> pe.encode('Schmidt') |
||
289 | {'463000'} |
||
290 | |||
291 | >>> sorted(pe.encode('The quick brown fox', max_length=20, |
||
292 | ... zero_pad=False)) |
||
293 | ['35457976754', '3557976754'] |
||
294 | |||
295 | """ |
||
296 | 1 | dms = [''] # initialize empty code list |
|
297 | |||
298 | # Require a max_length of at least 6 and not more than 64 |
||
299 | 1 | if max_length != -1: |
|
300 | 1 | max_length = min(max(6, max_length), 64) |
|
301 | else: |
||
302 | 1 | max_length = 64 |
|
303 | |||
304 | # uppercase, normalize, decompose, and filter non-A-Z |
||
305 | 1 | word = unicode_normalize('NFKD', text_type(word.upper())) |
|
306 | 1 | word = word.replace('ß', 'SS') |
|
307 | 1 | word = ''.join(c for c in word if c in self._uc_set) |
|
308 | |||
309 | # Nothing to convert, return base case |
||
310 | 1 | if not word: |
|
311 | 1 | if zero_pad: |
|
312 | 1 | return {'0' * max_length} |
|
313 | 1 | return {'0'} |
|
314 | |||
315 | 1 | pos = 0 |
|
316 | 1 | while pos < len(word): |
|
317 | # Iterate through _dms_order, which specifies the possible |
||
318 | # substrings for which codes exist in the Daitch-Mokotoff coding |
||
319 | 1 | for sstr in self._dms_order[word[pos]]: # pragma: no branch |
|
320 | 1 | if word[pos:].startswith(sstr): |
|
321 | # Having determined a valid substring start, retrieve the |
||
322 | # code |
||
323 | 1 | dm_val = self._dms_table[sstr] |
|
324 | |||
325 | # Having retried the code (triple), determine the correct |
||
326 | # positional variant (first, pre-vocalic, elsewhere) |
||
327 | 1 | if pos == 0: |
|
328 | 1 | dm_val = dm_val[0] |
|
329 | 1 | elif ( |
|
330 | pos + len(sstr) < len(word) |
||
331 | and word[pos + len(sstr)] in self._uc_v_set |
||
332 | ): |
||
333 | 1 | dm_val = dm_val[1] |
|
334 | else: |
||
335 | 1 | dm_val = dm_val[2] |
|
336 | |||
337 | # Build the code strings |
||
338 | 1 | if isinstance(dm_val, tuple): |
|
339 | 1 | dms = [_ + text_type(dm_val[0]) for _ in dms] + [ |
|
340 | _ + text_type(dm_val[1]) for _ in dms |
||
341 | ] |
||
342 | else: |
||
343 | 1 | dms = [_ + text_type(dm_val) for _ in dms] |
|
344 | 1 | pos += len(sstr) |
|
345 | 1 | break |
|
346 | |||
347 | # Filter out double letters and _ placeholders |
||
348 | 1 | dms = ( |
|
349 | ''.join(c for c in self._delete_consecutive_repeats(_) if c != '_') |
||
|
|||
350 | for _ in dms |
||
351 | ) |
||
352 | |||
353 | # Trim codes and return set |
||
354 | 1 | if zero_pad: |
|
355 | 1 | dms = ((_ + ('0' * max_length))[:max_length] for _ in dms) |
|
356 | else: |
||
357 | 1 | dms = (_[:max_length] for _ in dms) |
|
358 | 1 | return set(dms) |
|
359 | |||
404 |