| Conditions | 15 |
| Total Lines | 80 |
| Code Lines | 38 |
| Lines | 0 |
| Ratio | 0 % |
| Tests | 34 |
| CRAP Score | 15 |
| Changes | 0 | ||
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.distance._relaxed_hamming.RelaxedHamming.dist_abs() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | # -*- coding: utf-8 -*- |
||
| 82 | 1 | def dist_abs(self, src, tar): |
|
| 83 | """Return the discounted Hamming distance between two strings. |
||
| 84 | |||
| 85 | Parameters |
||
| 86 | ---------- |
||
| 87 | src : str |
||
| 88 | Source string for comparison |
||
| 89 | tar : str |
||
| 90 | Target string for comparison |
||
| 91 | |||
| 92 | Returns |
||
| 93 | ------- |
||
| 94 | float |
||
| 95 | Relaxed Hamming distance |
||
| 96 | |||
| 97 | Examples |
||
| 98 | -------- |
||
| 99 | >>> cmp = RelaxedHamming() |
||
| 100 | >>> cmp.dist_abs('cat', 'hat') |
||
| 101 | 1.0 |
||
| 102 | >>> cmp.dist_abs('Niall', 'Neil') |
||
| 103 | 1.4 |
||
| 104 | >>> cmp.dist_abs('aluminum', 'Catalan') |
||
| 105 | 6.4 |
||
| 106 | >>> cmp.dist_abs('ATCG', 'TAGC') |
||
| 107 | 0.8 |
||
| 108 | |||
| 109 | |||
| 110 | .. versionadded:: 0.4.1 |
||
| 111 | |||
| 112 | """ |
||
| 113 | 1 | if src == tar: |
|
| 114 | 1 | return 0 |
|
| 115 | |||
| 116 | 1 | if len(src) != len(tar): |
|
| 117 | 1 | replacement_char = 1 |
|
| 118 | 1 | while chr(replacement_char) in src or chr(replacement_char) in tar: |
|
| 119 | 1 | replacement_char += 1 |
|
| 120 | 1 | replacement_char = chr(replacement_char) |
|
| 121 | 1 | if len(src) < len(tar): |
|
| 122 | 1 | src += replacement_char * (len(tar) - len(src)) |
|
| 123 | else: |
||
| 124 | 1 | tar += replacement_char * (len(src) - len(tar)) |
|
| 125 | |||
| 126 | 1 | if self.params['tokenizer']: |
|
| 127 | 1 | src = self.params['tokenizer'].tokenize(src).get_list() |
|
| 128 | 1 | tar = self.params['tokenizer'].tokenize(tar).get_list() |
|
| 129 | |||
| 130 | 1 | score = 0 |
|
| 131 | 1 | for pos in range(len(src)): |
|
| 132 | 1 | if src[pos] == tar[pos : pos + 1][0]: |
|
| 133 | 1 | continue |
|
| 134 | |||
| 135 | 1 | try: |
|
| 136 | 1 | diff = ( |
|
| 137 | tar[pos + 1 : pos + self._maxdist + 1].index(src[pos]) + 1 |
||
| 138 | ) |
||
| 139 | 1 | except ValueError: |
|
| 140 | 1 | diff = 0 |
|
| 141 | 1 | try: |
|
| 142 | 1 | found = ( |
|
| 143 | tar[max(0, pos - self._maxdist) : pos][::-1].index( |
||
| 144 | src[pos] |
||
| 145 | ) |
||
| 146 | + 1 |
||
| 147 | ) |
||
| 148 | 1 | except ValueError: |
|
| 149 | 1 | found = 0 |
|
| 150 | |||
| 151 | 1 | if found and diff: |
|
| 152 | 1 | diff = min(diff, found) |
|
| 153 | 1 | elif found: |
|
| 154 | 1 | diff = found |
|
| 155 | |||
| 156 | 1 | if diff: |
|
| 157 | 1 | score += min(1.0, self._discount * diff) |
|
| 158 | else: |
||
| 159 | 1 | score += 1.0 |
|
| 160 | |||
| 161 | 1 | return score |
|
| 162 | |||
| 209 |