| Conditions | 33 |
| Total Lines | 100 |
| Code Lines | 51 |
| Lines | 0 |
| Ratio | 0 % |
| Tests | 51 |
| CRAP Score | 33 |
| Changes | 0 | ||
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.distance._guth.Guth.sim() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | # -*- coding: utf-8 -*- |
||
| 175 | 1 | def sim(self, src, tar): |
|
| 176 | """Return the relative Guth similarity of two strings. |
||
| 177 | |||
| 178 | This deviates from the algorithm described in :cite:`Guth:1976` in that |
||
| 179 | more distant matches are penalized, so that less similar terms score |
||
| 180 | lower that more similar terms. |
||
| 181 | |||
| 182 | If no match is found for a particular token in the source string, this |
||
| 183 | does not result in an automatic 0.0 score. Rather, the score is further |
||
| 184 | penalized towards 0.0. |
||
| 185 | |||
| 186 | Parameters |
||
| 187 | ---------- |
||
| 188 | src : str |
||
| 189 | Source string for comparison |
||
| 190 | tar : str |
||
| 191 | Target string for comparison |
||
| 192 | |||
| 193 | Returns |
||
| 194 | ------- |
||
| 195 | float |
||
| 196 | Relative Guth matching score |
||
| 197 | |||
| 198 | Examples |
||
| 199 | -------- |
||
| 200 | >>> cmp = Guth() |
||
| 201 | >>> cmp.sim('cat', 'hat') |
||
| 202 | 0.8666666666666667 |
||
| 203 | >>> cmp.sim('Niall', 'Neil') |
||
| 204 | 0.8800000000000001 |
||
| 205 | >>> cmp.sim('aluminum', 'Catalan') |
||
| 206 | 0.4 |
||
| 207 | >>> cmp.sim('ATCG', 'TAGC') |
||
| 208 | 0.8 |
||
| 209 | |||
| 210 | |||
| 211 | .. versionadded:: 0.4.1 |
||
| 212 | |||
| 213 | """ |
||
| 214 | 1 | if src == tar: |
|
| 215 | 1 | return 1.0 |
|
| 216 | 1 | if not src or not tar: |
|
| 217 | 1 | return 0.0 |
|
| 218 | |||
| 219 | 1 | if self.params['tokenizer']: |
|
| 220 | 1 | src = self.params['tokenizer'].tokenize(src).get_list() |
|
| 221 | 1 | tar = self.params['tokenizer'].tokenize(tar).get_list() |
|
| 222 | |||
| 223 | 1 | score = 0 |
|
| 224 | 1 | for pos in range(len(src)): |
|
| 225 | 1 | s = self._token_at(src, pos) |
|
| 226 | 1 | t = self._token_at(tar, pos) |
|
| 227 | 1 | if s and t and s == t: |
|
| 228 | 1 | score += 1.0 |
|
| 229 | 1 | continue |
|
| 230 | |||
| 231 | 1 | t = self._token_at(tar, pos + 1) |
|
| 232 | 1 | if s and t and s == t: |
|
| 233 | 1 | score += 0.8 |
|
| 234 | 1 | continue |
|
| 235 | |||
| 236 | 1 | t = self._token_at(tar, pos + 2) |
|
| 237 | 1 | if s and t and s == t: |
|
| 238 | 1 | score += 0.6 |
|
| 239 | 1 | continue |
|
| 240 | |||
| 241 | 1 | t = self._token_at(tar, pos - 1) |
|
| 242 | 1 | if s and t and s == t: |
|
| 243 | 1 | score += 0.8 |
|
| 244 | 1 | continue |
|
| 245 | |||
| 246 | 1 | s = self._token_at(src, pos - 1) |
|
| 247 | 1 | t = self._token_at(tar, pos) |
|
| 248 | 1 | if s and t and s == t: |
|
| 249 | 1 | score += 0.8 |
|
| 250 | 1 | continue |
|
| 251 | |||
| 252 | 1 | s = self._token_at(src, pos + 1) |
|
| 253 | 1 | if s and t and s == t: |
|
| 254 | 1 | score += 0.8 |
|
| 255 | 1 | continue |
|
| 256 | |||
| 257 | 1 | s = self._token_at(src, pos + 2) |
|
| 258 | 1 | if s and t and s == t: |
|
| 259 | 1 | score += 0.6 |
|
| 260 | 1 | continue |
|
| 261 | |||
| 262 | 1 | s = self._token_at(src, pos + 1) |
|
| 263 | 1 | t = self._token_at(tar, pos + 1) |
|
| 264 | 1 | if s and t and s == t: |
|
| 265 | 1 | score += 0.6 |
|
| 266 | 1 | continue |
|
| 267 | |||
| 268 | 1 | s = self._token_at(src, pos + 2) |
|
| 269 | 1 | t = self._token_at(tar, pos + 2) |
|
| 270 | 1 | if s and t and s == t: |
|
| 271 | 1 | score += 0.2 |
|
| 272 | 1 | continue |
|
| 273 | |||
| 274 | 1 | return score / len(src) |
|
| 275 | |||
| 281 |