| Conditions | 10 |
| Total Lines | 74 |
| Code Lines | 25 |
| Lines | 0 |
| Ratio | 0 % |
| Tests | 20 |
| CRAP Score | 10 |
| Changes | 0 | ||
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.tokenizer._qgrams.QGrams.__init__() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
| 1 | # -*- coding: utf-8 -*- |
||
| 47 | 1 | def __init__(self, term, qval=2, start_stop='$#', skip=0): |
|
| 48 | """Initialize QGrams. |
||
| 49 | |||
| 50 | Parameters |
||
| 51 | ---------- |
||
| 52 | term : str |
||
| 53 | A string to extract q-grams from |
||
| 54 | qval : int or Iterable |
||
| 55 | The q-gram length (defaults to 2), can be an integer, range object, |
||
| 56 | or list |
||
| 57 | start_stop : str |
||
| 58 | A string of length >= 0 indicating start & stop symbols. |
||
| 59 | If the string is '', q-grams will be calculated without start & |
||
| 60 | stop symbols appended to each end. |
||
| 61 | Otherwise, the first character of start_stop will pad the |
||
| 62 | beginning of the string and the last character of start_stop |
||
| 63 | will pad the end of the string before q-grams are calculated. |
||
| 64 | (In the case that start_stop is only 1 character long, the same |
||
| 65 | symbol will be used for both.) |
||
| 66 | skip : int or Iterable |
||
| 67 | The number of characters to skip, can be an integer, range object, |
||
| 68 | or list |
||
| 69 | |||
| 70 | Examples |
||
| 71 | -------- |
||
| 72 | >>> qg = QGrams('AATTATAT') |
||
| 73 | >>> qg |
||
| 74 | QGrams({'AT': 3, 'TA': 2, '$A': 1, 'AA': 1, 'TT': 1, 'T#': 1}) |
||
| 75 | |||
| 76 | >>> qg = QGrams('AATTATAT', qval=1, start_stop='') |
||
| 77 | >>> qg |
||
| 78 | QGrams({'A': 4, 'T': 4}) |
||
| 79 | |||
| 80 | >>> qg = QGrams('AATTATAT', qval=3, start_stop='') |
||
| 81 | >>> qg |
||
| 82 | QGrams({'TAT': 2, 'AAT': 1, 'ATT': 1, 'TTA': 1, 'ATA': 1}) |
||
| 83 | |||
| 84 | """ |
||
| 85 | # Save the term itself |
||
| 86 | 1 | self._term = term |
|
| 87 | 1 | self._term_ss = term |
|
| 88 | 1 | self._ordered_list = [] |
|
| 89 | |||
| 90 | 1 | if not isinstance(qval, Iterable): |
|
| 91 | 1 | qval = (qval,) |
|
| 92 | 1 | if not isinstance(skip, Iterable): |
|
| 93 | 1 | skip = (skip,) |
|
| 94 | |||
| 95 | 1 | for qval_i in qval: |
|
| 96 | 1 | for skip_i in skip: |
|
| 97 | 1 | if len(self._term) < qval_i or qval_i < 1: |
|
| 98 | 1 | continue |
|
| 99 | |||
| 100 | 1 | if start_stop and qval_i > 1: |
|
| 101 | 1 | term = ( |
|
| 102 | start_stop[0] * (qval_i - 1) |
||
| 103 | + self._term |
||
| 104 | + start_stop[-1] * (qval_i - 1) |
||
| 105 | ) |
||
| 106 | else: |
||
| 107 | 1 | term = self._term |
|
| 108 | |||
| 109 | # Having appended start & stop symbols (or not), save the |
||
| 110 | # result, but only for the longest valid qval_i |
||
| 111 | 1 | if len(term) > len(self._term_ss): |
|
| 112 | 1 | self._term_ss = term |
|
| 113 | |||
| 114 | 1 | skip_i += 1 |
|
| 115 | 1 | self._ordered_list += [ |
|
| 116 | term[i : i + (qval_i * skip_i) : skip_i] |
||
| 117 | for i in range(len(term) - (qval_i - 1)) |
||
| 118 | ] |
||
| 119 | |||
| 120 | 1 | super(QGrams, self).__init__(self._ordered_list) |
|
| 121 | |||
| 152 |