Conditions | 13 |
Total Lines | 118 |
Code Lines | 37 |
Lines | 0 |
Ratio | 0 % |
Tests | 30 |
CRAP Score | 13 |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.phonetic._Soundex.Soundex.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | # -*- coding: utf-8 -*- |
||
63 | 1 | def encode( |
|
64 | self, word, max_length=4, var='American', reverse=False, zero_pad=True |
||
65 | ): |
||
66 | """Return the Soundex code for a word. |
||
67 | |||
68 | Args: |
||
69 | word (str): The word to transform |
||
70 | max_length (int): The length of the code returned (defaults to 4) |
||
71 | var (str): The variant of the algorithm to employ (defaults to |
||
72 | ``American``): |
||
73 | - ``American`` follows the American Soundex algorithm, as |
||
74 | described at :cite:`US:2007` and in :cite:`Knuth:1998`; |
||
75 | this is also called Miracode |
||
76 | - ``special`` follows the rules from the 1880-1910 US |
||
77 | Census retrospective re-analysis, in which h & w are not |
||
78 | treated as blocking consonants but as vowels. Cf. |
||
79 | :cite:`Repici:2013`. |
||
80 | - ``Census`` follows the rules laid out in GIL 55 |
||
81 | :cite:`US:1997` by the US Census, including coding |
||
82 | prefixed and unprefixed versions of some names |
||
83 | reverse (bool): Reverse the word before computing the selected |
||
84 | Soundex (defaults to False); This results in "Reverse Soundex", |
||
85 | which is useful for blocking in cases where the initial |
||
86 | elements may be in error. |
||
87 | zero_pad (bool): Pad the end of the return value with 0s to achieve |
||
88 | a max_length string |
||
89 | |||
90 | Returns: |
||
91 | str: The Soundex value |
||
92 | |||
93 | Examples: |
||
94 | >>> pe = Soundex() |
||
95 | >>> pe.encode("Christopher") |
||
96 | 'C623' |
||
97 | >>> pe.encode("Niall") |
||
98 | 'N400' |
||
99 | >>> pe.encode('Smith') |
||
100 | 'S530' |
||
101 | >>> pe.encode('Schmidt') |
||
102 | 'S530' |
||
103 | |||
104 | >>> pe.encode('Christopher', max_length=-1) |
||
105 | 'C623160000000000000000000000000000000000000000000000000000000000' |
||
106 | >>> pe.encode('Christopher', max_length=-1, zero_pad=False) |
||
107 | 'C62316' |
||
108 | |||
109 | >>> pe.encode('Christopher', reverse=True) |
||
110 | 'R132' |
||
111 | |||
112 | >>> pe.encode('Ashcroft') |
||
113 | 'A261' |
||
114 | >>> pe.encode('Asicroft') |
||
115 | 'A226' |
||
116 | >>> pe.encode('Ashcroft', var='special') |
||
117 | 'A226' |
||
118 | >>> pe.encode('Asicroft', var='special') |
||
119 | 'A226' |
||
120 | |||
121 | """ |
||
122 | # Require a max_length of at least 4 and not more than 64 |
||
123 | 1 | if max_length != -1: |
|
124 | 1 | max_length = min(max(4, max_length), 64) |
|
125 | else: |
||
126 | 1 | max_length = 64 |
|
127 | |||
128 | # uppercase, normalize, decompose, and filter non-A-Z out |
||
129 | 1 | word = unicode_normalize('NFKD', text_type(word.upper())) |
|
130 | 1 | word = word.replace('ß', 'SS') |
|
131 | |||
132 | 1 | if var == 'Census': |
|
133 | # TODO: Should these prefixes be supplemented? (VANDE, DELA, VON) |
||
134 | 1 | if word[:3] in {'VAN', 'CON'} and len(word) > 4: |
|
135 | 1 | return ( |
|
136 | soundex(word, max_length, 'American', reverse, zero_pad), |
||
137 | soundex( |
||
138 | word[3:], max_length, 'American', reverse, zero_pad |
||
139 | ), |
||
140 | ) |
||
141 | 1 | if word[:2] in {'DE', 'DI', 'LA', 'LE'} and len(word) > 3: |
|
142 | 1 | return ( |
|
143 | soundex(word, max_length, 'American', reverse, zero_pad), |
||
144 | soundex( |
||
145 | word[2:], max_length, 'American', reverse, zero_pad |
||
146 | ), |
||
147 | ) |
||
148 | # Otherwise, proceed as usual (var='American' mode, ostensibly) |
||
149 | |||
150 | 1 | word = ''.join(c for c in word if c in self._uc_set) |
|
151 | |||
152 | # Nothing to convert, return base case |
||
153 | 1 | if not word: |
|
154 | 1 | if zero_pad: |
|
155 | 1 | return '0' * max_length |
|
156 | 1 | return '0' |
|
157 | |||
158 | # Reverse word if computing Reverse Soundex |
||
159 | 1 | if reverse: |
|
160 | 1 | word = word[::-1] |
|
161 | |||
162 | # apply the Soundex algorithm |
||
163 | 1 | sdx = word.translate(self._trans) |
|
164 | |||
165 | 1 | if var == 'special': |
|
166 | 1 | sdx = sdx.replace('9', '0') # special rule for 1880-1910 census |
|
167 | else: |
||
168 | 1 | sdx = sdx.replace('9', '') # rule 1 |
|
169 | 1 | sdx = self._delete_consecutive_repeats(sdx) # rule 3 |
|
170 | |||
171 | 1 | if word[0] in 'HW': |
|
172 | 1 | sdx = word[0] + sdx |
|
173 | else: |
||
174 | 1 | sdx = word[0] + sdx[1:] |
|
175 | 1 | sdx = sdx.replace('0', '') # rule 1 |
|
176 | |||
177 | 1 | if zero_pad: |
|
178 | 1 | sdx += '0' * max_length # rule 4 |
|
179 | |||
180 | 1 | return sdx[:max_length] |
|
181 | |||
247 |
This check looks for invalid names for a range of different identifiers.
You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.
If your project includes a Pylint configuration file, the settings contained in that file take precedence.
To find out more about Pylint, please refer to their site.