Conditions | 13 |
Total Lines | 125 |
Code Lines | 37 |
Lines | 0 |
Ratio | 0 % |
Tests | 30 |
CRAP Score | 13 |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.phonetic._soundex.Soundex.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | # -*- coding: utf-8 -*- |
||
63 | 1 | def encode( |
|
64 | self, word, max_length=4, var='American', reverse=False, zero_pad=True |
||
65 | ): |
||
66 | """Return the Soundex code for a word. |
||
67 | |||
68 | Parameters |
||
69 | ---------- |
||
70 | word : str |
||
71 | The word to transform |
||
72 | max_length : int |
||
73 | The length of the code returned (defaults to 4) |
||
74 | var : str |
||
75 | The variant of the algorithm to employ (defaults to ``American``): |
||
76 | |||
77 | - ``American`` follows the American Soundex algorithm, as |
||
78 | described at :cite:`US:2007` and in :cite:`Knuth:1998`; this |
||
79 | is also called Miracode |
||
80 | - ``special`` follows the rules from the 1880-1910 US Census |
||
81 | retrospective re-analysis, in which h & w are not treated as |
||
82 | blocking consonants but as vowels. Cf. :cite:`Repici:2013`. |
||
83 | - ``Census`` follows the rules laid out in GIL 55 |
||
84 | :cite:`US:1997` by the US Census, including coding prefixed |
||
85 | and unprefixed versions of some names |
||
86 | |||
87 | reverse : bool |
||
88 | Reverse the word before computing the selected Soundex (defaults to |
||
89 | False); This results in "Reverse Soundex", which is useful for |
||
90 | blocking in cases where the initial elements may be in error. |
||
91 | zero_pad : bool |
||
92 | Pad the end of the return value with 0s to achieve a max_length |
||
93 | string |
||
94 | |||
95 | Returns |
||
96 | ------- |
||
97 | str |
||
98 | The Soundex value |
||
99 | |||
100 | Examples |
||
101 | -------- |
||
102 | >>> pe = Soundex() |
||
103 | >>> pe.encode("Christopher") |
||
104 | 'C623' |
||
105 | >>> pe.encode("Niall") |
||
106 | 'N400' |
||
107 | >>> pe.encode('Smith') |
||
108 | 'S530' |
||
109 | >>> pe.encode('Schmidt') |
||
110 | 'S530' |
||
111 | |||
112 | >>> pe.encode('Christopher', max_length=-1) |
||
113 | 'C623160000000000000000000000000000000000000000000000000000000000' |
||
114 | >>> pe.encode('Christopher', max_length=-1, zero_pad=False) |
||
115 | 'C62316' |
||
116 | |||
117 | >>> pe.encode('Christopher', reverse=True) |
||
118 | 'R132' |
||
119 | |||
120 | >>> pe.encode('Ashcroft') |
||
121 | 'A261' |
||
122 | >>> pe.encode('Asicroft') |
||
123 | 'A226' |
||
124 | >>> pe.encode('Ashcroft', var='special') |
||
125 | 'A226' |
||
126 | >>> pe.encode('Asicroft', var='special') |
||
127 | 'A226' |
||
128 | |||
129 | """ |
||
130 | # Require a max_length of at least 4 and not more than 64 |
||
131 | 1 | if max_length != -1: |
|
132 | 1 | max_length = min(max(4, max_length), 64) |
|
133 | else: |
||
134 | 1 | max_length = 64 |
|
135 | |||
136 | # uppercase, normalize, decompose, and filter non-A-Z out |
||
137 | 1 | word = unicode_normalize('NFKD', text_type(word.upper())) |
|
138 | 1 | word = word.replace('ß', 'SS') |
|
139 | |||
140 | 1 | if var == 'Census': |
|
141 | 1 | if word[:3] in {'VAN', 'CON'} and len(word) > 4: |
|
142 | 1 | return ( |
|
143 | soundex(word, max_length, 'American', reverse, zero_pad), |
||
144 | soundex( |
||
145 | word[3:], max_length, 'American', reverse, zero_pad |
||
146 | ), |
||
147 | ) |
||
148 | 1 | if word[:2] in {'DE', 'DI', 'LA', 'LE'} and len(word) > 3: |
|
149 | 1 | return ( |
|
150 | soundex(word, max_length, 'American', reverse, zero_pad), |
||
151 | soundex( |
||
152 | word[2:], max_length, 'American', reverse, zero_pad |
||
153 | ), |
||
154 | ) |
||
155 | # Otherwise, proceed as usual (var='American' mode, ostensibly) |
||
156 | |||
157 | 1 | word = ''.join(c for c in word if c in self._uc_set) |
|
158 | |||
159 | # Nothing to convert, return base case |
||
160 | 1 | if not word: |
|
161 | 1 | if zero_pad: |
|
162 | 1 | return '0' * max_length |
|
163 | 1 | return '0' |
|
164 | |||
165 | # Reverse word if computing Reverse Soundex |
||
166 | 1 | if reverse: |
|
167 | 1 | word = word[::-1] |
|
168 | |||
169 | # apply the Soundex algorithm |
||
170 | 1 | sdx = word.translate(self._trans) |
|
171 | |||
172 | 1 | if var == 'special': |
|
173 | 1 | sdx = sdx.replace('9', '0') # special rule for 1880-1910 census |
|
174 | else: |
||
175 | 1 | sdx = sdx.replace('9', '') # rule 1 |
|
176 | 1 | sdx = self._delete_consecutive_repeats(sdx) # rule 3 |
|
177 | |||
178 | 1 | if word[0] in 'HW': |
|
179 | 1 | sdx = word[0] + sdx |
|
180 | else: |
||
181 | 1 | sdx = word[0] + sdx[1:] |
|
182 | 1 | sdx = sdx.replace('0', '') # rule 1 |
|
183 | |||
184 | 1 | if zero_pad: |
|
185 | 1 | sdx += '0' * max_length # rule 4 |
|
186 | |||
187 | 1 | return sdx[:max_length] |
|
188 | |||
262 |