Conditions | 17 |
Total Lines | 94 |
Code Lines | 45 |
Lines | 0 |
Ratio | 0 % |
Tests | 39 |
CRAP Score | 17 |
Changes | 0 |
Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.
For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.
Commonly applied refactorings include:
If many parameters/temporary variables are present:
Complex classes like abydos.stemmer._caumanns.Caumanns.stem() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.
Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.
1 | # -*- coding: utf-8 -*- |
||
53 | 1 | def stem(self, word): |
|
54 | """Return Caumanns German stem. |
||
55 | |||
56 | Parameters |
||
57 | ---------- |
||
58 | word : str |
||
59 | The word to stem |
||
60 | |||
61 | Returns |
||
62 | ------- |
||
63 | str |
||
64 | Word stem |
||
65 | |||
66 | Examples |
||
67 | -------- |
||
68 | >>> stmr = Caumanns() |
||
69 | >>> stmr.stem('lesen') |
||
70 | 'les' |
||
71 | >>> stmr.stem('graues') |
||
72 | 'grau' |
||
73 | >>> stmr.stem('buchstabieren') |
||
74 | 'buchstabier' |
||
75 | |||
76 | """ |
||
77 | 1 | if not word: |
|
78 | 1 | return '' |
|
79 | |||
80 | 1 | upper_initial = word[0].isupper() |
|
81 | 1 | word = normalize('NFC', text_type(word.lower())) |
|
82 | |||
83 | # # Part 2: Substitution |
||
84 | # 1. Change umlauts to corresponding vowels & ß to ss |
||
85 | 1 | word = word.translate(self._umlauts) |
|
86 | 1 | word = word.replace('ß', 'ss') |
|
87 | |||
88 | # 2. Change second of doubled characters to * |
||
89 | 1 | new_word = word[0] |
|
90 | 1 | for i in range(1, len(word)): |
|
91 | 1 | if new_word[i - 1] == word[i]: |
|
92 | 1 | new_word += '*' |
|
93 | else: |
||
94 | 1 | new_word += word[i] |
|
95 | 1 | word = new_word |
|
96 | |||
97 | # 3. Replace sch, ch, ei, ie with $, §, %, & |
||
98 | 1 | word = word.replace('sch', '$') |
|
99 | 1 | word = word.replace('ch', '§') |
|
100 | 1 | word = word.replace('ei', '%') |
|
101 | 1 | word = word.replace('ie', '&') |
|
102 | 1 | word = word.replace('ig', '#') |
|
103 | 1 | word = word.replace('st', '!') |
|
104 | |||
105 | # # Part 1: Recursive Context-Free Stripping |
||
106 | # 1. Remove the following 7 suffixes recursively |
||
107 | 1 | while len(word) > 3: |
|
108 | 1 | if (len(word) > 4 and word[-2:] in {'em', 'er'}) or ( |
|
109 | len(word) > 5 and word[-2:] == 'nd' |
||
110 | ): |
||
111 | 1 | word = word[:-2] |
|
112 | 1 | elif (word[-1] in {'e', 's', 'n'}) or ( |
|
113 | not upper_initial and word[-1] in {'t', '!'} |
||
114 | ): |
||
115 | 1 | word = word[:-1] |
|
116 | else: |
||
117 | 1 | break |
|
118 | |||
119 | # Additional optimizations: |
||
120 | 1 | if len(word) > 5 and word[-5:] == 'erin*': |
|
121 | 1 | word = word[:-1] |
|
122 | 1 | if word[-1] == 'z': |
|
123 | 1 | word = word[:-1] + 'x' |
|
124 | |||
125 | # Reverse substitutions: |
||
126 | 1 | word = word.replace('$', 'sch') |
|
127 | 1 | word = word.replace('§', 'ch') |
|
128 | 1 | word = word.replace('%', 'ei') |
|
129 | 1 | word = word.replace('&', 'ie') |
|
130 | 1 | word = word.replace('#', 'ig') |
|
131 | 1 | word = word.replace('!', 'st') |
|
132 | |||
133 | # Expand doubled |
||
134 | 1 | word = ''.join( |
|
135 | [word[0]] |
||
136 | + [ |
||
137 | word[i - 1] if word[i] == '*' else word[i] |
||
138 | for i in range(1, len(word)) |
||
139 | ] |
||
140 | ) |
||
141 | |||
142 | # Finally, convert gege to ge |
||
143 | 1 | if len(word) > 4: |
|
144 | 1 | word = word.replace('gege', 'ge', 1) |
|
145 | |||
146 | 1 | return word |
|
147 | |||
181 |