Completed
Pull Request — master (#138)
by Chris
14:20
created

abydos.stemmer._snowball.Porter.stem()   F

Complexity

Conditions 116

Size

Total Lines 223
Code Lines 176

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 172
CRAP Score 116

Importance

Changes 0
Metric Value
eloc 176
dl 0
loc 223
ccs 172
cts 172
cp 1
rs 0
c 0
b 0
f 0
cc 116
nop 3
crap 116

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.stemmer._snowball.Porter.stem() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
0 ignored issues
show
coding-style introduced by
Too many lines in module (1544/1000)
Loading history...
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.stemmer._snowball.
20
21
The stemmer._snowball module defines the stemmers:
22
23
    - Porter
24
    - Porter2 (Snowball English)
25
    - Snowball German
26
    - Snowball Dutch
27
    - Snowball Norwegian
28
    - Snowball Swedish
29
    - Snowball Danish
30
"""
31
32 1
from __future__ import unicode_literals
33
34 1
from unicodedata import normalize
35
36 1
from six import text_type
37 1
from six.moves import range
38
39 1
from ._stemmer import Stemmer
40
41 1
__all__ = [
42
    'Porter',
43
    'Porter2',
44
    'SnowballDanish',
45
    'SnowballDutch',
46
    'SnowballGerman',
47
    'SnowballNorwegian',
48
    'SnowballSwedish',
49
    'porter',
50
    'porter2',
51
    'sb_danish',
52
    'sb_dutch',
53
    'sb_german',
54
    'sb_norwegian',
55
    'sb_swedish',
56
]
57
58
59 1
class Porter(Stemmer):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
60
    """Porter stemmer.
61
62
    The Porter stemmer is described in :cite:`Porter:1980`.
63
    """
64
65 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y'}
66
67 1
    def _m_degree(self, term):
68
        """Return Porter helper function _m_degree value.
69
70
        m-degree is equal to the number of V to C transitions
71
72
        :param str term: the word for which to calculate the m-degree
73
        :returns: the m-degree as defined in the Porter stemmer definition
74
        :rtype: int
75
        """
76 1
        mdeg = 0
77 1
        last_was_vowel = False
78 1
        for letter in term:
79 1
            if letter in self._vowels:
80 1
                last_was_vowel = True
81
            else:
82 1
                if last_was_vowel:
83 1
                    mdeg += 1
84 1
                last_was_vowel = False
85 1
        return mdeg
86
87 1
    def _has_vowel(self, term):
88
        """Return Porter helper function _has_vowel value.
89
90
        :param str term: the word to scan for vowels
91
        :returns: true iff a vowel exists in the term (as defined in the Porter
92
            stemmer definition)
93
        :rtype: bool
94
        """
95 1
        for letter in term:
96 1
            if letter in self._vowels:
97 1
                return True
98 1
        return False
99
100 1
    def _ends_in_doubled_cons(self, term):
101
        """Return Porter helper function _ends_in_doubled_cons value.
102
103
        :param str term: the word to check for a final doubled consonant
104
        :param set vowels: the set of vowels in the language
105
        :returns: true iff the stem ends in a doubled consonant (as defined in
106
            the Porter stemmer definition)
107
        :rtype: bool
108
        """
109 1
        return (
110
            len(term) > 1
111
            and term[-1] not in self._vowels
112
            and term[-2] == term[-1]
113
        )
114
115 1
    def _ends_in_cvc(self, term):
116
        """Return Porter helper function _ends_in_cvc value.
117
118
        :param str term: the word to scan for cvc
119
        :returns: true iff the stem ends in cvc (as defined in the Porter
120
            stemmer definition)
121
        :rtype: bool
122
        """
123 1
        return len(term) > 2 and (
124
            term[-1] not in self._vowels
125
            and term[-2] in self._vowels
126
            and term[-3] not in self._vowels
127
            and term[-1] not in tuple('wxY')
128
        )
129
130 1
    def stem(self, word, early_english=False):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
131
        """Return Porter stem.
132
133
        :param str word: the word to calculate the stem of
134
        :param bool early_english: set to True in order to remove -eth & -est
135
            (2nd & 3rd person singular verbal agreement suffixes)
136
        :returns: word stem
137
        :rtype: str
138
139
        >>> stmr = Porter()
140
        >>> stmr.stem('reading')
141
        'read'
142
        >>> stmr.stem('suspension')
143
        'suspens'
144
        >>> stmr.stem('elusiveness')
145
        'elus'
146
147
        >>> stmr.stem('eateth', early_english=True)
148
        'eat'
149
        """
150
        # lowercase, normalize, and compose
151 1
        word = normalize('NFC', text_type(word.lower()))
152
153
        # Return word if stem is shorter than 2
154 1
        if len(word) < 3:
155 1
            return word
156
157
        # Re-map consonantal y to Y (Y will be C, y will be V)
158 1
        if word[0] == 'y':
159 1
            word = 'Y' + word[1:]
160 1
        for i in range(1, len(word)):
161 1
            if word[i] == 'y' and word[i - 1] in self._vowels:
162 1
                word = word[:i] + 'Y' + word[i + 1 :]
163
164
        # Step 1a
165 1
        if word[-1] == 's':
166 1
            if word[-4:] == 'sses':
167 1
                word = word[:-2]
168 1
            elif word[-3:] == 'ies':
169 1
                word = word[:-2]
170 1
            elif word[-2:] == 'ss':
171 1
                pass
172
            else:
173 1
                word = word[:-1]
174
175
        # Step 1b
176 1
        step1b_flag = False
177 1
        if word[-3:] == 'eed':
178 1
            if self._m_degree(word[:-3]) > 0:
179 1
                word = word[:-1]
180 1
        elif word[-2:] == 'ed':
181 1
            if self._has_vowel(word[:-2]):
182 1
                word = word[:-2]
183 1
                step1b_flag = True
184 1
        elif word[-3:] == 'ing':
185 1
            if self._has_vowel(word[:-3]):
186 1
                word = word[:-3]
187 1
                step1b_flag = True
188 1
        elif early_english:
189 1
            if word[-3:] == 'est':
190 1
                if self._has_vowel(word[:-3]):
191 1
                    word = word[:-3]
192 1
                    step1b_flag = True
193 1
            elif word[-3:] == 'eth':
194 1
                if self._has_vowel(word[:-3]):
195 1
                    word = word[:-3]
196 1
                    step1b_flag = True
197
198 1
        if step1b_flag:
199 1
            if word[-2:] in {'at', 'bl', 'iz'}:
200 1
                word += 'e'
201 1
            elif self._ends_in_doubled_cons(word) and word[-1] not in {
202
                'l',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
203
                's',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
204
                'z',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
205
            }:
206 1
                word = word[:-1]
207 1
            elif self._m_degree(word) == 1 and self._ends_in_cvc(word):
208 1
                word += 'e'
209
210
        # Step 1c
211 1
        if word[-1] in {'Y', 'y'} and self._has_vowel(word[:-1]):
212 1
            word = word[:-1] + 'i'
213
214
        # Step 2
215 1
        if len(word) > 1:
216 1
            if word[-2] == 'a':
217 1
                if word[-7:] == 'ational':
218 1
                    if self._m_degree(word[:-7]) > 0:
219 1
                        word = word[:-5] + 'e'
220 1
                elif word[-6:] == 'tional':
221 1
                    if self._m_degree(word[:-6]) > 0:
222 1
                        word = word[:-2]
223 1
            elif word[-2] == 'c':
224 1
                if word[-4:] in {'enci', 'anci'}:
225 1
                    if self._m_degree(word[:-4]) > 0:
226 1
                        word = word[:-1] + 'e'
227 1
            elif word[-2] == 'e':
228 1
                if word[-4:] == 'izer':
229 1
                    if self._m_degree(word[:-4]) > 0:
230 1
                        word = word[:-1]
231 1
            elif word[-2] == 'g':
232 1
                if word[-4:] == 'logi':
233 1
                    if self._m_degree(word[:-4]) > 0:
234 1
                        word = word[:-1]
235 1
            elif word[-2] == 'l':
236 1
                if word[-3:] == 'bli':
237 1
                    if self._m_degree(word[:-3]) > 0:
238 1
                        word = word[:-1] + 'e'
239 1
                elif word[-4:] == 'alli':
240 1
                    if self._m_degree(word[:-4]) > 0:
241 1
                        word = word[:-2]
242 1
                elif word[-5:] == 'entli':
243 1
                    if self._m_degree(word[:-5]) > 0:
244 1
                        word = word[:-2]
245 1
                elif word[-3:] == 'eli':
246 1
                    if self._m_degree(word[:-3]) > 0:
247 1
                        word = word[:-2]
248 1
                elif word[-5:] == 'ousli':
249 1
                    if self._m_degree(word[:-5]) > 0:
250 1
                        word = word[:-2]
251 1
            elif word[-2] == 'o':
252 1
                if word[-7:] == 'ization':
253 1
                    if self._m_degree(word[:-7]) > 0:
254 1
                        word = word[:-5] + 'e'
255 1
                elif word[-5:] == 'ation':
256 1
                    if self._m_degree(word[:-5]) > 0:
257 1
                        word = word[:-3] + 'e'
258 1
                elif word[-4:] == 'ator':
259 1
                    if self._m_degree(word[:-4]) > 0:
260 1
                        word = word[:-2] + 'e'
261 1
            elif word[-2] == 's':
262 1
                if word[-5:] == 'alism':
263 1
                    if self._m_degree(word[:-5]) > 0:
264 1
                        word = word[:-3]
265 1
                elif word[-7:] in {'iveness', 'fulness', 'ousness'}:
266 1
                    if self._m_degree(word[:-7]) > 0:
267 1
                        word = word[:-4]
268 1
            elif word[-2] == 't':
269 1
                if word[-5:] == 'aliti':
270 1
                    if self._m_degree(word[:-5]) > 0:
271 1
                        word = word[:-3]
272 1
                elif word[-5:] == 'iviti':
273 1
                    if self._m_degree(word[:-5]) > 0:
274 1
                        word = word[:-3] + 'e'
275 1
                elif word[-6:] == 'biliti':
276 1
                    if self._m_degree(word[:-6]) > 0:
277 1
                        word = word[:-5] + 'le'
278
279
        # Step 3
280 1
        if word[-5:] in 'icate':
281 1
            if self._m_degree(word[:-5]) > 0:
282 1
                word = word[:-3]
283 1
        elif word[-5:] == 'ative':
284 1
            if self._m_degree(word[:-5]) > 0:
285 1
                word = word[:-5]
286 1
        elif word[-5:] in {'alize', 'iciti'}:
287 1
            if self._m_degree(word[:-5]) > 0:
288 1
                word = word[:-3]
289 1
        elif word[-4:] == 'ical':
290 1
            if self._m_degree(word[:-4]) > 0:
291 1
                word = word[:-2]
292 1
        elif word[-3:] == 'ful':
293 1
            if self._m_degree(word[:-3]) > 0:
294 1
                word = word[:-3]
295 1
        elif word[-4:] == 'ness':
296 1
            if self._m_degree(word[:-4]) > 0:
297 1
                word = word[:-4]
298
299
        # Step 4
300 1
        if word[-2:] == 'al':
301 1
            if self._m_degree(word[:-2]) > 1:
302 1
                word = word[:-2]
303 1
        elif word[-4:] in {'ance', 'ence'}:
304 1
            if self._m_degree(word[:-4]) > 1:
305 1
                word = word[:-4]
306 1
        elif word[-2:] in {'er', 'ic'}:
307 1
            if self._m_degree(word[:-2]) > 1:
308 1
                word = word[:-2]
309 1
        elif word[-4:] in {'able', 'ible'}:
310 1
            if self._m_degree(word[:-4]) > 1:
311 1
                word = word[:-4]
312 1
        elif word[-3:] == 'ant':
313 1
            if self._m_degree(word[:-3]) > 1:
314 1
                word = word[:-3]
315 1
        elif word[-5:] == 'ement':
316 1
            if self._m_degree(word[:-5]) > 1:
317 1
                word = word[:-5]
318 1
        elif word[-4:] == 'ment':
319 1
            if self._m_degree(word[:-4]) > 1:
320 1
                word = word[:-4]
321 1
        elif word[-3:] == 'ent':
322 1
            if self._m_degree(word[:-3]) > 1:
323 1
                word = word[:-3]
324 1
        elif word[-4:] in {'sion', 'tion'}:
325 1
            if self._m_degree(word[:-3]) > 1:
326 1
                word = word[:-3]
327 1
        elif word[-2:] == 'ou':
328 1
            if self._m_degree(word[:-2]) > 1:
329 1
                word = word[:-2]
330 1
        elif word[-3:] in {'ism', 'ate', 'iti', 'ous', 'ive', 'ize'}:
331 1
            if self._m_degree(word[:-3]) > 1:
332 1
                word = word[:-3]
333
334
        # Step 5a
335 1
        if word[-1] == 'e':
336 1
            if self._m_degree(word[:-1]) > 1:
337 1
                word = word[:-1]
338 1
            elif self._m_degree(word[:-1]) == 1 and not self._ends_in_cvc(
339
                word[:-1]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
340
            ):
341 1
                word = word[:-1]
342
343
        # Step 5b
344 1
        if word[-2:] == 'll' and self._m_degree(word) > 1:
345 1
            word = word[:-1]
346
347
        # Change 'Y' back to 'y' if it survived stemming
348 1
        for i in range(len(word)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
349 1
            if word[i] == 'Y':
350 1
                word = word[:i] + 'y' + word[i + 1 :]
351
352 1
        return word
353
354
355 1
def porter(word, early_english=False):
356
    """Return Porter stem.
357
358
    This is a wrapper for :py:meth:`Porter.stem`.
359
360
    :param str word: the word to calculate the stem of
361
    :param bool early_english: set to True in order to remove -eth & -est
362
        (2nd & 3rd person singular verbal agreement suffixes)
363
    :returns: word stem
364
    :rtype: str
365
366
    >>> porter('reading')
367
    'read'
368
    >>> porter('suspension')
369
    'suspens'
370
    >>> porter('elusiveness')
371
    'elus'
372
373
    >>> porter('eateth', early_english=True)
374
    'eat'
375
    """
376 1
    return Porter().stem(word, early_english)
377
378
379 1
class Snowball(Stemmer):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
380
    """Snowball stemmer base class."""
381
382 1
    _vowels = set('aeiouy')
383 1
    _codanonvowels = set('\'bcdfghjklmnpqrstvz')
384
385 1
    def _sb_r1(self, term, r1_prefixes=None):
386
        """Return the R1 region, as defined in the Porter2 specification."""
387 1
        vowel_found = False
388 1
        if hasattr(r1_prefixes, '__iter__'):
389 1
            for prefix in r1_prefixes:
390 1
                if term[: len(prefix)] == prefix:
391 1
                    return len(prefix)
392
393 1
        for i in range(len(term)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
394 1
            if not vowel_found and term[i] in self._vowels:
395 1
                vowel_found = True
396 1
            elif vowel_found and term[i] not in self._vowels:
397 1
                return i + 1
398 1
        return len(term)
399
400 1
    def _sb_r2(self, term, r1_prefixes=None):
401
        """Return the R2 region, as defined in the Porter2 specification."""
402 1
        r1_start = self._sb_r1(term, r1_prefixes)
403 1
        return r1_start + self._sb_r1(term[r1_start:])
404
405 1
    def _sb_ends_in_short_syllable(self, term):
406
        """Return True iff term ends in a short syllable.
407
408
        (...according to the Porter2 specification.)
409
410
        NB: This is akin to the CVC test from the Porter stemmer. The
411
        description is unfortunately poor/ambiguous.
412
        """
413 1
        if not term:
414 1
            return False
415 1
        if len(term) == 2:
416 1
            if term[-2] in self._vowels and term[-1] not in self._vowels:
417 1
                return True
418 1
        elif len(term) >= 3:
419 1
            if (
420
                term[-3] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
421
                and term[-2] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
422
                and term[-1] in self._codanonvowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
423
            ):
424 1
                return True
425 1
        return False
426
427 1
    def _sb_short_word(self, term, r1_prefixes=None):
428
        """Return True iff term is a short word.
429
430
        (...according to the Porter2 specification.)
431
        """
432 1
        if self._sb_r1(term, r1_prefixes) == len(
433
            term
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
434
        ) and self._sb_ends_in_short_syllable(term):
435 1
            return True
436 1
        return False
437
438 1
    def _sb_has_vowel(self, term):
439
        """Return Porter helper function _sb_has_vowel value.
440
441
        :param str term: the word to scan for vowels
442
        :returns: true iff a vowel exists in the term (as defined in the Porter
443
            stemmer definition)
444
        :rtype: bool
445
        """
446 1
        for letter in term:
447 1
            if letter in self._vowels:
448 1
                return True
449 1
        return False
450
451
452 1
class Porter2(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
453
    """Porter2 (Snowball English) stemmer.
454
455
    The Porter2 (Snowball English) stemmer is defined in :cite:`Porter:2002`.
456
    """
457
458 1
    _doubles = {'bb', 'dd', 'ff', 'gg', 'mm', 'nn', 'pp', 'rr', 'tt'}
459 1
    _li = {'c', 'd', 'e', 'g', 'h', 'k', 'm', 'n', 'r', 't'}
460
461
    # R1 prefixes should be in order from longest to shortest to prevent
462
    # masking
463 1
    _r1_prefixes = ('commun', 'gener', 'arsen')
464 1
    _exception1dict = {  # special changes:
465
        'skis': 'ski',
466
        'skies': 'sky',
467
        'dying': 'die',
468
        'lying': 'lie',
469
        'tying': 'tie',
470
        # special -LY cases:
471
        'idly': 'idl',
472
        'gently': 'gentl',
473
        'ugly': 'ugli',
474
        'early': 'earli',
475
        'only': 'onli',
476
        'singly': 'singl',
477
    }
478 1
    _exception1set = {
479
        'sky',
480
        'news',
481
        'howe',
482
        'atlas',
483
        'cosmos',
484
        'bias',
485
        'andes',
486
    }
487 1
    _exception2set = {
488
        'inning',
489
        'outing',
490
        'canning',
491
        'herring',
492
        'earring',
493
        'proceed',
494
        'exceed',
495
        'succeed',
496
    }
497
498 1
    def stem(self, word, early_english=False):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
best-practice introduced by
Too many return statements (7/6)
Loading history...
499
        """Return the Porter2 (Snowball English) stem.
500
501
        :param str word: the word to calculate the stem of
502
        :param bool early_english: set to True in order to remove -eth & -est
503
            (2nd & 3rd person singular verbal agreement suffixes)
504
        :returns: word stem
505
        :rtype: str
506
507
        >>> stmr = Porter2()
508
        >>> stmr.stem('reading')
509
        'read'
510
        >>> stmr.stem('suspension')
511
        'suspens'
512
        >>> stmr.stem('elusiveness')
513
        'elus'
514
515
        >>> stmr.stem('eateth', early_english=True)
516
        'eat'
517
        """
518
        # lowercase, normalize, and compose
519 1
        word = normalize('NFC', text_type(word.lower()))
520
        # replace apostrophe-like characters with U+0027, per
521
        # http://snowball.tartarus.org/texts/apostrophe.html
522 1
        word = word.replace('’', '\'')
523 1
        word = word.replace('’', '\'')
524
525
        # Exceptions 1
526 1
        if word in self._exception1dict:
527 1
            return self._exception1dict[word]
528 1
        elif word in self._exception1set:
529 1
            return word
530
531
        # Return word if stem is shorter than 3
532 1
        if len(word) < 3:
533 1
            return word
534
535
        # Remove initial ', if present.
536 1
        while word and word[0] == '\'':
537 1
            word = word[1:]
538
            # Return word if stem is shorter than 2
539 1
            if len(word) < 2:
540 1
                return word
541
542
        # Re-map vocalic Y to y (Y will be C, y will be V)
543 1
        if word[0] == 'y':
544 1
            word = 'Y' + word[1:]
545 1
        for i in range(1, len(word)):
546 1
            if word[i] == 'y' and word[i - 1] in self._vowels:
547 1
                word = word[:i] + 'Y' + word[i + 1 :]
548
549 1
        r1_start = self._sb_r1(word, self._r1_prefixes)
550 1
        r2_start = self._sb_r2(word, self._r1_prefixes)
551
552
        # Step 0
553 1
        if word[-3:] == '\'s\'':
554 1
            word = word[:-3]
555 1
        elif word[-2:] == '\'s':
556 1
            word = word[:-2]
557 1
        elif word[-1:] == '\'':
558 1
            word = word[:-1]
559
        # Return word if stem is shorter than 2
560 1
        if len(word) < 3:
561 1
            return word
562
563
        # Step 1a
564 1
        if word[-4:] == 'sses':
565 1
            word = word[:-2]
566 1
        elif word[-3:] in {'ied', 'ies'}:
567 1
            if len(word) > 4:
568 1
                word = word[:-2]
569
            else:
570 1
                word = word[:-1]
571 1
        elif word[-2:] in {'us', 'ss'}:
572 1
            pass
573 1
        elif word[-1] == 's':
574 1
            if self._sb_has_vowel(word[:-2]):
575 1
                word = word[:-1]
576
577
        # Exceptions 2
578 1
        if word in self._exception2set:
579 1
            return word
580
581
        # Step 1b
582 1
        step1b_flag = False
583 1
        if word[-5:] == 'eedly':
584 1
            if len(word[r1_start:]) >= 5:
585 1
                word = word[:-3]
586 1
        elif word[-5:] == 'ingly':
587 1
            if self._sb_has_vowel(word[:-5]):
588 1
                word = word[:-5]
589 1
                step1b_flag = True
590 1
        elif word[-4:] == 'edly':
591 1
            if self._sb_has_vowel(word[:-4]):
592 1
                word = word[:-4]
593 1
                step1b_flag = True
594 1
        elif word[-3:] == 'eed':
595 1
            if len(word[r1_start:]) >= 3:
596 1
                word = word[:-1]
597 1
        elif word[-3:] == 'ing':
598 1
            if self._sb_has_vowel(word[:-3]):
599 1
                word = word[:-3]
600 1
                step1b_flag = True
601 1
        elif word[-2:] == 'ed':
602 1
            if self._sb_has_vowel(word[:-2]):
603 1
                word = word[:-2]
604 1
                step1b_flag = True
605 1
        elif early_english:
606 1
            if word[-3:] == 'est':
607 1
                if self._sb_has_vowel(word[:-3]):
608 1
                    word = word[:-3]
609 1
                    step1b_flag = True
610 1
            elif word[-3:] == 'eth':
611 1
                if self._sb_has_vowel(word[:-3]):
612 1
                    word = word[:-3]
613 1
                    step1b_flag = True
614
615 1
        if step1b_flag:
616 1
            if word[-2:] in {'at', 'bl', 'iz'}:
617 1
                word += 'e'
618 1
            elif word[-2:] in self._doubles:
619 1
                word = word[:-1]
620 1
            elif self._sb_short_word(word, self._r1_prefixes):
621 1
                word += 'e'
622
623
        # Step 1c
624 1
        if (
625
            len(word) > 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
626
            and word[-1] in {'Y', 'y'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
627
            and word[-2] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
628
        ):
629 1
            word = word[:-1] + 'i'
630
631
        # Step 2
632 1
        if word[-2] == 'a':
633 1
            if word[-7:] == 'ational':
634 1
                if len(word[r1_start:]) >= 7:
635 1
                    word = word[:-5] + 'e'
636 1
            elif word[-6:] == 'tional':
637 1
                if len(word[r1_start:]) >= 6:
638 1
                    word = word[:-2]
639 1
        elif word[-2] == 'c':
640 1
            if word[-4:] in {'enci', 'anci'}:
641 1
                if len(word[r1_start:]) >= 4:
642 1
                    word = word[:-1] + 'e'
643 1
        elif word[-2] == 'e':
644 1
            if word[-4:] == 'izer':
645 1
                if len(word[r1_start:]) >= 4:
646 1
                    word = word[:-1]
647 1
        elif word[-2] == 'g':
648 1
            if word[-3:] == 'ogi':
649 1
                if (
650
                    r1_start >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
651
                    and len(word[r1_start:]) >= 3
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
652
                    and word[-4] == 'l'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
653
                ):
654 1
                    word = word[:-1]
655 1
        elif word[-2] == 'l':
656 1
            if word[-6:] == 'lessli':
657 1
                if len(word[r1_start:]) >= 6:
658 1
                    word = word[:-2]
659 1
            elif word[-5:] in {'entli', 'fulli', 'ousli'}:
660 1
                if len(word[r1_start:]) >= 5:
661 1
                    word = word[:-2]
662 1
            elif word[-4:] == 'abli':
663 1
                if len(word[r1_start:]) >= 4:
664 1
                    word = word[:-1] + 'e'
665 1
            elif word[-4:] == 'alli':
666 1
                if len(word[r1_start:]) >= 4:
667 1
                    word = word[:-2]
668 1
            elif word[-3:] == 'bli':
669 1
                if len(word[r1_start:]) >= 3:
670 1
                    word = word[:-1] + 'e'
671 1
            elif word[-2:] == 'li':
672 1
                if (
673
                    r1_start >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
674
                    and len(word[r1_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
675
                    and word[-3] in self._li
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
676
                ):
677 1
                    word = word[:-2]
678 1
        elif word[-2] == 'o':
679 1
            if word[-7:] == 'ization':
680 1
                if len(word[r1_start:]) >= 7:
681 1
                    word = word[:-5] + 'e'
682 1
            elif word[-5:] == 'ation':
683 1
                if len(word[r1_start:]) >= 5:
684 1
                    word = word[:-3] + 'e'
685 1
            elif word[-4:] == 'ator':
686 1
                if len(word[r1_start:]) >= 4:
687 1
                    word = word[:-2] + 'e'
688 1
        elif word[-2] == 's':
689 1
            if word[-7:] in {'fulness', 'ousness', 'iveness'}:
690 1
                if len(word[r1_start:]) >= 7:
691 1
                    word = word[:-4]
692 1
            elif word[-5:] == 'alism':
693 1
                if len(word[r1_start:]) >= 5:
694 1
                    word = word[:-3]
695 1
        elif word[-2] == 't':
696 1
            if word[-6:] == 'biliti':
697 1
                if len(word[r1_start:]) >= 6:
698 1
                    word = word[:-5] + 'le'
699 1
            elif word[-5:] == 'aliti':
700 1
                if len(word[r1_start:]) >= 5:
701 1
                    word = word[:-3]
702 1
            elif word[-5:] == 'iviti':
703 1
                if len(word[r1_start:]) >= 5:
704 1
                    word = word[:-3] + 'e'
705
706
        # Step 3
707 1
        if word[-7:] == 'ational':
708 1
            if len(word[r1_start:]) >= 7:
709 1
                word = word[:-5] + 'e'
710 1
        elif word[-6:] == 'tional':
711 1
            if len(word[r1_start:]) >= 6:
712 1
                word = word[:-2]
713 1
        elif word[-5:] in {'alize', 'icate', 'iciti'}:
714 1
            if len(word[r1_start:]) >= 5:
715 1
                word = word[:-3]
716 1
        elif word[-5:] == 'ative':
717 1
            if len(word[r2_start:]) >= 5:
718 1
                word = word[:-5]
719 1
        elif word[-4:] == 'ical':
720 1
            if len(word[r1_start:]) >= 4:
721 1
                word = word[:-2]
722 1
        elif word[-4:] == 'ness':
723 1
            if len(word[r1_start:]) >= 4:
724 1
                word = word[:-4]
725 1
        elif word[-3:] == 'ful':
726 1
            if len(word[r1_start:]) >= 3:
727 1
                word = word[:-3]
728
729
        # Step 4
730 1
        for suffix in (
731
            'ement',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
732
            'ance',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
733
            'ence',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
734
            'able',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
735
            'ible',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
736
            'ment',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
737
            'ant',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
738
            'ent',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
739
            'ism',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
740
            'ate',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
741
            'iti',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
742
            'ous',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
743
            'ive',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
744
            'ize',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
745
            'al',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
746
            'er',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
747
            'ic',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
748
        ):
749 1
            if word[-len(suffix) :] == suffix:
750 1
                if len(word[r2_start:]) >= len(suffix):
751 1
                    word = word[: -len(suffix)]
752 1
                break
753
        else:
754 1
            if word[-3:] == 'ion':
755 1
                if (
756
                    len(word[r2_start:]) >= 3
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
757
                    and len(word) >= 4
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
758
                    and word[-4] in tuple('st')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
759
                ):
760 1
                    word = word[:-3]
761
762
        # Step 5
763 1
        if word[-1] == 'e':
764 1
            if len(word[r2_start:]) >= 1 or (
765
                len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
766
                and not self._sb_ends_in_short_syllable(word[:-1])
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
767
            ):
768 1
                word = word[:-1]
769 1
        elif word[-1] == 'l':
770 1
            if len(word[r2_start:]) >= 1 and word[-2] == 'l':
771 1
                word = word[:-1]
772
773
        # Change 'Y' back to 'y' if it survived stemming
774 1
        for i in range(0, len(word)):
775 1
            if word[i] == 'Y':
776 1
                word = word[:i] + 'y' + word[i + 1 :]
777
778 1
        return word
779
780
781 1
def porter2(word, early_english=False):
782
    """Return the Porter2 (Snowball English) stem.
783
784
    This is a wrapper for :py:meth:`Porter2.stem`.
785
786
    :param str word: the word to calculate the stem of
787
    :param bool early_english: set to True in order to remove -eth & -est
788
        (2nd & 3rd person singular verbal agreement suffixes)
789
    :returns: word stem
790
    :rtype: str
791
792
    >>> porter2('reading')
793
    'read'
794
    >>> porter2('suspension')
795
    'suspens'
796
    >>> porter2('elusiveness')
797
    'elus'
798
799
    >>> porter2('eateth', early_english=True)
800
    'eat'
801
    """
802 1
    return Porter2().stem(word, early_english)
803
804
805 1
class SnowballGerman(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
806
    """Snowball German stemmer.
807
808
    The Snowball German stemmer is defined at:
809
    http://snowball.tartarus.org/algorithms/german/stemmer.html
810
    """
811
812 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'ä', 'ö', 'ü'}
813 1
    _s_endings = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 'r', 't'}
814 1
    _st_endings = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 't'}
815
816 1
    def stem(self, word, alternate_vowels=False):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
817
        """Return Snowball German stem.
818
819
        :param str word: the word to calculate the stem of
820
        :param bool alternate_vowels: composes ae as ä, oe as ö, and ue as ü
821
            before running the algorithm
822
        :returns: word stem
823
        :rtype: str
824
825
        >>> stmr = SnowballGerman()
826
        >>> stmr.stem('lesen')
827
        'les'
828
        >>> stmr.stem('graues')
829
        'grau'
830
        >>> stmr.stem('buchstabieren')
831
        'buchstabi'
832
        """
833
        # lowercase, normalize, and compose
834 1
        word = normalize('NFC', word.lower())
835 1
        word = word.replace('ß', 'ss')
836
837 1
        if len(word) > 2:
838 1
            for i in range(2, len(word)):
839 1
                if word[i] in self._vowels and word[i - 2] in self._vowels:
840 1
                    if word[i - 1] == 'u':
841 1
                        word = word[: i - 1] + 'U' + word[i:]
842 1
                    elif word[i - 1] == 'y':
843 1
                        word = word[: i - 1] + 'Y' + word[i:]
844
845 1
        if alternate_vowels:
846 1
            word = word.replace('ae', 'ä')
847 1
            word = word.replace('oe', 'ö')
848 1
            word = word.replace('que', 'Q')
849 1
            word = word.replace('ue', 'ü')
850 1
            word = word.replace('Q', 'que')
851
852 1
        r1_start = max(3, self._sb_r1(word))
853 1
        r2_start = self._sb_r2(word)
854
855
        # Step 1
856 1
        niss_flag = False
857 1
        if word[-3:] == 'ern':
858 1
            if len(word[r1_start:]) >= 3:
859 1
                word = word[:-3]
860 1
        elif word[-2:] == 'em':
861 1
            if len(word[r1_start:]) >= 2:
862 1
                word = word[:-2]
863 1
        elif word[-2:] == 'er':
864 1
            if len(word[r1_start:]) >= 2:
865 1
                word = word[:-2]
866 1
        elif word[-2:] == 'en':
867 1
            if len(word[r1_start:]) >= 2:
868 1
                word = word[:-2]
869 1
                niss_flag = True
870 1
        elif word[-2:] == 'es':
871 1
            if len(word[r1_start:]) >= 2:
872 1
                word = word[:-2]
873 1
                niss_flag = True
874 1
        elif word[-1:] == 'e':
875 1
            if len(word[r1_start:]) >= 1:
876 1
                word = word[:-1]
877 1
                niss_flag = True
878 1
        elif word[-1:] == 's':
879 1
            if (
880
                len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
881
                and len(word) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
882
                and word[-2] in self._s_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
883
            ):
884 1
                word = word[:-1]
885
886 1
        if niss_flag and word[-4:] == 'niss':
887 1
            word = word[:-1]
888
889
        # Step 2
890 1
        if word[-3:] == 'est':
891 1
            if len(word[r1_start:]) >= 3:
892 1
                word = word[:-3]
893 1
        elif word[-2:] == 'en':
894 1
            if len(word[r1_start:]) >= 2:
895 1
                word = word[:-2]
896 1
        elif word[-2:] == 'er':
897 1
            if len(word[r1_start:]) >= 2:
898 1
                word = word[:-2]
899 1
        elif word[-2:] == 'st':
900 1
            if (
901
                len(word[r1_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
902
                and len(word) >= 6
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
903
                and word[-3] in self._st_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
904
            ):
905 1
                word = word[:-2]
906
907
        # Step 3
908 1
        if word[-4:] == 'isch':
909 1
            if len(word[r2_start:]) >= 4 and word[-5] != 'e':
910 1
                word = word[:-4]
911 1
        elif word[-4:] in {'lich', 'heit'}:
912 1
            if len(word[r2_start:]) >= 4:
913 1
                word = word[:-4]
914 1
                if word[-2:] in {'er', 'en'} and len(word[r1_start:]) >= 2:
915 1
                    word = word[:-2]
916 1
        elif word[-4:] == 'keit':
917 1
            if len(word[r2_start:]) >= 4:
918 1
                word = word[:-4]
919 1
                if word[-4:] == 'lich' and len(word[r2_start:]) >= 4:
920 1
                    word = word[:-4]
921 1
                elif word[-2:] == 'ig' and len(word[r2_start:]) >= 2:
922 1
                    word = word[:-2]
923 1
        elif word[-3:] in {'end', 'ung'}:
924 1
            if len(word[r2_start:]) >= 3:
925 1
                word = word[:-3]
926 1
                if (
927
                    word[-2:] == 'ig'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
928
                    and len(word[r2_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
929
                    and word[-3] != 'e'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
930
                ):
931 1
                    word = word[:-2]
932 1
        elif word[-2:] in {'ig', 'ik'}:
933 1
            if len(word[r2_start:]) >= 2 and word[-3] != 'e':
934 1
                word = word[:-2]
935
936
        # Change 'Y' and 'U' back to lowercase if survived stemming
937 1
        for i in range(0, len(word)):
938 1
            if word[i] == 'Y':
939 1
                word = word[:i] + 'y' + word[i + 1 :]
940 1
            elif word[i] == 'U':
941 1
                word = word[:i] + 'u' + word[i + 1 :]
942
943
        # Remove umlauts
944 1
        _umlauts = dict(zip((ord(_) for _ in 'äöü'), 'aou'))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _ does not seem to be defined.
Loading history...
945 1
        word = word.translate(_umlauts)
946
947 1
        return word
948
949
950 1
def sb_german(word, alternate_vowels=False):
951
    """Return Snowball German stem.
952
953
    This is a wrapper for :py:meth:`SnowballGerman.stem`.
954
955
    :param str word: the word to calculate the stem of
956
    :param bool alternate_vowels: composes ae as ä, oe as ö, and ue as ü before
957
        running the algorithm
958
    :returns: word stem
959
    :rtype: str
960
961
    >>> sb_german('lesen')
962
    'les'
963
    >>> sb_german('graues')
964
    'grau'
965
    >>> sb_german('buchstabieren')
966
    'buchstabi'
967
    """
968 1
    return SnowballGerman().stem(word, alternate_vowels)
969
970
971 1
class SnowballDutch(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
972
    """Snowball Dutch stemmer.
973
974
    The Snowball Dutch stemmer is defined at:
975
    http://snowball.tartarus.org/algorithms/dutch/stemmer.html
976
    """
977
978 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'è'}
979 1
    _not_s_endings = {'a', 'e', 'i', 'j', 'o', 'u', 'y', 'è'}
980 1
    _accented = dict(zip((ord(_) for _ in 'äëïöüáéíóú'), 'aeiouaeiou'))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _ does not seem to be defined.
Loading history...
981
982 1
    def _undouble(self, word):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
983
        """Undouble endings -kk, -dd, and -tt."""
984 1
        if (
985
            len(word) > 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
986
            and word[-1] == word[-2]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
987
            and word[-1] in {'d', 'k', 't'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
988
        ):
989 1
            return word[:-1]
990 1
        return word
991
992 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
993
        """Return Snowball Dutch stem.
994
995
        :param str word: the word to calculate the stem of
996
        :returns: word stem
997
        :rtype: str
998
999
        >>> stmr = SnowballDutch()
1000
        >>> stmr.stem('lezen')
1001
        'lez'
1002
        >>> stmr.stem('opschorting')
1003
        'opschort'
1004
        >>> stmr.stem('ongrijpbaarheid')
1005
        'ongrijp'
1006
        """
1007
        # lowercase, normalize, decompose, filter umlauts & acutes out, and
1008
        # compose
1009 1
        word = normalize('NFC', text_type(word.lower()))
1010 1
        word = word.translate(self._accented)
1011
1012 1
        for i in range(len(word)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
1013 1
            if i == 0 and word[0] == 'y':
1014 1
                word = 'Y' + word[1:]
1015 1
            elif word[i] == 'y' and word[i - 1] in self._vowels:
1016 1
                word = word[:i] + 'Y' + word[i + 1 :]
1017 1
            elif (
1018
                word[i] == 'i'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1019
                and word[i - 1] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1020
                and i + 1 < len(word)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1021
                and word[i + 1] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1022
            ):
1023 1
                word = word[:i] + 'I' + word[i + 1 :]
1024
1025 1
        r1_start = max(3, self._sb_r1(word))
1026 1
        r2_start = self._sb_r2(word)
1027
1028
        # Step 1
1029 1
        if word[-5:] == 'heden':
1030 1
            if len(word[r1_start:]) >= 5:
1031 1
                word = word[:-3] + 'id'
1032 1
        elif word[-3:] == 'ene':
1033 1
            if len(word[r1_start:]) >= 3 and (
1034
                word[-4] not in self._vowels and word[-6:-3] != 'gem'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1035
            ):
1036 1
                word = self._undouble(word[:-3])
1037 1
        elif word[-2:] == 'en':
1038 1
            if len(word[r1_start:]) >= 2 and (
1039
                word[-3] not in self._vowels and word[-5:-2] != 'gem'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1040
            ):
1041 1
                word = self._undouble(word[:-2])
1042 1
        elif word[-2:] == 'se':
1043 1
            if (
1044
                len(word[r1_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1045
                and word[-3] not in self._not_s_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1046
            ):
1047 1
                word = word[:-2]
1048 1
        elif word[-1:] == 's':
1049 1
            if (
1050
                len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1051
                and word[-2] not in self._not_s_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1052
            ):
1053 1
                word = word[:-1]
1054
1055
        # Step 2
1056 1
        e_removed = False
1057 1
        if word[-1:] == 'e':
1058 1
            if len(word[r1_start:]) >= 1 and word[-2] not in self._vowels:
1059 1
                word = self._undouble(word[:-1])
1060 1
                e_removed = True
1061
1062
        # Step 3a
1063 1
        if word[-4:] == 'heid':
1064 1
            if len(word[r2_start:]) >= 4 and word[-5] != 'c':
1065 1
                word = word[:-4]
1066 1
                if word[-2:] == 'en':
1067 1
                    if len(word[r1_start:]) >= 2 and (
1068
                        word[-3] not in self._vowels and word[-5:-2] != 'gem'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1069
                    ):
1070 1
                        word = self._undouble(word[:-2])
1071
1072
        # Step 3b
1073 1
        if word[-4:] == 'lijk':
1074 1
            if len(word[r2_start:]) >= 4:
1075 1
                word = word[:-4]
1076
                # Repeat step 2
1077 1
                if word[-1:] == 'e':
1078 1
                    if (
1079
                        len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1080
                        and word[-2] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1081
                    ):
1082 1
                        word = self._undouble(word[:-1])
1083 1
        elif word[-4:] == 'baar':
1084 1
            if len(word[r2_start:]) >= 4:
1085 1
                word = word[:-4]
1086 1
        elif word[-3:] in ('end', 'ing'):
1087 1
            if len(word[r2_start:]) >= 3:
1088 1
                word = word[:-3]
1089 1
                if (
1090
                    word[-2:] == 'ig'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1091
                    and len(word[r2_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1092
                    and word[-3] != 'e'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1093
                ):
1094 1
                    word = word[:-2]
1095
                else:
1096 1
                    word = self._undouble(word)
1097 1
        elif word[-3:] == 'bar':
1098 1
            if len(word[r2_start:]) >= 3 and e_removed:
1099 1
                word = word[:-3]
1100 1
        elif word[-2:] == 'ig':
1101 1
            if len(word[r2_start:]) >= 2 and word[-3] != 'e':
1102 1
                word = word[:-2]
1103
1104
        # Step 4
1105 1
        if (
1106
            len(word) >= 4
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
1107
            and word[-3] == word[-2]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1108
            and word[-2] in {'a', 'e', 'o', 'u'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1109
            and word[-4] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1110
            and word[-1] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1111
            and word[-1] != 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1112
        ):
1113 1
            word = word[:-2] + word[-1]
1114
1115
        # Change 'Y' and 'U' back to lowercase if survived stemming
1116 1
        for i in range(0, len(word)):
1117 1
            if word[i] == 'Y':
1118 1
                word = word[:i] + 'y' + word[i + 1 :]
1119 1
            elif word[i] == 'I':
1120 1
                word = word[:i] + 'i' + word[i + 1 :]
1121
1122 1
        return word
1123
1124
1125 1
def sb_dutch(word):
1126
    """Return Snowball Dutch stem.
1127
1128
    This is a wrapper for :py:meth:`SnowballDutch.stem`.
1129
1130
    :param str word: the word to calculate the stem of
1131
    :returns: word stem
1132
    :rtype: str
1133
1134
    >>> sb_dutch('lezen')
1135
    'lez'
1136
    >>> sb_dutch('opschorting')
1137
    'opschort'
1138
    >>> sb_dutch('ongrijpbaarheid')
1139
    'ongrijp'
1140
    """
1141 1
    return SnowballDutch().stem(word)
1142
1143
1144 1
class SnowballNorwegian(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1145
    """Snowball Norwegian stemmer.
1146
1147
    The Snowball Norwegian stemmer is defined at:
1148
    http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
1149
    """
1150
1151 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'å', 'æ', 'ø'}
1152 1
    _s_endings = {
1153
        'b',
1154
        'c',
1155
        'd',
1156
        'f',
1157
        'g',
1158
        'h',
1159
        'j',
1160
        'l',
1161
        'm',
1162
        'n',
1163
        'o',
1164
        'p',
1165
        'r',
1166
        't',
1167
        'v',
1168
        'y',
1169
        'z',
1170
    }
1171
1172 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1173
        """Return Snowball Norwegian stem.
1174
1175
        :param str word: the word to calculate the stem of
1176
        :returns: word stem
1177
        :rtype: str
1178
1179
        >>> stmr = SnowballNorwegian()
1180
        >>> stmr.stem('lese')
1181
        'les'
1182
        >>> stmr.stem('suspensjon')
1183
        'suspensjon'
1184
        >>> stmr.stem('sikkerhet')
1185
        'sikker'
1186
        """
1187
        # lowercase, normalize, and compose
1188 1
        word = normalize('NFC', text_type(word.lower()))
1189
1190 1
        r1_start = min(max(3, self._sb_r1(word)), len(word))
1191
1192
        # Step 1
1193 1
        _r1 = word[r1_start:]
1194 1
        if _r1[-7:] == 'hetenes':
1195 1
            word = word[:-7]
1196 1
        elif _r1[-6:] in {'hetene', 'hetens'}:
1197 1
            word = word[:-6]
1198 1
        elif _r1[-5:] in {'heten', 'heter', 'endes'}:
1199 1
            word = word[:-5]
1200 1
        elif _r1[-4:] in {'ande', 'ende', 'edes', 'enes', 'erte'}:
1201 1
            if word[-4:] == 'erte':
1202 1
                word = word[:-2]
1203
            else:
1204 1
                word = word[:-4]
1205 1
        elif _r1[-3:] in {
1206
            'ede',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1207
            'ane',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1208
            'ene',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1209
            'ens',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1210
            'ers',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1211
            'ets',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1212
            'het',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1213
            'ast',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1214
            'ert',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1215
        }:
1216 1
            if word[-3:] == 'ert':
1217 1
                word = word[:-1]
1218
            else:
1219 1
                word = word[:-3]
1220 1
        elif _r1[-2:] in {'en', 'ar', 'er', 'as', 'es', 'et'}:
1221 1
            word = word[:-2]
1222 1
        elif _r1[-1:] in {'a', 'e'}:
1223 1
            word = word[:-1]
1224 1
        elif _r1[-1:] == 's':
1225 1
            if (len(word) > 1 and word[-2] in self._s_endings) or (
1226
                len(word) > 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1227
                and word[-2] == 'k'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1228
                and word[-3] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1229
            ):
1230 1
                word = word[:-1]
1231
1232
        # Step 2
1233 1
        if word[r1_start:][-2:] in {'dt', 'vt'}:
1234 1
            word = word[:-1]
1235
1236
        # Step 3
1237 1
        _r1 = word[r1_start:]
1238 1
        if _r1[-7:] == 'hetslov':
1239 1
            word = word[:-7]
1240 1
        elif _r1[-4:] in {'eleg', 'elig', 'elov', 'slov'}:
1241 1
            word = word[:-4]
1242 1
        elif _r1[-3:] in {'leg', 'eig', 'lig', 'els', 'lov'}:
1243 1
            word = word[:-3]
1244 1
        elif _r1[-2:] == 'ig':
1245 1
            word = word[:-2]
1246
1247 1
        return word
1248
1249
1250 1
def sb_norwegian(word):
1251
    """Return Snowball Norwegian stem.
1252
1253
    This is a wrapper for :py:meth:`SnowballNorwegian.stem`.
1254
1255
    :param str word: the word to calculate the stem of
1256
    :returns: word stem
1257
    :rtype: str
1258
1259
    >>> sb_norwegian('lese')
1260
    'les'
1261
    >>> sb_norwegian('suspensjon')
1262
    'suspensjon'
1263
    >>> sb_norwegian('sikkerhet')
1264
    'sikker'
1265
    """
1266 1
    return SnowballNorwegian().stem(word)
1267
1268
1269 1
class SnowballSwedish(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1270
    """Snowball Swedish stemmer.
1271
1272
    The Snowball Swedish stemmer is defined at:
1273
    http://snowball.tartarus.org/algorithms/swedish/stemmer.html
1274
    """
1275
1276 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'ä', 'å', 'ö'}
1277 1
    _s_endings = {
1278
        'b',
1279
        'c',
1280
        'd',
1281
        'f',
1282
        'g',
1283
        'h',
1284
        'j',
1285
        'k',
1286
        'l',
1287
        'm',
1288
        'n',
1289
        'o',
1290
        'p',
1291
        'r',
1292
        't',
1293
        'v',
1294
        'y',
1295
    }
1296
1297 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1298
        """Return Snowball Swedish stem.
1299
1300
        :param str word: the word to calculate the stem of
1301
        :returns: word stem
1302
        :rtype: str
1303
1304
        >>> stmr = SnowballSwedish()
1305
        >>> stmr.stem('undervisa')
1306
        'undervis'
1307
        >>> stmr.stem('suspension')
1308
        'suspension'
1309
        >>> stmr.stem('visshet')
1310
        'viss'
1311
        """
1312
        # lowercase, normalize, and compose
1313 1
        word = normalize('NFC', text_type(word.lower()))
1314
1315 1
        r1_start = min(max(3, self._sb_r1(word)), len(word))
1316
1317
        # Step 1
1318 1
        _r1 = word[r1_start:]
1319 1 View Code Duplication
        if _r1[-7:] == 'heterna':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1320 1
            word = word[:-7]
1321 1
        elif _r1[-6:] == 'hetens':
1322 1
            word = word[:-6]
1323 1
        elif _r1[-5:] in {
1324
            'anden',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1325
            'heten',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1326
            'heter',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1327
            'arnas',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1328
            'ernas',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1329
            'ornas',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1330
            'andes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1331
            'arens',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1332
            'andet',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1333
        }:
1334 1
            word = word[:-5]
1335 1
        elif _r1[-4:] in {
1336
            'arna',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1337
            'erna',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1338
            'orna',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1339
            'ande',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1340
            'arne',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1341
            'aste',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1342
            'aren',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1343
            'ades',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1344
            'erns',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1345
        }:
1346 1
            word = word[:-4]
1347 1
        elif _r1[-3:] in {'ade', 'are', 'ern', 'ens', 'het', 'ast'}:
1348 1
            word = word[:-3]
1349 1
        elif _r1[-2:] in {'ad', 'en', 'ar', 'er', 'or', 'as', 'es', 'at'}:
1350 1
            word = word[:-2]
1351 1
        elif _r1[-1:] in {'a', 'e'}:
1352 1
            word = word[:-1]
1353 1
        elif _r1[-1:] == 's':
1354 1
            if len(word) > 1 and word[-2] in self._s_endings:
1355 1
                word = word[:-1]
1356
1357
        # Step 2
1358 1
        if word[r1_start:][-2:] in {'dd', 'gd', 'nn', 'dt', 'gt', 'kt', 'tt'}:
1359 1
            word = word[:-1]
1360
1361
        # Step 3
1362 1
        _r1 = word[r1_start:]
1363 1
        if _r1[-5:] == 'fullt':
1364 1
            word = word[:-1]
1365 1
        elif _r1[-4:] == 'löst':
1366 1
            word = word[:-1]
1367 1
        elif _r1[-3:] in {'lig', 'els'}:
1368 1
            word = word[:-3]
1369 1
        elif _r1[-2:] == 'ig':
1370 1
            word = word[:-2]
1371
1372 1
        return word
1373
1374
1375 1
def sb_swedish(word):
1376
    """Return Snowball Swedish stem.
1377
1378
    This is a wrapper for :py:meth:`SnowballSwedish.stem`.
1379
1380
    :param str word: the word to calculate the stem of
1381
    :returns: word stem
1382
    :rtype: str
1383
1384
    >>> sb_swedish('undervisa')
1385
    'undervis'
1386
    >>> sb_swedish('suspension')
1387
    'suspension'
1388
    >>> sb_swedish('visshet')
1389
    'viss'
1390
    """
1391 1
    return SnowballSwedish().stem(word)
1392
1393
1394 1
class SnowballDanish(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1395
    """Snowball Danish stemmer.
1396
1397
    The Snowball Danish stemmer is defined at:
1398
    http://snowball.tartarus.org/algorithms/danish/stemmer.html
1399
    """
1400
1401 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'å', 'æ', 'ø'}
1402 1
    _s_endings = {
1403
        'a',
1404
        'b',
1405
        'c',
1406
        'd',
1407
        'f',
1408
        'g',
1409
        'h',
1410
        'j',
1411
        'k',
1412
        'l',
1413
        'm',
1414
        'n',
1415
        'o',
1416
        'p',
1417
        'r',
1418
        't',
1419
        'v',
1420
        'y',
1421
        'z',
1422
        'å',
1423
    }
1424
1425 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1426
        """Return Snowball Danish stem.
1427
1428
        :param str word: the word to calculate the stem of
1429
        :returns: word stem
1430
        :rtype: str
1431
1432
        >>> stmr = SnowballDanish()
1433
        >>> stmr.stem('underviser')
1434
        'undervis'
1435
        >>> stmr.stem('suspension')
1436
        'suspension'
1437
        >>> stmr.stem('sikkerhed')
1438
        'sikker'
1439
        """
1440
        # lowercase, normalize, and compose
1441 1
        word = normalize('NFC', text_type(word.lower()))
1442
1443 1
        r1_start = min(max(3, self._sb_r1(word)), len(word))
1444
1445
        # Step 1
1446 1
        _r1 = word[r1_start:]
1447 1 View Code Duplication
        if _r1[-7:] == 'erendes':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1448 1
            word = word[:-7]
1449 1
        elif _r1[-6:] in {'erende', 'hedens'}:
1450 1
            word = word[:-6]
1451 1
        elif _r1[-5:] in {
1452
            'ethed',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1453
            'erede',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1454
            'heden',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1455
            'heder',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1456
            'endes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1457
            'ernes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1458
            'erens',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1459
            'erets',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1460
        }:
1461 1
            word = word[:-5]
1462 1
        elif _r1[-4:] in {
1463
            'ered',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1464
            'ende',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1465
            'erne',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1466
            'eren',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1467
            'erer',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1468
            'heds',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1469
            'enes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1470
            'eres',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1471
            'eret',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1472
        }:
1473 1
            word = word[:-4]
1474 1
        elif _r1[-3:] in {'hed', 'ene', 'ere', 'ens', 'ers', 'ets'}:
1475 1
            word = word[:-3]
1476 1
        elif _r1[-2:] in {'en', 'er', 'es', 'et'}:
1477 1
            word = word[:-2]
1478 1
        elif _r1[-1:] == 'e':
1479 1
            word = word[:-1]
1480 1
        elif _r1[-1:] == 's':
1481 1
            if len(word) > 1 and word[-2] in self._s_endings:
1482 1
                word = word[:-1]
1483
1484
        # Step 2
1485 1
        if word[r1_start:][-2:] in {'gd', 'dt', 'gt', 'kt'}:
1486 1
            word = word[:-1]
1487
1488
        # Step 3
1489 1
        if word[-4:] == 'igst':
1490 1
            word = word[:-2]
1491
1492 1
        _r1 = word[r1_start:]
1493 1
        repeat_step2 = False
1494 1
        if _r1[-4:] == 'elig':
1495 1
            word = word[:-4]
1496 1
            repeat_step2 = True
1497 1
        elif _r1[-4:] == 'løst':
1498 1
            word = word[:-1]
1499 1
        elif _r1[-3:] in {'lig', 'els'}:
1500 1
            word = word[:-3]
1501 1
            repeat_step2 = True
1502 1
        elif _r1[-2:] == 'ig':
1503 1
            word = word[:-2]
1504 1
            repeat_step2 = True
1505
1506 1
        if repeat_step2:
1507 1
            if word[r1_start:][-2:] in {'gd', 'dt', 'gt', 'kt'}:
1508 1
                word = word[:-1]
1509
1510
        # Step 4
1511 1
        if (
1512
            len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1513
            and len(word) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1514
            and word[-1] == word[-2]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1515
            and word[-1] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1516
        ):
1517 1
            word = word[:-1]
1518
1519 1
        return word
1520
1521
1522 1
def sb_danish(word):
1523
    """Return Snowball Danish stem.
1524
1525
    This is a wrapper for :py:meth:`SnowballDanish.stem`.
1526
1527
    :param str word: the word to calculate the stem of
1528
    :returns: word stem
1529
    :rtype: str
1530
1531
    >>> sb_danish('underviser')
1532
    'undervis'
1533
    >>> sb_danish('suspension')
1534
    'suspension'
1535
    >>> sb_danish('sikkerhed')
1536
    'sikker'
1537
    """
1538 1
    return SnowballDanish().stem(word)
1539
1540
1541
if __name__ == '__main__':
1542
    import doctest
1543
1544
    doctest.testmod()
1545