Completed
Pull Request — master (#141)
by Chris
13:03
created

abydos.stemmer._snowball.Porter2.stem()   F

Complexity

Conditions 127

Size

Total Lines 285
Code Lines 218

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 187
CRAP Score 127

Importance

Changes 0
Metric Value
eloc 218
dl 0
loc 285
ccs 187
cts 187
cp 1
rs 0
c 0
b 0
f 0
cc 127
nop 3
crap 127

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.stemmer._snowball.Porter2.stem() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
0 ignored issues
show
coding-style introduced by
Too many lines in module (1655/1000)
Loading history...
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.stemmer._snowball.
20
21
The stemmer._snowball module defines the stemmers:
22
23
    - Porter
24
    - Porter2 (Snowball English)
25
    - Snowball German
26
    - Snowball Dutch
27
    - Snowball Norwegian
28
    - Snowball Swedish
29
    - Snowball Danish
30
"""
31
32 1
from __future__ import unicode_literals
33
34 1
from unicodedata import normalize
35
36 1
from six import text_type
37 1
from six.moves import range
38
39 1
from ._stemmer import Stemmer
40
41 1
__all__ = [
42
    'Porter',
43
    'Porter2',
44
    'SnowballDanish',
45
    'SnowballDutch',
46
    'SnowballGerman',
47
    'SnowballNorwegian',
48
    'SnowballSwedish',
49
    'porter',
50
    'porter2',
51
    'sb_danish',
52
    'sb_dutch',
53
    'sb_german',
54
    'sb_norwegian',
55
    'sb_swedish',
56
]
57
58
59 1
class Porter(Stemmer):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
60
    """Porter stemmer.
61
62
    The Porter stemmer is described in :cite:`Porter:1980`.
63
    """
64
65 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y'}
66
67 1
    def _m_degree(self, term):
68
        """Return Porter helper function _m_degree value.
69
70
        m-degree is equal to the number of V to C transitions
71
72
        Args:
73
            term (str): The word for which to calculate the m-degree
74
75
        Returns:
76
            int: The m-degree as defined in the Porter stemmer definition
77
78
        """
79 1
        mdeg = 0
80 1
        last_was_vowel = False
81 1
        for letter in term:
82 1
            if letter in self._vowels:
83 1
                last_was_vowel = True
84
            else:
85 1
                if last_was_vowel:
86 1
                    mdeg += 1
87 1
                last_was_vowel = False
88 1
        return mdeg
89
90 1
    def _has_vowel(self, term):
91
        """Return Porter helper function _has_vowel value.
92
93
        Args:
94
            term (str): The word to scan for vowels
95
96
        Returns:
97
            bool: True iff a vowel exists in the term (as defined in the Porter
98
                stemmer definition)
99
100
        """
101 1
        for letter in term:
102 1
            if letter in self._vowels:
103 1
                return True
104 1
        return False
105
106 1
    def _ends_in_doubled_cons(self, term):
107
        """Return Porter helper function _ends_in_doubled_cons value.
108
109
        Args:
110
            term (str): The word to check for a final doubled consonant
111
112
        Returns:
113
            bool: True iff the stem ends in a doubled consonant (as defined in
114
                the Porter stemmer definition)
115
116
        """
117 1
        return (
118
            len(term) > 1
119
            and term[-1] not in self._vowels
120
            and term[-2] == term[-1]
121
        )
122
123 1
    def _ends_in_cvc(self, term):
124
        """Return Porter helper function _ends_in_cvc value.
125
126
        Args:
127
            term (str): The word to scan for cvc
128
129
        Returns:
130
            bool: True iff the stem ends in cvc (as defined in the Porter
131
                stemmer definition)
132
133
        """
134 1
        return len(term) > 2 and (
135
            term[-1] not in self._vowels
136
            and term[-2] in self._vowels
137
            and term[-3] not in self._vowels
138
            and term[-1] not in tuple('wxY')
139
        )
140
141 1
    def stem(self, word, early_english=False):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
142
        """Return Porter stem.
143
144
        Args:
145
            word (str): The word to stem
146
            early_english (bool): Set to True in order to remove -eth & -est
147
                (2nd & 3rd person singular verbal agreement suffixes)
148
149
        Returns:
150
            str: Word stem
151
152
        Examples:
153
            >>> stmr = Porter()
154
            >>> stmr.stem('reading')
155
            'read'
156
            >>> stmr.stem('suspension')
157
            'suspens'
158
            >>> stmr.stem('elusiveness')
159
            'elus'
160
161
            >>> stmr.stem('eateth', early_english=True)
162
            'eat'
163
164
        """
165
        # lowercase, normalize, and compose
166 1
        word = normalize('NFC', text_type(word.lower()))
167
168
        # Return word if stem is shorter than 2
169 1
        if len(word) < 3:
170 1
            return word
171
172
        # Re-map consonantal y to Y (Y will be C, y will be V)
173 1
        if word[0] == 'y':
174 1
            word = 'Y' + word[1:]
175 1
        for i in range(1, len(word)):
176 1
            if word[i] == 'y' and word[i - 1] in self._vowels:
177 1
                word = word[:i] + 'Y' + word[i + 1 :]
178
179
        # Step 1a
180 1
        if word[-1] == 's':
181 1
            if word[-4:] == 'sses':
182 1
                word = word[:-2]
183 1
            elif word[-3:] == 'ies':
184 1
                word = word[:-2]
185 1
            elif word[-2:] == 'ss':
186 1
                pass
187
            else:
188 1
                word = word[:-1]
189
190
        # Step 1b
191 1
        step1b_flag = False
192 1
        if word[-3:] == 'eed':
193 1
            if self._m_degree(word[:-3]) > 0:
194 1
                word = word[:-1]
195 1
        elif word[-2:] == 'ed':
196 1
            if self._has_vowel(word[:-2]):
197 1
                word = word[:-2]
198 1
                step1b_flag = True
199 1
        elif word[-3:] == 'ing':
200 1
            if self._has_vowel(word[:-3]):
201 1
                word = word[:-3]
202 1
                step1b_flag = True
203 1
        elif early_english:
204 1
            if word[-3:] == 'est':
205 1
                if self._has_vowel(word[:-3]):
206 1
                    word = word[:-3]
207 1
                    step1b_flag = True
208 1
            elif word[-3:] == 'eth':
209 1
                if self._has_vowel(word[:-3]):
210 1
                    word = word[:-3]
211 1
                    step1b_flag = True
212
213 1
        if step1b_flag:
214 1
            if word[-2:] in {'at', 'bl', 'iz'}:
215 1
                word += 'e'
216 1
            elif self._ends_in_doubled_cons(word) and word[-1] not in {
217
                'l',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
218
                's',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
219
                'z',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
220
            }:
221 1
                word = word[:-1]
222 1
            elif self._m_degree(word) == 1 and self._ends_in_cvc(word):
223 1
                word += 'e'
224
225
        # Step 1c
226 1
        if word[-1] in {'Y', 'y'} and self._has_vowel(word[:-1]):
227 1
            word = word[:-1] + 'i'
228
229
        # Step 2
230 1
        if len(word) > 1:
231 1
            if word[-2] == 'a':
232 1
                if word[-7:] == 'ational':
233 1
                    if self._m_degree(word[:-7]) > 0:
234 1
                        word = word[:-5] + 'e'
235 1
                elif word[-6:] == 'tional':
236 1
                    if self._m_degree(word[:-6]) > 0:
237 1
                        word = word[:-2]
238 1
            elif word[-2] == 'c':
239 1
                if word[-4:] in {'enci', 'anci'}:
240 1
                    if self._m_degree(word[:-4]) > 0:
241 1
                        word = word[:-1] + 'e'
242 1
            elif word[-2] == 'e':
243 1
                if word[-4:] == 'izer':
244 1
                    if self._m_degree(word[:-4]) > 0:
245 1
                        word = word[:-1]
246 1
            elif word[-2] == 'g':
247 1
                if word[-4:] == 'logi':
248 1
                    if self._m_degree(word[:-4]) > 0:
249 1
                        word = word[:-1]
250 1
            elif word[-2] == 'l':
251 1
                if word[-3:] == 'bli':
252 1
                    if self._m_degree(word[:-3]) > 0:
253 1
                        word = word[:-1] + 'e'
254 1
                elif word[-4:] == 'alli':
255 1
                    if self._m_degree(word[:-4]) > 0:
256 1
                        word = word[:-2]
257 1
                elif word[-5:] == 'entli':
258 1
                    if self._m_degree(word[:-5]) > 0:
259 1
                        word = word[:-2]
260 1
                elif word[-3:] == 'eli':
261 1
                    if self._m_degree(word[:-3]) > 0:
262 1
                        word = word[:-2]
263 1
                elif word[-5:] == 'ousli':
264 1
                    if self._m_degree(word[:-5]) > 0:
265 1
                        word = word[:-2]
266 1
            elif word[-2] == 'o':
267 1
                if word[-7:] == 'ization':
268 1
                    if self._m_degree(word[:-7]) > 0:
269 1
                        word = word[:-5] + 'e'
270 1
                elif word[-5:] == 'ation':
271 1
                    if self._m_degree(word[:-5]) > 0:
272 1
                        word = word[:-3] + 'e'
273 1
                elif word[-4:] == 'ator':
274 1
                    if self._m_degree(word[:-4]) > 0:
275 1
                        word = word[:-2] + 'e'
276 1
            elif word[-2] == 's':
277 1
                if word[-5:] == 'alism':
278 1
                    if self._m_degree(word[:-5]) > 0:
279 1
                        word = word[:-3]
280 1
                elif word[-7:] in {'iveness', 'fulness', 'ousness'}:
281 1
                    if self._m_degree(word[:-7]) > 0:
282 1
                        word = word[:-4]
283 1
            elif word[-2] == 't':
284 1
                if word[-5:] == 'aliti':
285 1
                    if self._m_degree(word[:-5]) > 0:
286 1
                        word = word[:-3]
287 1
                elif word[-5:] == 'iviti':
288 1
                    if self._m_degree(word[:-5]) > 0:
289 1
                        word = word[:-3] + 'e'
290 1
                elif word[-6:] == 'biliti':
291 1
                    if self._m_degree(word[:-6]) > 0:
292 1
                        word = word[:-5] + 'le'
293
294
        # Step 3
295 1
        if word[-5:] in 'icate':
296 1
            if self._m_degree(word[:-5]) > 0:
297 1
                word = word[:-3]
298 1
        elif word[-5:] == 'ative':
299 1
            if self._m_degree(word[:-5]) > 0:
300 1
                word = word[:-5]
301 1
        elif word[-5:] in {'alize', 'iciti'}:
302 1
            if self._m_degree(word[:-5]) > 0:
303 1
                word = word[:-3]
304 1
        elif word[-4:] == 'ical':
305 1
            if self._m_degree(word[:-4]) > 0:
306 1
                word = word[:-2]
307 1
        elif word[-3:] == 'ful':
308 1
            if self._m_degree(word[:-3]) > 0:
309 1
                word = word[:-3]
310 1
        elif word[-4:] == 'ness':
311 1
            if self._m_degree(word[:-4]) > 0:
312 1
                word = word[:-4]
313
314
        # Step 4
315 1
        if word[-2:] == 'al':
316 1
            if self._m_degree(word[:-2]) > 1:
317 1
                word = word[:-2]
318 1
        elif word[-4:] in {'ance', 'ence'}:
319 1
            if self._m_degree(word[:-4]) > 1:
320 1
                word = word[:-4]
321 1
        elif word[-2:] in {'er', 'ic'}:
322 1
            if self._m_degree(word[:-2]) > 1:
323 1
                word = word[:-2]
324 1
        elif word[-4:] in {'able', 'ible'}:
325 1
            if self._m_degree(word[:-4]) > 1:
326 1
                word = word[:-4]
327 1
        elif word[-3:] == 'ant':
328 1
            if self._m_degree(word[:-3]) > 1:
329 1
                word = word[:-3]
330 1
        elif word[-5:] == 'ement':
331 1
            if self._m_degree(word[:-5]) > 1:
332 1
                word = word[:-5]
333 1
        elif word[-4:] == 'ment':
334 1
            if self._m_degree(word[:-4]) > 1:
335 1
                word = word[:-4]
336 1
        elif word[-3:] == 'ent':
337 1
            if self._m_degree(word[:-3]) > 1:
338 1
                word = word[:-3]
339 1
        elif word[-4:] in {'sion', 'tion'}:
340 1
            if self._m_degree(word[:-3]) > 1:
341 1
                word = word[:-3]
342 1
        elif word[-2:] == 'ou':
343 1
            if self._m_degree(word[:-2]) > 1:
344 1
                word = word[:-2]
345 1
        elif word[-3:] in {'ism', 'ate', 'iti', 'ous', 'ive', 'ize'}:
346 1
            if self._m_degree(word[:-3]) > 1:
347 1
                word = word[:-3]
348
349
        # Step 5a
350 1
        if word[-1] == 'e':
351 1
            if self._m_degree(word[:-1]) > 1:
352 1
                word = word[:-1]
353 1
            elif self._m_degree(word[:-1]) == 1 and not self._ends_in_cvc(
354
                word[:-1]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
355
            ):
356 1
                word = word[:-1]
357
358
        # Step 5b
359 1
        if word[-2:] == 'll' and self._m_degree(word) > 1:
360 1
            word = word[:-1]
361
362
        # Change 'Y' back to 'y' if it survived stemming
363 1
        for i in range(len(word)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
364 1
            if word[i] == 'Y':
365 1
                word = word[:i] + 'y' + word[i + 1 :]
366
367 1
        return word
368
369
370 1
def porter(word, early_english=False):
371
    """Return Porter stem.
372
373
    This is a wrapper for :py:meth:`Porter.stem`.
374
375
    Args:
376
        word (str): The word to stem
377
        early_english (bool): Set to True in order to remove -eth & -est
378
                (2nd & 3rd person singular verbal agreement suffixes)
379
380
    Returns:
381
        str: Word stem
382
383
    Examples:
384
        >>> porter('reading')
385
        'read'
386
        >>> porter('suspension')
387
        'suspens'
388
        >>> porter('elusiveness')
389
        'elus'
390
391
        >>> porter('eateth', early_english=True)
392
        'eat'
393
394
    """
395 1
    return Porter().stem(word, early_english)
396
397
398 1
class Snowball(Stemmer):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
399
    """Snowball stemmer base class."""
400
401 1
    _vowels = set('aeiouy')
402 1
    _codanonvowels = set('\'bcdfghjklmnpqrstvz')
403
404 1
    def _sb_r1(self, term, r1_prefixes=None):
405
        """Return the R1 region, as defined in the Porter2 specification.
406
407
        Args:
408
            term (str): The term to examine
409
            r1_prefixes (set): Prefixes to consider
410
411
        Returns:
412
            int: Length of the R1 region
413
414
        """
415 1
        vowel_found = False
416 1
        if hasattr(r1_prefixes, '__iter__'):
417 1
            for prefix in r1_prefixes:
418 1
                if term[: len(prefix)] == prefix:
419 1
                    return len(prefix)
420
421 1
        for i in range(len(term)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
422 1
            if not vowel_found and term[i] in self._vowels:
423 1
                vowel_found = True
424 1
            elif vowel_found and term[i] not in self._vowels:
425 1
                return i + 1
426 1
        return len(term)
427
428 1
    def _sb_r2(self, term, r1_prefixes=None):
429
        """Return the R2 region, as defined in the Porter2 specification.
430
431
        Args:
432
            term (str): The term to examine
433
            r1_prefixes (set): Prefixes to consider
434
435
        Returns:
436
            int: Length of the R1 region
437
438
        """
439 1
        r1_start = self._sb_r1(term, r1_prefixes)
440 1
        return r1_start + self._sb_r1(term[r1_start:])
441
442 1
    def _sb_ends_in_short_syllable(self, term):
443
        """Return True iff term ends in a short syllable.
444
445
        (...according to the Porter2 specification.)
446
447
        NB: This is akin to the CVC test from the Porter stemmer. The
448
        description is unfortunately poor/ambiguous.
449
450
        Args:
451
            term (str): The term to examine
452
453
        Returns:
454
            bool: True iff term ends in a short syllable
455
456
        """
457 1
        if not term:
458 1
            return False
459 1
        if len(term) == 2:
460 1
            if term[-2] in self._vowels and term[-1] not in self._vowels:
461 1
                return True
462 1
        elif len(term) >= 3:
463 1
            if (
464
                term[-3] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
465
                and term[-2] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
466
                and term[-1] in self._codanonvowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
467
            ):
468 1
                return True
469 1
        return False
470
471 1
    def _sb_short_word(self, term, r1_prefixes=None):
472
        """Return True iff term is a short word.
473
474
        (...according to the Porter2 specification.)
475
476
        Args:
477
            term (str): The term to examine
478
            r1_prefixes (set): Prefixes to consider
479
480
        Returns:
481
            bool: True iff term is a short word
482
483
        """
484 1
        if self._sb_r1(term, r1_prefixes) == len(
485
            term
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
486
        ) and self._sb_ends_in_short_syllable(term):
487 1
            return True
488 1
        return False
489
490 1
    def _sb_has_vowel(self, term):
491
        """Return Porter helper function _sb_has_vowel value.
492
493
        Args:
494
            term (str): The term to examine
495
496
        Returns:
497
            bool: True iff a vowel exists in the term (as defined in the Porter
498
                stemmer definition)
499
500
        """
501 1
        for letter in term:
502 1
            if letter in self._vowels:
503 1
                return True
504 1
        return False
505
506
507 1
class Porter2(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
508
    """Porter2 (Snowball English) stemmer.
509
510
    The Porter2 (Snowball English) stemmer is defined in :cite:`Porter:2002`.
511
    """
512
513 1
    _doubles = {'bb', 'dd', 'ff', 'gg', 'mm', 'nn', 'pp', 'rr', 'tt'}
514 1
    _li = {'c', 'd', 'e', 'g', 'h', 'k', 'm', 'n', 'r', 't'}
515
516
    # R1 prefixes should be in order from longest to shortest to prevent
517
    # masking
518 1
    _r1_prefixes = ('commun', 'gener', 'arsen')
519 1
    _exception1dict = {  # special changes:
520
        'skis': 'ski',
521
        'skies': 'sky',
522
        'dying': 'die',
523
        'lying': 'lie',
524
        'tying': 'tie',
525
        # special -LY cases:
526
        'idly': 'idl',
527
        'gently': 'gentl',
528
        'ugly': 'ugli',
529
        'early': 'earli',
530
        'only': 'onli',
531
        'singly': 'singl',
532
    }
533 1
    _exception1set = {
534
        'sky',
535
        'news',
536
        'howe',
537
        'atlas',
538
        'cosmos',
539
        'bias',
540
        'andes',
541
    }
542 1
    _exception2set = {
543
        'inning',
544
        'outing',
545
        'canning',
546
        'herring',
547
        'earring',
548
        'proceed',
549
        'exceed',
550
        'succeed',
551
    }
552
553 1
    def stem(self, word, early_english=False):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
best-practice introduced by
Too many return statements (7/6)
Loading history...
554
        """Return the Porter2 (Snowball English) stem.
555
556
        Args:
557
            word (str): The word to stem
558
            early_english (bool): Set to True in order to remove -eth & -est
559
                (2nd & 3rd person singular verbal agreement suffixes)
560
561
        Returns:
562
            str: Word stem
563
564
        Examples:
565
            >>> stmr = Porter2()
566
            >>> stmr.stem('reading')
567
            'read'
568
            >>> stmr.stem('suspension')
569
            'suspens'
570
            >>> stmr.stem('elusiveness')
571
            'elus'
572
573
            >>> stmr.stem('eateth', early_english=True)
574
            'eat'
575
576
        """
577
        # lowercase, normalize, and compose
578 1
        word = normalize('NFC', text_type(word.lower()))
579
        # replace apostrophe-like characters with U+0027, per
580
        # http://snowball.tartarus.org/texts/apostrophe.html
581 1
        word = word.replace('’', '\'')
582 1
        word = word.replace('’', '\'')
583
584
        # Exceptions 1
585 1
        if word in self._exception1dict:
586 1
            return self._exception1dict[word]
587 1
        elif word in self._exception1set:
588 1
            return word
589
590
        # Return word if stem is shorter than 3
591 1
        if len(word) < 3:
592 1
            return word
593
594
        # Remove initial ', if present.
595 1
        while word and word[0] == '\'':
596 1
            word = word[1:]
597
            # Return word if stem is shorter than 2
598 1
            if len(word) < 2:
599 1
                return word
600
601
        # Re-map vocalic Y to y (Y will be C, y will be V)
602 1
        if word[0] == 'y':
603 1
            word = 'Y' + word[1:]
604 1
        for i in range(1, len(word)):
605 1
            if word[i] == 'y' and word[i - 1] in self._vowels:
606 1
                word = word[:i] + 'Y' + word[i + 1 :]
607
608 1
        r1_start = self._sb_r1(word, self._r1_prefixes)
609 1
        r2_start = self._sb_r2(word, self._r1_prefixes)
610
611
        # Step 0
612 1
        if word[-3:] == '\'s\'':
613 1
            word = word[:-3]
614 1
        elif word[-2:] == '\'s':
615 1
            word = word[:-2]
616 1
        elif word[-1:] == '\'':
617 1
            word = word[:-1]
618
        # Return word if stem is shorter than 2
619 1
        if len(word) < 3:
620 1
            return word
621
622
        # Step 1a
623 1
        if word[-4:] == 'sses':
624 1
            word = word[:-2]
625 1
        elif word[-3:] in {'ied', 'ies'}:
626 1
            if len(word) > 4:
627 1
                word = word[:-2]
628
            else:
629 1
                word = word[:-1]
630 1
        elif word[-2:] in {'us', 'ss'}:
631 1
            pass
632 1
        elif word[-1] == 's':
633 1
            if self._sb_has_vowel(word[:-2]):
634 1
                word = word[:-1]
635
636
        # Exceptions 2
637 1
        if word in self._exception2set:
638 1
            return word
639
640
        # Step 1b
641 1
        step1b_flag = False
642 1
        if word[-5:] == 'eedly':
643 1
            if len(word[r1_start:]) >= 5:
644 1
                word = word[:-3]
645 1
        elif word[-5:] == 'ingly':
646 1
            if self._sb_has_vowel(word[:-5]):
647 1
                word = word[:-5]
648 1
                step1b_flag = True
649 1
        elif word[-4:] == 'edly':
650 1
            if self._sb_has_vowel(word[:-4]):
651 1
                word = word[:-4]
652 1
                step1b_flag = True
653 1
        elif word[-3:] == 'eed':
654 1
            if len(word[r1_start:]) >= 3:
655 1
                word = word[:-1]
656 1
        elif word[-3:] == 'ing':
657 1
            if self._sb_has_vowel(word[:-3]):
658 1
                word = word[:-3]
659 1
                step1b_flag = True
660 1
        elif word[-2:] == 'ed':
661 1
            if self._sb_has_vowel(word[:-2]):
662 1
                word = word[:-2]
663 1
                step1b_flag = True
664 1
        elif early_english:
665 1
            if word[-3:] == 'est':
666 1
                if self._sb_has_vowel(word[:-3]):
667 1
                    word = word[:-3]
668 1
                    step1b_flag = True
669 1
            elif word[-3:] == 'eth':
670 1
                if self._sb_has_vowel(word[:-3]):
671 1
                    word = word[:-3]
672 1
                    step1b_flag = True
673
674 1
        if step1b_flag:
675 1
            if word[-2:] in {'at', 'bl', 'iz'}:
676 1
                word += 'e'
677 1
            elif word[-2:] in self._doubles:
678 1
                word = word[:-1]
679 1
            elif self._sb_short_word(word, self._r1_prefixes):
680 1
                word += 'e'
681
682
        # Step 1c
683 1
        if (
684
            len(word) > 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
685
            and word[-1] in {'Y', 'y'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
686
            and word[-2] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
687
        ):
688 1
            word = word[:-1] + 'i'
689
690
        # Step 2
691 1
        if word[-2] == 'a':
692 1
            if word[-7:] == 'ational':
693 1
                if len(word[r1_start:]) >= 7:
694 1
                    word = word[:-5] + 'e'
695 1
            elif word[-6:] == 'tional':
696 1
                if len(word[r1_start:]) >= 6:
697 1
                    word = word[:-2]
698 1
        elif word[-2] == 'c':
699 1
            if word[-4:] in {'enci', 'anci'}:
700 1
                if len(word[r1_start:]) >= 4:
701 1
                    word = word[:-1] + 'e'
702 1
        elif word[-2] == 'e':
703 1
            if word[-4:] == 'izer':
704 1
                if len(word[r1_start:]) >= 4:
705 1
                    word = word[:-1]
706 1
        elif word[-2] == 'g':
707 1
            if word[-3:] == 'ogi':
708 1
                if (
709
                    r1_start >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
710
                    and len(word[r1_start:]) >= 3
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
711
                    and word[-4] == 'l'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
712
                ):
713 1
                    word = word[:-1]
714 1
        elif word[-2] == 'l':
715 1
            if word[-6:] == 'lessli':
716 1
                if len(word[r1_start:]) >= 6:
717 1
                    word = word[:-2]
718 1
            elif word[-5:] in {'entli', 'fulli', 'ousli'}:
719 1
                if len(word[r1_start:]) >= 5:
720 1
                    word = word[:-2]
721 1
            elif word[-4:] == 'abli':
722 1
                if len(word[r1_start:]) >= 4:
723 1
                    word = word[:-1] + 'e'
724 1
            elif word[-4:] == 'alli':
725 1
                if len(word[r1_start:]) >= 4:
726 1
                    word = word[:-2]
727 1
            elif word[-3:] == 'bli':
728 1
                if len(word[r1_start:]) >= 3:
729 1
                    word = word[:-1] + 'e'
730 1
            elif word[-2:] == 'li':
731 1
                if (
732
                    r1_start >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
733
                    and len(word[r1_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
734
                    and word[-3] in self._li
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
735
                ):
736 1
                    word = word[:-2]
737 1
        elif word[-2] == 'o':
738 1
            if word[-7:] == 'ization':
739 1
                if len(word[r1_start:]) >= 7:
740 1
                    word = word[:-5] + 'e'
741 1
            elif word[-5:] == 'ation':
742 1
                if len(word[r1_start:]) >= 5:
743 1
                    word = word[:-3] + 'e'
744 1
            elif word[-4:] == 'ator':
745 1
                if len(word[r1_start:]) >= 4:
746 1
                    word = word[:-2] + 'e'
747 1
        elif word[-2] == 's':
748 1
            if word[-7:] in {'fulness', 'ousness', 'iveness'}:
749 1
                if len(word[r1_start:]) >= 7:
750 1
                    word = word[:-4]
751 1
            elif word[-5:] == 'alism':
752 1
                if len(word[r1_start:]) >= 5:
753 1
                    word = word[:-3]
754 1
        elif word[-2] == 't':
755 1
            if word[-6:] == 'biliti':
756 1
                if len(word[r1_start:]) >= 6:
757 1
                    word = word[:-5] + 'le'
758 1
            elif word[-5:] == 'aliti':
759 1
                if len(word[r1_start:]) >= 5:
760 1
                    word = word[:-3]
761 1
            elif word[-5:] == 'iviti':
762 1
                if len(word[r1_start:]) >= 5:
763 1
                    word = word[:-3] + 'e'
764
765
        # Step 3
766 1
        if word[-7:] == 'ational':
767 1
            if len(word[r1_start:]) >= 7:
768 1
                word = word[:-5] + 'e'
769 1
        elif word[-6:] == 'tional':
770 1
            if len(word[r1_start:]) >= 6:
771 1
                word = word[:-2]
772 1
        elif word[-5:] in {'alize', 'icate', 'iciti'}:
773 1
            if len(word[r1_start:]) >= 5:
774 1
                word = word[:-3]
775 1
        elif word[-5:] == 'ative':
776 1
            if len(word[r2_start:]) >= 5:
777 1
                word = word[:-5]
778 1
        elif word[-4:] == 'ical':
779 1
            if len(word[r1_start:]) >= 4:
780 1
                word = word[:-2]
781 1
        elif word[-4:] == 'ness':
782 1
            if len(word[r1_start:]) >= 4:
783 1
                word = word[:-4]
784 1
        elif word[-3:] == 'ful':
785 1
            if len(word[r1_start:]) >= 3:
786 1
                word = word[:-3]
787
788
        # Step 4
789 1
        for suffix in (
790
            'ement',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
791
            'ance',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
792
            'ence',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
793
            'able',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
794
            'ible',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
795
            'ment',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
796
            'ant',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
797
            'ent',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
798
            'ism',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
799
            'ate',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
800
            'iti',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
801
            'ous',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
802
            'ive',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
803
            'ize',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
804
            'al',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
805
            'er',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
806
            'ic',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
807
        ):
808 1
            if word[-len(suffix) :] == suffix:
809 1
                if len(word[r2_start:]) >= len(suffix):
810 1
                    word = word[: -len(suffix)]
811 1
                break
812
        else:
813 1
            if word[-3:] == 'ion':
814 1
                if (
815
                    len(word[r2_start:]) >= 3
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
816
                    and len(word) >= 4
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
817
                    and word[-4] in tuple('st')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
818
                ):
819 1
                    word = word[:-3]
820
821
        # Step 5
822 1
        if word[-1] == 'e':
823 1
            if len(word[r2_start:]) >= 1 or (
824
                len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
825
                and not self._sb_ends_in_short_syllable(word[:-1])
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
826
            ):
827 1
                word = word[:-1]
828 1
        elif word[-1] == 'l':
829 1
            if len(word[r2_start:]) >= 1 and word[-2] == 'l':
830 1
                word = word[:-1]
831
832
        # Change 'Y' back to 'y' if it survived stemming
833 1
        for i in range(0, len(word)):
834 1
            if word[i] == 'Y':
835 1
                word = word[:i] + 'y' + word[i + 1 :]
836
837 1
        return word
838
839
840 1
def porter2(word, early_english=False):
841
    """Return the Porter2 (Snowball English) stem.
842
843
    This is a wrapper for :py:meth:`Porter2.stem`.
844
845
    Args:
846
        word (str): The word to stem
847
        early_english (bool): Set to True in order to remove -eth & -est (2nd &
848
            3rd person singular verbal agreement suffixes)
849
850
    Returns:
851
        str: Word stem
852
853
    Examples:
854
        >>> porter2('reading')
855
        'read'
856
        >>> porter2('suspension')
857
        'suspens'
858
        >>> porter2('elusiveness')
859
        'elus'
860
861
        >>> porter2('eateth', early_english=True)
862
        'eat'
863
864
    """
865 1
    return Porter2().stem(word, early_english)
866
867
868 1
class SnowballGerman(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
869
    """Snowball German stemmer.
870
871
    The Snowball German stemmer is defined at:
872
    http://snowball.tartarus.org/algorithms/german/stemmer.html
873
    """
874
875 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'ä', 'ö', 'ü'}
876 1
    _s_endings = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 'r', 't'}
877 1
    _st_endings = {'b', 'd', 'f', 'g', 'h', 'k', 'l', 'm', 'n', 't'}
878
879 1
    def stem(self, word, alternate_vowels=False):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
880
        """Return Snowball German stem.
881
882
        Args:
883
            word (str): The word to stem
884
            alternate_vowels (bool): composes ae as ä, oe as ö, and ue as ü
885
                before running the algorithm
886
887
        Returns:
888
            str: Word stem
889
890
        Examples:
891
            >>> stmr = SnowballGerman()
892
            >>> stmr.stem('lesen')
893
            'les'
894
            >>> stmr.stem('graues')
895
            'grau'
896
            >>> stmr.stem('buchstabieren')
897
            'buchstabi'
898
899
        """
900
        # lowercase, normalize, and compose
901 1
        word = normalize('NFC', word.lower())
902 1
        word = word.replace('ß', 'ss')
903
904 1
        if len(word) > 2:
905 1
            for i in range(2, len(word)):
906 1
                if word[i] in self._vowels and word[i - 2] in self._vowels:
907 1
                    if word[i - 1] == 'u':
908 1
                        word = word[: i - 1] + 'U' + word[i:]
909 1
                    elif word[i - 1] == 'y':
910 1
                        word = word[: i - 1] + 'Y' + word[i:]
911
912 1
        if alternate_vowels:
913 1
            word = word.replace('ae', 'ä')
914 1
            word = word.replace('oe', 'ö')
915 1
            word = word.replace('que', 'Q')
916 1
            word = word.replace('ue', 'ü')
917 1
            word = word.replace('Q', 'que')
918
919 1
        r1_start = max(3, self._sb_r1(word))
920 1
        r2_start = self._sb_r2(word)
921
922
        # Step 1
923 1
        niss_flag = False
924 1
        if word[-3:] == 'ern':
925 1
            if len(word[r1_start:]) >= 3:
926 1
                word = word[:-3]
927 1
        elif word[-2:] == 'em':
928 1
            if len(word[r1_start:]) >= 2:
929 1
                word = word[:-2]
930 1
        elif word[-2:] == 'er':
931 1
            if len(word[r1_start:]) >= 2:
932 1
                word = word[:-2]
933 1
        elif word[-2:] == 'en':
934 1
            if len(word[r1_start:]) >= 2:
935 1
                word = word[:-2]
936 1
                niss_flag = True
937 1
        elif word[-2:] == 'es':
938 1
            if len(word[r1_start:]) >= 2:
939 1
                word = word[:-2]
940 1
                niss_flag = True
941 1
        elif word[-1:] == 'e':
942 1
            if len(word[r1_start:]) >= 1:
943 1
                word = word[:-1]
944 1
                niss_flag = True
945 1
        elif word[-1:] == 's':
946 1
            if (
947
                len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
948
                and len(word) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
949
                and word[-2] in self._s_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
950
            ):
951 1
                word = word[:-1]
952
953 1
        if niss_flag and word[-4:] == 'niss':
954 1
            word = word[:-1]
955
956
        # Step 2
957 1
        if word[-3:] == 'est':
958 1
            if len(word[r1_start:]) >= 3:
959 1
                word = word[:-3]
960 1
        elif word[-2:] == 'en':
961 1
            if len(word[r1_start:]) >= 2:
962 1
                word = word[:-2]
963 1
        elif word[-2:] == 'er':
964 1
            if len(word[r1_start:]) >= 2:
965 1
                word = word[:-2]
966 1
        elif word[-2:] == 'st':
967 1
            if (
968
                len(word[r1_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
969
                and len(word) >= 6
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
970
                and word[-3] in self._st_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
971
            ):
972 1
                word = word[:-2]
973
974
        # Step 3
975 1
        if word[-4:] == 'isch':
976 1
            if len(word[r2_start:]) >= 4 and word[-5] != 'e':
977 1
                word = word[:-4]
978 1
        elif word[-4:] in {'lich', 'heit'}:
979 1
            if len(word[r2_start:]) >= 4:
980 1
                word = word[:-4]
981 1
                if word[-2:] in {'er', 'en'} and len(word[r1_start:]) >= 2:
982 1
                    word = word[:-2]
983 1
        elif word[-4:] == 'keit':
984 1
            if len(word[r2_start:]) >= 4:
985 1
                word = word[:-4]
986 1
                if word[-4:] == 'lich' and len(word[r2_start:]) >= 4:
987 1
                    word = word[:-4]
988 1
                elif word[-2:] == 'ig' and len(word[r2_start:]) >= 2:
989 1
                    word = word[:-2]
990 1
        elif word[-3:] in {'end', 'ung'}:
991 1
            if len(word[r2_start:]) >= 3:
992 1
                word = word[:-3]
993 1
                if (
994
                    word[-2:] == 'ig'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
995
                    and len(word[r2_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
996
                    and word[-3] != 'e'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
997
                ):
998 1
                    word = word[:-2]
999 1
        elif word[-2:] in {'ig', 'ik'}:
1000 1
            if len(word[r2_start:]) >= 2 and word[-3] != 'e':
1001 1
                word = word[:-2]
1002
1003
        # Change 'Y' and 'U' back to lowercase if survived stemming
1004 1
        for i in range(0, len(word)):
1005 1
            if word[i] == 'Y':
1006 1
                word = word[:i] + 'y' + word[i + 1 :]
1007 1
            elif word[i] == 'U':
1008 1
                word = word[:i] + 'u' + word[i + 1 :]
1009
1010
        # Remove umlauts
1011 1
        _umlauts = dict(zip((ord(_) for _ in 'äöü'), 'aou'))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _ does not seem to be defined.
Loading history...
1012 1
        word = word.translate(_umlauts)
1013
1014 1
        return word
1015
1016
1017 1
def sb_german(word, alternate_vowels=False):
1018
    """Return Snowball German stem.
1019
1020
    This is a wrapper for :py:meth:`SnowballGerman.stem`.
1021
1022
    Args:
1023
        word (str): The word to stem
1024
        alternate_vowels (bool): composes ae as ä, oe as ö, and ue as ü
1025
            before running the algorithm
1026
1027
    Returns:
1028
        str: Word stem
1029
1030
    Examples:
1031
        >>> sb_german('lesen')
1032
        'les'
1033
        >>> sb_german('graues')
1034
        'grau'
1035
        >>> sb_german('buchstabieren')
1036
        'buchstabi'
1037
1038
    """
1039 1
    return SnowballGerman().stem(word, alternate_vowels)
1040
1041
1042 1
class SnowballDutch(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1043
    """Snowball Dutch stemmer.
1044
1045
    The Snowball Dutch stemmer is defined at:
1046
    http://snowball.tartarus.org/algorithms/dutch/stemmer.html
1047
    """
1048
1049 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'è'}
1050 1
    _not_s_endings = {'a', 'e', 'i', 'j', 'o', 'u', 'y', 'è'}
1051 1
    _accented = dict(zip((ord(_) for _ in 'äëïöüáéíóú'), 'aeiouaeiou'))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _ does not seem to be defined.
Loading history...
1052
1053 1
    def _undouble(self, word):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
1054
        """Undouble endings -kk, -dd, and -tt.
1055
1056
        Args:
1057
            word (str): The word to stem
1058
1059
        Returns:
1060
            str: The word with doubled endings undoubled
1061
1062
        """
1063 1
        if (
1064
            len(word) > 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1065
            and word[-1] == word[-2]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1066
            and word[-1] in {'d', 'k', 't'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1067
        ):
1068 1
            return word[:-1]
1069 1
        return word
1070
1071 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1072
        """Return Snowball Dutch stem.
1073
1074
        Args:
1075
            word (str): The word to stem
1076
1077
        Returns:
1078
            str: Word stem
1079
1080
        Examples:
1081
            >>> stmr = SnowballDutch()
1082
            >>> stmr.stem('lezen')
1083
            'lez'
1084
            >>> stmr.stem('opschorting')
1085
            'opschort'
1086
            >>> stmr.stem('ongrijpbaarheid')
1087
            'ongrijp'
1088
1089
        """
1090
        # lowercase, normalize, decompose, filter umlauts & acutes out, and
1091
        # compose
1092 1
        word = normalize('NFC', text_type(word.lower()))
1093 1
        word = word.translate(self._accented)
1094
1095 1
        for i in range(len(word)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
1096 1
            if i == 0 and word[0] == 'y':
1097 1
                word = 'Y' + word[1:]
1098 1
            elif word[i] == 'y' and word[i - 1] in self._vowels:
1099 1
                word = word[:i] + 'Y' + word[i + 1 :]
1100 1
            elif (
1101
                word[i] == 'i'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1102
                and word[i - 1] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1103
                and i + 1 < len(word)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1104
                and word[i + 1] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1105
            ):
1106 1
                word = word[:i] + 'I' + word[i + 1 :]
1107
1108 1
        r1_start = max(3, self._sb_r1(word))
1109 1
        r2_start = self._sb_r2(word)
1110
1111
        # Step 1
1112 1
        if word[-5:] == 'heden':
1113 1
            if len(word[r1_start:]) >= 5:
1114 1
                word = word[:-3] + 'id'
1115 1
        elif word[-3:] == 'ene':
1116 1
            if len(word[r1_start:]) >= 3 and (
1117
                word[-4] not in self._vowels and word[-6:-3] != 'gem'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1118
            ):
1119 1
                word = self._undouble(word[:-3])
1120 1
        elif word[-2:] == 'en':
1121 1
            if len(word[r1_start:]) >= 2 and (
1122
                word[-3] not in self._vowels and word[-5:-2] != 'gem'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1123
            ):
1124 1
                word = self._undouble(word[:-2])
1125 1
        elif word[-2:] == 'se':
1126 1
            if (
1127
                len(word[r1_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1128
                and word[-3] not in self._not_s_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1129
            ):
1130 1
                word = word[:-2]
1131 1
        elif word[-1:] == 's':
1132 1
            if (
1133
                len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1134
                and word[-2] not in self._not_s_endings
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1135
            ):
1136 1
                word = word[:-1]
1137
1138
        # Step 2
1139 1
        e_removed = False
1140 1
        if word[-1:] == 'e':
1141 1
            if len(word[r1_start:]) >= 1 and word[-2] not in self._vowels:
1142 1
                word = self._undouble(word[:-1])
1143 1
                e_removed = True
1144
1145
        # Step 3a
1146 1
        if word[-4:] == 'heid':
1147 1
            if len(word[r2_start:]) >= 4 and word[-5] != 'c':
1148 1
                word = word[:-4]
1149 1
                if word[-2:] == 'en':
1150 1
                    if len(word[r1_start:]) >= 2 and (
1151
                        word[-3] not in self._vowels and word[-5:-2] != 'gem'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1152
                    ):
1153 1
                        word = self._undouble(word[:-2])
1154
1155
        # Step 3b
1156 1
        if word[-4:] == 'lijk':
1157 1
            if len(word[r2_start:]) >= 4:
1158 1
                word = word[:-4]
1159
                # Repeat step 2
1160 1
                if word[-1:] == 'e':
1161 1
                    if (
1162
                        len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1163
                        and word[-2] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1164
                    ):
1165 1
                        word = self._undouble(word[:-1])
1166 1
        elif word[-4:] == 'baar':
1167 1
            if len(word[r2_start:]) >= 4:
1168 1
                word = word[:-4]
1169 1
        elif word[-3:] in ('end', 'ing'):
1170 1
            if len(word[r2_start:]) >= 3:
1171 1
                word = word[:-3]
1172 1
                if (
1173
                    word[-2:] == 'ig'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1174
                    and len(word[r2_start:]) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1175
                    and word[-3] != 'e'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1176
                ):
1177 1
                    word = word[:-2]
1178
                else:
1179 1
                    word = self._undouble(word)
1180 1
        elif word[-3:] == 'bar':
1181 1
            if len(word[r2_start:]) >= 3 and e_removed:
1182 1
                word = word[:-3]
1183 1
        elif word[-2:] == 'ig':
1184 1
            if len(word[r2_start:]) >= 2 and word[-3] != 'e':
1185 1
                word = word[:-2]
1186
1187
        # Step 4
1188 1
        if (
1189
            len(word) >= 4
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
1190
            and word[-3] == word[-2]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1191
            and word[-2] in {'a', 'e', 'o', 'u'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1192
            and word[-4] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1193
            and word[-1] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1194
            and word[-1] != 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1195
        ):
1196 1
            word = word[:-2] + word[-1]
1197
1198
        # Change 'Y' and 'U' back to lowercase if survived stemming
1199 1
        for i in range(0, len(word)):
1200 1
            if word[i] == 'Y':
1201 1
                word = word[:i] + 'y' + word[i + 1 :]
1202 1
            elif word[i] == 'I':
1203 1
                word = word[:i] + 'i' + word[i + 1 :]
1204
1205 1
        return word
1206
1207
1208 1
def sb_dutch(word):
1209
    """Return Snowball Dutch stem.
1210
1211
    This is a wrapper for :py:meth:`SnowballDutch.stem`.
1212
1213
    Args:
1214
        word (str): The word to stem
1215
1216
    Returns:
1217
        str: Word stem
1218
1219
    Examples:
1220
        >>> sb_dutch('lezen')
1221
        'lez'
1222
        >>> sb_dutch('opschorting')
1223
        'opschort'
1224
        >>> sb_dutch('ongrijpbaarheid')
1225
        'ongrijp'
1226
1227
    """
1228 1
    return SnowballDutch().stem(word)
1229
1230
1231 1
class SnowballNorwegian(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1232
    """Snowball Norwegian stemmer.
1233
1234
    The Snowball Norwegian stemmer is defined at:
1235
    http://snowball.tartarus.org/algorithms/norwegian/stemmer.html
1236
    """
1237
1238 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'å', 'æ', 'ø'}
1239 1
    _s_endings = {
1240
        'b',
1241
        'c',
1242
        'd',
1243
        'f',
1244
        'g',
1245
        'h',
1246
        'j',
1247
        'l',
1248
        'm',
1249
        'n',
1250
        'o',
1251
        'p',
1252
        'r',
1253
        't',
1254
        'v',
1255
        'y',
1256
        'z',
1257
    }
1258
1259 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1260
        """Return Snowball Norwegian stem.
1261
1262
        Args:
1263
            word (str): The word to stem
1264
1265
        Returns:
1266
            str: Word stem
1267
1268
        Examples:
1269
            >>> stmr = SnowballNorwegian()
1270
            >>> stmr.stem('lese')
1271
            'les'
1272
            >>> stmr.stem('suspensjon')
1273
            'suspensjon'
1274
            >>> stmr.stem('sikkerhet')
1275
            'sikker'
1276
1277
        """
1278
        # lowercase, normalize, and compose
1279 1
        word = normalize('NFC', text_type(word.lower()))
1280
1281 1
        r1_start = min(max(3, self._sb_r1(word)), len(word))
1282
1283
        # Step 1
1284 1
        _r1 = word[r1_start:]
1285 1
        if _r1[-7:] == 'hetenes':
1286 1
            word = word[:-7]
1287 1
        elif _r1[-6:] in {'hetene', 'hetens'}:
1288 1
            word = word[:-6]
1289 1
        elif _r1[-5:] in {'heten', 'heter', 'endes'}:
1290 1
            word = word[:-5]
1291 1
        elif _r1[-4:] in {'ande', 'ende', 'edes', 'enes', 'erte'}:
1292 1
            if word[-4:] == 'erte':
1293 1
                word = word[:-2]
1294
            else:
1295 1
                word = word[:-4]
1296 1
        elif _r1[-3:] in {
1297
            'ede',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1298
            'ane',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1299
            'ene',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1300
            'ens',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1301
            'ers',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1302
            'ets',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1303
            'het',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1304
            'ast',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1305
            'ert',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1306
        }:
1307 1
            if word[-3:] == 'ert':
1308 1
                word = word[:-1]
1309
            else:
1310 1
                word = word[:-3]
1311 1
        elif _r1[-2:] in {'en', 'ar', 'er', 'as', 'es', 'et'}:
1312 1
            word = word[:-2]
1313 1
        elif _r1[-1:] in {'a', 'e'}:
1314 1
            word = word[:-1]
1315 1
        elif _r1[-1:] == 's':
1316 1
            if (len(word) > 1 and word[-2] in self._s_endings) or (
1317
                len(word) > 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1318
                and word[-2] == 'k'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1319
                and word[-3] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1320
            ):
1321 1
                word = word[:-1]
1322
1323
        # Step 2
1324 1
        if word[r1_start:][-2:] in {'dt', 'vt'}:
1325 1
            word = word[:-1]
1326
1327
        # Step 3
1328 1
        _r1 = word[r1_start:]
1329 1
        if _r1[-7:] == 'hetslov':
1330 1
            word = word[:-7]
1331 1
        elif _r1[-4:] in {'eleg', 'elig', 'elov', 'slov'}:
1332 1
            word = word[:-4]
1333 1
        elif _r1[-3:] in {'leg', 'eig', 'lig', 'els', 'lov'}:
1334 1
            word = word[:-3]
1335 1
        elif _r1[-2:] == 'ig':
1336 1
            word = word[:-2]
1337
1338 1
        return word
1339
1340
1341 1
def sb_norwegian(word):
1342
    """Return Snowball Norwegian stem.
1343
1344
    This is a wrapper for :py:meth:`SnowballNorwegian.stem`.
1345
1346
    Args:
1347
        word (str): The word to stem
1348
1349
    Returns:
1350
        str: Word stem
1351
1352
    Examples:
1353
        >>> sb_norwegian('lese')
1354
        'les'
1355
        >>> sb_norwegian('suspensjon')
1356
        'suspensjon'
1357
        >>> sb_norwegian('sikkerhet')
1358
        'sikker'
1359
1360
    """
1361 1
    return SnowballNorwegian().stem(word)
1362
1363
1364 1
class SnowballSwedish(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1365
    """Snowball Swedish stemmer.
1366
1367
    The Snowball Swedish stemmer is defined at:
1368
    http://snowball.tartarus.org/algorithms/swedish/stemmer.html
1369
    """
1370
1371 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'ä', 'å', 'ö'}
1372 1
    _s_endings = {
1373
        'b',
1374
        'c',
1375
        'd',
1376
        'f',
1377
        'g',
1378
        'h',
1379
        'j',
1380
        'k',
1381
        'l',
1382
        'm',
1383
        'n',
1384
        'o',
1385
        'p',
1386
        'r',
1387
        't',
1388
        'v',
1389
        'y',
1390
    }
1391
1392 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1393
        """Return Snowball Swedish stem.
1394
1395
        Args:
1396
            word (str): The word to stem
1397
1398
        Returns:
1399
            str: Word stem
1400
1401
        Examples:
1402
            >>> stmr = SnowballSwedish()
1403
            >>> stmr.stem('undervisa')
1404
            'undervis'
1405
            >>> stmr.stem('suspension')
1406
            'suspension'
1407
            >>> stmr.stem('visshet')
1408
            'viss'
1409
1410
        """
1411
        # lowercase, normalize, and compose
1412 1
        word = normalize('NFC', text_type(word.lower()))
1413
1414 1
        r1_start = min(max(3, self._sb_r1(word)), len(word))
1415
1416
        # Step 1
1417 1
        _r1 = word[r1_start:]
1418 1 View Code Duplication
        if _r1[-7:] == 'heterna':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1419 1
            word = word[:-7]
1420 1
        elif _r1[-6:] == 'hetens':
1421 1
            word = word[:-6]
1422 1
        elif _r1[-5:] in {
1423
            'anden',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1424
            'heten',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1425
            'heter',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1426
            'arnas',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1427
            'ernas',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1428
            'ornas',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1429
            'andes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1430
            'arens',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1431
            'andet',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1432
        }:
1433 1
            word = word[:-5]
1434 1
        elif _r1[-4:] in {
1435
            'arna',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1436
            'erna',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1437
            'orna',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1438
            'ande',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1439
            'arne',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1440
            'aste',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1441
            'aren',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1442
            'ades',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1443
            'erns',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1444
        }:
1445 1
            word = word[:-4]
1446 1
        elif _r1[-3:] in {'ade', 'are', 'ern', 'ens', 'het', 'ast'}:
1447 1
            word = word[:-3]
1448 1
        elif _r1[-2:] in {'ad', 'en', 'ar', 'er', 'or', 'as', 'es', 'at'}:
1449 1
            word = word[:-2]
1450 1
        elif _r1[-1:] in {'a', 'e'}:
1451 1
            word = word[:-1]
1452 1
        elif _r1[-1:] == 's':
1453 1
            if len(word) > 1 and word[-2] in self._s_endings:
1454 1
                word = word[:-1]
1455
1456
        # Step 2
1457 1
        if word[r1_start:][-2:] in {'dd', 'gd', 'nn', 'dt', 'gt', 'kt', 'tt'}:
1458 1
            word = word[:-1]
1459
1460
        # Step 3
1461 1
        _r1 = word[r1_start:]
1462 1
        if _r1[-5:] == 'fullt':
1463 1
            word = word[:-1]
1464 1
        elif _r1[-4:] == 'löst':
1465 1
            word = word[:-1]
1466 1
        elif _r1[-3:] in {'lig', 'els'}:
1467 1
            word = word[:-3]
1468 1
        elif _r1[-2:] == 'ig':
1469 1
            word = word[:-2]
1470
1471 1
        return word
1472
1473
1474 1
def sb_swedish(word):
1475
    """Return Snowball Swedish stem.
1476
1477
    This is a wrapper for :py:meth:`SnowballSwedish.stem`.
1478
1479
    Args:
1480
        word (str): The word to stem
1481
1482
    Returns:
1483
        str: Word stem
1484
1485
    Examples:
1486
        >>> sb_swedish('undervisa')
1487
        'undervis'
1488
        >>> sb_swedish('suspension')
1489
        'suspension'
1490
        >>> sb_swedish('visshet')
1491
        'viss'
1492
1493
    """
1494 1
    return SnowballSwedish().stem(word)
1495
1496
1497 1
class SnowballDanish(Snowball):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
1498
    """Snowball Danish stemmer.
1499
1500
    The Snowball Danish stemmer is defined at:
1501
    http://snowball.tartarus.org/algorithms/danish/stemmer.html
1502
    """
1503
1504 1
    _vowels = {'a', 'e', 'i', 'o', 'u', 'y', 'å', 'æ', 'ø'}
1505 1
    _s_endings = {
1506
        'a',
1507
        'b',
1508
        'c',
1509
        'd',
1510
        'f',
1511
        'g',
1512
        'h',
1513
        'j',
1514
        'k',
1515
        'l',
1516
        'm',
1517
        'n',
1518
        'o',
1519
        'p',
1520
        'r',
1521
        't',
1522
        'v',
1523
        'y',
1524
        'z',
1525
        'å',
1526
    }
1527
1528 1
    def stem(self, word):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'stem' method
Loading history...
1529
        """Return Snowball Danish stem.
1530
1531
        Args:
1532
            word (str): The word to stem
1533
1534
        Returns:
1535
            str: Word stem
1536
1537
        Examples:
1538
            >>> stmr = SnowballDanish()
1539
            >>> stmr.stem('underviser')
1540
            'undervis'
1541
            >>> stmr.stem('suspension')
1542
            'suspension'
1543
            >>> stmr.stem('sikkerhed')
1544
            'sikker'
1545
1546
        """
1547
        # lowercase, normalize, and compose
1548 1
        word = normalize('NFC', text_type(word.lower()))
1549
1550 1
        r1_start = min(max(3, self._sb_r1(word)), len(word))
1551
1552
        # Step 1
1553 1
        _r1 = word[r1_start:]
1554 1 View Code Duplication
        if _r1[-7:] == 'erendes':
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
1555 1
            word = word[:-7]
1556 1
        elif _r1[-6:] in {'erende', 'hedens'}:
1557 1
            word = word[:-6]
1558 1
        elif _r1[-5:] in {
1559
            'ethed',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1560
            'erede',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1561
            'heden',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1562
            'heder',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1563
            'endes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1564
            'ernes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1565
            'erens',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1566
            'erets',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1567
        }:
1568 1
            word = word[:-5]
1569 1
        elif _r1[-4:] in {
1570
            'ered',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1571
            'ende',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1572
            'erne',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1573
            'eren',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1574
            'erer',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1575
            'heds',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1576
            'enes',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1577
            'eres',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1578
            'eret',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1579
        }:
1580 1
            word = word[:-4]
1581 1
        elif _r1[-3:] in {'hed', 'ene', 'ere', 'ens', 'ers', 'ets'}:
1582 1
            word = word[:-3]
1583 1
        elif _r1[-2:] in {'en', 'er', 'es', 'et'}:
1584 1
            word = word[:-2]
1585 1
        elif _r1[-1:] == 'e':
1586 1
            word = word[:-1]
1587 1
        elif _r1[-1:] == 's':
1588 1
            if len(word) > 1 and word[-2] in self._s_endings:
1589 1
                word = word[:-1]
1590
1591
        # Step 2
1592 1
        if word[r1_start:][-2:] in {'gd', 'dt', 'gt', 'kt'}:
1593 1
            word = word[:-1]
1594
1595
        # Step 3
1596 1
        if word[-4:] == 'igst':
1597 1
            word = word[:-2]
1598
1599 1
        _r1 = word[r1_start:]
1600 1
        repeat_step2 = False
1601 1
        if _r1[-4:] == 'elig':
1602 1
            word = word[:-4]
1603 1
            repeat_step2 = True
1604 1
        elif _r1[-4:] == 'løst':
1605 1
            word = word[:-1]
1606 1
        elif _r1[-3:] in {'lig', 'els'}:
1607 1
            word = word[:-3]
1608 1
            repeat_step2 = True
1609 1
        elif _r1[-2:] == 'ig':
1610 1
            word = word[:-2]
1611 1
            repeat_step2 = True
1612
1613 1
        if repeat_step2:
1614 1
            if word[r1_start:][-2:] in {'gd', 'dt', 'gt', 'kt'}:
1615 1
                word = word[:-1]
1616
1617
        # Step 4
1618 1
        if (
1619
            len(word[r1_start:]) >= 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1620
            and len(word) >= 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1621
            and word[-1] == word[-2]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1622
            and word[-1] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1623
        ):
1624 1
            word = word[:-1]
1625
1626 1
        return word
1627
1628
1629 1
def sb_danish(word):
1630
    """Return Snowball Danish stem.
1631
1632
    This is a wrapper for :py:meth:`SnowballDanish.stem`.
1633
1634
    Args:
1635
        word (str): The word to stem
1636
1637
    Returns:
1638
        str: Word stem
1639
1640
    Examples:
1641
        >>> sb_danish('underviser')
1642
        'undervis'
1643
        >>> sb_danish('suspension')
1644
        'suspension'
1645
        >>> sb_danish('sikkerhed')
1646
        'sikker'
1647
1648
    """
1649 1
    return SnowballDanish().stem(word)
1650
1651
1652
if __name__ == '__main__':
1653
    import doctest
1654
1655
    doctest.testmod()
1656