Completed
Push — master ( f43547...71985b )
by Chris
12:00 queued 10s
created

abydos.stemmer._snowball.sb_dutch()   F

Complexity

Conditions 66

Size

Total Lines 137
Code Lines 93

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 74
CRAP Score 66

Importance

Changes 0
Metric Value
eloc 93
dl 0
loc 137
ccs 74
cts 74
cp 1
rs 0
c 0
b 0
f 0
cc 66
nop 1
crap 66

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.stemmer._snowball.sb_dutch() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.stemmer._snowball.
20
21
Snowball Stemmer base class
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from six.moves import range
32
33 1
from ._stemmer import _Stemmer
34
35
36 1
class _Snowball(_Stemmer):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
37
    """Snowball stemmer base class."""
38
39 1
    _vowels = set('aeiouy')
40 1
    _codanonvowels = set('\'bcdfghjklmnpqrstvz')
41
42 1
    def _sb_r1(self, term, r1_prefixes=None):
43
        """Return the R1 region, as defined in the Porter2 specification.
44
45
        Parameters
46
        ----------
47
        term : str
48
            The term to examine
49
        r1_prefixes : set
50
            Prefixes to consider
51
52
        Returns
53
        -------
54
        int
55
            Length of the R1 region
56
57
        """
58 1
        vowel_found = False
59 1
        if hasattr(r1_prefixes, '__iter__'):
60 1
            for prefix in r1_prefixes:
61 1
                if term[: len(prefix)] == prefix:
62 1
                    return len(prefix)
63
64 1
        for i in range(len(term)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
65 1
            if not vowel_found and term[i] in self._vowels:
66 1
                vowel_found = True
67 1
            elif vowel_found and term[i] not in self._vowels:
68 1
                return i + 1
69 1
        return len(term)
70
71 1
    def _sb_r2(self, term, r1_prefixes=None):
72
        """Return the R2 region, as defined in the Porter2 specification.
73
74
        Parameters
75
        ----------
76
        term : str
77
            The term to examine
78
        r1_prefixes : set
79
            Prefixes to consider
80
81
        Returns
82
        -------
83
        int
84
            Length of the R1 region
85
86
        """
87 1
        r1_start = self._sb_r1(term, r1_prefixes)
88 1
        return r1_start + self._sb_r1(term[r1_start:])
89
90 1
    def _sb_ends_in_short_syllable(self, term):
91
        """Return True iff term ends in a short syllable.
92
93
        (...according to the Porter2 specification.)
94
95
        NB: This is akin to the CVC test from the Porter stemmer. The
96
        description is unfortunately poor/ambiguous.
97
98
        Parameters
99
        ----------
100
        term : str
101
            The term to examine
102
103
        Returns
104
        -------
105
        bool
106
            True iff term ends in a short syllable
107
108
        """
109 1
        if not term:
110 1
            return False
111 1
        if len(term) == 2:
112 1
            if term[-2] in self._vowels and term[-1] not in self._vowels:
113 1
                return True
114 1
        elif len(term) >= 3:
115 1
            if (
116
                term[-3] not in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
117
                and term[-2] in self._vowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
118
                and term[-1] in self._codanonvowels
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
119
            ):
120 1
                return True
121 1
        return False
122
123 1
    def _sb_short_word(self, term, r1_prefixes=None):
124
        """Return True iff term is a short word.
125
126
        (...according to the Porter2 specification.)
127
128
        Parameters
129
        ----------
130
        term : str
131
            The term to examine
132
        r1_prefixes : set
133
            Prefixes to consider
134
135
        Returns
136
        -------
137
        bool
138
            True iff term is a short word
139
140
        """
141 1
        if self._sb_r1(term, r1_prefixes) == len(
142
            term
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
143
        ) and self._sb_ends_in_short_syllable(term):
144 1
            return True
145 1
        return False
146
147 1
    def _sb_has_vowel(self, term):
148
        """Return Porter helper function _sb_has_vowel value.
149
150
        Parameters
151
        ----------
152
        term : str
153
            The term to examine
154
155
        Returns
156
        -------
157
        bool
158
            True iff a vowel exists in the term (as defined in the Porter
159
            stemmer definition)
160
161
        """
162 1
        for letter in term:
163 1
            if letter in self._vowels:
164 1
                return True
165 1
        return False
166
167
168
if __name__ == '__main__':
169
    import doctest
170
171
    doctest.testmod()
172