SpanishMetaphone.encode()   F
last analyzed

Complexity

Conditions 29

Size

Total Lines 187
Code Lines 101

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 81
CRAP Score 29

Importance

Changes 0
Metric Value
eloc 101
dl 0
loc 187
ccs 81
cts 81
cp 1
rs 0
c 0
b 0
f 0
cc 29
nop 2
crap 29

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._spanish_metaphone.SpanishMetaphone.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# Copyright 2018-2020 by Christopher C. Little.
2
# This file is part of Abydos.
3
#
4
# Abydos is free software: you can redistribute it and/or modify
5
# it under the terms of the GNU General Public License as published by
6
# the Free Software Foundation, either version 3 of the License, or
7
# (at your option) any later version.
8
#
9
# Abydos is distributed in the hope that it will be useful,
10
# but WITHOUT ANY WARRANTY; without even the implied warranty of
11
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12
# GNU General Public License for more details.
13
#
14
# You should have received a copy of the GNU General Public License
15
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
16
17
"""abydos.phonetic._spanish_metaphone.
18
19 1
Spanish Metaphone
20
"""
21
22
from unicodedata import normalize as unicode_normalize
23
24 1
from ._phonetic import _Phonetic
25
26
__all__ = ['SpanishMetaphone']
27
28
29
class SpanishMetaphone(_Phonetic):
30
    """Spanish Metaphone.
31 1
32
    This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
33 1
    https://github.com/amsqr/Spanish-Metaphone and discussed in
34
    :cite:`Mosquera:2012`.
35 1
36
    Modified version based on :cite:`delPilarAngeles:2016`.
37 1
38 1
39
    .. versionadded:: 0.3.6
40 1
    """
41
42
    def __init__(self, max_length: int = 6, modified: bool = False) -> None:
43 1
        """Initialize AlphaSIS instance.
44
45
        Parameters
46
        ----------
47
        max_length : int
48
            The length of the code returned (defaults to 6)
49
        modified : bool
50
            Set to True to use del Pilar Angeles & Bailón-Miguel's modified
51
            version of the algorithm
52
53
54
        .. versionadded:: 0.4.0
55
56 1
        """
57
        self._max_length = max_length
58
        self._modified = modified
59
60
    def encode(self, word: str) -> str:
61
        """Return the Spanish Metaphone of a word.
62
63
        Parameters
64
        ----------
65
        word : str
66
            The word to transform
67
68
69
        Returns
70
        -------
71 1
        str
72 1
            The Spanish Metaphone code
73
74 1
        Examples
75
        --------
76
        >>> pe = SpanishMetaphone()
77
        >>> pe.encode('Perez')
78
        'PRZ'
79
        >>> pe.encode('Martinez')
80
        'MRTNZ'
81
        >>> pe.encode('Gutierrez')
82
        'GTRRZ'
83
        >>> pe.encode('Santiago')
84
        'SNTG'
85
        >>> pe.encode('Nicolás')
86
        'NKLS'
87
88
89
        .. versionadded:: 0.3.0
90
        .. versionchanged:: 0.3.6
91
            Encapsulated in class
92
93
94
        """
95
96
        def _is_vowel(pos: int) -> bool:
97
            """Return True if the character at word[pos] is a vowel.
98
99
            Parameters
100
            ----------
101
            pos : int
102
                Position to check for a vowel
103
104
            Returns
105
            -------
106
            bool
107
                True if word[pos] is a vowel
108
109
            .. versionadded:: 0.3.0
110 1
111
            """
112
            return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}
113
114
        word = unicode_normalize('NFC', word.upper())
115
116
        meta_key = ''
117
        pos = 0
118
119
        # do some replacements for the modified version
120
        if self._modified:
121
            word = word.replace('MB', 'NB')
122
            word = word.replace('MP', 'NP')
123
            word = word.replace('BS', 'S')
124
            if word[:2] == 'PS':
125
                word = word[1:]
126 1
127
        # simple replacements
128 1
        word = word.replace('Á', 'A')
129
        word = word.replace('CH', 'X')
130 1
        word = word.replace('Ç', 'S')
131 1
        word = word.replace('É', 'E')
132
        word = word.replace('Í', 'I')
133
        word = word.replace('Ó', 'O')
134 1
        word = word.replace('Ú', 'U')
135 1
        word = word.replace('Ñ', 'NY')
136 1
        word = word.replace('GÜ', 'W')
137 1
        word = word.replace('Ü', 'U')
138 1
        word = word.replace('B', 'V')
139 1
        word = word.replace('LL', 'Y')
140
141
        while len(meta_key) < self._max_length:
142 1
            if pos >= len(word):
143 1
                break
144 1
145 1
            # get the next character
146 1
            current_char = word[pos]
147 1
148 1
            # if a vowel in pos 0, add to key
149 1
            if _is_vowel(pos) and pos == 0:
150 1
                meta_key += current_char
151 1
                pos += 1
152 1
            # otherwise, do consonant rules
153 1
            else:
154
                # simple consonants (unmutated)
155 1
                if current_char in {
156 1
                    'D',
157 1
                    'F',
158
                    'J',
159
                    'K',
160 1
                    'M',
161
                    'N',
162
                    'P',
163 1
                    'T',
164 1
                    'V',
165 1
                    'L',
166
                    'Y',
167
                }:
168
                    meta_key += current_char
169 1
                    # skip doubled consonants
170
                    if word[pos + 1 : pos + 2] == current_char:
171
                        pos += 2
172
                    else:
173
                        pos += 1
174
                else:
175
                    if current_char == 'C':
176
                        # special case 'acción', 'reacción',etc.
177
                        if word[pos + 1 : pos + 2] == 'C':
178
                            meta_key += 'X'
179
                            pos += 2
180
                        # special case 'cesar', 'cien', 'cid', 'conciencia'
181
                        elif word[pos + 1 : pos + 2] in {'E', 'I'}:
182 1
                            meta_key += 'Z'
183
                            pos += 2
184 1
                        # base case
185 1
                        else:
186
                            meta_key += 'K'
187 1
                            pos += 1
188
                    elif current_char == 'G':
189 1
                        # special case 'gente', 'ecologia',etc
190
                        if word[pos + 1 : pos + 2] in {'E', 'I'}:
191 1
                            meta_key += 'J'
192 1
                            pos += 2
193 1
                        # base case
194
                        else:
195 1
                            meta_key += 'G'
196 1
                            pos += 1
197 1
                    elif current_char == 'H':
198
                        # since the letter 'H' is silent in Spanish,
199
                        # set the meta key to the vowel after the letter 'H'
200 1
                        if _is_vowel(pos + 1):
201 1
                            meta_key += word[pos + 1]
202 1
                            pos += 2
203
                        else:
204 1
                            meta_key += 'H'
205 1
                            pos += 1
206 1
                    elif current_char == 'Q':
207
                        if word[pos + 1 : pos + 2] == 'U':
208
                            pos += 2
209 1
                        else:
210 1
                            pos += 1
211 1
                        meta_key += 'K'
212
                    elif current_char == 'W':
213
                        meta_key += 'U'
214 1
                        pos += 1
215 1
                    elif current_char == 'R':
216 1
                        meta_key += 'R'
217
                        pos += 1
218 1
                    elif current_char == 'S':
219 1
                        if not _is_vowel(pos + 1) and pos == 0:
220 1
                            meta_key += 'ES'
221 1
                            pos += 1
222 1
                        else:
223
                            meta_key += 'S'
224 1
                            pos += 1
225 1
                    elif current_char == 'Z':
226 1
                        meta_key += 'Z'
227 1
                        pos += 1
228 1
                    elif current_char == 'X':
229 1
                        if (
230 1
                            len(word) > 1
231 1
                            and pos == 0
232 1
                            and not _is_vowel(pos + 1)
233 1
                        ):
234 1
                            meta_key += 'EX'
235 1
                            pos += 1
236
                        else:
237 1
                            meta_key += 'X'
238 1
                            pos += 1
239 1
                    else:
240 1
                        pos += 1
241 1
242 1
        # Final change from S to Z in modified version
243 1
        if self._modified:
244
            meta_key = meta_key.replace('S', 'Z')
245
246
        return meta_key
247
248 1
249 1
if __name__ == '__main__':
250
    import doctest
251 1
252
    doctest.testmod()
253