Completed
Branch master (78a222)
by Chris
14:36
created

abydos.phonetic._es.spanish_metaphone()   F

Complexity

Conditions 29

Size

Total Lines 161
Code Lines 98

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 87
CRAP Score 29

Importance

Changes 0
Metric Value
eloc 98
dl 0
loc 161
ccs 87
cts 87
cp 1
rs 0
c 0
b 0
f 0
cc 29
nop 3
crap 29

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._es.spanish_metaphone() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._es.
20
21
The phonetic._es module implements phonetic algorithms intended for Spanish,
22
including:
23
24
    - Phonetic Spanish
25
    - Spanish Metaphone
26
"""
27
28 1
from __future__ import unicode_literals
29
30 1
from unicodedata import normalize as unicode_normalize
31
32 1
from six import text_type
33
34 1
__all__ = ['phonetic_spanish', 'spanish_metaphone']
35
36
37 1
def phonetic_spanish(word, max_length=-1):
38
    """Return the PhoneticSpanish coding of word.
39
40
    This follows the coding described in :cite:`Amon:2012` and
41
    :cite:`delPilarAngeles:2015`.
42
43
    :param str word: the word to transform
44
    :param int max_length: the length of the code returned (defaults to
45
        unlimited)
46
    :returns: the PhoneticSpanish code
47
    :rtype: str
48
49
    >>> phonetic_spanish('Perez')
50
    '094'
51
    >>> phonetic_spanish('Martinez')
52
    '69364'
53
    >>> phonetic_spanish('Gutierrez')
54
    '83994'
55
    >>> phonetic_spanish('Santiago')
56
    '4638'
57
    >>> phonetic_spanish('Nicolás')
58
    '6454'
59
    """
60 1
    _es_soundex_translation = dict(
61
        zip((ord(_) for _ in 'BCDFGHJKLMNPQRSTVXYZ'), '14328287566079431454')
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _ does not seem to be defined.
Loading history...
62
    )
63
64
    # uppercase, normalize, and decompose, filter to A-Z minus vowels & W
65 1
    word = unicode_normalize('NFKD', text_type(word.upper()))
66 1
    word = ''.join(
67
        c
68
        for c in word
69
        if c
70
        in {
71
            'B',
72
            'C',
73
            'D',
74
            'F',
75
            'G',
76
            'H',
77
            'J',
78
            'K',
79
            'L',
80
            'M',
81
            'N',
82
            'P',
83
            'Q',
84
            'R',
85
            'S',
86
            'T',
87
            'V',
88
            'X',
89
            'Y',
90
            'Z',
91
        }
92
    )
93
94
    # merge repeated Ls & Rs
95 1
    word = word.replace('LL', 'L')
96 1
    word = word.replace('R', 'R')
97
98
    # apply the Soundex algorithm
99 1
    sdx = word.translate(_es_soundex_translation)
100
101 1
    if max_length > 0:
102 1
        sdx = (sdx + ('0' * max_length))[:max_length]
103
104 1
    return sdx
105
106
107 1
def spanish_metaphone(word, max_length=6, modified=False):
108
    """Return the Spanish Metaphone of a word.
109
110
    This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
111
    https://github.com/amsqr/Spanish-Metaphone and discussed in
112
    :cite:`Mosquera:2012`.
113
114
    Modified version based on :cite:`delPilarAngeles:2016`.
115
116
    :param str word: the word to transform
117
    :param int max_length: the length of the code returned (defaults to 6)
118
    :param bool modified: Set to True to use del Pilar Angeles &
119
        Bailón-Miguel's modified version of the algorithm
120
    :returns: the Spanish Metaphone code
121
    :rtype: str
122
123
    >>> spanish_metaphone('Perez')
124
    'PRZ'
125
    >>> spanish_metaphone('Martinez')
126
    'MRTNZ'
127
    >>> spanish_metaphone('Gutierrez')
128
    'GTRRZ'
129
    >>> spanish_metaphone('Santiago')
130
    'SNTG'
131
    >>> spanish_metaphone('Nicolás')
132
    'NKLS'
133
    """
134
135 1
    def _is_vowel(pos):
136
        """Return True if the character at word[pos] is a vowel."""
137 1
        return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}
138
139 1
    word = unicode_normalize('NFC', text_type(word.upper()))
140
141 1
    meta_key = ''
142 1
    pos = 0
143
144
    # do some replacements for the modified version
145 1
    if modified:
146 1
        word = word.replace('MB', 'NB')
147 1
        word = word.replace('MP', 'NP')
148 1
        word = word.replace('BS', 'S')
149 1
        if word[:2] == 'PS':
150 1
            word = word[1:]
151
152
    # simple replacements
153 1
    word = word.replace('Á', 'A')
154 1
    word = word.replace('CH', 'X')
155 1
    word = word.replace('Ç', 'S')
156 1
    word = word.replace('É', 'E')
157 1
    word = word.replace('Í', 'I')
158 1
    word = word.replace('Ó', 'O')
159 1
    word = word.replace('Ú', 'U')
160 1
    word = word.replace('Ñ', 'NY')
161 1
    word = word.replace('GÜ', 'W')
162 1
    word = word.replace('Ü', 'U')
163 1
    word = word.replace('B', 'V')
164 1
    word = word.replace('LL', 'Y')
165
166 1
    while len(meta_key) < max_length:
167 1
        if pos >= len(word):
168 1
            break
169
170
        # get the next character
171 1
        current_char = word[pos]
172
173
        # if a vowel in pos 0, add to key
174 1
        if _is_vowel(pos) and pos == 0:
175 1
            meta_key += current_char
176 1
            pos += 1
177
        # otherwise, do consonant rules
178
        else:
179
            # simple consonants (unmutated)
180 1
            if current_char in {
181
                'D',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
182
                'F',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
183
                'J',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
184
                'K',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
185
                'M',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
186
                'N',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
187
                'P',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
188
                'T',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
189
                'V',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
190
                'L',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
191
                'Y',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
192
            }:
193 1
                meta_key += current_char
194
                # skip doubled consonants
195 1
                if word[pos + 1 : pos + 2] == current_char:
196 1
                    pos += 2
197
                else:
198 1
                    pos += 1
199
            else:
200 1
                if current_char == 'C':
201
                    # special case 'acción', 'reacción',etc.
202 1
                    if word[pos + 1 : pos + 2] == 'C':
203 1
                        meta_key += 'X'
204 1
                        pos += 2
205
                    # special case 'cesar', 'cien', 'cid', 'conciencia'
206 1
                    elif word[pos + 1 : pos + 2] in {'E', 'I'}:
207 1
                        meta_key += 'Z'
208 1
                        pos += 2
209
                    # base case
210
                    else:
211 1
                        meta_key += 'K'
212 1
                        pos += 1
213 1
                elif current_char == 'G':
214
                    # special case 'gente', 'ecologia',etc
215 1
                    if word[pos + 1 : pos + 2] in {'E', 'I'}:
216 1
                        meta_key += 'J'
217 1
                        pos += 2
218
                    # base case
219
                    else:
220 1
                        meta_key += 'G'
221 1
                        pos += 1
222 1
                elif current_char == 'H':
223
                    # since the letter 'H' is silent in Spanish,
224
                    # set the meta key to the vowel after the letter 'H'
225 1
                    if _is_vowel(pos + 1):
226 1
                        meta_key += word[pos + 1]
227 1
                        pos += 2
228
                    else:
229 1
                        meta_key += 'H'
230 1
                        pos += 1
231 1
                elif current_char == 'Q':
232 1
                    if word[pos + 1 : pos + 2] == 'U':
233 1
                        pos += 2
234
                    else:
235 1
                        pos += 1
236 1
                    meta_key += 'K'
237 1
                elif current_char == 'W':
238 1
                    meta_key += 'U'
239 1
                    pos += 1
240 1
                elif current_char == 'R':
241 1
                    meta_key += 'R'
242 1
                    pos += 1
243 1
                elif current_char == 'S':
244 1
                    if not _is_vowel(pos + 1) and pos == 0:
245 1
                        meta_key += 'ES'
246 1
                        pos += 1
247
                    else:
248 1
                        meta_key += 'S'
249 1
                        pos += 1
250 1
                elif current_char == 'Z':
251 1
                    meta_key += 'Z'
252 1
                    pos += 1
253 1
                elif current_char == 'X':
254 1
                    if len(word) > 1 and pos == 0 and not _is_vowel(pos + 1):
255 1
                        meta_key += 'EX'
256 1
                        pos += 1
257
                    else:
258 1
                        meta_key += 'X'
259 1
                        pos += 1
260
                else:
261 1
                    pos += 1
262
263
    # Final change from S to Z in modified version
264 1
    if modified:
265 1
        meta_key = meta_key.replace('S', 'Z')
266
267 1
    return meta_key
268
269
270
if __name__ == '__main__':
271
    import doctest
272
273
    doctest.testmod()
274