abydos.phonetic._es - Code Metrics - Inspection of "78a222a9f7d8976f6744d263e3d6d01a2a991c27" - chrislit/abydos - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Branch — master (78a222)

by Chris

created 2018-10-26 11:30 UTC

abydos.phonetic._es A

↳ Parent: Project

Complexity

Total Complexity

Size/Duplication

Total Lines	274
Duplicated Lines	0 %

Test Coverage

Coverage

100%

Importance

Changes

Metric	Value
wmc	31
eloc	141
dl	0
loc	274
ccs	102
cts	102
cp	1
rs	9.92
c	0
b	0
f	0

2 Functions

Rating	Name	Duplication	Size	Complexity
F	spanish_metaphone()	0	161	29
A	phonetic_spanish()	0	68	2

# -*- coding: utf-8 -*-

# Copyright 2018 by Christopher C. Little.
# This file is part of Abydos.
#
# Abydos is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Abydos is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.

"""abydos.phonetic._es.

The phonetic._es module implements phonetic algorithms intended for Spanish,
including:

    - Phonetic Spanish
    - Spanish Metaphone
"""

from __future__ import unicode_literals

from unicodedata import normalize as unicode_normalize

from six import text_type

__all__ = ['phonetic_spanish', 'spanish_metaphone']


def phonetic_spanish(word, max_length=-1):
    """Return the PhoneticSpanish coding of word.

    This follows the coding described in :cite:`Amon:2012` and
    :cite:`delPilarAngeles:2015`.

    :param str word: the word to transform
    :param int max_length: the length of the code returned (defaults to
        unlimited)
    :returns: the PhoneticSpanish code
    :rtype: str

    >>> phonetic_spanish('Perez')
    '094'
    >>> phonetic_spanish('Martinez')
    '69364'
    >>> phonetic_spanish('Gutierrez')
    '83994'
    >>> phonetic_spanish('Santiago')
    '4638'
    >>> phonetic_spanish('Nicolás')
    '6454'
    """
    _es_soundex_translation = dict(
        zip((ord(_) for _ in 'BCDFGHJKLMNPQRSTVXYZ'), '14328287566079431454')

    )

    # uppercase, normalize, and decompose, filter to A-Z minus vowels & W
    word = unicode_normalize('NFKD', text_type(word.upper()))
    word = ''.join(
        c
        for c in word
        if c
        in {
            'B',
            'C',
            'D',
            'F',
            'G',
            'H',
            'J',
            'K',
            'L',
            'M',
            'N',
            'P',
            'Q',
            'R',
            'S',
            'T',
            'V',
            'X',
            'Y',
            'Z',
        }
    )

    # merge repeated Ls & Rs
    word = word.replace('LL', 'L')
    word = word.replace('R', 'R')

    # apply the Soundex algorithm
    sdx = word.translate(_es_soundex_translation)

    if max_length > 0:
        sdx = (sdx + ('0' * max_length))[:max_length]

    return sdx


def spanish_metaphone(word, max_length=6, modified=False):
    """Return the Spanish Metaphone of a word.

    This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
    https://github.com/amsqr/Spanish-Metaphone and discussed in
    :cite:`Mosquera:2012`.

    Modified version based on :cite:`delPilarAngeles:2016`.

    :param str word: the word to transform
    :param int max_length: the length of the code returned (defaults to 6)
    :param bool modified: Set to True to use del Pilar Angeles &
        Bailón-Miguel's modified version of the algorithm
    :returns: the Spanish Metaphone code
    :rtype: str

    >>> spanish_metaphone('Perez')
    'PRZ'
    >>> spanish_metaphone('Martinez')
    'MRTNZ'
    >>> spanish_metaphone('Gutierrez')
    'GTRRZ'
    >>> spanish_metaphone('Santiago')
    'SNTG'
    >>> spanish_metaphone('Nicolás')
    'NKLS'
    """

    def _is_vowel(pos):
        """Return True if the character at word[pos] is a vowel."""
        return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}

    word = unicode_normalize('NFC', text_type(word.upper()))

    meta_key = ''
    pos = 0

    # do some replacements for the modified version
    if modified:
        word = word.replace('MB', 'NB')
        word = word.replace('MP', 'NP')
        word = word.replace('BS', 'S')
        if word[:2] == 'PS':
            word = word[1:]

    # simple replacements
    word = word.replace('Á', 'A')
    word = word.replace('CH', 'X')
    word = word.replace('Ç', 'S')
    word = word.replace('É', 'E')
    word = word.replace('Í', 'I')
    word = word.replace('Ó', 'O')
    word = word.replace('Ú', 'U')
    word = word.replace('Ñ', 'NY')
    word = word.replace('GÜ', 'W')
    word = word.replace('Ü', 'U')
    word = word.replace('B', 'V')
    word = word.replace('LL', 'Y')

    while len(meta_key) < max_length:
        if pos >= len(word):
            break

        # get the next character
        current_char = word[pos]

        # if a vowel in pos 0, add to key
        if _is_vowel(pos) and pos == 0:
            meta_key += current_char
            pos += 1
        # otherwise, do consonant rules
        else:
            # simple consonants (unmutated)
            if current_char in {
                'D',

                'F',

                'J',

                'K',

                'M',

                'N',

                'P',

                'T',

                'V',

                'L',

                'Y',

            }:
                meta_key += current_char
                # skip doubled consonants
                if word[pos + 1 : pos + 2] == current_char:
                    pos += 2
                else:
                    pos += 1
            else:
                if current_char == 'C':
                    # special case 'acción', 'reacción',etc.
                    if word[pos + 1 : pos + 2] == 'C':
                        meta_key += 'X'
                        pos += 2
                    # special case 'cesar', 'cien', 'cid', 'conciencia'
                    elif word[pos + 1 : pos + 2] in {'E', 'I'}:
                        meta_key += 'Z'
                        pos += 2
                    # base case
                    else:
                        meta_key += 'K'
                        pos += 1
                elif current_char == 'G':
                    # special case 'gente', 'ecologia',etc
                    if word[pos + 1 : pos + 2] in {'E', 'I'}:
                        meta_key += 'J'
                        pos += 2
                    # base case
                    else:
                        meta_key += 'G'
                        pos += 1
                elif current_char == 'H':
                    # since the letter 'H' is silent in Spanish,
                    # set the meta key to the vowel after the letter 'H'
                    if _is_vowel(pos + 1):
                        meta_key += word[pos + 1]
                        pos += 2
                    else:
                        meta_key += 'H'
                        pos += 1
                elif current_char == 'Q':
                    if word[pos + 1 : pos + 2] == 'U':
                        pos += 2
                    else:
                        pos += 1
                    meta_key += 'K'
                elif current_char == 'W':
                    meta_key += 'U'
                    pos += 1
                elif current_char == 'R':
                    meta_key += 'R'
                    pos += 1
                elif current_char == 'S':
                    if not _is_vowel(pos + 1) and pos == 0:
                        meta_key += 'ES'
                        pos += 1
                    else:
                        meta_key += 'S'
                        pos += 1
                elif current_char == 'Z':
                    meta_key += 'Z'
                    pos += 1
                elif current_char == 'X':
                    if len(word) > 1 and pos == 0 and not _is_vowel(pos + 1):
                        meta_key += 'EX'
                        pos += 1
                    else:
                        meta_key += 'X'
                        pos += 1
                else:
                    pos += 1

    # Final change from S to Z in modified version
    if modified:
        meta_key = meta_key.replace('S', 'Z')

    return meta_key


if __name__ == '__main__':
    import doctest

    doctest.testmod()


1		# -- coding: utf-8 --
2
3		# Copyright 2018 by Christopher C. Little.
4		# This file is part of Abydos.
5		#
6		# Abydos is free software: you can redistribute it and/or modify
7		# it under the terms of the GNU General Public License as published by
8		# the Free Software Foundation, either version 3 of the License, or
9		# (at your option) any later version.
10		#
11		# Abydos is distributed in the hope that it will be useful,
12		# but WITHOUT ANY WARRANTY; without even the implied warranty of
13		# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14		# GNU General Public License for more details.
15		#
16		# You should have received a copy of the GNU General Public License
17		# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19	1	"""abydos.phonetic._es.
20
21		The phonetic._es module implements phonetic algorithms intended for Spanish,
22		including:
23
24		- Phonetic Spanish
25		- Spanish Metaphone
26		"""
27
28	1	from __future__ import unicode_literals
29
30	1	from unicodedata import normalize as unicode_normalize
31
32	1	from six import text_type
33
34	1	__all__ = ['phonetic_spanish', 'spanish_metaphone']
35
36
37	1	def phonetic_spanish(word, max_length=-1):
38		"""Return the PhoneticSpanish coding of word.
39
40		This follows the coding described in :cite:`Amon:2012` and
41		:cite:`delPilarAngeles:2015`.
42
43		:param str word: the word to transform
44		:param int max_length: the length of the code returned (defaults to
45		unlimited)
46		:returns: the PhoneticSpanish code
47		:rtype: str
48
49		>>> phonetic_spanish('Perez')
50		'094'
51		>>> phonetic_spanish('Martinez')
52		'69364'
53		>>> phonetic_spanish('Gutierrez')
54		'83994'
55		>>> phonetic_spanish('Santiago')
56		'4638'
57		>>> phonetic_spanish('Nicolás')
58		'6454'
59		"""
60	1	_es_soundex_translation = dict(
61		zip((ord(_) for _ in 'BCDFGHJKLMNPQRSTVXYZ'), '14328287566079431454')
		0 ignored issues – show Comprehensibility Best Practice introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report The variable `_` does not seem to be defined. Loading history...
62		)
63
64		# uppercase, normalize, and decompose, filter to A-Z minus vowels & W
65	1	word = unicode_normalize('NFKD', text_type(word.upper()))
66	1	word = ''.join(
67		c
68		for c in word
69		if c
70		in {
71		'B',
72		'C',
73		'D',
74		'F',
75		'G',
76		'H',
77		'J',
78		'K',
79		'L',
80		'M',
81		'N',
82		'P',
83		'Q',
84		'R',
85		'S',
86		'T',
87		'V',
88		'X',
89		'Y',
90		'Z',
91		}
92		)
93
94		# merge repeated Ls & Rs
95	1	word = word.replace('LL', 'L')
96	1	word = word.replace('R', 'R')
97
98		# apply the Soundex algorithm
99	1	sdx = word.translate(_es_soundex_translation)
100
101	1	if max_length > 0:
102	1	sdx = (sdx + ('0' * max_length))[:max_length]
103
104	1	return sdx
105
106
107	1	def spanish_metaphone(word, max_length=6, modified=False):
108		"""Return the Spanish Metaphone of a word.
109
110		This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
111		https://github.com/amsqr/Spanish-Metaphone and discussed in
112		:cite:`Mosquera:2012`.
113
114		Modified version based on :cite:`delPilarAngeles:2016`.
115
116		:param str word: the word to transform
117		:param int max_length: the length of the code returned (defaults to 6)
118		:param bool modified: Set to True to use del Pilar Angeles &
119		Bailón-Miguel's modified version of the algorithm
120		:returns: the Spanish Metaphone code
121		:rtype: str
122
123		>>> spanish_metaphone('Perez')
124		'PRZ'
125		>>> spanish_metaphone('Martinez')
126		'MRTNZ'
127		>>> spanish_metaphone('Gutierrez')
128		'GTRRZ'
129		>>> spanish_metaphone('Santiago')
130		'SNTG'
131		>>> spanish_metaphone('Nicolás')
132		'NKLS'
133		"""
134
135	1	def _is_vowel(pos):
136		"""Return True if the character at word[pos] is a vowel."""
137	1	return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}
138
139	1	word = unicode_normalize('NFC', text_type(word.upper()))
140
141	1	meta_key = ''
142	1	pos = 0
143
144		# do some replacements for the modified version
145	1	if modified:
146	1	word = word.replace('MB', 'NB')
147	1	word = word.replace('MP', 'NP')
148	1	word = word.replace('BS', 'S')
149	1	if word[:2] == 'PS':
150	1	word = word[1:]
151
152		# simple replacements
153	1	word = word.replace('Á', 'A')
154	1	word = word.replace('CH', 'X')
155	1	word = word.replace('Ç', 'S')
156	1	word = word.replace('É', 'E')
157	1	word = word.replace('Í', 'I')
158	1	word = word.replace('Ó', 'O')
159	1	word = word.replace('Ú', 'U')
160	1	word = word.replace('Ñ', 'NY')
161	1	word = word.replace('GÜ', 'W')
162	1	word = word.replace('Ü', 'U')
163	1	word = word.replace('B', 'V')
164	1	word = word.replace('LL', 'Y')
165
166	1	while len(meta_key) < max_length:
167	1	if pos >= len(word):
168	1	break
169
170		# get the next character
171	1	current_char = word[pos]
172
173		# if a vowel in pos 0, add to key
174	1	if _is_vowel(pos) and pos == 0:
175	1	meta_key += current_char
176	1	pos += 1
177		# otherwise, do consonant rules
178		else:
179		# simple consonants (unmutated)
180	1	if current_char in {
181		'D',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
182		'F',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
183		'J',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
184		'K',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
185		'M',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
186		'N',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
187		'P',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
188		'T',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
189		'V',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
190		'L',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
191		'Y',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
192		}:
193	1	meta_key += current_char
194		# skip doubled consonants
195	1	if word[pos + 1 : pos + 2] == current_char:
196	1	pos += 2
197		else:
198	1	pos += 1
199		else:
200	1	if current_char == 'C':
201		# special case 'acción', 'reacción',etc.
202	1	if word[pos + 1 : pos + 2] == 'C':
203	1	meta_key += 'X'
204	1	pos += 2
205		# special case 'cesar', 'cien', 'cid', 'conciencia'
206	1	elif word[pos + 1 : pos + 2] in {'E', 'I'}:
207	1	meta_key += 'Z'
208	1	pos += 2
209		# base case
210		else:
211	1	meta_key += 'K'
212	1	pos += 1
213	1	elif current_char == 'G':
214		# special case 'gente', 'ecologia',etc
215	1	if word[pos + 1 : pos + 2] in {'E', 'I'}:
216	1	meta_key += 'J'
217	1	pos += 2
218		# base case
219		else:
220	1	meta_key += 'G'
221	1	pos += 1
222	1	elif current_char == 'H':
223		# since the letter 'H' is silent in Spanish,
224		# set the meta key to the vowel after the letter 'H'
225	1	if _is_vowel(pos + 1):
226	1	meta_key += word[pos + 1]
227	1	pos += 2
228		else:
229	1	meta_key += 'H'
230	1	pos += 1
231	1	elif current_char == 'Q':
232	1	if word[pos + 1 : pos + 2] == 'U':
233	1	pos += 2
234		else:
235	1	pos += 1
236	1	meta_key += 'K'
237	1	elif current_char == 'W':
238	1	meta_key += 'U'
239	1	pos += 1
240	1	elif current_char == 'R':
241	1	meta_key += 'R'
242	1	pos += 1
243	1	elif current_char == 'S':
244	1	if not _is_vowel(pos + 1) and pos == 0:
245	1	meta_key += 'ES'
246	1	pos += 1
247		else:
248	1	meta_key += 'S'
249	1	pos += 1
250	1	elif current_char == 'Z':
251	1	meta_key += 'Z'
252	1	pos += 1
253	1	elif current_char == 'X':
254	1	if len(word) > 1 and pos == 0 and not _is_vowel(pos + 1):
255	1	meta_key += 'EX'
256	1	pos += 1
257		else:
258	1	meta_key += 'X'
259	1	pos += 1
260		else:
261	1	pos += 1
262
263		# Final change from S to Z in modified version
264	1	if modified:
265	1	meta_key = meta_key.replace('S', 'Z')
266
267	1	return meta_key
268
269
270		if __name__ == '__main__':
271		import doctest
272
273		doctest.testmod()
274

chrislit / abydos

Branch — master (78a222)

abydos.phonetic._es A

Complexity

Size/Duplication

Test Coverage

Importance

2 Functions

Duplication Side-by-Side

Filter issues like