abydos.phonetic._spanish_metaphone.SpanishMetaphone.encode() - Code Metrics - Inspection of "0.3.6" - chrislit/abydos - Measure and Improve Code Quality continuously with Scrutinizer

Completed

Pull Request — master (#141)

by Chris

created 2018-11-10 03:25 UTC

SpanishMetaphone.encode() F

↳ Parent: abydos.phonetic._spanish_metaphone

Complexity

Conditions

Size

Total Lines	172
Code Lines	101

Duplication

Lines	0
Ratio	0 %

Code Coverage

Tests	87
CRAP Score	29

Importance

Changes

Metric	Value
cc	29
eloc	101
nop	4
dl	0
loc	172
ccs	87
cts	87
cp	1
crap	29
rs	0
c	0
b	0
f	0

How to fix Long Method Complexity

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

If many parameters/temporary variables are present:
- Replace temporary variables with Query
- Introduce parameter object; often combined with preserve whole object
- If the above two are insufficient: Replace method with method object
If you have long conditionals: Decompose Conditional
Otherwise: Extract method

Complexity

Complex classes like abydos.phonetic._spanish_metaphone.SpanishMetaphone.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

# -*- coding: utf-8 -*-

# Copyright 2018 by Christopher C. Little.
# This file is part of Abydos.
#
# Abydos is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Abydos is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.

"""abydos.phonetic._spanish_metaphone.

Spanish Metaphone
"""

from __future__ import (
    absolute_import,
    division,
    print_function,
    unicode_literals,
)

from unicodedata import normalize as unicode_normalize

from six import text_type

from ._phonetic import Phonetic

__all__ = ['SpanishMetaphone', 'spanish_metaphone']


class SpanishMetaphone(Phonetic):

    """Spanish Metaphone.

    This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
    https://github.com/amsqr/Spanish-Metaphone and discussed in
    :cite:`Mosquera:2012`.

    Modified version based on :cite:`delPilarAngeles:2016`.
    """

    def encode(self, word, max_length=6, modified=False):

        """Return the Spanish Metaphone of a word.

        Args:
            word (str): The word to transform
            max_length (int): The length of the code returned (defaults to 6)
            modified (bool): Set to True to use del Pilar Angeles &
                Bailón-Miguel's modified version of the algorithm

        Returns:
            str: The Spanish Metaphone code

        Examples:
            >>> pe = SpanishMetaphone()
            >>> pe.encode('Perez')
            'PRZ'
            >>> pe.encode('Martinez')
            'MRTNZ'
            >>> pe.encode('Gutierrez')
            'GTRRZ'
            >>> pe.encode('Santiago')
            'SNTG'
            >>> pe.encode('Nicolás')
            'NKLS'

        """

        def _is_vowel(pos):
            """Return True if the character at word[pos] is a vowel.

            Args:
                pos (int): Position to check for a vowel

            Returns:
                bool: True if word[pos] is a vowel

            """
            return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}

        word = unicode_normalize('NFC', text_type(word.upper()))

        meta_key = ''
        pos = 0

        # do some replacements for the modified version
        if modified:
            word = word.replace('MB', 'NB')
            word = word.replace('MP', 'NP')
            word = word.replace('BS', 'S')
            if word[:2] == 'PS':
                word = word[1:]

        # simple replacements
        word = word.replace('Á', 'A')
        word = word.replace('CH', 'X')
        word = word.replace('Ç', 'S')
        word = word.replace('É', 'E')
        word = word.replace('Í', 'I')
        word = word.replace('Ó', 'O')
        word = word.replace('Ú', 'U')
        word = word.replace('Ñ', 'NY')
        word = word.replace('GÜ', 'W')
        word = word.replace('Ü', 'U')
        word = word.replace('B', 'V')
        word = word.replace('LL', 'Y')

        while len(meta_key) < max_length:
            if pos >= len(word):
                break

            # get the next character
            current_char = word[pos]

            # if a vowel in pos 0, add to key
            if _is_vowel(pos) and pos == 0:
                meta_key += current_char
                pos += 1
            # otherwise, do consonant rules
            else:
                # simple consonants (unmutated)
                if current_char in {
                    'D',

                    'F',

                    'J',

                    'K',

                    'M',

                    'N',

                    'P',

                    'T',

                    'V',

                    'L',

                    'Y',

                }:
                    meta_key += current_char
                    # skip doubled consonants
                    if word[pos + 1 : pos + 2] == current_char:
                        pos += 2
                    else:
                        pos += 1
                else:
                    if current_char == 'C':
                        # special case 'acción', 'reacción',etc.
                        if word[pos + 1 : pos + 2] == 'C':
                            meta_key += 'X'
                            pos += 2
                        # special case 'cesar', 'cien', 'cid', 'conciencia'
                        elif word[pos + 1 : pos + 2] in {'E', 'I'}:
                            meta_key += 'Z'
                            pos += 2
                        # base case
                        else:
                            meta_key += 'K'
                            pos += 1
                    elif current_char == 'G':
                        # special case 'gente', 'ecologia',etc
                        if word[pos + 1 : pos + 2] in {'E', 'I'}:
                            meta_key += 'J'
                            pos += 2
                        # base case
                        else:
                            meta_key += 'G'
                            pos += 1
                    elif current_char == 'H':
                        # since the letter 'H' is silent in Spanish,
                        # set the meta key to the vowel after the letter 'H'
                        if _is_vowel(pos + 1):
                            meta_key += word[pos + 1]
                            pos += 2
                        else:
                            meta_key += 'H'
                            pos += 1
                    elif current_char == 'Q':
                        if word[pos + 1 : pos + 2] == 'U':
                            pos += 2
                        else:
                            pos += 1
                        meta_key += 'K'
                    elif current_char == 'W':
                        meta_key += 'U'
                        pos += 1
                    elif current_char == 'R':
                        meta_key += 'R'
                        pos += 1
                    elif current_char == 'S':
                        if not _is_vowel(pos + 1) and pos == 0:
                            meta_key += 'ES'
                            pos += 1
                        else:
                            meta_key += 'S'
                            pos += 1
                    elif current_char == 'Z':
                        meta_key += 'Z'
                        pos += 1
                    elif current_char == 'X':
                        if (
                            len(word) > 1

                            and pos == 0

                            and not _is_vowel(pos + 1)

                        ):
                            meta_key += 'EX'
                            pos += 1
                        else:
                            meta_key += 'X'
                            pos += 1
                    else:
                        pos += 1

        # Final change from S to Z in modified version
        if modified:
            meta_key = meta_key.replace('S', 'Z')

        return meta_key


def spanish_metaphone(word, max_length=6, modified=False):
    """Return the Spanish Metaphone of a word.

    This is a wrapper for :py:meth:`SpanishMetaphone.encode`.

    Args:
        word (str): The word to transform
        max_length (int): The length of the code returned (defaults to 6)
        modified (bool): Set to True to use del Pilar Angeles &
            Bailón-Miguel's modified version of the algorithm

    Returns:
        str: The Spanish Metaphone code

    Examples:
        >>> spanish_metaphone('Perez')
        'PRZ'
        >>> spanish_metaphone('Martinez')
        'MRTNZ'
        >>> spanish_metaphone('Gutierrez')
        'GTRRZ'
        >>> spanish_metaphone('Santiago')
        'SNTG'
        >>> spanish_metaphone('Nicolás')
        'NKLS'

    """
    return SpanishMetaphone().encode(word, max_length, modified)


if __name__ == '__main__':
    import doctest

    doctest.testmod()


1		# -- coding: utf-8 --
2
3		# Copyright 2018 by Christopher C. Little.
4		# This file is part of Abydos.
5		#
6		# Abydos is free software: you can redistribute it and/or modify
7		# it under the terms of the GNU General Public License as published by
8		# the Free Software Foundation, either version 3 of the License, or
9		# (at your option) any later version.
10		#
11		# Abydos is distributed in the hope that it will be useful,
12		# but WITHOUT ANY WARRANTY; without even the implied warranty of
13		# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14		# GNU General Public License for more details.
15		#
16		# You should have received a copy of the GNU General Public License
17		# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19	1	"""abydos.phonetic._spanish_metaphone.
20
21		Spanish Metaphone
22		"""
23
24	1	from __future__ import (
25		absolute_import,
26		division,
27		print_function,
28		unicode_literals,
29		)
30
31	1	from unicodedata import normalize as unicode_normalize
32
33	1	from six import text_type
34
35	1	from ._phonetic import Phonetic
36
37	1	__all__ = ['SpanishMetaphone', 'spanish_metaphone']
38
39
40	1	class SpanishMetaphone(Phonetic):
		0 ignored issues – show Unused Code introduced 2018-11-04 08:02 UTC by Report Bug Copy Issue Report The variable `__class__` seems to be unused. Loading history...
41		"""Spanish Metaphone.
42
43		This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
44		https://github.com/amsqr/Spanish-Metaphone and discussed in
45		:cite:`Mosquera:2012`.
46
47		Modified version based on :cite:`delPilarAngeles:2016`.
48		"""
49
50	1	def encode(self, word, max_length=6, modified=False):
		0 ignored issues – show Bug introduced 2018-11-04 08:02 UTC by Report Bug Copy Issue Report Parameters differ from overridden 'encode' method Loading history...
51		"""Return the Spanish Metaphone of a word.
52
53		Args:
54		word (str): The word to transform
55		max_length (int): The length of the code returned (defaults to 6)
56		modified (bool): Set to True to use del Pilar Angeles &
57		Bailón-Miguel's modified version of the algorithm
58
59		Returns:
60		str: The Spanish Metaphone code
61
62		Examples:
63		>>> pe = SpanishMetaphone()
64		>>> pe.encode('Perez')
65		'PRZ'
66		>>> pe.encode('Martinez')
67		'MRTNZ'
68		>>> pe.encode('Gutierrez')
69		'GTRRZ'
70		>>> pe.encode('Santiago')
71		'SNTG'
72		>>> pe.encode('Nicolás')
73		'NKLS'
74
75		"""
76
77	1	def _is_vowel(pos):
78		"""Return True if the character at word[pos] is a vowel.
79
80		Args:
81		pos (int): Position to check for a vowel
82
83		Returns:
84		bool: True if word[pos] is a vowel
85
86		"""
87	1	return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}
88
89	1	word = unicode_normalize('NFC', text_type(word.upper()))
90
91	1	meta_key = ''
92	1	pos = 0
93
94		# do some replacements for the modified version
95	1	if modified:
96	1	word = word.replace('MB', 'NB')
97	1	word = word.replace('MP', 'NP')
98	1	word = word.replace('BS', 'S')
99	1	if word[:2] == 'PS':
100	1	word = word[1:]
101
102		# simple replacements
103	1	word = word.replace('Á', 'A')
104	1	word = word.replace('CH', 'X')
105	1	word = word.replace('Ç', 'S')
106	1	word = word.replace('É', 'E')
107	1	word = word.replace('Í', 'I')
108	1	word = word.replace('Ó', 'O')
109	1	word = word.replace('Ú', 'U')
110	1	word = word.replace('Ñ', 'NY')
111	1	word = word.replace('GÜ', 'W')
112	1	word = word.replace('Ü', 'U')
113	1	word = word.replace('B', 'V')
114	1	word = word.replace('LL', 'Y')
115
116	1	while len(meta_key) < max_length:
117	1	if pos >= len(word):
118	1	break
119
120		# get the next character
121	1	current_char = word[pos]
122
123		# if a vowel in pos 0, add to key
124	1	if _is_vowel(pos) and pos == 0:
125	1	meta_key += current_char
126	1	pos += 1
127		# otherwise, do consonant rules
128		else:
129		# simple consonants (unmutated)
130	1	if current_char in {
131		'D',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
132		'F',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
133		'J',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
134		'K',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
135		'M',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
136		'N',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
137		'P',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
138		'T',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
139		'V',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
140		'L',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
141		'Y',
		0 ignored issues – show Coding Style introduced 2018-10-24 06:00 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
142		}:
143	1	meta_key += current_char
144		# skip doubled consonants
145	1	if word[pos + 1 : pos + 2] == current_char:
146	1	pos += 2
147		else:
148	1	pos += 1
149		else:
150	1	if current_char == 'C':
151		# special case 'acción', 'reacción',etc.
152	1	if word[pos + 1 : pos + 2] == 'C':
153	1	meta_key += 'X'
154	1	pos += 2
155		# special case 'cesar', 'cien', 'cid', 'conciencia'
156	1	elif word[pos + 1 : pos + 2] in {'E', 'I'}:
157	1	meta_key += 'Z'
158	1	pos += 2
159		# base case
160		else:
161	1	meta_key += 'K'
162	1	pos += 1
163	1	elif current_char == 'G':
164		# special case 'gente', 'ecologia',etc
165	1	if word[pos + 1 : pos + 2] in {'E', 'I'}:
166	1	meta_key += 'J'
167	1	pos += 2
168		# base case
169		else:
170	1	meta_key += 'G'
171	1	pos += 1
172	1	elif current_char == 'H':
173		# since the letter 'H' is silent in Spanish,
174		# set the meta key to the vowel after the letter 'H'
175	1	if _is_vowel(pos + 1):
176	1	meta_key += word[pos + 1]
177	1	pos += 2
178		else:
179	1	meta_key += 'H'
180	1	pos += 1
181	1	elif current_char == 'Q':
182	1	if word[pos + 1 : pos + 2] == 'U':
183	1	pos += 2
184		else:
185	1	pos += 1
186	1	meta_key += 'K'
187	1	elif current_char == 'W':
188	1	meta_key += 'U'
189	1	pos += 1
190	1	elif current_char == 'R':
191	1	meta_key += 'R'
192	1	pos += 1
193	1	elif current_char == 'S':
194	1	if not _is_vowel(pos + 1) and pos == 0:
195	1	meta_key += 'ES'
196	1	pos += 1
197		else:
198	1	meta_key += 'S'
199	1	pos += 1
200	1	elif current_char == 'Z':
201	1	meta_key += 'Z'
202	1	pos += 1
203	1	elif current_char == 'X':
204	1	if (
205		len(word) > 1
		0 ignored issues – show Coding Style introduced 2018-11-04 08:02 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
206		and pos == 0
		0 ignored issues – show Coding Style introduced 2018-11-04 08:02 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
207		and not _is_vowel(pos + 1)
		0 ignored issues – show Coding Style introduced 2018-11-04 08:02 UTC by Report Bug Copy Issue Report Wrong hanging indentation before block (add 4 spaces). Loading history...
208		):
209	1	meta_key += 'EX'
210	1	pos += 1
211		else:
212	1	meta_key += 'X'
213	1	pos += 1
214		else:
215	1	pos += 1
216
217		# Final change from S to Z in modified version
218	1	if modified:
219	1	meta_key = meta_key.replace('S', 'Z')
220
221	1	return meta_key
222
223
224	1	def spanish_metaphone(word, max_length=6, modified=False):
225		"""Return the Spanish Metaphone of a word.
226
227		This is a wrapper for :py:meth:`SpanishMetaphone.encode`.
228
229		Args:
230		word (str): The word to transform
231		max_length (int): The length of the code returned (defaults to 6)
232		modified (bool): Set to True to use del Pilar Angeles &
233		Bailón-Miguel's modified version of the algorithm
234
235		Returns:
236		str: The Spanish Metaphone code
237
238		Examples:
239		>>> spanish_metaphone('Perez')
240		'PRZ'
241		>>> spanish_metaphone('Martinez')
242		'MRTNZ'
243		>>> spanish_metaphone('Gutierrez')
244		'GTRRZ'
245		>>> spanish_metaphone('Santiago')
246		'SNTG'
247		>>> spanish_metaphone('Nicolás')
248		'NKLS'
249
250		"""
251	1	return SpanishMetaphone().encode(word, max_length, modified)
252
253
254		if __name__ == '__main__':
255		import doctest
256
257		doctest.testmod()
258

chrislit / abydos

Pull Request — master (#141)

SpanishMetaphone.encode() F

Complexity

Size

Duplication

Code Coverage

Importance

How to fix Long Method Complexity

Long Method

Complexity

Duplication Side-by-Side

Filter issues like