abydos.phonetic._spanish_metaphone.SpanishMetaphone.encode() - Code Metrics - chrislit/abydos - Measure and Improve Code Quality continuously with Scrutinizer

SpanishMetaphone.encode() F
last analyzed 2020-12-31 20:10 UTC

↳ Parent: abydos.phonetic._spanish_metaphone

Complexity

Conditions

Size

Total Lines	187
Code Lines	101

Duplication

Lines	0
Ratio	0 %

Code Coverage

Tests	81
CRAP Score	29

Importance

Changes

Metric	Value
eloc	101
dl	0
loc	187
ccs	81
cts	81
cp	1
rs	0
c	0
b	0
f	0
cc	29
nop	2
crap	29

How to fix Long Method Complexity

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

If many parameters/temporary variables are present:
- Replace temporary variables with Query
- Introduce parameter object; often combined with preserve whole object
- If the above two are insufficient: Replace method with method object
If you have long conditionals: Decompose Conditional
Otherwise: Extract method

Complexity

Complex classes like abydos.phonetic._spanish_metaphone.SpanishMetaphone.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

# Copyright 2018-2020 by Christopher C. Little.
# This file is part of Abydos.
#
# Abydos is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Abydos is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.

"""abydos.phonetic._spanish_metaphone.

Spanish Metaphone
"""

from unicodedata import normalize as unicode_normalize

from ._phonetic import _Phonetic

__all__ = ['SpanishMetaphone']


class SpanishMetaphone(_Phonetic):
    """Spanish Metaphone.

    This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
    https://github.com/amsqr/Spanish-Metaphone and discussed in
    :cite:`Mosquera:2012`.

    Modified version based on :cite:`delPilarAngeles:2016`.


    .. versionadded:: 0.3.6
    """

    def __init__(self, max_length: int = 6, modified: bool = False) -> None:
        """Initialize AlphaSIS instance.

        Parameters
        ----------
        max_length : int
            The length of the code returned (defaults to 6)
        modified : bool
            Set to True to use del Pilar Angeles & Bailón-Miguel's modified
            version of the algorithm


        .. versionadded:: 0.4.0

        """
        self._max_length = max_length
        self._modified = modified

    def encode(self, word: str) -> str:
        """Return the Spanish Metaphone of a word.

        Parameters
        ----------
        word : str
            The word to transform


        Returns
        -------
        str
            The Spanish Metaphone code

        Examples
        --------
        >>> pe = SpanishMetaphone()
        >>> pe.encode('Perez')
        'PRZ'
        >>> pe.encode('Martinez')
        'MRTNZ'
        >>> pe.encode('Gutierrez')
        'GTRRZ'
        >>> pe.encode('Santiago')
        'SNTG'
        >>> pe.encode('Nicolás')
        'NKLS'


        .. versionadded:: 0.3.0
        .. versionchanged:: 0.3.6
            Encapsulated in class


        """

        def _is_vowel(pos: int) -> bool:
            """Return True if the character at word[pos] is a vowel.

            Parameters
            ----------
            pos : int
                Position to check for a vowel

            Returns
            -------
            bool
                True if word[pos] is a vowel

            .. versionadded:: 0.3.0

            """
            return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}

        word = unicode_normalize('NFC', word.upper())

        meta_key = ''
        pos = 0

        # do some replacements for the modified version
        if self._modified:
            word = word.replace('MB', 'NB')
            word = word.replace('MP', 'NP')
            word = word.replace('BS', 'S')
            if word[:2] == 'PS':
                word = word[1:]

        # simple replacements
        word = word.replace('Á', 'A')
        word = word.replace('CH', 'X')
        word = word.replace('Ç', 'S')
        word = word.replace('É', 'E')
        word = word.replace('Í', 'I')
        word = word.replace('Ó', 'O')
        word = word.replace('Ú', 'U')
        word = word.replace('Ñ', 'NY')
        word = word.replace('GÜ', 'W')
        word = word.replace('Ü', 'U')
        word = word.replace('B', 'V')
        word = word.replace('LL', 'Y')

        while len(meta_key) < self._max_length:
            if pos >= len(word):
                break

            # get the next character
            current_char = word[pos]

            # if a vowel in pos 0, add to key
            if _is_vowel(pos) and pos == 0:
                meta_key += current_char
                pos += 1
            # otherwise, do consonant rules
            else:
                # simple consonants (unmutated)
                if current_char in {
                    'D',
                    'F',
                    'J',
                    'K',
                    'M',
                    'N',
                    'P',
                    'T',
                    'V',
                    'L',
                    'Y',
                }:
                    meta_key += current_char
                    # skip doubled consonants
                    if word[pos + 1 : pos + 2] == current_char:
                        pos += 2
                    else:
                        pos += 1
                else:
                    if current_char == 'C':
                        # special case 'acción', 'reacción',etc.
                        if word[pos + 1 : pos + 2] == 'C':
                            meta_key += 'X'
                            pos += 2
                        # special case 'cesar', 'cien', 'cid', 'conciencia'
                        elif word[pos + 1 : pos + 2] in {'E', 'I'}:
                            meta_key += 'Z'
                            pos += 2
                        # base case
                        else:
                            meta_key += 'K'
                            pos += 1
                    elif current_char == 'G':
                        # special case 'gente', 'ecologia',etc
                        if word[pos + 1 : pos + 2] in {'E', 'I'}:
                            meta_key += 'J'
                            pos += 2
                        # base case
                        else:
                            meta_key += 'G'
                            pos += 1
                    elif current_char == 'H':
                        # since the letter 'H' is silent in Spanish,
                        # set the meta key to the vowel after the letter 'H'
                        if _is_vowel(pos + 1):
                            meta_key += word[pos + 1]
                            pos += 2
                        else:
                            meta_key += 'H'
                            pos += 1
                    elif current_char == 'Q':
                        if word[pos + 1 : pos + 2] == 'U':
                            pos += 2
                        else:
                            pos += 1
                        meta_key += 'K'
                    elif current_char == 'W':
                        meta_key += 'U'
                        pos += 1
                    elif current_char == 'R':
                        meta_key += 'R'
                        pos += 1
                    elif current_char == 'S':
                        if not _is_vowel(pos + 1) and pos == 0:
                            meta_key += 'ES'
                            pos += 1
                        else:
                            meta_key += 'S'
                            pos += 1
                    elif current_char == 'Z':
                        meta_key += 'Z'
                        pos += 1
                    elif current_char == 'X':
                        if (
                            len(word) > 1
                            and pos == 0
                            and not _is_vowel(pos + 1)
                        ):
                            meta_key += 'EX'
                            pos += 1
                        else:
                            meta_key += 'X'
                            pos += 1
                    else:
                        pos += 1

        # Final change from S to Z in modified version
        if self._modified:
            meta_key = meta_key.replace('S', 'Z')

        return meta_key


if __name__ == '__main__':
    import doctest

    doctest.testmod()


1		# Copyright 2018-2020 by Christopher C. Little.
2		# This file is part of Abydos.
3		#
4		# Abydos is free software: you can redistribute it and/or modify
5		# it under the terms of the GNU General Public License as published by
6		# the Free Software Foundation, either version 3 of the License, or
7		# (at your option) any later version.
8		#
9		# Abydos is distributed in the hope that it will be useful,
10		# but WITHOUT ANY WARRANTY; without even the implied warranty of
11		# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12		# GNU General Public License for more details.
13		#
14		# You should have received a copy of the GNU General Public License
15		# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
16
17		"""abydos.phonetic._spanish_metaphone.
18
19	1	Spanish Metaphone
20		"""
21
22		from unicodedata import normalize as unicode_normalize
23
24	1	from ._phonetic import _Phonetic
25
26		__all__ = ['SpanishMetaphone']
27
28
29		class SpanishMetaphone(_Phonetic):
30		"""Spanish Metaphone.
31	1
32		This is a quick rewrite of the Spanish Metaphone Algorithm, as presented at
33	1	https://github.com/amsqr/Spanish-Metaphone and discussed in
34		:cite:`Mosquera:2012`.
35	1
36		Modified version based on :cite:`delPilarAngeles:2016`.
37	1
38	1
39		.. versionadded:: 0.3.6
40	1	"""
41
42		def __init__(self, max_length: int = 6, modified: bool = False) -> None:
43	1	"""Initialize AlphaSIS instance.
44
45		Parameters
46		----------
47		max_length : int
48		The length of the code returned (defaults to 6)
49		modified : bool
50		Set to True to use del Pilar Angeles & Bailón-Miguel's modified
51		version of the algorithm
52
53
54		.. versionadded:: 0.4.0
55
56	1	"""
57		self._max_length = max_length
58		self._modified = modified
59
60		def encode(self, word: str) -> str:
61		"""Return the Spanish Metaphone of a word.
62
63		Parameters
64		----------
65		word : str
66		The word to transform
67
68
69		Returns
70		-------
71	1	str
72	1	The Spanish Metaphone code
73
74	1	Examples
75		--------
76		>>> pe = SpanishMetaphone()
77		>>> pe.encode('Perez')
78		'PRZ'
79		>>> pe.encode('Martinez')
80		'MRTNZ'
81		>>> pe.encode('Gutierrez')
82		'GTRRZ'
83		>>> pe.encode('Santiago')
84		'SNTG'
85		>>> pe.encode('Nicolás')
86		'NKLS'
87
88
89		.. versionadded:: 0.3.0
90		.. versionchanged:: 0.3.6
91		Encapsulated in class
92
93
94		"""
95
96		def _is_vowel(pos: int) -> bool:
97		"""Return True if the character at word[pos] is a vowel.
98
99		Parameters
100		----------
101		pos : int
102		Position to check for a vowel
103
104		Returns
105		-------
106		bool
107		True if word[pos] is a vowel
108
109		.. versionadded:: 0.3.0
110	1
111		"""
112		return pos < len(word) and word[pos] in {'A', 'E', 'I', 'O', 'U'}
113
114		word = unicode_normalize('NFC', word.upper())
115
116		meta_key = ''
117		pos = 0
118
119		# do some replacements for the modified version
120		if self._modified:
121		word = word.replace('MB', 'NB')
122		word = word.replace('MP', 'NP')
123		word = word.replace('BS', 'S')
124		if word[:2] == 'PS':
125		word = word[1:]
126	1
127		# simple replacements
128	1	word = word.replace('Á', 'A')
129		word = word.replace('CH', 'X')
130	1	word = word.replace('Ç', 'S')
131	1	word = word.replace('É', 'E')
132		word = word.replace('Í', 'I')
133		word = word.replace('Ó', 'O')
134	1	word = word.replace('Ú', 'U')
135	1	word = word.replace('Ñ', 'NY')
136	1	word = word.replace('GÜ', 'W')
137	1	word = word.replace('Ü', 'U')
138	1	word = word.replace('B', 'V')
139	1	word = word.replace('LL', 'Y')
140
141		while len(meta_key) < self._max_length:
142	1	if pos >= len(word):
143	1	break
144	1
145	1	# get the next character
146	1	current_char = word[pos]
147	1
148	1	# if a vowel in pos 0, add to key
149	1	if _is_vowel(pos) and pos == 0:
150	1	meta_key += current_char
151	1	pos += 1
152	1	# otherwise, do consonant rules
153	1	else:
154		# simple consonants (unmutated)
155	1	if current_char in {
156	1	'D',
157	1	'F',
158		'J',
159		'K',
160	1	'M',
161		'N',
162		'P',
163	1	'T',
164	1	'V',
165	1	'L',
166		'Y',
167		}:
168		meta_key += current_char
169	1	# skip doubled consonants
170		if word[pos + 1 : pos + 2] == current_char:
171		pos += 2
172		else:
173		pos += 1
174		else:
175		if current_char == 'C':
176		# special case 'acción', 'reacción',etc.
177		if word[pos + 1 : pos + 2] == 'C':
178		meta_key += 'X'
179		pos += 2
180		# special case 'cesar', 'cien', 'cid', 'conciencia'
181		elif word[pos + 1 : pos + 2] in {'E', 'I'}:
182	1	meta_key += 'Z'
183		pos += 2
184	1	# base case
185	1	else:
186		meta_key += 'K'
187	1	pos += 1
188		elif current_char == 'G':
189	1	# special case 'gente', 'ecologia',etc
190		if word[pos + 1 : pos + 2] in {'E', 'I'}:
191	1	meta_key += 'J'
192	1	pos += 2
193	1	# base case
194		else:
195	1	meta_key += 'G'
196	1	pos += 1
197	1	elif current_char == 'H':
198		# since the letter 'H' is silent in Spanish,
199		# set the meta key to the vowel after the letter 'H'
200	1	if _is_vowel(pos + 1):
201	1	meta_key += word[pos + 1]
202	1	pos += 2
203		else:
204	1	meta_key += 'H'
205	1	pos += 1
206	1	elif current_char == 'Q':
207		if word[pos + 1 : pos + 2] == 'U':
208		pos += 2
209	1	else:
210	1	pos += 1
211	1	meta_key += 'K'
212		elif current_char == 'W':
213		meta_key += 'U'
214	1	pos += 1
215	1	elif current_char == 'R':
216	1	meta_key += 'R'
217		pos += 1
218	1	elif current_char == 'S':
219	1	if not _is_vowel(pos + 1) and pos == 0:
220	1	meta_key += 'ES'
221	1	pos += 1
222	1	else:
223		meta_key += 'S'
224	1	pos += 1
225	1	elif current_char == 'Z':
226	1	meta_key += 'Z'
227	1	pos += 1
228	1	elif current_char == 'X':
229	1	if (
230	1	len(word) > 1
231	1	and pos == 0
232	1	and not _is_vowel(pos + 1)
233	1	):
234	1	meta_key += 'EX'
235	1	pos += 1
236		else:
237	1	meta_key += 'X'
238	1	pos += 1
239	1	else:
240	1	pos += 1
241	1
242	1	# Final change from S to Z in modified version
243	1	if self._modified:
244		meta_key = meta_key.replace('S', 'Z')
245
246		return meta_key
247
248	1
249	1	if __name__ == '__main__':
250		import doctest
251	1
252		doctest.testmod()
253

chrislit / abydos

SpanishMetaphone.encode() F last analyzed 2020-12-31 20:10 UTC

Complexity

Size

Duplication

Code Coverage

Importance

How to fix Long Method Complexity

Long Method

Complexity

Duplication Side-by-Side

Filter issues like

SpanishMetaphone.encode() F
last analyzed 2020-12-31 20:10 UTC