Completed
Push — master ( f43547...71985b )
by Chris
12:00 queued 10s
created

abydos.phonetic._metaphone   F

Complexity

Total Complexity 81

Size/Duplication

Total Lines 294
Duplicated Lines 0 %

Test Coverage

Coverage 100%

Importance

Changes 0
Metric Value
eloc 145
dl 0
loc 294
ccs 109
cts 109
cp 1
rs 2
c 0
b 0
f 0
wmc 81

1 Function

Rating   Name   Duplication   Size   Complexity  
A metaphone() 0 31 1

1 Method

Rating   Name   Duplication   Size   Complexity  
F Metaphone.encode() 0 205 80

How to fix   Complexity   

Complexity

Complex classes like abydos.phonetic._metaphone often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._metaphone.
20
21
Metaphone
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from six.moves import range
32
33 1
from ._phonetic import _Phonetic
34
35 1
__all__ = ['Metaphone', 'metaphone']
36
37
38 1
class Metaphone(_Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
39
    """Metaphone.
40
41
    Based on Lawrence Philips' Pick BASIC code from 1990 :cite:`Philips:1990`,
42
    as described in :cite:`Philips:1990b`.
43
    This incorporates some corrections to the above code, particularly
44
    some of those suggested by Michael Kuhn in :cite:`Kuhn:1995`.
45
    """
46
47 1
    _frontv = {'E', 'I', 'Y'}
48 1
    _varson = {'C', 'G', 'P', 'S', 'T'}
49
50 1
    def encode(self, word, max_length=-1):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
51
        """Return the Metaphone code for a word.
52
53
        Based on Lawrence Philips' Pick BASIC code from 1990
54
        :cite:`Philips:1990`, as described in :cite:`Philips:1990b`.
55
        This incorporates some corrections to the above code, particularly
56
        some of those suggested by Michael Kuhn in :cite:`Kuhn:1995`.
57
58
        Parameters
59
        ----------
60
        word : str
61
            The word to transform
62
        max_length : int
63
            The maximum length of the returned Metaphone code (defaults to 64,
64
            but in Philips' original implementation this was 4)
65
66
        Returns
67
        -------
68
        str
69
            The Metaphone value
70
71
        Examples
72
        --------
73
        >>> pe = Metaphone()
74
        >>> pe.encode('Christopher')
75
        'KRSTFR'
76
        >>> pe.encode('Niall')
77
        'NL'
78
        >>> pe.encode('Smith')
79
        'SM0'
80
        >>> pe.encode('Schmidt')
81
        'SKMTT'
82
83
        """
84
        # Require a max_length of at least 4
85 1
        if max_length != -1:
86 1
            max_length = max(4, max_length)
87
        else:
88 1
            max_length = 64
89
90
        # As in variable sound--those modified by adding an "h"
91 1
        ename = ''.join(c for c in word.upper() if c.isalnum())
92 1
        ename = ename.replace('ß', 'SS')
93
94
        # Delete non-alphanumeric characters and make all caps
95 1
        if not ename:
96 1
            return ''
97 1
        if ename[0:2] in {'PN', 'AE', 'KN', 'GN', 'WR'}:
98 1
            ename = ename[1:]
99 1
        elif ename[0] == 'X':
100 1
            ename = 'S' + ename[1:]
101 1
        elif ename[0:2] == 'WH':
102 1
            ename = 'W' + ename[2:]
103
104
        # Convert to metaphone
105 1
        elen = len(ename) - 1
106 1
        metaph = ''
107 1
        for i in range(len(ename)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
108 1
            if len(metaph) >= max_length:
109 1
                break
110 1
            if (
111
                ename[i] not in {'G', 'T'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
112
                and i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
113
                and ename[i - 1] == ename[i]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
114
            ):
115 1
                continue
116
117 1
            if ename[i] in self._uc_v_set and i == 0:
118 1
                metaph = ename[i]
119
120 1
            elif ename[i] == 'B':
121 1
                if i != elen or ename[i - 1] != 'M':
122 1
                    metaph += ename[i]
123
124 1
            elif ename[i] == 'C':
125 1
                if not (
126
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
127
                    and ename[i - 1] == 'S'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
128
                    and ename[i + 1 : i + 2] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
129
                ):
130 1
                    if ename[i + 1 : i + 3] == 'IA':
131 1
                        metaph += 'X'
132 1
                    elif ename[i + 1 : i + 2] in self._frontv:
133 1
                        metaph += 'S'
134 1
                    elif i > 0 and ename[i - 1 : i + 2] == 'SCH':
135 1
                        metaph += 'K'
136 1
                    elif ename[i + 1 : i + 2] == 'H':
137 1
                        if (
138
                            i == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
139
                            and i + 1 < elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
140
                            and ename[i + 2 : i + 3] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
141
                        ):
142 1
                            metaph += 'K'
143
                        else:
144 1
                            metaph += 'X'
145
                    else:
146 1
                        metaph += 'K'
147
148 1
            elif ename[i] == 'D':
149 1
                if (
150
                    ename[i + 1 : i + 2] == 'G'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
151
                    and ename[i + 2 : i + 3] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
152
                ):
153 1
                    metaph += 'J'
154
                else:
155 1
                    metaph += 'T'
156
157 1
            elif ename[i] == 'G':
158 1
                if ename[i + 1 : i + 2] == 'H' and not (
159
                    i + 1 == elen or ename[i + 2 : i + 3] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
160
                ):
161 1
                    continue
162 1
                elif i > 0 and (
163
                    (i + 1 == elen and ename[i + 1] == 'N')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
164
                    or (i + 3 == elen and ename[i + 1 : i + 4] == 'NED')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
165
                ):
166 1
                    continue
167 1
                elif (
168
                    i - 1 > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
169
                    and i + 1 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
170
                    and ename[i - 1] == 'D'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
171
                    and ename[i + 1] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
172
                ):
173 1
                    continue
174 1
                elif ename[i + 1 : i + 2] == 'G':
175 1
                    continue
176 1
                elif ename[i + 1 : i + 2] in self._frontv:
177 1
                    if i == 0 or ename[i - 1] != 'G':
178 1
                        metaph += 'J'
179
                    else:
180 1
                        metaph += 'K'
181
                else:
182 1
                    metaph += 'K'
183
184 1
            elif ename[i] == 'H':
185 1
                if (
186
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
187
                    and ename[i - 1] in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
188
                    and ename[i + 1 : i + 2] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
189
                ):
190 1
                    continue
191 1
                elif i > 0 and ename[i - 1] in self._varson:
192 1
                    continue
193
                else:
194 1
                    metaph += 'H'
195
196 1
            elif ename[i] in {'F', 'J', 'L', 'M', 'N', 'R'}:
197 1
                metaph += ename[i]
198
199 1
            elif ename[i] == 'K':
200 1
                if i > 0 and ename[i - 1] == 'C':
201 1
                    continue
202
                else:
203 1
                    metaph += 'K'
204
205 1
            elif ename[i] == 'P':
206 1
                if ename[i + 1 : i + 2] == 'H':
207 1
                    metaph += 'F'
208
                else:
209 1
                    metaph += 'P'
210
211 1
            elif ename[i] == 'Q':
212 1
                metaph += 'K'
213
214 1
            elif ename[i] == 'S':
215 1
                if (
216
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
217
                    and i + 2 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
218
                    and ename[i + 1] == 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
219
                    and ename[i + 2] in 'OA'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
220
                ):
221 1
                    metaph += 'X'
222 1
                elif ename[i + 1 : i + 2] == 'H':
223 1
                    metaph += 'X'
224
                else:
225 1
                    metaph += 'S'
226
227 1
            elif ename[i] == 'T':
228 1
                if (
229
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
230
                    and i + 2 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
231
                    and ename[i + 1] == 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
232
                    and ename[i + 2] in {'A', 'O'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
233
                ):
234 1
                    metaph += 'X'
235 1
                elif ename[i + 1 : i + 2] == 'H':
236 1
                    metaph += '0'
237 1
                elif ename[i + 1 : i + 3] != 'CH':
238 1
                    if ename[i - 1 : i] != 'T':
239 1
                        metaph += 'T'
240
241 1
            elif ename[i] == 'V':
242 1
                metaph += 'F'
243
244 1
            elif ename[i] in 'WY':
245 1
                if ename[i + 1 : i + 2] in self._uc_v_set:
246 1
                    metaph += ename[i]
247
248 1
            elif ename[i] == 'X':
249 1
                metaph += 'KS'
250
251 1
            elif ename[i] == 'Z':
252 1
                metaph += 'S'
253
254 1
        return metaph
255
256
257 1
def metaphone(word, max_length=-1):
258
    """Return the Metaphone code for a word.
259
260
    This is a wrapper for :py:meth:`Metaphone.encode`.
261
262
    Parameters
263
    ----------
264
    word : str
265
        The word to transform
266
    max_length : int
267
        The maximum length of the returned Metaphone code (defaults to 64, but
268
        in Philips' original implementation this was 4)
269
270
    Returns
271
    -------
272
    str
273
        The Metaphone value
274
275
    Examples
276
    --------
277
    >>> metaphone('Christopher')
278
    'KRSTFR'
279
    >>> metaphone('Niall')
280
    'NL'
281
    >>> metaphone('Smith')
282
    'SM0'
283
    >>> metaphone('Schmidt')
284
    'SKMTT'
285
286
    """
287 1
    return Metaphone().encode(word, max_length)
288
289
290
if __name__ == '__main__':
291
    import doctest
292
293
    doctest.testmod()
294