Completed
Pull Request — master (#138)
by Chris
14:20
created

abydos.phonetic._metaphone.Metaphone.encode()   F

Complexity

Conditions 80

Size

Total Lines 196
Code Lines 128

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 99
CRAP Score 80

Importance

Changes 0
Metric Value
eloc 128
dl 0
loc 196
ccs 99
cts 99
cp 1
rs 0
c 0
b 0
f 0
cc 80
nop 3
crap 80

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._metaphone.Metaphone.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
0 ignored issues
show
coding-style introduced by
Too many lines in module (1151/1000)
Loading history...
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._metaphone.
20
21
The phonetic._metaphone module implements Metaphone and Double Metaphone.
22
"""
23
24 1
from __future__ import unicode_literals
25
26 1
from six.moves import range
27
28 1
from ._phonetic import Phonetic
29
30 1
__all__ = ['DoubleMetaphone', 'Metaphone', 'double_metaphone', 'metaphone']
31
32
33 1
class Metaphone(Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
34
    """Metaphone.
35
36
    Based on Lawrence Philips' Pick BASIC code from 1990 :cite:`Philips:1990`,
37
    as described in :cite:`Philips:1990b`.
38
    This incorporates some corrections to the above code, particularly
39
    some of those suggested by Michael Kuhn in :cite:`Kuhn:1995`.
40
    """
41
42 1
    _frontv = {'E', 'I', 'Y'}
43 1
    _varson = {'C', 'G', 'P', 'S', 'T'}
44
45 1
    def encode(self, word, max_length=-1):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
46
        """Return the Metaphone code for a word.
47
48
        Based on Lawrence Philips' Pick BASIC code from 1990
49
        :cite:`Philips:1990`, as described in :cite:`Philips:1990b`.
50
        This incorporates some corrections to the above code, particularly
51
        some of those suggested by Michael Kuhn in :cite:`Kuhn:1995`.
52
53
        :param str word: the word to transform
54
        :param int max_length: the maximum length of the returned Metaphone
55
            code (defaults to 64, but in Philips' original implementation this
56
            was 4)
57
        :returns: the Metaphone value
58
        :rtype: str
59
60
        >>> pe = Metaphone()
61
        >>> pe.encode('Christopher')
62
        'KRSTFR'
63
        >>> pe.encode('Niall')
64
        'NL'
65
        >>> pe.encode('Smith')
66
        'SM0'
67
        >>> pe.encode('Schmidt')
68
        'SKMTT'
69
        """
70
        # Require a max_length of at least 4
71 1
        if max_length != -1:
72 1
            max_length = max(4, max_length)
73
        else:
74 1
            max_length = 64
75
76
        # As in variable sound--those modified by adding an "h"
77 1
        ename = ''.join(c for c in word.upper() if c.isalnum())
78 1
        ename = ename.replace('ß', 'SS')
79
80
        # Delete non-alphanumeric characters and make all caps
81 1
        if not ename:
82 1
            return ''
83 1
        if ename[0:2] in {'PN', 'AE', 'KN', 'GN', 'WR'}:
84 1
            ename = ename[1:]
85 1
        elif ename[0] == 'X':
86 1
            ename = 'S' + ename[1:]
87 1
        elif ename[0:2] == 'WH':
88 1
            ename = 'W' + ename[2:]
89
90
        # Convert to metaphone
91 1
        elen = len(ename) - 1
92 1
        metaph = ''
93 1
        for i in range(len(ename)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
94 1
            if len(metaph) >= max_length:
95 1
                break
96 1
            if (
97
                ename[i] not in {'G', 'T'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
98
                and i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
99
                and ename[i - 1] == ename[i]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
100
            ):
101 1
                continue
102
103 1
            if ename[i] in self._uc_v_set and i == 0:
104 1
                metaph = ename[i]
105
106 1
            elif ename[i] == 'B':
107 1
                if i != elen or ename[i - 1] != 'M':
108 1
                    metaph += ename[i]
109
110 1
            elif ename[i] == 'C':
111 1
                if not (
112
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
113
                    and ename[i - 1] == 'S'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
114
                    and ename[i + 1 : i + 2] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
115
                ):
116 1
                    if ename[i + 1 : i + 3] == 'IA':
117 1
                        metaph += 'X'
118 1
                    elif ename[i + 1 : i + 2] in self._frontv:
119 1
                        metaph += 'S'
120 1
                    elif i > 0 and ename[i - 1 : i + 2] == 'SCH':
121 1
                        metaph += 'K'
122 1
                    elif ename[i + 1 : i + 2] == 'H':
123 1
                        if (
124
                            i == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
125
                            and i + 1 < elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
126
                            and ename[i + 2 : i + 3] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
127
                        ):
128 1
                            metaph += 'K'
129
                        else:
130 1
                            metaph += 'X'
131
                    else:
132 1
                        metaph += 'K'
133
134 1
            elif ename[i] == 'D':
135 1
                if (
136
                    ename[i + 1 : i + 2] == 'G'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
137
                    and ename[i + 2 : i + 3] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
138
                ):
139 1
                    metaph += 'J'
140
                else:
141 1
                    metaph += 'T'
142
143 1
            elif ename[i] == 'G':
144 1
                if ename[i + 1 : i + 2] == 'H' and not (
145
                    i + 1 == elen or ename[i + 2 : i + 3] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
146
                ):
147 1
                    continue
148 1
                elif i > 0 and (
149
                    (i + 1 == elen and ename[i + 1] == 'N')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
150
                    or (i + 3 == elen and ename[i + 1 : i + 4] == 'NED')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
151
                ):
152 1
                    continue
153 1
                elif (
154
                    i - 1 > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
155
                    and i + 1 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
156
                    and ename[i - 1] == 'D'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
157
                    and ename[i + 1] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
158
                ):
159 1
                    continue
160 1
                elif ename[i + 1 : i + 2] == 'G':
161 1
                    continue
162 1
                elif ename[i + 1 : i + 2] in self._frontv:
163 1
                    if i == 0 or ename[i - 1] != 'G':
164 1
                        metaph += 'J'
165
                    else:
166 1
                        metaph += 'K'
167
                else:
168 1
                    metaph += 'K'
169
170 1
            elif ename[i] == 'H':
171 1
                if (
172
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
173
                    and ename[i - 1] in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
174
                    and ename[i + 1 : i + 2] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
175
                ):
176 1
                    continue
177 1
                elif i > 0 and ename[i - 1] in self._varson:
178 1
                    continue
179
                else:
180 1
                    metaph += 'H'
181
182 1
            elif ename[i] in {'F', 'J', 'L', 'M', 'N', 'R'}:
183 1
                metaph += ename[i]
184
185 1
            elif ename[i] == 'K':
186 1
                if i > 0 and ename[i - 1] == 'C':
187 1
                    continue
188
                else:
189 1
                    metaph += 'K'
190
191 1
            elif ename[i] == 'P':
192 1
                if ename[i + 1 : i + 2] == 'H':
193 1
                    metaph += 'F'
194
                else:
195 1
                    metaph += 'P'
196
197 1
            elif ename[i] == 'Q':
198 1
                metaph += 'K'
199
200 1
            elif ename[i] == 'S':
201 1
                if (
202
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
203
                    and i + 2 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
204
                    and ename[i + 1] == 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
205
                    and ename[i + 2] in 'OA'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
206
                ):
207 1
                    metaph += 'X'
208 1
                elif ename[i + 1 : i + 2] == 'H':
209 1
                    metaph += 'X'
210
                else:
211 1
                    metaph += 'S'
212
213 1
            elif ename[i] == 'T':
214 1
                if (
215
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
216
                    and i + 2 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
217
                    and ename[i + 1] == 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
218
                    and ename[i + 2] in {'A', 'O'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
219
                ):
220 1
                    metaph += 'X'
221 1
                elif ename[i + 1 : i + 2] == 'H':
222 1
                    metaph += '0'
223 1
                elif ename[i + 1 : i + 3] != 'CH':
224 1
                    if ename[i - 1 : i] != 'T':
225 1
                        metaph += 'T'
226
227 1
            elif ename[i] == 'V':
228 1
                metaph += 'F'
229
230 1
            elif ename[i] in 'WY':
231 1
                if ename[i + 1 : i + 2] in self._uc_v_set:
232 1
                    metaph += ename[i]
233
234 1
            elif ename[i] == 'X':
235 1
                metaph += 'KS'
236
237 1
            elif ename[i] == 'Z':
238 1
                metaph += 'S'
239
240 1
        return metaph
241
242
243 1
def metaphone(word, max_length=-1):
244
    """Return the Metaphone code for a word.
245
246
    This is a wrapper for :py:meth:`Metaphone.encode`.
247
248
    :param str word: the word to transform
249
    :param int max_length: the maximum length of the returned Metaphone code
250
        (defaults to 64, but in Philips' original implementation this was 4)
251
    :returns: the Metaphone value
252
    :rtype: str
253
254
    >>> metaphone('Christopher')
255
    'KRSTFR'
256
    >>> metaphone('Niall')
257
    'NL'
258
    >>> metaphone('Smith')
259
    'SM0'
260
    >>> metaphone('Schmidt')
261
    'SKMTT'
262
    """
263 1
    return Metaphone().encode(word, max_length)
264
265
266 1
class DoubleMetaphone(Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
267
    """Double Metaphone.
268
269
    Based on Lawrence Philips' (Visual) C++ code from 1999
270
    :cite:`Philips:2000`.
271
    """
272
273 1
    def encode(self, word, max_length=-1):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
274
        """Return the Double Metaphone code for a word.
275
276
        :param word: the word to transform
277
        :param max_length: the maximum length of the returned Double Metaphone
278
            codes (defaults to 64, but in Philips' original implementation this
279
            was 4)
280
        :returns: the Double Metaphone value(s)
281
        :rtype: tuple
282
283
        >>> pe = DoubleMetaphone()
284
        >>> pe.encode('Christopher')
285
        ('KRSTFR', '')
286
        >>> pe.encode('Niall')
287
        ('NL', '')
288
        >>> pe.encode('Smith')
289
        ('SM0', 'XMT')
290
        >>> pe.encode('Schmidt')
291
        ('XMT', 'SMT')
292
        """
293
        # Require a max_length of at least 4
294 1
        if max_length != -1:
295 1
            max_length = max(4, max_length)
296
        else:
297 1
            max_length = 64
298
299 1
        primary = ''
300 1
        secondary = ''
301
302 1
        def _slavo_germanic():
303
            """Return True if the word appears to be Slavic or Germanic."""
304 1
            if 'W' in word or 'K' in word or 'CZ' in word:
305 1
                return True
306 1
            return False
307
308 1
        def _metaph_add(pri, sec=''):
309
            """Return a new metaphone tuple with the supplied elements."""
310 1
            newpri = primary
311 1
            newsec = secondary
312 1
            if pri:
313 1
                newpri += pri
314 1
            if sec:
315 1
                if sec != ' ':
316 1
                    newsec += sec
317
            else:
318 1
                newsec += pri
319 1
            return newpri, newsec
320
321 1
        def _is_vowel(pos):
322
            """Return True if the character at word[pos] is a vowel."""
323 1
            if pos >= 0 and word[pos] in {'A', 'E', 'I', 'O', 'U', 'Y'}:
324 1
                return True
325 1
            return False
326
327 1
        def _get_at(pos):
328
            """Return the character at word[pos]."""
329 1
            return word[pos]
330
331 1
        def _string_at(pos, slen, substrings):
332
            """Return True if word[pos:pos+slen] is in substrings."""
333 1
            if pos < 0:
334 1
                return False
335 1
            return word[pos : pos + slen] in substrings
336
337 1
        current = 0
338 1
        length = len(word)
339 1
        if length < 1:
340 1
            return '', ''
341 1
        last = length - 1
342
343 1
        word = word.upper()
344 1
        word = word.replace('ß', 'SS')
345
346
        # Pad the original string so that we can index beyond the edge of the
347
        # world
348 1
        word += '     '
349
350
        # Skip these when at start of word
351 1
        if word[0:2] in {'GN', 'KN', 'PN', 'WR', 'PS'}:
352 1
            current += 1
353
354
        # Initial 'X' is pronounced 'Z' e.g. 'Xavier'
355 1
        if _get_at(0) == 'X':
356 1
            primary, secondary = _metaph_add('S')  # 'Z' maps to 'S'
357 1
            current += 1
358
359
        # Main loop
360 1
        while True:
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
361 1
            if current >= length:
362 1
                break
363
364 1
            if _get_at(current) in {'A', 'E', 'I', 'O', 'U', 'Y'}:
365 1
                if current == 0:
366
                    # All init vowels now map to 'A'
367 1
                    primary, secondary = _metaph_add('A')
368 1
                current += 1
369 1
                continue
370
371 1
            elif _get_at(current) == 'B':
372
                # "-mb", e.g", "dumb", already skipped over...
373 1
                primary, secondary = _metaph_add('P')
374 1
                if _get_at(current + 1) == 'B':
375 1
                    current += 2
376
                else:
377 1
                    current += 1
378 1
                continue
379
380 1
            elif _get_at(current) == 'Ç':
381 1
                primary, secondary = _metaph_add('S')
382 1
                current += 1
383 1
                continue
384
385 1
            elif _get_at(current) == 'C':
386
                # Various Germanic
387 1
                if (
388
                    current > 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
389
                    and not _is_vowel(current - 2)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
390
                    and _string_at((current - 1), 3, {'ACH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
391
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
392
                        (_get_at(current + 2) != 'I')
393
                        and (
394
                            (_get_at(current + 2) != 'E')
395
                            or _string_at(
396
                                (current - 2), 6, {'BACHER', 'MACHER'}
397
                            )
398
                        )
399
                    )
400
                ):
401 1
                    primary, secondary = _metaph_add('K')
402 1
                    current += 2
403 1
                    continue
404
405
                # Special case 'caesar'
406 1
                elif current == 0 and _string_at(current, 6, {'CAESAR'}):
407 1
                    primary, secondary = _metaph_add('S')
408 1
                    current += 2
409 1
                    continue
410
411
                # Italian 'chianti'
412 1
                elif _string_at(current, 4, {'CHIA'}):
413 1
                    primary, secondary = _metaph_add('K')
414 1
                    current += 2
415 1
                    continue
416
417 1
                elif _string_at(current, 2, {'CH'}):
418
                    # Find 'Michael'
419 1
                    if current > 0 and _string_at(current, 4, {'CHAE'}):
420 1
                        primary, secondary = _metaph_add('K', 'X')
421 1
                        current += 2
422 1
                        continue
423
424
                    # Greek roots e.g. 'chemistry', 'chorus'
425 1
                    elif (
426
                        current == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
427
                        and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
428
                            _string_at((current + 1), 5, {'HARAC', 'HARIS'})
429
                            or _string_at(
430
                                (current + 1), 3, {'HOR', 'HYM', 'HIA', 'HEM'}
431
                            )
432
                        )
433
                        and not _string_at(0, 5, {'CHORE'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
434
                    ):
435 1
                        primary, secondary = _metaph_add('K')
436 1
                        current += 2
437 1
                        continue
438
439
                    # Germanic, Greek, or otherwise 'ch' for 'kh' sound
440 1
                    elif (
441
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
442
                            _string_at(0, 4, {'VAN ', 'VON '})
443
                            or _string_at(0, 3, {'SCH'})
444
                        )
445
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
446
                        # 'architect but not 'arch', 'orchestra', 'orchid'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
447
                        _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
448
                            (current - 2), 6, {'ORCHES', 'ARCHIT', 'ORCHID'}
449
                        )
450
                        or _string_at((current + 2), 1, {'T', 'S'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
451
                        or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
452
                            (
453
                                _string_at(
454
                                    (current - 1), 1, {'A', 'O', 'U', 'E'}
455
                                )
456
                                or (current == 0)
457
                            )
458
                            and
459
                            # e.g., 'wachtler', 'wechsler', but not 'tichner'
460
                            _string_at(
461
                                (current + 2),
462
                                1,
463
                                {
464
                                    'L',
465
                                    'R',
466
                                    'N',
467
                                    'M',
468
                                    'B',
469
                                    'H',
470
                                    'F',
471
                                    'V',
472
                                    'W',
473
                                    ' ',
474
                                },
475
                            )
476
                        )
477
                    ):
478 1
                        primary, secondary = _metaph_add('K')
479
480
                    else:
481 1
                        if current > 0:
482 1
                            if _string_at(0, 2, {'MC'}):
483
                                # e.g., "McHugh"
484 1
                                primary, secondary = _metaph_add('K')
485
                            else:
486 1
                                primary, secondary = _metaph_add('X', 'K')
487
                        else:
488 1
                            primary, secondary = _metaph_add('X')
489
490 1
                    current += 2
491 1
                    continue
492
493
                # e.g, 'czerny'
494 1
                elif _string_at(current, 2, {'CZ'}) and not _string_at(
495
                    (current - 2), 4, {'WICZ'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
496
                ):
497 1
                    primary, secondary = _metaph_add('S', 'X')
498 1
                    current += 2
499 1
                    continue
500
501
                # e.g., 'focaccia'
502 1
                elif _string_at((current + 1), 3, {'CIA'}):
503 1
                    primary, secondary = _metaph_add('X')
504 1
                    current += 3
505
506
                # double 'C', but not if e.g. 'McClellan'
507 1
                elif _string_at(current, 2, {'CC'}) and not (
508
                    (current == 1) and (_get_at(0) == 'M')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
509
                ):
510
                    # 'bellocchio' but not 'bacchus'
511 1
                    if _string_at(
512
                        (current + 2), 1, {'I', 'E', 'H'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
513
                    ) and not _string_at((current + 2), 2, ['HU']):
514
                        # 'accident', 'accede' 'succeed'
515 1
                        if (
516
                            (current == 1) and _get_at(current - 1) == 'A'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
517
                        ) or _string_at((current - 1), 5, {'UCCEE', 'UCCES'}):
518 1
                            primary, secondary = _metaph_add('KS')
519
                        # 'bacci', 'bertucci', other italian
520
                        else:
521 1
                            primary, secondary = _metaph_add('X')
522 1
                        current += 3
523 1
                        continue
524
                    else:  # Pierce's rule
525 1
                        primary, secondary = _metaph_add('K')
526 1
                        current += 2
527 1
                        continue
528
529 1
                elif _string_at(current, 2, {'CK', 'CG', 'CQ'}):
530 1
                    primary, secondary = _metaph_add('K')
531 1
                    current += 2
532 1
                    continue
533
534 1
                elif _string_at(current, 2, {'CI', 'CE', 'CY'}):
535
                    # Italian vs. English
536 1
                    if _string_at(current, 3, {'CIO', 'CIE', 'CIA'}):
537 1
                        primary, secondary = _metaph_add('S', 'X')
538
                    else:
539 1
                        primary, secondary = _metaph_add('S')
540 1
                    current += 2
541 1
                    continue
542
543
                # else
544
                else:
545 1
                    primary, secondary = _metaph_add('K')
546
547
                    # name sent in 'mac caffrey', 'mac gregor
548 1
                    if _string_at((current + 1), 2, {' C', ' Q', ' G'}):
549 1
                        current += 3
550 1
                    elif _string_at(
551
                        (current + 1), 1, {'C', 'K', 'Q'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
552
                    ) and not _string_at((current + 1), 2, {'CE', 'CI'}):
553 1
                        current += 2
554
                    else:
555 1
                        current += 1
556 1
                    continue
557
558 1
            elif _get_at(current) == 'D':
559 1
                if _string_at(current, 2, {'DG'}):
560 1
                    if _string_at((current + 2), 1, {'I', 'E', 'Y'}):
561
                        # e.g. 'edge'
562 1
                        primary, secondary = _metaph_add('J')
563 1
                        current += 3
564 1
                        continue
565
                    else:
566
                        # e.g. 'edgar'
567 1
                        primary, secondary = _metaph_add('TK')
568 1
                        current += 2
569 1
                        continue
570
571 1
                elif _string_at(current, 2, {'DT', 'DD'}):
572 1
                    primary, secondary = _metaph_add('T')
573 1
                    current += 2
574 1
                    continue
575
576
                # else
577
                else:
578 1
                    primary, secondary = _metaph_add('T')
579 1
                    current += 1
580 1
                    continue
581
582 1
            elif _get_at(current) == 'F':
583 1
                if _get_at(current + 1) == 'F':
584 1
                    current += 2
585
                else:
586 1
                    current += 1
587 1
                primary, secondary = _metaph_add('F')
588 1
                continue
589
590 1
            elif _get_at(current) == 'G':
591 1
                if _get_at(current + 1) == 'H':
592 1
                    if (current > 0) and not _is_vowel(current - 1):
593 1
                        primary, secondary = _metaph_add('K')
594 1
                        current += 2
595 1
                        continue
596
597
                    # 'ghislane', ghiradelli
598 1
                    elif current == 0:
599 1
                        if _get_at(current + 2) == 'I':
600 1
                            primary, secondary = _metaph_add('J')
601
                        else:
602 1
                            primary, secondary = _metaph_add('K')
603 1
                        current += 2
604 1
                        continue
605
606
                    # Parker's rule (with some further refinements) -
607
                    # e.g., 'hugh'
608 1
                    elif (
609
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
610
                            (current > 1)
611
                            and _string_at((current - 2), 1, {'B', 'H', 'D'})
612
                        )
613
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
614
                        # e.g., 'bough'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
615
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
616
                            (current > 2)
617
                            and _string_at((current - 3), 1, {'B', 'H', 'D'})
618
                        )
619
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
620
                        # e.g., 'broughton'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
621
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
622
                            (current > 3)
623
                            and _string_at((current - 4), 1, {'B', 'H'})
624
                        )
625
                    ):
626 1
                        current += 2
627 1
                        continue
628
                    else:
629
                        # e.g. 'laugh', 'McLaughlin', 'cough',
630
                        #      'gough', 'rough', 'tough'
631 1
                        if (
632
                            (current > 2)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
633
                            and (_get_at(current - 1) == 'U')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
634
                            and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
635
                                _string_at(
636
                                    (current - 3), 1, {'C', 'G', 'L', 'R', 'T'}
637
                                )
638
                            )
639
                        ):
640 1
                            primary, secondary = _metaph_add('F')
641 1
                        elif (current > 0) and _get_at(current - 1) != 'I':
642 1
                            primary, secondary = _metaph_add('K')
643 1
                        current += 2
644 1
                        continue
645
646 1
                elif _get_at(current + 1) == 'N':
647 1
                    if (
648
                        (current == 1)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
649
                        and _is_vowel(0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
650
                        and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
651
                    ):
652 1
                        primary, secondary = _metaph_add('KN', 'N')
653
                    # not e.g. 'cagney'
654 1
                    elif (
655
                        not _string_at((current + 2), 2, {'EY'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
656
                        and (_get_at(current + 1) != 'Y')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
657
                        and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
658
                    ):
659 1
                        primary, secondary = _metaph_add('N', 'KN')
660
                    else:
661 1
                        primary, secondary = _metaph_add('KN')
662 1
                    current += 2
663 1
                    continue
664
665
                # 'tagliaro'
666 1
                elif (
667
                    _string_at((current + 1), 2, {'LI'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
668
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
669
                ):
670 1
                    primary, secondary = _metaph_add('KL', 'L')
671 1
                    current += 2
672 1
                    continue
673
674
                # -ges-, -gep-, -gel-, -gie- at beginning
675 1
                elif (current == 0) and (
676
                    (_get_at(current + 1) == 'Y')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
677
                    or _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
678
                        (current + 1),
679
                        2,
680
                        {
681
                            'ES',
682
                            'EP',
683
                            'EB',
684
                            'EL',
685
                            'EY',
686
                            'IB',
687
                            'IL',
688
                            'IN',
689
                            'IE',
690
                            'EI',
691
                            'ER',
692
                        },
693
                    )
694
                ):
695 1
                    primary, secondary = _metaph_add('K', 'J')
696 1
                    current += 2
697 1
                    continue
698
699
                #  -ger-,  -gy-
700 1
                elif (
701
                    (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
702
                        _string_at((current + 1), 2, {'ER'})
703
                        or (_get_at(current + 1) == 'Y')
704
                    )
705
                    and not _string_at(0, 6, {'DANGER', 'RANGER', 'MANGER'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
706
                    and not _string_at((current - 1), 1, {'E', 'I'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
707
                    and not _string_at((current - 1), 3, {'RGY', 'OGY'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
708
                ):
709 1
                    primary, secondary = _metaph_add('K', 'J')
710 1
                    current += 2
711 1
                    continue
712
713
                #  italian e.g, 'biaggi'
714 1
                elif _string_at(
715
                    (current + 1), 1, {'E', 'I', 'Y'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
716
                ) or _string_at((current - 1), 4, {'AGGI', 'OGGI'}):
717
                    # obvious germanic
718 1
                    if (
719
                        _string_at(0, 4, {'VAN ', 'VON '})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
720
                        or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
721
                    ) or _string_at((current + 1), 2, {'ET'}):
722 1
                        primary, secondary = _metaph_add('K')
723 1
                    elif _string_at((current + 1), 4, {'IER '}):
724 1
                        primary, secondary = _metaph_add('J')
725
                    else:
726 1
                        primary, secondary = _metaph_add('J', 'K')
727 1
                    current += 2
728 1
                    continue
729
730
                else:
731 1
                    if _get_at(current + 1) == 'G':
732 1
                        current += 2
733
                    else:
734 1
                        current += 1
735 1
                    primary, secondary = _metaph_add('K')
736 1
                    continue
737
738 1
            elif _get_at(current) == 'H':
739
                # only keep if first & before vowel or btw. 2 vowels
740 1
                if ((current == 0) or _is_vowel(current - 1)) and _is_vowel(
741
                    current + 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
742
                ):
743 1
                    primary, secondary = _metaph_add('H')
744 1
                    current += 2
745
                else:  # also takes care of 'HH'
746 1
                    current += 1
747 1
                continue
748
749 1
            elif _get_at(current) == 'J':
750
                # obvious spanish, 'jose', 'san jacinto'
751 1
                if _string_at(current, 4, ['JOSE']) or _string_at(
752
                    0, 4, {'SAN '}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
753
                ):
754 1
                    if (
755
                        (current == 0) and (_get_at(current + 4) == ' ')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
756
                    ) or _string_at(0, 4, ['SAN ']):
757 1
                        primary, secondary = _metaph_add('H')
758
                    else:
759 1
                        primary, secondary = _metaph_add('J', 'H')
760 1
                    current += 1
761 1
                    continue
762
763 1
                elif (current == 0) and not _string_at(current, 4, {'JOSE'}):
764
                    # Yankelovich/Jankelowicz
765 1
                    primary, secondary = _metaph_add('J', 'A')
766
                # Spanish pron. of e.g. 'bajador'
767 1
                elif (
768
                    _is_vowel(current - 1)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
769
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
770
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
771
                        (_get_at(current + 1) == 'A')
772
                        or (_get_at(current + 1) == 'O')
773
                    )
774
                ):
775 1
                    primary, secondary = _metaph_add('J', 'H')
776 1
                elif current == last:
777 1
                    primary, secondary = _metaph_add('J', ' ')
778 1
                elif not _string_at(
779
                    (current + 1), 1, {'L', 'T', 'K', 'S', 'N', 'M', 'B', 'Z'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
780
                ) and not _string_at((current - 1), 1, {'S', 'K', 'L'}):
781 1
                    primary, secondary = _metaph_add('J')
782
783 1
                if _get_at(current + 1) == 'J':  # it could happen!
784 1
                    current += 2
785
                else:
786 1
                    current += 1
787 1
                continue
788
789 1
            elif _get_at(current) == 'K':
790 1
                if _get_at(current + 1) == 'K':
791 1
                    current += 2
792
                else:
793 1
                    current += 1
794 1
                primary, secondary = _metaph_add('K')
795 1
                continue
796
797 1
            elif _get_at(current) == 'L':
798 1
                if _get_at(current + 1) == 'L':
799
                    # Spanish e.g. 'cabrillo', 'gallegos'
800 1
                    if (
801
                        (current == (length - 3))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
802
                        and _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
803
                            (current - 1), 4, {'ILLO', 'ILLA', 'ALLE'}
804
                        )
805
                    ) or (
806
                        (
807
                            _string_at((last - 1), 2, {'AS', 'OS'})
808
                            or _string_at(last, 1, {'A', 'O'})
809
                        )
810
                        and _string_at((current - 1), 4, {'ALLE'})
811
                    ):
812 1
                        primary, secondary = _metaph_add('L', ' ')
813 1
                        current += 2
814 1
                        continue
815 1
                    current += 2
816
                else:
817 1
                    current += 1
818 1
                primary, secondary = _metaph_add('L')
819 1
                continue
820
821 1
            elif _get_at(current) == 'M':
822 1
                if (
823
                    (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
824
                        _string_at((current - 1), 3, {'UMB'})
825
                        and (
826
                            ((current + 1) == last)
827
                            or _string_at((current + 2), 2, {'ER'})
828
                        )
829
                    )
830
                    or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
831
                    # 'dumb', 'thumb'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
832
                    (_get_at(current + 1) == 'M')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
833
                ):
834 1
                    current += 2
835
                else:
836 1
                    current += 1
837 1
                primary, secondary = _metaph_add('M')
838 1
                continue
839
840 1
            elif _get_at(current) == 'N':
841 1
                if _get_at(current + 1) == 'N':
842 1
                    current += 2
843
                else:
844 1
                    current += 1
845 1
                primary, secondary = _metaph_add('N')
846 1
                continue
847
848 1
            elif _get_at(current) == 'Ñ':
849 1
                current += 1
850 1
                primary, secondary = _metaph_add('N')
851 1
                continue
852
853 1
            elif _get_at(current) == 'P':
854 1
                if _get_at(current + 1) == 'H':
855 1
                    primary, secondary = _metaph_add('F')
856 1
                    current += 2
857 1
                    continue
858
859
                # also account for "campbell", "raspberry"
860 1
                elif _string_at((current + 1), 1, {'P', 'B'}):
861 1
                    current += 2
862
                else:
863 1
                    current += 1
864 1
                primary, secondary = _metaph_add('P')
865 1
                continue
866
867 1
            elif _get_at(current) == 'Q':
868 1
                if _get_at(current + 1) == 'Q':
869 1
                    current += 2
870
                else:
871 1
                    current += 1
872 1
                primary, secondary = _metaph_add('K')
873 1
                continue
874
875 1
            elif _get_at(current) == 'R':
876
                # french e.g. 'rogier', but exclude 'hochmeier'
877 1
                if (
878
                    (current == last)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
879
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
880
                    and _string_at((current - 2), 2, {'IE'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
881
                    and not _string_at((current - 4), 2, {'ME', 'MA'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
882
                ):
883 1
                    primary, secondary = _metaph_add('', 'R')
884
                else:
885 1
                    primary, secondary = _metaph_add('R')
886
887 1
                if _get_at(current + 1) == 'R':
888 1
                    current += 2
889
                else:
890 1
                    current += 1
891 1
                continue
892
893 1
            elif _get_at(current) == 'S':
894
                # special cases 'island', 'isle', 'carlisle', 'carlysle'
895 1
                if _string_at((current - 1), 3, {'ISL', 'YSL'}):
896 1
                    current += 1
897 1
                    continue
898
899
                # special case 'sugar-'
900 1
                elif (current == 0) and _string_at(current, 5, {'SUGAR'}):
901 1
                    primary, secondary = _metaph_add('X', 'S')
902 1
                    current += 1
903 1
                    continue
904
905 1
                elif _string_at(current, 2, {'SH'}):
906
                    # Germanic
907 1
                    if _string_at(
908
                        (current + 1), 4, {'HEIM', 'HOEK', 'HOLM', 'HOLZ'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
909
                    ):
910 1
                        primary, secondary = _metaph_add('S')
911
                    else:
912 1
                        primary, secondary = _metaph_add('X')
913 1
                    current += 2
914 1
                    continue
915
916
                # Italian & Armenian
917 1
                elif _string_at(current, 3, {'SIO', 'SIA'}) or _string_at(
918
                    current, 4, {'SIAN'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
919
                ):
920 1
                    if not _slavo_germanic():
921 1
                        primary, secondary = _metaph_add('S', 'X')
922
                    else:
923 1
                        primary, secondary = _metaph_add('S')
924 1
                    current += 3
925 1
                    continue
926
927
                # German & anglicisations, e.g. 'smith' match 'schmidt',
928
                #                               'snider' match 'schneider'
929
                # also, -sz- in Slavic language although in Hungarian it is
930
                #       pronounced 's'
931 1
                elif (
932
                    (current == 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
933
                    and _string_at((current + 1), 1, {'M', 'N', 'L', 'W'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
934
                ) or _string_at((current + 1), 1, {'Z'}):
935 1
                    primary, secondary = _metaph_add('S', 'X')
936 1
                    if _string_at((current + 1), 1, {'Z'}):
937 1
                        current += 2
938
                    else:
939 1
                        current += 1
940 1
                    continue
941
942 1
                elif _string_at(current, 2, {'SC'}):
943
                    # Schlesinger's rule
944 1
                    if _get_at(current + 2) == 'H':
945
                        # dutch origin, e.g. 'school', 'schooner'
946 1
                        if _string_at(
947
                            (current + 3),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
948
                            2,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
949
                            {'OO', 'ER', 'EN', 'UY', 'ED', 'EM'},
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
950
                        ):
951
                            # 'schermerhorn', 'schenker'
952 1
                            if _string_at((current + 3), 2, {'ER', 'EN'}):
953 1
                                primary, secondary = _metaph_add('X', 'SK')
954
                            else:
955 1
                                primary, secondary = _metaph_add('SK')
956 1
                            current += 3
957 1
                            continue
958
                        else:
959 1
                            if (
960
                                (current == 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
961
                                and not _is_vowel(3)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
962
                                and (_get_at(3) != 'W')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
963
                            ):
964 1
                                primary, secondary = _metaph_add('X', 'S')
965
                            else:
966 1
                                primary, secondary = _metaph_add('X')
967 1
                            current += 3
968 1
                            continue
969
970 1
                    elif _string_at((current + 2), 1, {'I', 'E', 'Y'}):
971 1
                        primary, secondary = _metaph_add('S')
972 1
                        current += 3
973 1
                        continue
974
975
                    # else
976
                    else:
977 1
                        primary, secondary = _metaph_add('SK')
978 1
                        current += 3
979 1
                        continue
980
981
                else:
982
                    # french e.g. 'resnais', 'artois'
983 1
                    if (current == last) and _string_at(
984
                        (current - 2), 2, {'AI', 'OI'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
985
                    ):
986 1
                        primary, secondary = _metaph_add('', 'S')
987
                    else:
988 1
                        primary, secondary = _metaph_add('S')
989
990 1
                    if _string_at((current + 1), 1, {'S', 'Z'}):
991 1
                        current += 2
992
                    else:
993 1
                        current += 1
994 1
                    continue
995
996 1
            elif _get_at(current) == 'T':
997 1
                if _string_at(current, 4, {'TION'}):
998 1
                    primary, secondary = _metaph_add('X')
999 1
                    current += 3
1000 1
                    continue
1001
1002 1
                elif _string_at(current, 3, {'TIA', 'TCH'}):
1003 1
                    primary, secondary = _metaph_add('X')
1004 1
                    current += 3
1005 1
                    continue
1006
1007 1
                elif _string_at(current, 2, {'TH'}) or _string_at(
1008
                    current, 3, {'TTH'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1009
                ):
1010
                    # special case 'thomas', 'thames' or germanic
1011 1
                    if (
1012
                        _string_at((current + 2), 2, {'OM', 'AM'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1013
                        or _string_at(0, 4, {'VAN ', 'VON '})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1014
                        or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1015
                    ):
1016 1
                        primary, secondary = _metaph_add('T')
1017
                    else:
1018 1
                        primary, secondary = _metaph_add('0', 'T')
1019 1
                    current += 2
1020 1
                    continue
1021
1022 1
                elif _string_at((current + 1), 1, {'T', 'D'}):
1023 1
                    current += 2
1024
                else:
1025 1
                    current += 1
1026 1
                primary, secondary = _metaph_add('T')
1027 1
                continue
1028
1029 1
            elif _get_at(current) == 'V':
1030 1
                if _get_at(current + 1) == 'V':
1031 1
                    current += 2
1032
                else:
1033 1
                    current += 1
1034 1
                primary, secondary = _metaph_add('F')
1035 1
                continue
1036
1037 1
            elif _get_at(current) == 'W':
1038
                # can also be in middle of word
1039 1
                if _string_at(current, 2, {'WR'}):
1040 1
                    primary, secondary = _metaph_add('R')
1041 1
                    current += 2
1042 1
                    continue
1043 1
                elif (current == 0) and (
1044
                    _is_vowel(current + 1) or _string_at(current, 2, {'WH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1045
                ):
1046
                    # Wasserman should match Vasserman
1047 1
                    if _is_vowel(current + 1):
1048 1
                        primary, secondary = _metaph_add('A', 'F')
1049
                    else:
1050
                        # need Uomo to match Womo
1051 1
                        primary, secondary = _metaph_add('A')
1052
1053
                # Arnow should match Arnoff
1054 1
                if (
1055
                    ((current == last) and _is_vowel(current - 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1056
                    or _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1057
                        (current - 1), 5, {'EWSKI', 'EWSKY', 'OWSKI', 'OWSKY'}
1058
                    )
1059
                    or _string_at(0, 3, ['SCH'])
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1060
                ):
1061 1
                    primary, secondary = _metaph_add('', 'F')
1062 1
                    current += 1
1063 1
                    continue
1064
                # Polish e.g. 'filipowicz'
1065 1
                elif _string_at(current, 4, {'WICZ', 'WITZ'}):
1066 1
                    primary, secondary = _metaph_add('TS', 'FX')
1067 1
                    current += 4
1068 1
                    continue
1069
                # else skip it
1070
                else:
1071 1
                    current += 1
1072 1
                    continue
1073
1074 1
            elif _get_at(current) == 'X':
1075
                # French e.g. breaux
1076 1
                if not (
1077
                    (current == last)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1078
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1079
                        _string_at((current - 3), 3, {'IAU', 'EAU'})
1080
                        or _string_at((current - 2), 2, {'AU', 'OU'})
1081
                    )
1082
                ):
1083 1
                    primary, secondary = _metaph_add('KS')
1084
1085 1
                if _string_at((current + 1), 1, {'C', 'X'}):
1086 1
                    current += 2
1087
                else:
1088 1
                    current += 1
1089 1
                continue
1090
1091 1
            elif _get_at(current) == 'Z':
1092
                # Chinese Pinyin e.g. 'zhao'
1093 1
                if _get_at(current + 1) == 'H':
1094 1
                    primary, secondary = _metaph_add('J')
1095 1
                    current += 2
1096 1
                    continue
1097 1
                elif _string_at((current + 1), 2, {'ZO', 'ZI', 'ZA'}) or (
1098
                    _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1099
                    and ((current > 0) and _get_at(current - 1) != 'T')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1100
                ):
1101 1
                    primary, secondary = _metaph_add('S', 'TS')
1102
                else:
1103 1
                    primary, secondary = _metaph_add('S')
1104
1105 1
                if _get_at(current + 1) == 'Z':
1106 1
                    current += 2
1107
                else:
1108 1
                    current += 1
1109 1
                continue
1110
1111
            else:
1112 1
                current += 1
1113
1114 1
        if max_length > 0:
1115 1
            primary = primary[:max_length]
1116 1
            secondary = secondary[:max_length]
1117 1
        if primary == secondary:
1118 1
            secondary = ''
1119
1120 1
        return primary, secondary
1121
1122
1123 1
def double_metaphone(word, max_length=-1):
1124
    """Return the Double Metaphone code for a word.
1125
1126
    Based on Lawrence Philips' (Visual) C++ code from 1999
1127
    :cite:`Philips:2000`.
1128
1129
    :param word: the word to transform
1130
    :param max_length: the maximum length of the returned Double Metaphone
1131
        codes (defaults to 64, but in Philips' original implementation this
1132
        was 4)
1133
    :returns: the Double Metaphone value(s)
1134
    :rtype: tuple
1135
1136
    >>> double_metaphone('Christopher')
1137
    ('KRSTFR', '')
1138
    >>> double_metaphone('Niall')
1139
    ('NL', '')
1140
    >>> double_metaphone('Smith')
1141
    ('SM0', 'XMT')
1142
    >>> double_metaphone('Schmidt')
1143
    ('XMT', 'SMT')
1144
    """
1145 1
    return DoubleMetaphone().encode(word, max_length)
1146
1147
1148
if __name__ == '__main__':
1149
    import doctest
1150
1151
    doctest.testmod()
1152