Completed
Pull Request — master (#141)
by Chris
13:03
created

DoubleMetaphone.encode()   F

Complexity

Conditions 219

Size

Total Lines 892
Code Lines 549

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 407
CRAP Score 219

Importance

Changes 0
Metric Value
eloc 549
dl 0
loc 892
ccs 407
cts 407
cp 1
rs 0
c 0
b 0
f 0
cc 219
nop 3
crap 219

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._metaphone.DoubleMetaphone.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
0 ignored issues
show
coding-style introduced by
Too many lines in module (1208/1000)
Loading history...
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._metaphone.
20
21
The phonetic._metaphone module implements Metaphone and Double Metaphone.
22
"""
23
24 1
from __future__ import unicode_literals
25
26 1
from six.moves import range
27
28 1
from ._phonetic import Phonetic
29
30 1
__all__ = ['DoubleMetaphone', 'Metaphone', 'double_metaphone', 'metaphone']
31
32
33 1
class Metaphone(Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
34
    """Metaphone.
35
36
    Based on Lawrence Philips' Pick BASIC code from 1990 :cite:`Philips:1990`,
37
    as described in :cite:`Philips:1990b`.
38
    This incorporates some corrections to the above code, particularly
39
    some of those suggested by Michael Kuhn in :cite:`Kuhn:1995`.
40
    """
41
42 1
    _frontv = {'E', 'I', 'Y'}
43 1
    _varson = {'C', 'G', 'P', 'S', 'T'}
44
45 1
    def encode(self, word, max_length=-1):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
46
        """Return the Metaphone code for a word.
47
48
        Based on Lawrence Philips' Pick BASIC code from 1990
49
        :cite:`Philips:1990`, as described in :cite:`Philips:1990b`.
50
        This incorporates some corrections to the above code, particularly
51
        some of those suggested by Michael Kuhn in :cite:`Kuhn:1995`.
52
53
        Args:
54
            word (str): The word to transform
55
            max_length (int): The maximum length of the returned Metaphone
56
                code (defaults to 64, but in Philips' original implementation
57
                this was 4)
58
59
        Returns:
60
            str: The Metaphone value
61
62
        Examples:
63
            >>> pe = Metaphone()
64
            >>> pe.encode('Christopher')
65
            'KRSTFR'
66
            >>> pe.encode('Niall')
67
            'NL'
68
            >>> pe.encode('Smith')
69
            'SM0'
70
            >>> pe.encode('Schmidt')
71
            'SKMTT'
72
73
        """
74
        # Require a max_length of at least 4
75 1
        if max_length != -1:
76 1
            max_length = max(4, max_length)
77
        else:
78 1
            max_length = 64
79
80
        # As in variable sound--those modified by adding an "h"
81 1
        ename = ''.join(c for c in word.upper() if c.isalnum())
82 1
        ename = ename.replace('ß', 'SS')
83
84
        # Delete non-alphanumeric characters and make all caps
85 1
        if not ename:
86 1
            return ''
87 1
        if ename[0:2] in {'PN', 'AE', 'KN', 'GN', 'WR'}:
88 1
            ename = ename[1:]
89 1
        elif ename[0] == 'X':
90 1
            ename = 'S' + ename[1:]
91 1
        elif ename[0:2] == 'WH':
92 1
            ename = 'W' + ename[2:]
93
94
        # Convert to metaphone
95 1
        elen = len(ename) - 1
96 1
        metaph = ''
97 1
        for i in range(len(ename)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
98 1
            if len(metaph) >= max_length:
99 1
                break
100 1
            if (
101
                ename[i] not in {'G', 'T'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
102
                and i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
103
                and ename[i - 1] == ename[i]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
104
            ):
105 1
                continue
106
107 1
            if ename[i] in self._uc_v_set and i == 0:
108 1
                metaph = ename[i]
109
110 1
            elif ename[i] == 'B':
111 1
                if i != elen or ename[i - 1] != 'M':
112 1
                    metaph += ename[i]
113
114 1
            elif ename[i] == 'C':
115 1
                if not (
116
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
117
                    and ename[i - 1] == 'S'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
118
                    and ename[i + 1 : i + 2] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
119
                ):
120 1
                    if ename[i + 1 : i + 3] == 'IA':
121 1
                        metaph += 'X'
122 1
                    elif ename[i + 1 : i + 2] in self._frontv:
123 1
                        metaph += 'S'
124 1
                    elif i > 0 and ename[i - 1 : i + 2] == 'SCH':
125 1
                        metaph += 'K'
126 1
                    elif ename[i + 1 : i + 2] == 'H':
127 1
                        if (
128
                            i == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
129
                            and i + 1 < elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
130
                            and ename[i + 2 : i + 3] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
131
                        ):
132 1
                            metaph += 'K'
133
                        else:
134 1
                            metaph += 'X'
135
                    else:
136 1
                        metaph += 'K'
137
138 1
            elif ename[i] == 'D':
139 1
                if (
140
                    ename[i + 1 : i + 2] == 'G'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
141
                    and ename[i + 2 : i + 3] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
142
                ):
143 1
                    metaph += 'J'
144
                else:
145 1
                    metaph += 'T'
146
147 1
            elif ename[i] == 'G':
148 1
                if ename[i + 1 : i + 2] == 'H' and not (
149
                    i + 1 == elen or ename[i + 2 : i + 3] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
150
                ):
151 1
                    continue
152 1
                elif i > 0 and (
153
                    (i + 1 == elen and ename[i + 1] == 'N')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
154
                    or (i + 3 == elen and ename[i + 1 : i + 4] == 'NED')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
155
                ):
156 1
                    continue
157 1
                elif (
158
                    i - 1 > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
159
                    and i + 1 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
160
                    and ename[i - 1] == 'D'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
161
                    and ename[i + 1] in self._frontv
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
162
                ):
163 1
                    continue
164 1
                elif ename[i + 1 : i + 2] == 'G':
165 1
                    continue
166 1
                elif ename[i + 1 : i + 2] in self._frontv:
167 1
                    if i == 0 or ename[i - 1] != 'G':
168 1
                        metaph += 'J'
169
                    else:
170 1
                        metaph += 'K'
171
                else:
172 1
                    metaph += 'K'
173
174 1
            elif ename[i] == 'H':
175 1
                if (
176
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
177
                    and ename[i - 1] in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
178
                    and ename[i + 1 : i + 2] not in self._uc_v_set
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
179
                ):
180 1
                    continue
181 1
                elif i > 0 and ename[i - 1] in self._varson:
182 1
                    continue
183
                else:
184 1
                    metaph += 'H'
185
186 1
            elif ename[i] in {'F', 'J', 'L', 'M', 'N', 'R'}:
187 1
                metaph += ename[i]
188
189 1
            elif ename[i] == 'K':
190 1
                if i > 0 and ename[i - 1] == 'C':
191 1
                    continue
192
                else:
193 1
                    metaph += 'K'
194
195 1
            elif ename[i] == 'P':
196 1
                if ename[i + 1 : i + 2] == 'H':
197 1
                    metaph += 'F'
198
                else:
199 1
                    metaph += 'P'
200
201 1
            elif ename[i] == 'Q':
202 1
                metaph += 'K'
203
204 1
            elif ename[i] == 'S':
205 1
                if (
206
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
207
                    and i + 2 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
208
                    and ename[i + 1] == 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
209
                    and ename[i + 2] in 'OA'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
210
                ):
211 1
                    metaph += 'X'
212 1
                elif ename[i + 1 : i + 2] == 'H':
213 1
                    metaph += 'X'
214
                else:
215 1
                    metaph += 'S'
216
217 1
            elif ename[i] == 'T':
218 1
                if (
219
                    i > 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
220
                    and i + 2 <= elen
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
221
                    and ename[i + 1] == 'I'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
222
                    and ename[i + 2] in {'A', 'O'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
223
                ):
224 1
                    metaph += 'X'
225 1
                elif ename[i + 1 : i + 2] == 'H':
226 1
                    metaph += '0'
227 1
                elif ename[i + 1 : i + 3] != 'CH':
228 1
                    if ename[i - 1 : i] != 'T':
229 1
                        metaph += 'T'
230
231 1
            elif ename[i] == 'V':
232 1
                metaph += 'F'
233
234 1
            elif ename[i] in 'WY':
235 1
                if ename[i + 1 : i + 2] in self._uc_v_set:
236 1
                    metaph += ename[i]
237
238 1
            elif ename[i] == 'X':
239 1
                metaph += 'KS'
240
241 1
            elif ename[i] == 'Z':
242 1
                metaph += 'S'
243
244 1
        return metaph
245
246
247 1
def metaphone(word, max_length=-1):
248
    """Return the Metaphone code for a word.
249
250
    This is a wrapper for :py:meth:`Metaphone.encode`.
251
252
    Args:
253
        word (str): The word to transform
254
        max_length (int): The maximum length of the returned Metaphone
255
            code (defaults to 64, but in Philips' original implementation
256
            this was 4)
257
258
    Returns:
259
        str: The Metaphone value
260
261
    Examples:
262
        >>> metaphone('Christopher')
263
        'KRSTFR'
264
        >>> metaphone('Niall')
265
        'NL'
266
        >>> metaphone('Smith')
267
        'SM0'
268
        >>> metaphone('Schmidt')
269
        'SKMTT'
270
271
    """
272 1
    return Metaphone().encode(word, max_length)
273
274
275 1
class DoubleMetaphone(Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
276
    """Double Metaphone.
277
278
    Based on Lawrence Philips' (Visual) C++ code from 1999
279
    :cite:`Philips:2000`.
280
    """
281
282 1
    def encode(self, word, max_length=-1):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
283
        """Return the Double Metaphone code for a word.
284
285
        Args:
286
            word (str): The word to transform
287
            max_length (int): The maximum length of the returned Double
288
                Metaphone codes (defaults to 64, but in Philips' original
289
                implementation this was 4)
290
291
        Returns:
292
            tuple: The Double Metaphone value(s)
293
294
        Examples:
295
            >>> pe = DoubleMetaphone()
296
            >>> pe.encode('Christopher')
297
            ('KRSTFR', '')
298
            >>> pe.encode('Niall')
299
            ('NL', '')
300
            >>> pe.encode('Smith')
301
            ('SM0', 'XMT')
302
            >>> pe.encode('Schmidt')
303
            ('XMT', 'SMT')
304
305
        """
306
        # Require a max_length of at least 4
307 1
        if max_length != -1:
308 1
            max_length = max(4, max_length)
309
        else:
310 1
            max_length = 64
311
312 1
        primary = ''
313 1
        secondary = ''
314
315 1
        def _slavo_germanic():
316
            """Return True if the word appears to be Slavic or Germanic.
317
318
            Returns:
319
                bool: True if the word appears to be Slavic or Germanic
320
321
            """
322 1
            if 'W' in word or 'K' in word or 'CZ' in word:
323 1
                return True
324 1
            return False
325
326 1
        def _metaph_add(pri, sec=''):
327
            """Return a new metaphone tuple with the supplied elements.
328
329
            Args:
330
                pri (str): The primary element
331
                sec (str): The secondary element
332
333
            Returns:
334
                tuple: A new metaphone tuple with the supplied elements
335
336
            """
337 1
            newpri = primary
338 1
            newsec = secondary
339 1
            if pri:
340 1
                newpri += pri
341 1
            if sec:
342 1
                if sec != ' ':
343 1
                    newsec += sec
344
            else:
345 1
                newsec += pri
346 1
            return newpri, newsec
347
348 1
        def _is_vowel(pos):
349
            """Return True if the character at word[pos] is a vowel.
350
351
            Args:
352
                pos (int): Position in the word
353
354
            Returns:
355
                bool: True if the character is a vowel
356
357
            """
358 1
            if pos >= 0 and word[pos] in {'A', 'E', 'I', 'O', 'U', 'Y'}:
359 1
                return True
360 1
            return False
361
362 1
        def _get_at(pos):
363
            """Return the character at word[pos].
364
365
            Args:
366
                pos (int): Position in the word
367
368
            Returns:
369
                str: Character at word[pos]
370
371
            """
372 1
            return word[pos]
373
374 1
        def _string_at(pos, slen, substrings):
375
            """Return True if word[pos:pos+slen] is in substrings.
376
377
            Args:
378
                pos (int): Position in the word
379
                slen (int): Substring length
380
                substrings (set): Substrings to search
381
382
            Returns:
383
                bool: True if word[pos:pos+slen] is in substrings
384
385
            """
386 1
            if pos < 0:
387 1
                return False
388 1
            return word[pos : pos + slen] in substrings
389
390 1
        current = 0
391 1
        length = len(word)
392 1
        if length < 1:
393 1
            return '', ''
394 1
        last = length - 1
395
396 1
        word = word.upper()
397 1
        word = word.replace('ß', 'SS')
398
399
        # Pad the original string so that we can index beyond the edge of the
400
        # world
401 1
        word += '     '
402
403
        # Skip these when at start of word
404 1
        if word[0:2] in {'GN', 'KN', 'PN', 'WR', 'PS'}:
405 1
            current += 1
406
407
        # Initial 'X' is pronounced 'Z' e.g. 'Xavier'
408 1
        if _get_at(0) == 'X':
409 1
            primary, secondary = _metaph_add('S')  # 'Z' maps to 'S'
410 1
            current += 1
411
412
        # Main loop
413 1
        while True:
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
414 1
            if current >= length:
415 1
                break
416
417 1
            if _get_at(current) in {'A', 'E', 'I', 'O', 'U', 'Y'}:
418 1
                if current == 0:
419
                    # All init vowels now map to 'A'
420 1
                    primary, secondary = _metaph_add('A')
421 1
                current += 1
422 1
                continue
423
424 1
            elif _get_at(current) == 'B':
425
                # "-mb", e.g", "dumb", already skipped over...
426 1
                primary, secondary = _metaph_add('P')
427 1
                if _get_at(current + 1) == 'B':
428 1
                    current += 2
429
                else:
430 1
                    current += 1
431 1
                continue
432
433 1
            elif _get_at(current) == 'Ç':
434 1
                primary, secondary = _metaph_add('S')
435 1
                current += 1
436 1
                continue
437
438 1
            elif _get_at(current) == 'C':
439
                # Various Germanic
440 1
                if (
441
                    current > 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
442
                    and not _is_vowel(current - 2)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
443
                    and _string_at((current - 1), 3, {'ACH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
444
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
445
                        (_get_at(current + 2) != 'I')
446
                        and (
447
                            (_get_at(current + 2) != 'E')
448
                            or _string_at(
449
                                (current - 2), 6, {'BACHER', 'MACHER'}
450
                            )
451
                        )
452
                    )
453
                ):
454 1
                    primary, secondary = _metaph_add('K')
455 1
                    current += 2
456 1
                    continue
457
458
                # Special case 'caesar'
459 1
                elif current == 0 and _string_at(current, 6, {'CAESAR'}):
460 1
                    primary, secondary = _metaph_add('S')
461 1
                    current += 2
462 1
                    continue
463
464
                # Italian 'chianti'
465 1
                elif _string_at(current, 4, {'CHIA'}):
466 1
                    primary, secondary = _metaph_add('K')
467 1
                    current += 2
468 1
                    continue
469
470 1
                elif _string_at(current, 2, {'CH'}):
471
                    # Find 'Michael'
472 1
                    if current > 0 and _string_at(current, 4, {'CHAE'}):
473 1
                        primary, secondary = _metaph_add('K', 'X')
474 1
                        current += 2
475 1
                        continue
476
477
                    # Greek roots e.g. 'chemistry', 'chorus'
478 1
                    elif (
479
                        current == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
480
                        and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
481
                            _string_at((current + 1), 5, {'HARAC', 'HARIS'})
482
                            or _string_at(
483
                                (current + 1), 3, {'HOR', 'HYM', 'HIA', 'HEM'}
484
                            )
485
                        )
486
                        and not _string_at(0, 5, {'CHORE'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
487
                    ):
488 1
                        primary, secondary = _metaph_add('K')
489 1
                        current += 2
490 1
                        continue
491
492
                    # Germanic, Greek, or otherwise 'ch' for 'kh' sound
493 1
                    elif (
494
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
495
                            _string_at(0, 4, {'VAN ', 'VON '})
496
                            or _string_at(0, 3, {'SCH'})
497
                        )
498
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
499
                        # 'architect but not 'arch', 'orchestra', 'orchid'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
500
                        _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
501
                            (current - 2), 6, {'ORCHES', 'ARCHIT', 'ORCHID'}
502
                        )
503
                        or _string_at((current + 2), 1, {'T', 'S'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
504
                        or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
505
                            (
506
                                _string_at(
507
                                    (current - 1), 1, {'A', 'O', 'U', 'E'}
508
                                )
509
                                or (current == 0)
510
                            )
511
                            and
512
                            # e.g., 'wachtler', 'wechsler', but not 'tichner'
513
                            _string_at(
514
                                (current + 2),
515
                                1,
516
                                {
517
                                    'L',
518
                                    'R',
519
                                    'N',
520
                                    'M',
521
                                    'B',
522
                                    'H',
523
                                    'F',
524
                                    'V',
525
                                    'W',
526
                                    ' ',
527
                                },
528
                            )
529
                        )
530
                    ):
531 1
                        primary, secondary = _metaph_add('K')
532
533
                    else:
534 1
                        if current > 0:
535 1
                            if _string_at(0, 2, {'MC'}):
536
                                # e.g., "McHugh"
537 1
                                primary, secondary = _metaph_add('K')
538
                            else:
539 1
                                primary, secondary = _metaph_add('X', 'K')
540
                        else:
541 1
                            primary, secondary = _metaph_add('X')
542
543 1
                    current += 2
544 1
                    continue
545
546
                # e.g, 'czerny'
547 1
                elif _string_at(current, 2, {'CZ'}) and not _string_at(
548
                    (current - 2), 4, {'WICZ'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
549
                ):
550 1
                    primary, secondary = _metaph_add('S', 'X')
551 1
                    current += 2
552 1
                    continue
553
554
                # e.g., 'focaccia'
555 1
                elif _string_at((current + 1), 3, {'CIA'}):
556 1
                    primary, secondary = _metaph_add('X')
557 1
                    current += 3
558
559
                # double 'C', but not if e.g. 'McClellan'
560 1
                elif _string_at(current, 2, {'CC'}) and not (
561
                    (current == 1) and (_get_at(0) == 'M')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
562
                ):
563
                    # 'bellocchio' but not 'bacchus'
564 1
                    if _string_at(
565
                        (current + 2), 1, {'I', 'E', 'H'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
566
                    ) and not _string_at((current + 2), 2, ['HU']):
567
                        # 'accident', 'accede' 'succeed'
568 1
                        if (
569
                            (current == 1) and _get_at(current - 1) == 'A'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
570
                        ) or _string_at((current - 1), 5, {'UCCEE', 'UCCES'}):
571 1
                            primary, secondary = _metaph_add('KS')
572
                        # 'bacci', 'bertucci', other italian
573
                        else:
574 1
                            primary, secondary = _metaph_add('X')
575 1
                        current += 3
576 1
                        continue
577
                    else:  # Pierce's rule
578 1
                        primary, secondary = _metaph_add('K')
579 1
                        current += 2
580 1
                        continue
581
582 1
                elif _string_at(current, 2, {'CK', 'CG', 'CQ'}):
583 1
                    primary, secondary = _metaph_add('K')
584 1
                    current += 2
585 1
                    continue
586
587 1
                elif _string_at(current, 2, {'CI', 'CE', 'CY'}):
588
                    # Italian vs. English
589 1
                    if _string_at(current, 3, {'CIO', 'CIE', 'CIA'}):
590 1
                        primary, secondary = _metaph_add('S', 'X')
591
                    else:
592 1
                        primary, secondary = _metaph_add('S')
593 1
                    current += 2
594 1
                    continue
595
596
                # else
597
                else:
598 1
                    primary, secondary = _metaph_add('K')
599
600
                    # name sent in 'mac caffrey', 'mac gregor
601 1
                    if _string_at((current + 1), 2, {' C', ' Q', ' G'}):
602 1
                        current += 3
603 1
                    elif _string_at(
604
                        (current + 1), 1, {'C', 'K', 'Q'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
605
                    ) and not _string_at((current + 1), 2, {'CE', 'CI'}):
606 1
                        current += 2
607
                    else:
608 1
                        current += 1
609 1
                    continue
610
611 1
            elif _get_at(current) == 'D':
612 1
                if _string_at(current, 2, {'DG'}):
613 1
                    if _string_at((current + 2), 1, {'I', 'E', 'Y'}):
614
                        # e.g. 'edge'
615 1
                        primary, secondary = _metaph_add('J')
616 1
                        current += 3
617 1
                        continue
618
                    else:
619
                        # e.g. 'edgar'
620 1
                        primary, secondary = _metaph_add('TK')
621 1
                        current += 2
622 1
                        continue
623
624 1
                elif _string_at(current, 2, {'DT', 'DD'}):
625 1
                    primary, secondary = _metaph_add('T')
626 1
                    current += 2
627 1
                    continue
628
629
                # else
630
                else:
631 1
                    primary, secondary = _metaph_add('T')
632 1
                    current += 1
633 1
                    continue
634
635 1
            elif _get_at(current) == 'F':
636 1
                if _get_at(current + 1) == 'F':
637 1
                    current += 2
638
                else:
639 1
                    current += 1
640 1
                primary, secondary = _metaph_add('F')
641 1
                continue
642
643 1
            elif _get_at(current) == 'G':
644 1
                if _get_at(current + 1) == 'H':
645 1
                    if (current > 0) and not _is_vowel(current - 1):
646 1
                        primary, secondary = _metaph_add('K')
647 1
                        current += 2
648 1
                        continue
649
650
                    # 'ghislane', ghiradelli
651 1
                    elif current == 0:
652 1
                        if _get_at(current + 2) == 'I':
653 1
                            primary, secondary = _metaph_add('J')
654
                        else:
655 1
                            primary, secondary = _metaph_add('K')
656 1
                        current += 2
657 1
                        continue
658
659
                    # Parker's rule (with some further refinements) -
660
                    # e.g., 'hugh'
661 1
                    elif (
662
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
663
                            (current > 1)
664
                            and _string_at((current - 2), 1, {'B', 'H', 'D'})
665
                        )
666
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
667
                        # e.g., 'bough'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
668
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
669
                            (current > 2)
670
                            and _string_at((current - 3), 1, {'B', 'H', 'D'})
671
                        )
672
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
673
                        # e.g., 'broughton'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
674
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
675
                            (current > 3)
676
                            and _string_at((current - 4), 1, {'B', 'H'})
677
                        )
678
                    ):
679 1
                        current += 2
680 1
                        continue
681
                    else:
682
                        # e.g. 'laugh', 'McLaughlin', 'cough',
683
                        #      'gough', 'rough', 'tough'
684 1
                        if (
685
                            (current > 2)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
686
                            and (_get_at(current - 1) == 'U')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
687
                            and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
688
                                _string_at(
689
                                    (current - 3), 1, {'C', 'G', 'L', 'R', 'T'}
690
                                )
691
                            )
692
                        ):
693 1
                            primary, secondary = _metaph_add('F')
694 1
                        elif (current > 0) and _get_at(current - 1) != 'I':
695 1
                            primary, secondary = _metaph_add('K')
696 1
                        current += 2
697 1
                        continue
698
699 1
                elif _get_at(current + 1) == 'N':
700 1
                    if (
701
                        (current == 1)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
702
                        and _is_vowel(0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
703
                        and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
704
                    ):
705 1
                        primary, secondary = _metaph_add('KN', 'N')
706
                    # not e.g. 'cagney'
707 1
                    elif (
708
                        not _string_at((current + 2), 2, {'EY'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
709
                        and (_get_at(current + 1) != 'Y')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
710
                        and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
711
                    ):
712 1
                        primary, secondary = _metaph_add('N', 'KN')
713
                    else:
714 1
                        primary, secondary = _metaph_add('KN')
715 1
                    current += 2
716 1
                    continue
717
718
                # 'tagliaro'
719 1
                elif (
720
                    _string_at((current + 1), 2, {'LI'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
721
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
722
                ):
723 1
                    primary, secondary = _metaph_add('KL', 'L')
724 1
                    current += 2
725 1
                    continue
726
727
                # -ges-, -gep-, -gel-, -gie- at beginning
728 1
                elif (current == 0) and (
729
                    (_get_at(current + 1) == 'Y')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
730
                    or _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
731
                        (current + 1),
732
                        2,
733
                        {
734
                            'ES',
735
                            'EP',
736
                            'EB',
737
                            'EL',
738
                            'EY',
739
                            'IB',
740
                            'IL',
741
                            'IN',
742
                            'IE',
743
                            'EI',
744
                            'ER',
745
                        },
746
                    )
747
                ):
748 1
                    primary, secondary = _metaph_add('K', 'J')
749 1
                    current += 2
750 1
                    continue
751
752
                #  -ger-,  -gy-
753 1
                elif (
754
                    (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
755
                        _string_at((current + 1), 2, {'ER'})
756
                        or (_get_at(current + 1) == 'Y')
757
                    )
758
                    and not _string_at(0, 6, {'DANGER', 'RANGER', 'MANGER'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
759
                    and not _string_at((current - 1), 1, {'E', 'I'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
760
                    and not _string_at((current - 1), 3, {'RGY', 'OGY'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
761
                ):
762 1
                    primary, secondary = _metaph_add('K', 'J')
763 1
                    current += 2
764 1
                    continue
765
766
                #  italian e.g, 'biaggi'
767 1
                elif _string_at(
768
                    (current + 1), 1, {'E', 'I', 'Y'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
769
                ) or _string_at((current - 1), 4, {'AGGI', 'OGGI'}):
770
                    # obvious germanic
771 1
                    if (
772
                        _string_at(0, 4, {'VAN ', 'VON '})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
773
                        or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
774
                    ) or _string_at((current + 1), 2, {'ET'}):
775 1
                        primary, secondary = _metaph_add('K')
776 1
                    elif _string_at((current + 1), 4, {'IER '}):
777 1
                        primary, secondary = _metaph_add('J')
778
                    else:
779 1
                        primary, secondary = _metaph_add('J', 'K')
780 1
                    current += 2
781 1
                    continue
782
783
                else:
784 1
                    if _get_at(current + 1) == 'G':
785 1
                        current += 2
786
                    else:
787 1
                        current += 1
788 1
                    primary, secondary = _metaph_add('K')
789 1
                    continue
790
791 1
            elif _get_at(current) == 'H':
792
                # only keep if first & before vowel or btw. 2 vowels
793 1
                if ((current == 0) or _is_vowel(current - 1)) and _is_vowel(
794
                    current + 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
795
                ):
796 1
                    primary, secondary = _metaph_add('H')
797 1
                    current += 2
798
                else:  # also takes care of 'HH'
799 1
                    current += 1
800 1
                continue
801
802 1
            elif _get_at(current) == 'J':
803
                # obvious spanish, 'jose', 'san jacinto'
804 1
                if _string_at(current, 4, ['JOSE']) or _string_at(
805
                    0, 4, {'SAN '}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
806
                ):
807 1
                    if (
808
                        (current == 0) and (_get_at(current + 4) == ' ')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
809
                    ) or _string_at(0, 4, ['SAN ']):
810 1
                        primary, secondary = _metaph_add('H')
811
                    else:
812 1
                        primary, secondary = _metaph_add('J', 'H')
813 1
                    current += 1
814 1
                    continue
815
816 1
                elif (current == 0) and not _string_at(current, 4, {'JOSE'}):
817
                    # Yankelovich/Jankelowicz
818 1
                    primary, secondary = _metaph_add('J', 'A')
819
                # Spanish pron. of e.g. 'bajador'
820 1
                elif (
821
                    _is_vowel(current - 1)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
822
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
823
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
824
                        (_get_at(current + 1) == 'A')
825
                        or (_get_at(current + 1) == 'O')
826
                    )
827
                ):
828 1
                    primary, secondary = _metaph_add('J', 'H')
829 1
                elif current == last:
830 1
                    primary, secondary = _metaph_add('J', ' ')
831 1
                elif not _string_at(
832
                    (current + 1), 1, {'L', 'T', 'K', 'S', 'N', 'M', 'B', 'Z'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
833
                ) and not _string_at((current - 1), 1, {'S', 'K', 'L'}):
834 1
                    primary, secondary = _metaph_add('J')
835
836 1
                if _get_at(current + 1) == 'J':  # it could happen!
837 1
                    current += 2
838
                else:
839 1
                    current += 1
840 1
                continue
841
842 1
            elif _get_at(current) == 'K':
843 1
                if _get_at(current + 1) == 'K':
844 1
                    current += 2
845
                else:
846 1
                    current += 1
847 1
                primary, secondary = _metaph_add('K')
848 1
                continue
849
850 1
            elif _get_at(current) == 'L':
851 1
                if _get_at(current + 1) == 'L':
852
                    # Spanish e.g. 'cabrillo', 'gallegos'
853 1
                    if (
854
                        (current == (length - 3))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
855
                        and _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
856
                            (current - 1), 4, {'ILLO', 'ILLA', 'ALLE'}
857
                        )
858
                    ) or (
859
                        (
860
                            _string_at((last - 1), 2, {'AS', 'OS'})
861
                            or _string_at(last, 1, {'A', 'O'})
862
                        )
863
                        and _string_at((current - 1), 4, {'ALLE'})
864
                    ):
865 1
                        primary, secondary = _metaph_add('L', ' ')
866 1
                        current += 2
867 1
                        continue
868 1
                    current += 2
869
                else:
870 1
                    current += 1
871 1
                primary, secondary = _metaph_add('L')
872 1
                continue
873
874 1
            elif _get_at(current) == 'M':
875 1
                if (
876
                    (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
877
                        _string_at((current - 1), 3, {'UMB'})
878
                        and (
879
                            ((current + 1) == last)
880
                            or _string_at((current + 2), 2, {'ER'})
881
                        )
882
                    )
883
                    or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
884
                    # 'dumb', 'thumb'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
885
                    (_get_at(current + 1) == 'M')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
886
                ):
887 1
                    current += 2
888
                else:
889 1
                    current += 1
890 1
                primary, secondary = _metaph_add('M')
891 1
                continue
892
893 1
            elif _get_at(current) == 'N':
894 1
                if _get_at(current + 1) == 'N':
895 1
                    current += 2
896
                else:
897 1
                    current += 1
898 1
                primary, secondary = _metaph_add('N')
899 1
                continue
900
901 1
            elif _get_at(current) == 'Ñ':
902 1
                current += 1
903 1
                primary, secondary = _metaph_add('N')
904 1
                continue
905
906 1
            elif _get_at(current) == 'P':
907 1
                if _get_at(current + 1) == 'H':
908 1
                    primary, secondary = _metaph_add('F')
909 1
                    current += 2
910 1
                    continue
911
912
                # also account for "campbell", "raspberry"
913 1
                elif _string_at((current + 1), 1, {'P', 'B'}):
914 1
                    current += 2
915
                else:
916 1
                    current += 1
917 1
                primary, secondary = _metaph_add('P')
918 1
                continue
919
920 1
            elif _get_at(current) == 'Q':
921 1
                if _get_at(current + 1) == 'Q':
922 1
                    current += 2
923
                else:
924 1
                    current += 1
925 1
                primary, secondary = _metaph_add('K')
926 1
                continue
927
928 1
            elif _get_at(current) == 'R':
929
                # french e.g. 'rogier', but exclude 'hochmeier'
930 1
                if (
931
                    (current == last)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
932
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
933
                    and _string_at((current - 2), 2, {'IE'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
934
                    and not _string_at((current - 4), 2, {'ME', 'MA'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
935
                ):
936 1
                    primary, secondary = _metaph_add('', 'R')
937
                else:
938 1
                    primary, secondary = _metaph_add('R')
939
940 1
                if _get_at(current + 1) == 'R':
941 1
                    current += 2
942
                else:
943 1
                    current += 1
944 1
                continue
945
946 1
            elif _get_at(current) == 'S':
947
                # special cases 'island', 'isle', 'carlisle', 'carlysle'
948 1
                if _string_at((current - 1), 3, {'ISL', 'YSL'}):
949 1
                    current += 1
950 1
                    continue
951
952
                # special case 'sugar-'
953 1
                elif (current == 0) and _string_at(current, 5, {'SUGAR'}):
954 1
                    primary, secondary = _metaph_add('X', 'S')
955 1
                    current += 1
956 1
                    continue
957
958 1
                elif _string_at(current, 2, {'SH'}):
959
                    # Germanic
960 1
                    if _string_at(
961
                        (current + 1), 4, {'HEIM', 'HOEK', 'HOLM', 'HOLZ'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
962
                    ):
963 1
                        primary, secondary = _metaph_add('S')
964
                    else:
965 1
                        primary, secondary = _metaph_add('X')
966 1
                    current += 2
967 1
                    continue
968
969
                # Italian & Armenian
970 1
                elif _string_at(current, 3, {'SIO', 'SIA'}) or _string_at(
971
                    current, 4, {'SIAN'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
972
                ):
973 1
                    if not _slavo_germanic():
974 1
                        primary, secondary = _metaph_add('S', 'X')
975
                    else:
976 1
                        primary, secondary = _metaph_add('S')
977 1
                    current += 3
978 1
                    continue
979
980
                # German & anglicisations, e.g. 'smith' match 'schmidt',
981
                #                               'snider' match 'schneider'
982
                # also, -sz- in Slavic language although in Hungarian it is
983
                #       pronounced 's'
984 1
                elif (
985
                    (current == 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
986
                    and _string_at((current + 1), 1, {'M', 'N', 'L', 'W'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
987
                ) or _string_at((current + 1), 1, {'Z'}):
988 1
                    primary, secondary = _metaph_add('S', 'X')
989 1
                    if _string_at((current + 1), 1, {'Z'}):
990 1
                        current += 2
991
                    else:
992 1
                        current += 1
993 1
                    continue
994
995 1
                elif _string_at(current, 2, {'SC'}):
996
                    # Schlesinger's rule
997 1
                    if _get_at(current + 2) == 'H':
998
                        # dutch origin, e.g. 'school', 'schooner'
999 1
                        if _string_at(
1000
                            (current + 3),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1001
                            2,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1002
                            {'OO', 'ER', 'EN', 'UY', 'ED', 'EM'},
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1003
                        ):
1004
                            # 'schermerhorn', 'schenker'
1005 1
                            if _string_at((current + 3), 2, {'ER', 'EN'}):
1006 1
                                primary, secondary = _metaph_add('X', 'SK')
1007
                            else:
1008 1
                                primary, secondary = _metaph_add('SK')
1009 1
                            current += 3
1010 1
                            continue
1011
                        else:
1012 1
                            if (
1013
                                (current == 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1014
                                and not _is_vowel(3)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1015
                                and (_get_at(3) != 'W')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1016
                            ):
1017 1
                                primary, secondary = _metaph_add('X', 'S')
1018
                            else:
1019 1
                                primary, secondary = _metaph_add('X')
1020 1
                            current += 3
1021 1
                            continue
1022
1023 1
                    elif _string_at((current + 2), 1, {'I', 'E', 'Y'}):
1024 1
                        primary, secondary = _metaph_add('S')
1025 1
                        current += 3
1026 1
                        continue
1027
1028
                    # else
1029
                    else:
1030 1
                        primary, secondary = _metaph_add('SK')
1031 1
                        current += 3
1032 1
                        continue
1033
1034
                else:
1035
                    # french e.g. 'resnais', 'artois'
1036 1
                    if (current == last) and _string_at(
1037
                        (current - 2), 2, {'AI', 'OI'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1038
                    ):
1039 1
                        primary, secondary = _metaph_add('', 'S')
1040
                    else:
1041 1
                        primary, secondary = _metaph_add('S')
1042
1043 1
                    if _string_at((current + 1), 1, {'S', 'Z'}):
1044 1
                        current += 2
1045
                    else:
1046 1
                        current += 1
1047 1
                    continue
1048
1049 1
            elif _get_at(current) == 'T':
1050 1
                if _string_at(current, 4, {'TION'}):
1051 1
                    primary, secondary = _metaph_add('X')
1052 1
                    current += 3
1053 1
                    continue
1054
1055 1
                elif _string_at(current, 3, {'TIA', 'TCH'}):
1056 1
                    primary, secondary = _metaph_add('X')
1057 1
                    current += 3
1058 1
                    continue
1059
1060 1
                elif _string_at(current, 2, {'TH'}) or _string_at(
1061
                    current, 3, {'TTH'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1062
                ):
1063
                    # special case 'thomas', 'thames' or germanic
1064 1
                    if (
1065
                        _string_at((current + 2), 2, {'OM', 'AM'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1066
                        or _string_at(0, 4, {'VAN ', 'VON '})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1067
                        or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1068
                    ):
1069 1
                        primary, secondary = _metaph_add('T')
1070
                    else:
1071 1
                        primary, secondary = _metaph_add('0', 'T')
1072 1
                    current += 2
1073 1
                    continue
1074
1075 1
                elif _string_at((current + 1), 1, {'T', 'D'}):
1076 1
                    current += 2
1077
                else:
1078 1
                    current += 1
1079 1
                primary, secondary = _metaph_add('T')
1080 1
                continue
1081
1082 1
            elif _get_at(current) == 'V':
1083 1
                if _get_at(current + 1) == 'V':
1084 1
                    current += 2
1085
                else:
1086 1
                    current += 1
1087 1
                primary, secondary = _metaph_add('F')
1088 1
                continue
1089
1090 1
            elif _get_at(current) == 'W':
1091
                # can also be in middle of word
1092 1
                if _string_at(current, 2, {'WR'}):
1093 1
                    primary, secondary = _metaph_add('R')
1094 1
                    current += 2
1095 1
                    continue
1096 1
                elif (current == 0) and (
1097
                    _is_vowel(current + 1) or _string_at(current, 2, {'WH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1098
                ):
1099
                    # Wasserman should match Vasserman
1100 1
                    if _is_vowel(current + 1):
1101 1
                        primary, secondary = _metaph_add('A', 'F')
1102
                    else:
1103
                        # need Uomo to match Womo
1104 1
                        primary, secondary = _metaph_add('A')
1105
1106
                # Arnow should match Arnoff
1107 1
                if (
1108
                    ((current == last) and _is_vowel(current - 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1109
                    or _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1110
                        (current - 1), 5, {'EWSKI', 'EWSKY', 'OWSKI', 'OWSKY'}
1111
                    )
1112
                    or _string_at(0, 3, ['SCH'])
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1113
                ):
1114 1
                    primary, secondary = _metaph_add('', 'F')
1115 1
                    current += 1
1116 1
                    continue
1117
                # Polish e.g. 'filipowicz'
1118 1
                elif _string_at(current, 4, {'WICZ', 'WITZ'}):
1119 1
                    primary, secondary = _metaph_add('TS', 'FX')
1120 1
                    current += 4
1121 1
                    continue
1122
                # else skip it
1123
                else:
1124 1
                    current += 1
1125 1
                    continue
1126
1127 1
            elif _get_at(current) == 'X':
1128
                # French e.g. breaux
1129 1
                if not (
1130
                    (current == last)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1131
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1132
                        _string_at((current - 3), 3, {'IAU', 'EAU'})
1133
                        or _string_at((current - 2), 2, {'AU', 'OU'})
1134
                    )
1135
                ):
1136 1
                    primary, secondary = _metaph_add('KS')
1137
1138 1
                if _string_at((current + 1), 1, {'C', 'X'}):
1139 1
                    current += 2
1140
                else:
1141 1
                    current += 1
1142 1
                continue
1143
1144 1
            elif _get_at(current) == 'Z':
1145
                # Chinese Pinyin e.g. 'zhao'
1146 1
                if _get_at(current + 1) == 'H':
1147 1
                    primary, secondary = _metaph_add('J')
1148 1
                    current += 2
1149 1
                    continue
1150 1
                elif _string_at((current + 1), 2, {'ZO', 'ZI', 'ZA'}) or (
1151
                    _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1152
                    and ((current > 0) and _get_at(current - 1) != 'T')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
1153
                ):
1154 1
                    primary, secondary = _metaph_add('S', 'TS')
1155
                else:
1156 1
                    primary, secondary = _metaph_add('S')
1157
1158 1
                if _get_at(current + 1) == 'Z':
1159 1
                    current += 2
1160
                else:
1161 1
                    current += 1
1162 1
                continue
1163
1164
            else:
1165 1
                current += 1
1166
1167 1
        if max_length > 0:
1168 1
            primary = primary[:max_length]
1169 1
            secondary = secondary[:max_length]
1170 1
        if primary == secondary:
1171 1
            secondary = ''
1172
1173 1
        return primary, secondary
1174
1175
1176 1
def double_metaphone(word, max_length=-1):
1177
    """Return the Double Metaphone code for a word.
1178
1179
    Based on Lawrence Philips' (Visual) C++ code from 1999
1180
    :cite:`Philips:2000`.
1181
1182
    Args:
1183
        word (str): The word to transform
1184
        max_length (int): The maximum length of the returned Double
1185
            Metaphone codes (defaults to 64, but in Philips' original
1186
            implementation this was 4)
1187
1188
    Returns:
1189
        tuple: The Double Metaphone value(s)
1190
1191
    Examples:
1192
        >>> double_metaphone('Christopher')
1193
        ('KRSTFR', '')
1194
        >>> double_metaphone('Niall')
1195
        ('NL', '')
1196
        >>> double_metaphone('Smith')
1197
        ('SM0', 'XMT')
1198
        >>> double_metaphone('Schmidt')
1199
        ('XMT', 'SMT')
1200
1201
    """
1202 1
    return DoubleMetaphone().encode(word, max_length)
1203
1204
1205
if __name__ == '__main__':
1206
    import doctest
1207
1208
    doctest.testmod()
1209