Completed
Pull Request — master (#141)
by Chris
13:24
created

DoubleMetaphone.encode()   F

Complexity

Conditions 219

Size

Total Lines 918
Code Lines 549

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 407
CRAP Score 219

Importance

Changes 0
Metric Value
cc 219
eloc 549
nop 3
dl 0
loc 918
ccs 407
cts 407
cp 1
crap 219
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._double_metaphone.DoubleMetaphone.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._double_metaphone.
20
21
Double Metaphone
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from ._phonetic import Phonetic
32
33 1
__all__ = ['DoubleMetaphone', 'double_metaphone']
34
35
36 1
class DoubleMetaphone(Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
37
    """Double Metaphone.
38
39
    Based on Lawrence Philips' (Visual) C++ code from 1999
40
    :cite:`Philips:2000`.
41
    """
42
43 1
    def encode(self, word, max_length=-1):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
44
        """Return the Double Metaphone code for a word.
45
46
        Parameters
47
        ----------
48
        word : str
49
            The word to transform
50
        max_length : int
51
            The maximum length of the returned Double Metaphone codes (defaults
52
            to 64, but in Philips' original implementation this was 4)
53
54
        Returns
55
        -------
56
        tuple
57
            The Double Metaphone value(s)
58
59
        Examples
60
        --------
61
        >>> pe = DoubleMetaphone()
62
        >>> pe.encode('Christopher')
63
        ('KRSTFR', '')
64
        >>> pe.encode('Niall')
65
        ('NL', '')
66
        >>> pe.encode('Smith')
67
        ('SM0', 'XMT')
68
        >>> pe.encode('Schmidt')
69
        ('XMT', 'SMT')
70
71
        """
72
        # Require a max_length of at least 4
73 1
        if max_length != -1:
74 1
            max_length = max(4, max_length)
75
        else:
76 1
            max_length = 64
77
78 1
        primary = ''
79 1
        secondary = ''
80
81 1
        def _slavo_germanic():
82
            """Return True if the word appears to be Slavic or Germanic.
83
84
            Returns
85
            -------
86
            bool
87
                True if the word appears to be Slavic or Germanic
88
89
            """
90 1
            if 'W' in word or 'K' in word or 'CZ' in word:
91 1
                return True
92 1
            return False
93
94 1
        def _metaph_add(pri, sec=''):
95
            """Return a new metaphone tuple with the supplied elements.
96
97
            Parameters
98
            ----------
99
            pri : str
100
                The primary element
101
            sec : str
102
                The secondary element
103
104
            Returns
105
            -------
106
            tuple
107
                A new metaphone tuple with the supplied elements
108
109
            """
110 1
            newpri = primary
111 1
            newsec = secondary
112 1
            if pri:
113 1
                newpri += pri
114 1
            if sec:
115 1
                if sec != ' ':
116 1
                    newsec += sec
117
            else:
118 1
                newsec += pri
119 1
            return newpri, newsec
120
121 1
        def _is_vowel(pos):
122
            """Return True if the character at word[pos] is a vowel.
123
124
            Parameters
125
            ----------
126
            pos : int
127
                Position in the word
128
129
            Returns
130
            -------
131
            bool
132
                True if the character is a vowel
133
134
            """
135 1
            if pos >= 0 and word[pos] in {'A', 'E', 'I', 'O', 'U', 'Y'}:
136 1
                return True
137 1
            return False
138
139 1
        def _get_at(pos):
140
            """Return the character at word[pos].
141
142
            Parameters
143
            ----------
144
            pos : int
145
                Position in the word
146
147
            Returns
148
            -------
149
            str
150
                Character at word[pos]
151
152
            """
153 1
            return word[pos]
154
155 1
        def _string_at(pos, slen, substrings):
156
            """Return True if word[pos:pos+slen] is in substrings.
157
158
            Parameters
159
            ----------
160
            pos : int
161
                Position in the word
162
            slen : int
163
                Substring length
164
            substrings : set
165
                Substrings to search
166
167
            Returns
168
            -------
169
            bool
170
                True if word[pos:pos+slen] is in substrings
171
172
            """
173 1
            if pos < 0:
174 1
                return False
175 1
            return word[pos : pos + slen] in substrings
176
177 1
        current = 0
178 1
        length = len(word)
179 1
        if length < 1:
180 1
            return '', ''
181 1
        last = length - 1
182
183 1
        word = word.upper()
184 1
        word = word.replace('ß', 'SS')
185
186
        # Pad the original string so that we can index beyond the edge of the
187
        # world
188 1
        word += '     '
189
190
        # Skip these when at start of word
191 1
        if word[0:2] in {'GN', 'KN', 'PN', 'WR', 'PS'}:
192 1
            current += 1
193
194
        # Initial 'X' is pronounced 'Z' e.g. 'Xavier'
195 1
        if _get_at(0) == 'X':
196 1
            primary, secondary = _metaph_add('S')  # 'Z' maps to 'S'
197 1
            current += 1
198
199
        # Main loop
200 1
        while True:
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
201 1
            if current >= length:
202 1
                break
203
204 1
            if _get_at(current) in {'A', 'E', 'I', 'O', 'U', 'Y'}:
205 1
                if current == 0:
206
                    # All init vowels now map to 'A'
207 1
                    primary, secondary = _metaph_add('A')
208 1
                current += 1
209 1
                continue
210
211 1
            elif _get_at(current) == 'B':
212
                # "-mb", e.g", "dumb", already skipped over...
213 1
                primary, secondary = _metaph_add('P')
214 1
                if _get_at(current + 1) == 'B':
215 1
                    current += 2
216
                else:
217 1
                    current += 1
218 1
                continue
219
220 1
            elif _get_at(current) == 'Ç':
221 1
                primary, secondary = _metaph_add('S')
222 1
                current += 1
223 1
                continue
224
225 1
            elif _get_at(current) == 'C':
226
                # Various Germanic
227 1
                if (
228
                    current > 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
229
                    and not _is_vowel(current - 2)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
230
                    and _string_at((current - 1), 3, {'ACH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
231
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
232
                        (_get_at(current + 2) != 'I')
233
                        and (
234
                            (_get_at(current + 2) != 'E')
235
                            or _string_at(
236
                                (current - 2), 6, {'BACHER', 'MACHER'}
237
                            )
238
                        )
239
                    )
240
                ):
241 1
                    primary, secondary = _metaph_add('K')
242 1
                    current += 2
243 1
                    continue
244
245
                # Special case 'caesar'
246 1
                elif current == 0 and _string_at(current, 6, {'CAESAR'}):
247 1
                    primary, secondary = _metaph_add('S')
248 1
                    current += 2
249 1
                    continue
250
251
                # Italian 'chianti'
252 1
                elif _string_at(current, 4, {'CHIA'}):
253 1
                    primary, secondary = _metaph_add('K')
254 1
                    current += 2
255 1
                    continue
256
257 1
                elif _string_at(current, 2, {'CH'}):
258
                    # Find 'Michael'
259 1
                    if current > 0 and _string_at(current, 4, {'CHAE'}):
260 1
                        primary, secondary = _metaph_add('K', 'X')
261 1
                        current += 2
262 1
                        continue
263
264
                    # Greek roots e.g. 'chemistry', 'chorus'
265 1
                    elif (
266
                        current == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
267
                        and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
268
                            _string_at((current + 1), 5, {'HARAC', 'HARIS'})
269
                            or _string_at(
270
                                (current + 1), 3, {'HOR', 'HYM', 'HIA', 'HEM'}
271
                            )
272
                        )
273
                        and not _string_at(0, 5, {'CHORE'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
274
                    ):
275 1
                        primary, secondary = _metaph_add('K')
276 1
                        current += 2
277 1
                        continue
278
279
                    # Germanic, Greek, or otherwise 'ch' for 'kh' sound
280 1
                    elif (
281
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
282
                            _string_at(0, 4, {'VAN ', 'VON '})
283
                            or _string_at(0, 3, {'SCH'})
284
                        )
285
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
286
                        # 'architect but not 'arch', 'orchestra', 'orchid'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
287
                        _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
288
                            (current - 2), 6, {'ORCHES', 'ARCHIT', 'ORCHID'}
289
                        )
290
                        or _string_at((current + 2), 1, {'T', 'S'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
291
                        or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
292
                            (
293
                                _string_at(
294
                                    (current - 1), 1, {'A', 'O', 'U', 'E'}
295
                                )
296
                                or (current == 0)
297
                            )
298
                            and
299
                            # e.g., 'wachtler', 'wechsler', but not 'tichner'
300
                            _string_at(
301
                                (current + 2),
302
                                1,
303
                                {
304
                                    'L',
305
                                    'R',
306
                                    'N',
307
                                    'M',
308
                                    'B',
309
                                    'H',
310
                                    'F',
311
                                    'V',
312
                                    'W',
313
                                    ' ',
314
                                },
315
                            )
316
                        )
317
                    ):
318 1
                        primary, secondary = _metaph_add('K')
319
320
                    else:
321 1
                        if current > 0:
322 1
                            if _string_at(0, 2, {'MC'}):
323
                                # e.g., "McHugh"
324 1
                                primary, secondary = _metaph_add('K')
325
                            else:
326 1
                                primary, secondary = _metaph_add('X', 'K')
327
                        else:
328 1
                            primary, secondary = _metaph_add('X')
329
330 1
                    current += 2
331 1
                    continue
332
333
                # e.g, 'czerny'
334 1
                elif _string_at(current, 2, {'CZ'}) and not _string_at(
335
                    (current - 2), 4, {'WICZ'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
336
                ):
337 1
                    primary, secondary = _metaph_add('S', 'X')
338 1
                    current += 2
339 1
                    continue
340
341
                # e.g., 'focaccia'
342 1
                elif _string_at((current + 1), 3, {'CIA'}):
343 1
                    primary, secondary = _metaph_add('X')
344 1
                    current += 3
345
346
                # double 'C', but not if e.g. 'McClellan'
347 1
                elif _string_at(current, 2, {'CC'}) and not (
348
                    (current == 1) and (_get_at(0) == 'M')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
349
                ):
350
                    # 'bellocchio' but not 'bacchus'
351 1
                    if _string_at(
352
                        (current + 2), 1, {'I', 'E', 'H'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
353
                    ) and not _string_at((current + 2), 2, {'HU'}):
354
                        # 'accident', 'accede' 'succeed'
355 1
                        if (
356
                            (current == 1) and _get_at(current - 1) == 'A'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
357
                        ) or _string_at((current - 1), 5, {'UCCEE', 'UCCES'}):
358 1
                            primary, secondary = _metaph_add('KS')
359
                        # 'bacci', 'bertucci', other italian
360
                        else:
361 1
                            primary, secondary = _metaph_add('X')
362 1
                        current += 3
363 1
                        continue
364
                    else:  # Pierce's rule
365 1
                        primary, secondary = _metaph_add('K')
366 1
                        current += 2
367 1
                        continue
368
369 1
                elif _string_at(current, 2, {'CK', 'CG', 'CQ'}):
370 1
                    primary, secondary = _metaph_add('K')
371 1
                    current += 2
372 1
                    continue
373
374 1
                elif _string_at(current, 2, {'CI', 'CE', 'CY'}):
375
                    # Italian vs. English
376 1
                    if _string_at(current, 3, {'CIO', 'CIE', 'CIA'}):
377 1
                        primary, secondary = _metaph_add('S', 'X')
378
                    else:
379 1
                        primary, secondary = _metaph_add('S')
380 1
                    current += 2
381 1
                    continue
382
383
                # else
384
                else:
385 1
                    primary, secondary = _metaph_add('K')
386
387
                    # name sent in 'mac caffrey', 'mac gregor
388 1
                    if _string_at((current + 1), 2, {' C', ' Q', ' G'}):
389 1
                        current += 3
390 1
                    elif _string_at(
391
                        (current + 1), 1, {'C', 'K', 'Q'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
392
                    ) and not _string_at((current + 1), 2, {'CE', 'CI'}):
393 1
                        current += 2
394
                    else:
395 1
                        current += 1
396 1
                    continue
397
398 1
            elif _get_at(current) == 'D':
399 1
                if _string_at(current, 2, {'DG'}):
400 1
                    if _string_at((current + 2), 1, {'I', 'E', 'Y'}):
401
                        # e.g. 'edge'
402 1
                        primary, secondary = _metaph_add('J')
403 1
                        current += 3
404 1
                        continue
405
                    else:
406
                        # e.g. 'edgar'
407 1
                        primary, secondary = _metaph_add('TK')
408 1
                        current += 2
409 1
                        continue
410
411 1
                elif _string_at(current, 2, {'DT', 'DD'}):
412 1
                    primary, secondary = _metaph_add('T')
413 1
                    current += 2
414 1
                    continue
415
416
                # else
417
                else:
418 1
                    primary, secondary = _metaph_add('T')
419 1
                    current += 1
420 1
                    continue
421
422 1
            elif _get_at(current) == 'F':
423 1
                if _get_at(current + 1) == 'F':
424 1
                    current += 2
425
                else:
426 1
                    current += 1
427 1
                primary, secondary = _metaph_add('F')
428 1
                continue
429
430 1
            elif _get_at(current) == 'G':
431 1
                if _get_at(current + 1) == 'H':
432 1
                    if (current > 0) and not _is_vowel(current - 1):
433 1
                        primary, secondary = _metaph_add('K')
434 1
                        current += 2
435 1
                        continue
436
437
                    # 'ghislane', ghiradelli
438 1
                    elif current == 0:
439 1
                        if _get_at(current + 2) == 'I':
440 1
                            primary, secondary = _metaph_add('J')
441
                        else:
442 1
                            primary, secondary = _metaph_add('K')
443 1
                        current += 2
444 1
                        continue
445
446
                    # Parker's rule (with some further refinements) -
447
                    # e.g., 'hugh'
448 1
                    elif (
449
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
450
                            (current > 1)
451
                            and _string_at((current - 2), 1, {'B', 'H', 'D'})
452
                        )
453
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
454
                        # e.g., 'bough'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
455
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
456
                            (current > 2)
457
                            and _string_at((current - 3), 1, {'B', 'H', 'D'})
458
                        )
459
                        or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
460
                        # e.g., 'broughton'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
461
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
462
                            (current > 3)
463
                            and _string_at((current - 4), 1, {'B', 'H'})
464
                        )
465
                    ):
466 1
                        current += 2
467 1
                        continue
468
                    else:
469
                        # e.g. 'laugh', 'McLaughlin', 'cough',
470
                        #      'gough', 'rough', 'tough'
471 1
                        if (
472
                            (current > 2)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
473
                            and (_get_at(current - 1) == 'U')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
474
                            and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
475
                                _string_at(
476
                                    (current - 3), 1, {'C', 'G', 'L', 'R', 'T'}
477
                                )
478
                            )
479
                        ):
480 1
                            primary, secondary = _metaph_add('F')
481 1
                        elif (current > 0) and _get_at(current - 1) != 'I':
482 1
                            primary, secondary = _metaph_add('K')
483 1
                        current += 2
484 1
                        continue
485
486 1
                elif _get_at(current + 1) == 'N':
487 1
                    if (
488
                        (current == 1)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
489
                        and _is_vowel(0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
490
                        and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
491
                    ):
492 1
                        primary, secondary = _metaph_add('KN', 'N')
493
                    # not e.g. 'cagney'
494 1
                    elif (
495
                        not _string_at((current + 2), 2, {'EY'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
496
                        and (_get_at(current + 1) != 'Y')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
497
                        and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
498
                    ):
499 1
                        primary, secondary = _metaph_add('N', 'KN')
500
                    else:
501 1
                        primary, secondary = _metaph_add('KN')
502 1
                    current += 2
503 1
                    continue
504
505
                # 'tagliaro'
506 1
                elif (
507
                    _string_at((current + 1), 2, {'LI'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
508
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
509
                ):
510 1
                    primary, secondary = _metaph_add('KL', 'L')
511 1
                    current += 2
512 1
                    continue
513
514
                # -ges-, -gep-, -gel-, -gie- at beginning
515 1
                elif (current == 0) and (
516
                    (_get_at(current + 1) == 'Y')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
517
                    or _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
518
                        (current + 1),
519
                        2,
520
                        {
521
                            'ES',
522
                            'EP',
523
                            'EB',
524
                            'EL',
525
                            'EY',
526
                            'IB',
527
                            'IL',
528
                            'IN',
529
                            'IE',
530
                            'EI',
531
                            'ER',
532
                        },
533
                    )
534
                ):
535 1
                    primary, secondary = _metaph_add('K', 'J')
536 1
                    current += 2
537 1
                    continue
538
539
                #  -ger-,  -gy-
540 1
                elif (
541
                    (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
542
                        _string_at((current + 1), 2, {'ER'})
543
                        or (_get_at(current + 1) == 'Y')
544
                    )
545
                    and not _string_at(0, 6, {'DANGER', 'RANGER', 'MANGER'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
546
                    and not _string_at((current - 1), 1, {'E', 'I'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
547
                    and not _string_at((current - 1), 3, {'RGY', 'OGY'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
548
                ):
549 1
                    primary, secondary = _metaph_add('K', 'J')
550 1
                    current += 2
551 1
                    continue
552
553
                #  italian e.g, 'biaggi'
554 1
                elif _string_at(
555
                    (current + 1), 1, {'E', 'I', 'Y'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
556
                ) or _string_at((current - 1), 4, {'AGGI', 'OGGI'}):
557
                    # obvious germanic
558 1
                    if (
559
                        _string_at(0, 4, {'VAN ', 'VON '})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
560
                        or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
561
                    ) or _string_at((current + 1), 2, {'ET'}):
562 1
                        primary, secondary = _metaph_add('K')
563 1
                    elif _string_at((current + 1), 4, {'IER '}):
564 1
                        primary, secondary = _metaph_add('J')
565
                    else:
566 1
                        primary, secondary = _metaph_add('J', 'K')
567 1
                    current += 2
568 1
                    continue
569
570
                else:
571 1
                    if _get_at(current + 1) == 'G':
572 1
                        current += 2
573
                    else:
574 1
                        current += 1
575 1
                    primary, secondary = _metaph_add('K')
576 1
                    continue
577
578 1
            elif _get_at(current) == 'H':
579
                # only keep if first & before vowel or btw. 2 vowels
580 1
                if ((current == 0) or _is_vowel(current - 1)) and _is_vowel(
581
                    current + 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
582
                ):
583 1
                    primary, secondary = _metaph_add('H')
584 1
                    current += 2
585
                else:  # also takes care of 'HH'
586 1
                    current += 1
587 1
                continue
588
589 1
            elif _get_at(current) == 'J':
590
                # obvious spanish, 'jose', 'san jacinto'
591 1
                if _string_at(current, 4, {'JOSE'}) or _string_at(
592
                    0, 4, {'SAN '}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
593
                ):
594 1
                    if (
595
                        (current == 0) and (_get_at(current + 4) == ' ')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
596
                    ) or _string_at(0, 4, {'SAN '}):
597 1
                        primary, secondary = _metaph_add('H')
598
                    else:
599 1
                        primary, secondary = _metaph_add('J', 'H')
600 1
                    current += 1
601 1
                    continue
602
603 1
                elif (current == 0) and not _string_at(current, 4, {'JOSE'}):
604
                    # Yankelovich/Jankelowicz
605 1
                    primary, secondary = _metaph_add('J', 'A')
606
                # Spanish pron. of e.g. 'bajador'
607 1
                elif (
608
                    _is_vowel(current - 1)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
609
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
610
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
611
                        (_get_at(current + 1) == 'A')
612
                        or (_get_at(current + 1) == 'O')
613
                    )
614
                ):
615 1
                    primary, secondary = _metaph_add('J', 'H')
616 1
                elif current == last:
617 1
                    primary, secondary = _metaph_add('J', ' ')
618 1
                elif not _string_at(
619
                    (current + 1), 1, {'L', 'T', 'K', 'S', 'N', 'M', 'B', 'Z'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
620
                ) and not _string_at((current - 1), 1, {'S', 'K', 'L'}):
621 1
                    primary, secondary = _metaph_add('J')
622
623 1
                if _get_at(current + 1) == 'J':  # it could happen!
624 1
                    current += 2
625
                else:
626 1
                    current += 1
627 1
                continue
628
629 1
            elif _get_at(current) == 'K':
630 1
                if _get_at(current + 1) == 'K':
631 1
                    current += 2
632
                else:
633 1
                    current += 1
634 1
                primary, secondary = _metaph_add('K')
635 1
                continue
636
637 1
            elif _get_at(current) == 'L':
638 1
                if _get_at(current + 1) == 'L':
639
                    # Spanish e.g. 'cabrillo', 'gallegos'
640 1
                    if (
641
                        (current == (length - 3))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
642
                        and _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
643
                            (current - 1), 4, {'ILLO', 'ILLA', 'ALLE'}
644
                        )
645
                    ) or (
646
                        (
647
                            _string_at((last - 1), 2, {'AS', 'OS'})
648
                            or _string_at(last, 1, {'A', 'O'})
649
                        )
650
                        and _string_at((current - 1), 4, {'ALLE'})
651
                    ):
652 1
                        primary, secondary = _metaph_add('L', ' ')
653 1
                        current += 2
654 1
                        continue
655 1
                    current += 2
656
                else:
657 1
                    current += 1
658 1
                primary, secondary = _metaph_add('L')
659 1
                continue
660
661 1
            elif _get_at(current) == 'M':
662 1
                if (
663
                    (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
664
                        _string_at((current - 1), 3, {'UMB'})
665
                        and (
666
                            ((current + 1) == last)
667
                            or _string_at((current + 2), 2, {'ER'})
668
                        )
669
                    )
670
                    or
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
671
                    # 'dumb', 'thumb'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
672
                    (_get_at(current + 1) == 'M')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
673
                ):
674 1
                    current += 2
675
                else:
676 1
                    current += 1
677 1
                primary, secondary = _metaph_add('M')
678 1
                continue
679
680 1
            elif _get_at(current) == 'N':
681 1
                if _get_at(current + 1) == 'N':
682 1
                    current += 2
683
                else:
684 1
                    current += 1
685 1
                primary, secondary = _metaph_add('N')
686 1
                continue
687
688 1
            elif _get_at(current) == 'Ñ':
689 1
                current += 1
690 1
                primary, secondary = _metaph_add('N')
691 1
                continue
692
693 1
            elif _get_at(current) == 'P':
694 1
                if _get_at(current + 1) == 'H':
695 1
                    primary, secondary = _metaph_add('F')
696 1
                    current += 2
697 1
                    continue
698
699
                # also account for "campbell", "raspberry"
700 1
                elif _string_at((current + 1), 1, {'P', 'B'}):
701 1
                    current += 2
702
                else:
703 1
                    current += 1
704 1
                primary, secondary = _metaph_add('P')
705 1
                continue
706
707 1
            elif _get_at(current) == 'Q':
708 1
                if _get_at(current + 1) == 'Q':
709 1
                    current += 2
710
                else:
711 1
                    current += 1
712 1
                primary, secondary = _metaph_add('K')
713 1
                continue
714
715 1
            elif _get_at(current) == 'R':
716
                # french e.g. 'rogier', but exclude 'hochmeier'
717 1
                if (
718
                    (current == last)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
719
                    and not _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
720
                    and _string_at((current - 2), 2, {'IE'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
721
                    and not _string_at((current - 4), 2, {'ME', 'MA'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
722
                ):
723 1
                    primary, secondary = _metaph_add('', 'R')
724
                else:
725 1
                    primary, secondary = _metaph_add('R')
726
727 1
                if _get_at(current + 1) == 'R':
728 1
                    current += 2
729
                else:
730 1
                    current += 1
731 1
                continue
732
733 1
            elif _get_at(current) == 'S':
734
                # special cases 'island', 'isle', 'carlisle', 'carlysle'
735 1
                if _string_at((current - 1), 3, {'ISL', 'YSL'}):
736 1
                    current += 1
737 1
                    continue
738
739
                # special case 'sugar-'
740 1
                elif (current == 0) and _string_at(current, 5, {'SUGAR'}):
741 1
                    primary, secondary = _metaph_add('X', 'S')
742 1
                    current += 1
743 1
                    continue
744
745 1
                elif _string_at(current, 2, {'SH'}):
746
                    # Germanic
747 1
                    if _string_at(
748
                        (current + 1), 4, {'HEIM', 'HOEK', 'HOLM', 'HOLZ'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
749
                    ):
750 1
                        primary, secondary = _metaph_add('S')
751
                    else:
752 1
                        primary, secondary = _metaph_add('X')
753 1
                    current += 2
754 1
                    continue
755
756
                # Italian & Armenian
757 1
                elif _string_at(current, 3, {'SIO', 'SIA'}) or _string_at(
758
                    current, 4, {'SIAN'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
759
                ):
760 1
                    if not _slavo_germanic():
761 1
                        primary, secondary = _metaph_add('S', 'X')
762
                    else:
763 1
                        primary, secondary = _metaph_add('S')
764 1
                    current += 3
765 1
                    continue
766
767
                # German & anglicisations, e.g. 'smith' match 'schmidt',
768
                #                               'snider' match 'schneider'
769
                # also, -sz- in Slavic language although in Hungarian it is
770
                #       pronounced 's'
771 1
                elif (
772
                    (current == 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
773
                    and _string_at((current + 1), 1, {'M', 'N', 'L', 'W'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
774
                ) or _string_at((current + 1), 1, {'Z'}):
775 1
                    primary, secondary = _metaph_add('S', 'X')
776 1
                    if _string_at((current + 1), 1, {'Z'}):
777 1
                        current += 2
778
                    else:
779 1
                        current += 1
780 1
                    continue
781
782 1
                elif _string_at(current, 2, {'SC'}):
783
                    # Schlesinger's rule
784 1
                    if _get_at(current + 2) == 'H':
785
                        # dutch origin, e.g. 'school', 'schooner'
786 1
                        if _string_at(
787
                            (current + 3),
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
788
                            2,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
789
                            {'OO', 'ER', 'EN', 'UY', 'ED', 'EM'},
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
790
                        ):
791
                            # 'schermerhorn', 'schenker'
792 1
                            if _string_at((current + 3), 2, {'ER', 'EN'}):
793 1
                                primary, secondary = _metaph_add('X', 'SK')
794
                            else:
795 1
                                primary, secondary = _metaph_add('SK')
796 1
                            current += 3
797 1
                            continue
798
                        else:
799 1
                            if (
800
                                (current == 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
801
                                and not _is_vowel(3)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
802
                                and (_get_at(3) != 'W')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
803
                            ):
804 1
                                primary, secondary = _metaph_add('X', 'S')
805
                            else:
806 1
                                primary, secondary = _metaph_add('X')
807 1
                            current += 3
808 1
                            continue
809
810 1
                    elif _string_at((current + 2), 1, {'I', 'E', 'Y'}):
811 1
                        primary, secondary = _metaph_add('S')
812 1
                        current += 3
813 1
                        continue
814
815
                    # else
816
                    else:
817 1
                        primary, secondary = _metaph_add('SK')
818 1
                        current += 3
819 1
                        continue
820
821
                else:
822
                    # french e.g. 'resnais', 'artois'
823 1
                    if (current == last) and _string_at(
824
                        (current - 2), 2, {'AI', 'OI'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
825
                    ):
826 1
                        primary, secondary = _metaph_add('', 'S')
827
                    else:
828 1
                        primary, secondary = _metaph_add('S')
829
830 1
                    if _string_at((current + 1), 1, {'S', 'Z'}):
831 1
                        current += 2
832
                    else:
833 1
                        current += 1
834 1
                    continue
835
836 1
            elif _get_at(current) == 'T':
837 1
                if _string_at(current, 4, {'TION'}):
838 1
                    primary, secondary = _metaph_add('X')
839 1
                    current += 3
840 1
                    continue
841
842 1
                elif _string_at(current, 3, {'TIA', 'TCH'}):
843 1
                    primary, secondary = _metaph_add('X')
844 1
                    current += 3
845 1
                    continue
846
847 1
                elif _string_at(current, 2, {'TH'}) or _string_at(
848
                    current, 3, {'TTH'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
849
                ):
850
                    # special case 'thomas', 'thames' or germanic
851 1
                    if (
852
                        _string_at((current + 2), 2, {'OM', 'AM'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
853
                        or _string_at(0, 4, {'VAN ', 'VON '})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
854
                        or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
855
                    ):
856 1
                        primary, secondary = _metaph_add('T')
857
                    else:
858 1
                        primary, secondary = _metaph_add('0', 'T')
859 1
                    current += 2
860 1
                    continue
861
862 1
                elif _string_at((current + 1), 1, {'T', 'D'}):
863 1
                    current += 2
864
                else:
865 1
                    current += 1
866 1
                primary, secondary = _metaph_add('T')
867 1
                continue
868
869 1
            elif _get_at(current) == 'V':
870 1
                if _get_at(current + 1) == 'V':
871 1
                    current += 2
872
                else:
873 1
                    current += 1
874 1
                primary, secondary = _metaph_add('F')
875 1
                continue
876
877 1
            elif _get_at(current) == 'W':
878
                # can also be in middle of word
879 1
                if _string_at(current, 2, {'WR'}):
880 1
                    primary, secondary = _metaph_add('R')
881 1
                    current += 2
882 1
                    continue
883 1
                elif (current == 0) and (
884
                    _is_vowel(current + 1) or _string_at(current, 2, {'WH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
885
                ):
886
                    # Wasserman should match Vasserman
887 1
                    if _is_vowel(current + 1):
888 1
                        primary, secondary = _metaph_add('A', 'F')
889
                    else:
890
                        # need Uomo to match Womo
891 1
                        primary, secondary = _metaph_add('A')
892
893
                # Arnow should match Arnoff
894 1
                if (
895
                    ((current == last) and _is_vowel(current - 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
896
                    or _string_at(
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
897
                        (current - 1), 5, {'EWSKI', 'EWSKY', 'OWSKI', 'OWSKY'}
898
                    )
899
                    or _string_at(0, 3, {'SCH'})
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
900
                ):
901 1
                    primary, secondary = _metaph_add('', 'F')
902 1
                    current += 1
903 1
                    continue
904
                # Polish e.g. 'filipowicz'
905 1
                elif _string_at(current, 4, {'WICZ', 'WITZ'}):
906 1
                    primary, secondary = _metaph_add('TS', 'FX')
907 1
                    current += 4
908 1
                    continue
909
                # else skip it
910
                else:
911 1
                    current += 1
912 1
                    continue
913
914 1
            elif _get_at(current) == 'X':
915
                # French e.g. breaux
916 1
                if not (
917
                    (current == last)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
918
                    and (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
919
                        _string_at((current - 3), 3, {'IAU', 'EAU'})
920
                        or _string_at((current - 2), 2, {'AU', 'OU'})
921
                    )
922
                ):
923 1
                    primary, secondary = _metaph_add('KS')
924
925 1
                if _string_at((current + 1), 1, {'C', 'X'}):
926 1
                    current += 2
927
                else:
928 1
                    current += 1
929 1
                continue
930
931 1
            elif _get_at(current) == 'Z':
932
                # Chinese Pinyin e.g. 'zhao'
933 1
                if _get_at(current + 1) == 'H':
934 1
                    primary, secondary = _metaph_add('J')
935 1
                    current += 2
936 1
                    continue
937 1
                elif _string_at((current + 1), 2, {'ZO', 'ZI', 'ZA'}) or (
938
                    _slavo_germanic()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
939
                    and ((current > 0) and _get_at(current - 1) != 'T')
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
940
                ):
941 1
                    primary, secondary = _metaph_add('S', 'TS')
942
                else:
943 1
                    primary, secondary = _metaph_add('S')
944
945 1
                if _get_at(current + 1) == 'Z':
946 1
                    current += 2
947
                else:
948 1
                    current += 1
949 1
                continue
950
951
            else:
952 1
                current += 1
953
954 1
        if max_length > 0:
955 1
            primary = primary[:max_length]
956 1
            secondary = secondary[:max_length]
957 1
        if primary == secondary:
958 1
            secondary = ''
959
960 1
        return primary, secondary
961
962
963 1
def double_metaphone(word, max_length=-1):
964
    """Return the Double Metaphone code for a word.
965
966
    This is a wrapper for :py:meth:`DoubleMetaphone.encode`.
967
968
    Parameters
969
    ----------
970
    word : str
971
        The word to transform
972
    max_length : int
973
        The maximum length of the returned Double Metaphone codes (defaults to
974
        64, but in Philips' original implementation this was 4)
975
976
    Returns
977
    -------
978
    tuple
979
        The Double Metaphone value(s)
980
981
    Examples
982
    --------
983
    >>> double_metaphone('Christopher')
984
    ('KRSTFR', '')
985
    >>> double_metaphone('Niall')
986
    ('NL', '')
987
    >>> double_metaphone('Smith')
988
    ('SM0', 'XMT')
989
    >>> double_metaphone('Schmidt')
990
    ('XMT', 'SMT')
991
992
    """
993 1
    return DoubleMetaphone().encode(word, max_length)
994
995
996
if __name__ == '__main__':
997
    import doctest
998
999
    doctest.testmod()
1000