Completed
Pull Request — master (#138)
by Chris
14:20
created

abydos.phonetic._dolby.Dolby.encode()   F

Complexity

Conditions 32

Size

Total Lines 188
Code Lines 94

Duplication

Lines 0
Ratio 0 %

Code Coverage

Tests 81
CRAP Score 32

Importance

Changes 0
Metric Value
eloc 94
dl 0
loc 188
ccs 81
cts 81
cp 1
rs 0
c 0
b 0
f 0
cc 32
nop 5
crap 32

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._dolby.Dolby.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._dolby.
20
21
The phonetic._dolby module implements the Dolby Code algorithm.
22
"""
23
24 1
from __future__ import unicode_literals
25
26 1
from unicodedata import normalize as unicode_normalize
27
28 1
from six import text_type
29
30 1
from ._phonetic import Phonetic
31
32 1
__all__ = ['Dolby', 'dolby']
33
34
35 1
class Dolby(Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
36
    """Dolby Code.
37
38
    This follows "A Spelling Equivalent Abbreviation Algorithm For Personal
39
    Names" from :cite:`Dolby:1970` and :cite:`Cunningham:1969`.
40
    """
41
42 1
    def encode(self, word, max_length=-1, keep_vowels=False, vowel_char='*'):
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'encode' method
Loading history...
43
        r"""Return the Dolby Code of a name.
44
45
        :param word: the word to encode
46
        :param max_length: maximum length of the returned Dolby code -- this
47
            also activates the fixed-length code mode if it is greater than 0
48
        :param keep_vowels: if True, retains all vowel markers
49
        :param vowel_char: the vowel marker character (default to \*)
50
        :returns: the Dolby Code
51
        :rtype: str
52
53
        >>> pe = Dolby()
54
        >>> pe.encode('Hansen')
55
        'H*NSN'
56
        >>> pe.encode('Larsen')
57
        'L*RSN'
58
        >>> pe.encode('Aagaard')
59
        '*GR'
60
        >>> pe.encode('Braaten')
61
        'BR*DN'
62
        >>> pe.encode('Sandvik')
63
        'S*NVK'
64
        >>> pe.encode('Hansen', max_length=6)
65
        'H*NS*N'
66
        >>> pe.encode('Larsen', max_length=6)
67
        'L*RS*N'
68
        >>> pe.encode('Aagaard', max_length=6)
69
        '*G*R  '
70
        >>> pe.encode('Braaten', max_length=6)
71
        'BR*D*N'
72
        >>> pe.encode('Sandvik', max_length=6)
73
        'S*NF*K'
74
75
        >>> pe.encode('Smith')
76
        'SM*D'
77
        >>> pe.encode('Waters')
78
        'W*DRS'
79
        >>> pe.encode('James')
80
        'J*MS'
81
        >>> pe.encode('Schmidt')
82
        'SM*D'
83
        >>> pe.encode('Ashcroft')
84
        '*SKRFD'
85
        >>> pe.encode('Smith', max_length=6)
86
        'SM*D  '
87
        >>> pe.encode('Waters', max_length=6)
88
        'W*D*RS'
89
        >>> pe.encode('James', max_length=6)
90
        'J*M*S '
91
        >>> pe.encode('Schmidt', max_length=6)
92
        'SM*D  '
93
        >>> pe.encode('Ashcroft', max_length=6)
94
        '*SKRFD'
95
        """
96
        # uppercase, normalize, decompose, and filter non-A-Z out
97 1
        word = unicode_normalize('NFKD', text_type(word.upper()))
98 1
        word = word.replace('ß', 'SS')
99 1
        word = ''.join(c for c in word if c in self._uc_set)
100
101
        # Rule 1 (FL2)
102 1
        if word[:3] in {'MCG', 'MAG', 'MAC'}:
103 1
            word = 'MK' + word[3:]
104 1
        elif word[:2] == 'MC':
105 1
            word = 'MK' + word[2:]
106
107
        # Rule 2 (FL3)
108 1
        pos = len(word) - 2
109 1
        while pos > -1:
110 1
            if word[pos : pos + 2] in {
111
                'DT',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
112
                'LD',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
113
                'ND',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
114
                'NT',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
115
                'RC',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
116
                'RD',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
117
                'RT',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
118
                'SC',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
119
                'SK',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
120
                'ST',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
121
            }:
122 1
                word = word[: pos + 1] + word[pos + 2 :]
123 1
                pos += 1
124 1
            pos -= 1
125
126
        # Rule 3 (FL4)
127
        # Although the rule indicates "after the first letter", the test cases
128
        # make it clear that these apply to the first letter also.
129 1
        word = word.replace('X', 'KS')
130 1
        word = word.replace('CE', 'SE')
131 1
        word = word.replace('CI', 'SI')
132 1
        word = word.replace('CY', 'SI')
133
134
        # not in the rule set, but they seem to have intended it
135 1
        word = word.replace('TCH', 'CH')
136
137 1
        pos = word.find('CH', 1)
138 1
        while pos != -1:
139 1
            if word[pos - 1 : pos] not in self._uc_vy_set:
140 1
                word = word[:pos] + 'S' + word[pos + 1 :]
141 1
            pos = word.find('CH', pos + 1)
142
143 1
        word = word.replace('C', 'K')
144 1
        word = word.replace('Z', 'S')
145
146 1
        word = word.replace('WR', 'R')
147 1
        word = word.replace('DG', 'G')
148 1
        word = word.replace('QU', 'K')
149 1
        word = word.replace('T', 'D')
150 1
        word = word.replace('PH', 'F')
151
152
        # Rule 4 (FL5)
153
        # Although the rule indicates "after the first letter", the test cases
154
        # make it clear that these apply to the first letter also.
155 1
        pos = word.find('K', 0)
156 1
        while pos != -1:
157 1
            if pos > 1 and word[pos - 1 : pos] not in self._uc_vy_set | {
158
                'L',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
159
                'N',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
160
                'R',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
161
            }:
162 1
                word = word[: pos - 1] + word[pos:]
163 1
                pos -= 1
164 1
            pos = word.find('K', pos + 1)
165
166
        # Rule FL6
167 1
        if max_length > 0 and word[-1:] == 'E':
168 1
            word = word[:-1]
169
170
        # Rule 5 (FL7)
171 1
        word = self._delete_consecutive_repeats(word)
172
173
        # Rule 6 (FL8)
174 1
        if word[:2] == 'PF':
175 1
            word = word[1:]
176 1
        if word[-2:] == 'PF':
177 1
            word = word[:-1]
178 1
        elif word[-2:] == 'GH':
179 1
            if word[-3:-2] in self._uc_vy_set:
180 1
                word = word[:-2] + 'F'
181
            else:
182 1
                word = word[:-2] + 'G'
183 1
        word = word.replace('GH', '')
184
185
        # Rule FL9
186 1
        if max_length > 0:
187 1
            word = word.replace('V', 'F')
188
189
        # Rules 7-9 (FL10-FL12)
190 1
        first = 1 + (1 if max_length > 0 else 0)
191 1
        code = ''
192 1
        for pos, char in enumerate(word):
193 1
            if char in self._uc_vy_set:
194 1
                if first or keep_vowels:
195 1
                    code += vowel_char
196 1
                    first -= 1
197 1
            elif pos > 0 and char in {'W', 'H'}:
198 1
                continue
199
            else:
200 1
                code += char
201
202 1
        if max_length > 0:
0 ignored issues
show
unused-code introduced by
Too many nested blocks (6/5)
Loading history...
203
            # Rule FL13
204 1
            if len(code) > max_length and code[-1:] == 'S':
205 1
                code = code[:-1]
206 1
            if keep_vowels:
207 1
                code = code[:max_length]
208
            else:
209
                # Rule FL14
210 1
                code = code[: max_length + 2]
211
                # Rule FL15
212 1
                while len(code) > max_length:
213 1
                    vowels = len(code) - max_length
214 1
                    excess = vowels - 1
215 1
                    word = code
216 1
                    code = ''
217 1
                    for char in word:
218 1
                        if char == vowel_char:
219 1
                            if vowels:
220 1
                                code += char
221 1
                                vowels -= 1
222
                        else:
223 1
                            code += char
224 1
                    code = code[: max_length + excess]
225
226
            # Rule FL16
227 1
            code += ' ' * (max_length - len(code))
228
229 1
        return code
230
231
232 1
def dolby(word, max_length=-1, keep_vowels=False, vowel_char='*'):
233
    r"""Return the Dolby Code of a name.
234
235
    This follows "A Spelling Equivalent Abbreviation Algorithm For Personal
236
    Names" from :cite:`Dolby:1970` and :cite:`Cunningham:1969`.
237
238
    :param word: the word to encode
239
    :param max_length: maximum length of the returned Dolby code -- this also
240
        activates the fixed-length code mode if it is greater than 0
241
    :param keep_vowels: if True, retains all vowel markers
242
    :param vowel_char: the vowel marker character (default to \*)
243
    :returns: the Dolby Code
244
    :rtype: str
245
246
    >>> dolby('Hansen')
247
    'H*NSN'
248
    >>> dolby('Larsen')
249
    'L*RSN'
250
    >>> dolby('Aagaard')
251
    '*GR'
252
    >>> dolby('Braaten')
253
    'BR*DN'
254
    >>> dolby('Sandvik')
255
    'S*NVK'
256
    >>> dolby('Hansen', max_length=6)
257
    'H*NS*N'
258
    >>> dolby('Larsen', max_length=6)
259
    'L*RS*N'
260
    >>> dolby('Aagaard', max_length=6)
261
    '*G*R  '
262
    >>> dolby('Braaten', max_length=6)
263
    'BR*D*N'
264
    >>> dolby('Sandvik', max_length=6)
265
    'S*NF*K'
266
267
    >>> dolby('Smith')
268
    'SM*D'
269
    >>> dolby('Waters')
270
    'W*DRS'
271
    >>> dolby('James')
272
    'J*MS'
273
    >>> dolby('Schmidt')
274
    'SM*D'
275
    >>> dolby('Ashcroft')
276
    '*SKRFD'
277
    >>> dolby('Smith', max_length=6)
278
    'SM*D  '
279
    >>> dolby('Waters', max_length=6)
280
    'W*D*RS'
281
    >>> dolby('James', max_length=6)
282
    'J*M*S '
283
    >>> dolby('Schmidt', max_length=6)
284
    'SM*D  '
285
    >>> dolby('Ashcroft', max_length=6)
286
    '*SKRFD'
287
    """
288 1
    return Dolby().encode(word, max_length, keep_vowels, vowel_char)
289
290
291
if __name__ == '__main__':
292
    import doctest
293
294
    doctest.testmod()
295