Completed
Push — master ( f43547...71985b )
by Chris
12:00 queued 10s
created

abydos.phonetic._koelner.Koelner.encode()   F

Complexity

Conditions 23

Size

Total Lines 141
Code Lines 58

Duplication

Lines 45
Ratio 31.91 %

Code Coverage

Tests 57
CRAP Score 23

Importance

Changes 0
Metric Value
cc 23
eloc 58
nop 2
dl 45
loc 141
ccs 57
cts 57
cp 1
crap 23
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.phonetic._koelner.Koelner.encode() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2014-2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.phonetic._koelner.
20
21
Kölner Phonetik
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from unicodedata import normalize as unicode_normalize
32
33 1
from six import text_type
34 1
from six.moves import range
35
36 1
from ._phonetic import _Phonetic
37
38 1
__all__ = [
39
    'Koelner',
40
    'koelner_phonetik',
41
    'koelner_phonetik_alpha',
42
    'koelner_phonetik_num_to_alpha',
43
]
44
45
46 1
class Koelner(_Phonetic):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
47
    """Kölner Phonetik.
48
49
    Based on the algorithm defined by :cite:`Postel:1969`.
50
    """
51
52 1
    _uc_v_set = set('AEIOUJY')
53
54 1
    _num_trans = dict(zip((ord(_) for _ in '012345678'), 'APTFKLNRS'))
0 ignored issues
show
Comprehensibility Best Practice introduced by
The variable _ does not seem to be defined.
Loading history...
55 1
    _num_set = set('012345678')
56
57 1
    def encode(self, word):
58
        """Return the Kölner Phonetik (numeric output) code for a word.
59
60
        While the output code is numeric, it is still a str because 0s can lead
61
        the code.
62
63
        Parameters
64
        ----------
65
        word : str
66
            The word to transform
67
68
        Returns
69
        -------
70
        str
71
            The Kölner Phonetik value as a numeric string
72
73
        Example
74
        -------
75
        >>> pe = Koelner()
76
        >>> pe.encode('Christopher')
77
        '478237'
78
        >>> pe.encode('Niall')
79
        '65'
80
        >>> pe.encode('Smith')
81
        '862'
82
        >>> pe.encode('Schmidt')
83
        '862'
84
        >>> pe.encode('Müller')
85
        '657'
86
        >>> pe.encode('Zimmermann')
87
        '86766'
88
89
        """
90
91 1
        def _after(word, pos, letters):
92
            """Return True if word[pos] follows one of the supplied letters.
93
94
            Parameters
95
            ----------
96
            word : str
97
                The word to check
98
            pos : int
99
                Position within word to check
100
            letters : str
101
                Letters to confirm precede word[pos]
102
103
            Returns
104
            -------
105
            bool
106
                True if word[pos] follows a value in letters
107
108
            """
109 1
            return pos > 0 and word[pos - 1] in letters
110
111 1
        def _before(word, pos, letters):
112
            """Return True if word[pos] precedes one of the supplied letters.
113
114
            Parameters
115
            ----------
116
            word : str
117
                The word to check
118
            pos : int
119
                Position within word to check
120
            letters : str
121
                Letters to confirm follow word[pos]
122
123
            Returns
124
            -------
125
            bool
126
                True if word[pos] precedes a value in letters
127
128
            """
129 1
            return pos + 1 < len(word) and word[pos + 1] in letters
130
131 1
        sdx = ''
132
133 1
        word = unicode_normalize('NFKD', text_type(word.upper()))
134 1
        word = word.replace('ß', 'SS')
135
136 1
        word = word.replace('Ä', 'AE')
137 1
        word = word.replace('Ö', 'OE')
138 1
        word = word.replace('Ü', 'UE')
139 1
        word = ''.join(c for c in word if c in self._uc_set)
140
141
        # Nothing to convert, return base case
142 1
        if not word:
143 1
            return sdx
144
145 1
        for i in range(len(word)):
0 ignored issues
show
unused-code introduced by
Consider using enumerate instead of iterating with range and len
Loading history...
146 1 View Code Duplication
            if word[i] in self._uc_v_set:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
147 1
                sdx += '0'
148 1
            elif word[i] == 'B':
149 1
                sdx += '1'
150 1
            elif word[i] == 'P':
151 1
                if _before(word, i, {'H'}):
152 1
                    sdx += '3'
153
                else:
154 1
                    sdx += '1'
155 1
            elif word[i] in {'D', 'T'}:
156 1
                if _before(word, i, {'C', 'S', 'Z'}):
157 1
                    sdx += '8'
158
                else:
159 1
                    sdx += '2'
160 1
            elif word[i] in {'F', 'V', 'W'}:
161 1
                sdx += '3'
162 1
            elif word[i] in {'G', 'K', 'Q'}:
163 1
                sdx += '4'
164 1
            elif word[i] == 'C':
165 1
                if _after(word, i, {'S', 'Z'}):
166 1
                    sdx += '8'
167 1
                elif i == 0:
168 1
                    if _before(
169
                        word, i, {'A', 'H', 'K', 'L', 'O', 'Q', 'R', 'U', 'X'}
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
170
                    ):
171 1
                        sdx += '4'
172
                    else:
173 1
                        sdx += '8'
174 1
                elif _before(word, i, {'A', 'H', 'K', 'O', 'Q', 'U', 'X'}):
175 1
                    sdx += '4'
176
                else:
177 1
                    sdx += '8'
178 1
            elif word[i] == 'X':
179 1
                if _after(word, i, {'C', 'K', 'Q'}):
180 1
                    sdx += '8'
181
                else:
182 1
                    sdx += '48'
183 1
            elif word[i] == 'L':
184 1
                sdx += '5'
185 1
            elif word[i] in {'M', 'N'}:
186 1
                sdx += '6'
187 1
            elif word[i] == 'R':
188 1
                sdx += '7'
189 1
            elif word[i] in {'S', 'Z'}:
190 1
                sdx += '8'
191
192 1
        sdx = self._delete_consecutive_repeats(sdx)
193
194 1
        if sdx:
195 1
            sdx = sdx[:1] + sdx[1:].replace('0', '')
196
197 1
        return sdx
198
199 1
    def _to_alpha(self, num):
200
        """Convert a Kölner Phonetik code from numeric to alphabetic.
201
202
        Parameters
203
        ----------
204
        num : str or int
205
            A numeric Kölner Phonetik representation
206
207
        Returns
208
        -------
209
        str
210
            An alphabetic representation of the same word
211
212
        Examples
213
        --------
214
        >>> pe = Koelner()
215
        >>> pe._to_alpha('862')
216
        'SNT'
217
        >>> pe._to_alpha('657')
218
        'NLR'
219
        >>> pe._to_alpha('86766')
220
        'SNRNN'
221
222
        """
223 1
        num = ''.join(c for c in text_type(num) if c in self._num_set)
224 1
        return num.translate(self._num_trans)
225
226 1
    def encode_alpha(self, word):
227
        """Return the Kölner Phonetik (alphabetic output) code for a word.
228
229
        Parameters
230
        ----------
231
        word : str
232
            The word to transform
233
234
        Returns
235
        -------
236
        str
237
            The Kölner Phonetik value as an alphabetic string
238
239
        Examples
240
        --------
241
        >>> pe = Koelner()
242
        >>> pe.encode_alpha('Smith')
243
        'SNT'
244
        >>> pe.encode_alpha('Schmidt')
245
        'SNT'
246
        >>> pe.encode_alpha('Müller')
247
        'NLR'
248
        >>> pe.encode_alpha('Zimmermann')
249
        'SNRNN'
250
251
        """
252 1
        return koelner_phonetik_num_to_alpha(koelner_phonetik(word))
253
254
255 1
def koelner_phonetik(word):
256
    """Return the Kölner Phonetik (numeric output) code for a word.
257
258
    This is a wrapper for :py:meth:`Koelner.encode`.
259
260
    Parameters
261
    ----------
262
    word : str
263
        The word to transform
264
265
    Returns
266
    -------
267
    str
268
        The Kölner Phonetik value as a numeric string
269
270
    Example
271
    -------
272
    >>> koelner_phonetik('Christopher')
273
    '478237'
274
    >>> koelner_phonetik('Niall')
275
    '65'
276
    >>> koelner_phonetik('Smith')
277
    '862'
278
    >>> koelner_phonetik('Schmidt')
279
    '862'
280
    >>> koelner_phonetik('Müller')
281
    '657'
282
    >>> koelner_phonetik('Zimmermann')
283
    '86766'
284
285
    """
286 1
    return Koelner().encode(word)
287
288
289 1
def koelner_phonetik_num_to_alpha(num):
290
    """Convert a Kölner Phonetik code from numeric to alphabetic.
291
292
    This is a wrapper for :py:meth:`Koelner._to_alpha`.
293
294
    Parameters
295
    ----------
296
    num : str or int
297
        A numeric Kölner Phonetik representation
298
299
    Returns
300
    -------
301
    str
302
        An alphabetic representation of the same word
303
304
    Examples
305
    --------
306
    >>> koelner_phonetik_num_to_alpha('862')
307
    'SNT'
308
    >>> koelner_phonetik_num_to_alpha('657')
309
    'NLR'
310
    >>> koelner_phonetik_num_to_alpha('86766')
311
    'SNRNN'
312
313
    """
314 1
    return Koelner()._to_alpha(num)
0 ignored issues
show
Coding Style Best Practice introduced by
It seems like _to_alpha was declared protected and should not be accessed from this context.

Prefixing a member variable _ is usually regarded as the equivalent of declaring it with protected visibility that exists in other languages. Consequentially, such a member should only be accessed from the same class or a child class:

class MyParent:
    def __init__(self):
        self._x = 1;
        self.y = 2;

class MyChild(MyParent):
    def some_method(self):
        return self._x    # Ok, since accessed from a child class

class AnotherClass:
    def some_method(self, instance_of_my_child):
        return instance_of_my_child._x   # Would be flagged as AnotherClass is not
                                         # a child class of MyParent
Loading history...
315
316
317 1
def koelner_phonetik_alpha(word):
318
    """Return the Kölner Phonetik (alphabetic output) code for a word.
319
320
    This is a wrapper for :py:meth:`Koelner.encode_alpha`.
321
322
    Parameters
323
    ----------
324
    word : str
325
        The word to transform
326
327
    Returns
328
    -------
329
    str
330
        The Kölner Phonetik value as an alphabetic string
331
332
    Examples
333
    --------
334
    >>> koelner_phonetik_alpha('Smith')
335
    'SNT'
336
    >>> koelner_phonetik_alpha('Schmidt')
337
    'SNT'
338
    >>> koelner_phonetik_alpha('Müller')
339
    'NLR'
340
    >>> koelner_phonetik_alpha('Zimmermann')
341
    'SNRNN'
342
343
    """
344 1
    return Koelner().encode_alpha(word)
345
346
347
if __name__ == '__main__':
348
    import doctest
349
350
    doctest.testmod()
351