Completed
Pull Request — master (#138)
by Chris
14:20
created

abydos.distance._synoname   F

Complexity

Total Complexity 157

Size/Duplication

Total Lines 692
Duplicated Lines 4.77 %

Test Coverage

Coverage 99.23%

Importance

Changes 0
Metric Value
wmc 157
eloc 421
dl 33
loc 692
ccs 257
cts 259
cp 0.9923
rs 2
c 0
b 0
f 0

4 Methods

Rating   Name   Duplication   Size   Complexity  
A Synoname._synoname_strip_punct() 0 15 3
A Synoname.dist() 0 32 1
F Synoname._synoname_word_approximation() 0 233 73
F Synoname.dist_abs() 33 273 79

1 Function

Rating   Name   Duplication   Size   Complexity  
A synoname() 0 43 1

How to fix   Duplicated Code    Complexity   

Duplicated Code

Duplicate code is one of the most pungent code smells. A rule that is often used is to re-structure code once it is duplicated in three or more places.

Common duplication problems, and corresponding solutions are:

Complexity

 Tip:   Before tackling complexity, make sure that you eliminate any duplication first. This often can reduce the size of classes significantly.

Complex classes like abydos.distance._synoname often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.distance.synoname.
20
21
The distance.synoname module implements Synoname.
22
"""
23
24 1
from __future__ import division, unicode_literals
25
26 1
from collections import Iterable
27
28 1
from ._distance import Distance
29 1
from ._levenshtein import levenshtein
30 1
from ._sequence import sim_ratcliff_obershelp
31
32
# noinspection PyProtectedMember
33 1
from ..fingerprint._synoname import SynonameToolcode
34
35 1
__all__ = ['Synoname', 'synoname']
36
37
38 1
class Synoname(Distance):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
39
    """Synoname.
40
41
    Cf. :cite:`Getty:1991,Gross:1991`
42
    """
43
44 1
    stc = SynonameToolcode()
45
46 1
    test_dict = {
47
        val: 2 ** n
48
        for n, val in enumerate(
49
            (
50
                'exact',
51
                'omission',
52
                'substitution',
53
                'transposition',
54
                'punctuation',
55
                'initials',
56
                'extension',
57
                'inclusion',
58
                'no_first',
59
                'word_approx',
60
                'confusions',
61
                'char_approx',
62
            )
63
        )
64
    }
65 1
    match_name = (
66
        '',
67
        'exact',
68
        'omission',
69
        'substitution',
70
        'transposition',
71
        'punctuation',
72
        'initials',
73
        'extension',
74
        'inclusion',
75
        'no_first',
76
        'word_approx',
77
        'confusions',
78
        'char_approx',
79
        'no_match',
80
    )
81 1
    match_type_dict = {val: n for n, val in enumerate(match_name)}
82
83 1
    def _synoname_strip_punct(self, word):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
84
        """Return a word with punctuation stripped out.
85
86
        :param word: a word to strip punctuation from
87
        :returns: The word stripped of punctuation
88
89
        >>> pe = Synoname()
90
        >>> pe._synoname_strip_punct('AB;CD EF-GH$IJ')
91
        'ABCD EFGHIJ'
92
        """
93 1
        stripped = ''
94 1
        for char in word:
95 1
            if char not in set(',-./:;"&\'()!{|}?$%*+<=>[\\]^_`~'):
96 1
                stripped += char
97 1
        return stripped.strip()
98
99 1
    def _synoname_word_approximation(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (32/15).
Loading history...
best-practice introduced by
Too many return statements (10/6)
Loading history...
100
        self, src_ln, tar_ln, src_fn='', tar_fn='', features=None
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
101
    ):
102
        """Return the Synoname word approximation score for two names.
103
104
        :param str src_ln: last name of the source
105
        :param str tar_ln: last name of the target
106
        :param str src_fn: first name of the source (optional)
107
        :param str tar_fn: first name of the target (optional)
108
        :param features: a dict containing special features calculated using
109
            fingerprint.SynonameToolcode (optional)
110
        :returns: The word approximation score
111
        :rtype: float
112
113
        >>> pe = Synoname()
114
        >>> pe._synoname_word_approximation('Smith Waterman', 'Waterman',
115
        ... 'Tom Joe Bob', 'Tom Joe')
116
        0.6
117
        """
118 1
        if features is None:
119 1
            features = {}
120 1
        if 'src_specials' not in features:
121 1
            features['src_specials'] = []
122 1
        if 'tar_specials' not in features:
123 1
            features['tar_specials'] = []
124
125 1
        src_len_specials = len(features['src_specials'])
126 1
        tar_len_specials = len(features['tar_specials'])
127
128
        # 1
129 1
        if ('gen_conflict' in features and features['gen_conflict']) or (
130
            'roman_conflict' in features and features['roman_conflict']
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
131
        ):
132 1
            return 0
133
134
        # 3 & 7
135 1
        full_tar1 = ' '.join((tar_ln, tar_fn)).replace('-', ' ').strip()
136 1
        for s_pos, s_type in features['tar_specials']:
137 1
            if s_type == 'a':
138 1
                full_tar1 = full_tar1[
139
                    : -(1 + len(self.stc.synoname_special_table[s_pos][1]))
140
                ]
141 1
            elif s_type == 'b':
142 1
                loc = (
143
                    full_tar1.find(
144
                        ' ' + self.stc.synoname_special_table[s_pos][1] + ' '
145
                    )
146
                    + 1
147
                )
148 1
                full_tar1 = (
149
                    full_tar1[:loc]
150
                    + full_tar1[
151
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
152
                    ]
153
                )
154 1
            elif s_type == 'c':
155 1
                full_tar1 = full_tar1[
156
                    1 + len(self.stc.synoname_special_table[s_pos][1]) :
157
                ]
158
159 1
        full_src1 = ' '.join((src_ln, src_fn)).replace('-', ' ').strip()
160 1
        for s_pos, s_type in features['src_specials']:
161 1
            if s_type == 'a':
162 1
                full_src1 = full_src1[
163
                    : -(1 + len(self.stc.synoname_special_table[s_pos][1]))
164
                ]
165 1
            elif s_type == 'b':
166 1
                loc = (
167
                    full_src1.find(
168
                        ' ' + self.stc.synoname_special_table[s_pos][1] + ' '
169
                    )
170
                    + 1
171
                )
172 1
                full_src1 = (
173
                    full_src1[:loc]
174
                    + full_src1[
175
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
176
                    ]
177
                )
178 1
            elif s_type == 'c':
179 1
                full_src1 = full_src1[
180
                    1 + len(self.stc.synoname_special_table[s_pos][1]) :
181
                ]
182
183 1
        full_tar2 = full_tar1
184 1
        for s_pos, s_type in features['tar_specials']:
185 1
            if s_type == 'd':
186 1
                full_tar2 = full_tar2[
187
                    len(self.stc.synoname_special_table[s_pos][1]) :
188
                ]
189 1
            elif (
190
                s_type == 'X'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
191
                and self.stc.synoname_special_table[s_pos][1] in full_tar2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
192
            ):
193 1
                loc = full_tar2.find(
194
                    ' ' + self.stc.synoname_special_table[s_pos][1]
195
                )
196 1
                full_tar2 = (
197
                    full_tar2[:loc]
198
                    + full_tar2[
199
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
200
                    ]
201
                )
202
203 1
        full_src2 = full_src1
204 1
        for s_pos, s_type in features['src_specials']:
205 1
            if s_type == 'd':
206 1
                full_src2 = full_src2[
207
                    len(self.stc.synoname_special_table[s_pos][1]) :
208
                ]
209 1
            elif (
210
                s_type == 'X'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
211
                and self.stc.synoname_special_table[s_pos][1] in full_src2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
212
            ):
213 1
                loc = full_src2.find(
214
                    ' ' + self.stc.synoname_special_table[s_pos][1]
215
                )
216 1
                full_src2 = (
217
                    full_src2[:loc]
218
                    + full_src2[
219
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
220
                    ]
221
                )
222
223 1
        full_tar1 = self._synoname_strip_punct(full_tar1)
224 1
        tar1_words = full_tar1.split()
225 1
        tar1_num_words = len(tar1_words)
226
227 1
        full_src1 = self._synoname_strip_punct(full_src1)
228 1
        src1_words = full_src1.split()
229 1
        src1_num_words = len(src1_words)
230
231 1
        full_tar2 = self._synoname_strip_punct(full_tar2)
232 1
        tar2_words = full_tar2.split()
233 1
        tar2_num_words = len(tar2_words)
234
235 1
        full_src2 = self._synoname_strip_punct(full_src2)
236 1
        src2_words = full_src2.split()
237 1
        src2_num_words = len(src2_words)
238
239
        # 2
240 1
        if (
241
            src1_num_words < 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
242
            and src_len_specials == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
243
            and src2_num_words < 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
244
            and tar_len_specials == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
245
        ):
246 1
            return 0
247
248
        # 4
249 1
        if (
250
            tar1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
251
            and src1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
252
            and tar1_words[0] == src1_words[0]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
253
        ):
254 1
            return 1
255 1
        if tar1_num_words < 2 and tar_len_specials == 0:
256 1
            return 0
257
258
        # 5
259 1
        last_found = False
260 1
        for word in tar1_words:
261 1
            if src_ln.endswith(word) or word + ' ' in src_ln:
262 1
                last_found = True
263
264 1
        if not last_found:
265 1
            for word in src1_words:
266 1
                if tar_ln.endswith(word) or word + ' ' in tar_ln:
267 1
                    last_found = True
268
269
        # 6
270 1
        matches = 0
271 1
        if last_found:
272 1
            for i, s_word in enumerate(src1_words):
273 1
                for j, t_word in enumerate(tar1_words):
274 1
                    if s_word == t_word:
275 1
                        src1_words[i] = '@'
276 1
                        tar1_words[j] = '@'
277 1
                        matches += 1
278 1
        w_ratio = matches / max(tar1_num_words, src1_num_words)
279 1
        if matches > 1 or (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
280
            matches == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
281
            and src1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
282
            and tar1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
283
            and (tar_len_specials > 0 or src_len_specials > 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
284
        ):
285 1
            return w_ratio
286
287
        # 8
288 1
        if (
289
            tar2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
290
            and src2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
291
            and tar2_words[0] == src2_words[0]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
292
        ):
293 1
            return 1
294
        # I see no way that the following can be True if the equivalent in
295
        # #4 was False.
296
        if tar2_num_words < 2 and tar_len_specials == 0:  # pragma: no cover
297
            return 0
298
299
        # 9
300 1
        last_found = False
301 1
        for word in tar2_words:
302 1
            if src_ln.endswith(word) or word + ' ' in src_ln:
303 1
                last_found = True
304
305 1
        if not last_found:
306 1
            for word in src2_words:
307 1
                if tar_ln.endswith(word) or word + ' ' in tar_ln:
308 1
                    last_found = True
309
310 1
        if not last_found:
311 1
            return 0
312
313
        # 10
314 1
        matches = 0
315 1
        if last_found:
316 1
            for i, s_word in enumerate(src2_words):
317 1
                for j, t_word in enumerate(tar2_words):
318 1
                    if s_word == t_word:
319 1
                        src2_words[i] = '@'
320 1
                        tar2_words[j] = '@'
321 1
                        matches += 1
322 1
        w_ratio = matches / max(tar2_num_words, src2_num_words)
323 1
        if matches > 1 or (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
324
            matches == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
325
            and src2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
326
            and tar2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
327
            and (tar_len_specials > 0 or src_len_specials > 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
328
        ):
329
            return w_ratio
330
331 1
        return 0
332
333 1
    def dist_abs(
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (44/15).
Loading history...
Bug introduced by
Parameters differ from overridden 'dist_abs' method
Loading history...
best-practice introduced by
Too many return statements (18/6)
Loading history...
334
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
335
        src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
336
        tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
337
        word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
338
        char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
339
        tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
340
        ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
341
    ):
342
        """Return the Synoname similarity type of two words.
343
344
        :param str src: source string for comparison
345
        :param str tar: target string for comparison
346
        :param bool ret_name: return the name of the match type rather than the
347
            int value
348
        :param float word_approx_min: the minimum word approximation value to
349
            signal a 'word_approx' match
350
        :param float char_approx_min: the minimum character approximation value
351
            to signal a 'char_approx' match
352
        :param int or Iterable tests: either an integer indicating tests to
353
            perform or a list of test names to perform (defaults to performing
354
            all tests)
355
        :param bool ret_name: if True, returns the match name rather than its
356
            integer equivalent
357
        :returns: Synoname value
358
        :rtype: int (or str if ret_name is True)
359
360
        >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''))
361
        2
362
        >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''),
363
        ... ret_name=True)
364
        'omission'
365
        >>> synoname(('Dore', 'Gustave', ''),
366
        ... ('Dore', 'Paul Gustave Louis Christophe', ''),
367
        ... ret_name=True)
368
        'inclusion'
369
        >>> synoname(('Pereira', 'I. R.', ''), ('Pereira', 'I. Smith', ''),
370
        ... ret_name=True)
371
        'word_approx'
372
        """
373 1
        if isinstance(tests, Iterable):
374 1
            new_tests = 0
375 1
            for term in tests:
376 1
                if term in self.test_dict:
377 1
                    new_tests += self.test_dict[term]
378 1
            tests = new_tests
379
380 1
        if isinstance(src, tuple):
381 1
            src_ln, src_fn, src_qual = src
382 1
        elif '#' in src:
383 1
            src_ln, src_fn, src_qual = src.split('#')[-3:]
384
        else:
385 1
            src_ln, src_fn, src_qual = src, '', ''
386
387 1
        if isinstance(tar, tuple):
388 1
            tar_ln, tar_fn, tar_qual = tar
389 1
        elif '#' in tar:
390 1
            tar_ln, tar_fn, tar_qual = tar.split('#')[-3:]
391
        else:
392 1
            tar_ln, tar_fn, tar_qual = tar, '', ''
393
394 1
        def _split_special(spec):
395 1
            spec_list = []
396 1
            while spec:
397 1
                spec_list.append((int(spec[:3]), spec[3:4]))
398 1
                spec = spec[4:]
399 1
            return spec_list
400
401 1
        def _fmt_retval(val):
402 1
            if ret_name:
403 1
                return self.match_name[val]
404 1
            return val
405
406
        # 1. Preprocessing
407
408
        # Lowercasing
409 1
        src_fn = src_fn.strip().lower()
410 1
        src_ln = src_ln.strip().lower()
411 1
        src_qual = src_qual.strip().lower()
412
413 1
        tar_fn = tar_fn.strip().lower()
414 1
        tar_ln = tar_ln.strip().lower()
415 1
        tar_qual = tar_qual.strip().lower()
416
417
        # Create toolcodes
418 1
        src_ln, src_fn, src_tc = self.stc.fingerprint(src_ln, src_fn, src_qual)
419 1
        tar_ln, tar_fn, tar_tc = self.stc.fingerprint(tar_ln, tar_fn, tar_qual)
420
421 1
        src_generation = int(src_tc[2])
422 1
        src_romancode = int(src_tc[3:6])
423 1
        src_len_fn = int(src_tc[6:8])
424 1
        src_tc = src_tc.split('$')
425 1
        src_specials = _split_special(src_tc[1])
426
427 1
        tar_generation = int(tar_tc[2])
428 1
        tar_romancode = int(tar_tc[3:6])
429 1
        tar_len_fn = int(tar_tc[6:8])
430 1
        tar_tc = tar_tc.split('$')
431 1
        tar_specials = _split_special(tar_tc[1])
432
433 1
        gen_conflict = (src_generation != tar_generation) and bool(
434
            src_generation or tar_generation
435
        )
436 1
        roman_conflict = (src_romancode != tar_romancode) and bool(
437
            src_romancode or tar_romancode
438
        )
439
440 1
        ln_equal = src_ln == tar_ln
441 1
        fn_equal = src_fn == tar_fn
442
443
        # approx_c
444 1
        def _approx_c():
445 1
            if gen_conflict or roman_conflict:
446 1
                return False, 0
447
448 1
            full_src = ' '.join((src_ln, src_fn))
449 1
            if full_src.startswith('master '):
450 1
                full_src = full_src[len('master ') :]
451 1
                for intro in [
452
                    'of the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
453
                    'of ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
454
                    'known as the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
455
                    'with the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
456
                    'with ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
457
                ]:
458 1
                    if full_src.startswith(intro):
459 1
                        full_src = full_src[len(intro) :]
460
461 1
            full_tar = ' '.join((tar_ln, tar_fn))
462 1
            if full_tar.startswith('master '):
463 1
                full_tar = full_tar[len('master ') :]
464 1
                for intro in [
465
                    'of the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
466
                    'of ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
467
                    'known as the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
468
                    'with the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
469
                    'with ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
470
                ]:
471 1
                    if full_tar.startswith(intro):
472 1
                        full_tar = full_tar[len(intro) :]
473
474 1
            loc_ratio = sim_ratcliff_obershelp(full_src, full_tar)
475 1
            return loc_ratio >= char_approx_min, loc_ratio
476
477 1
        approx_c_result, ca_ratio = _approx_c()
0 ignored issues
show
Unused Code introduced by
The variable approx_c_result seems to be unused.
Loading history...
478
479 1
        if tests & self.test_dict['exact'] and fn_equal and ln_equal:
480 1
            return _fmt_retval(self.match_type_dict['exact'])
481 1
        if tests & self.test_dict['omission']:
482 1 View Code Duplication
            if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
483
                fn_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
484
                and levenshtein(src_ln, tar_ln, cost=(1, 1, 99, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
485
            ):
486 1
                if not roman_conflict:
487 1
                    return _fmt_retval(self.match_type_dict['omission'])
488 1
            elif (
489
                ln_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
490
                and levenshtein(src_fn, tar_fn, cost=(1, 1, 99, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
491
            ):
492 1
                return _fmt_retval(self.match_type_dict['omission'])
493 1 View Code Duplication
        if tests & self.test_dict['substitution']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
494 1
            if (
495
                fn_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
496
                and levenshtein(src_ln, tar_ln, cost=(99, 99, 1, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
497
            ):
498 1
                return _fmt_retval(self.match_type_dict['substitution'])
499 1
            elif (
500
                ln_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
501
                and levenshtein(src_fn, tar_fn, cost=(99, 99, 1, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
502
            ):
503 1
                return _fmt_retval(self.match_type_dict['substitution'])
504 1 View Code Duplication
        if tests & self.test_dict['transposition']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
505 1
            if fn_equal and (
506
                levenshtein(src_ln, tar_ln, mode='osa', cost=(99, 99, 99, 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
507
                == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
508
            ):
509 1
                return _fmt_retval(self.match_type_dict['transposition'])
510 1
            elif ln_equal and (
511
                levenshtein(src_fn, tar_fn, mode='osa', cost=(99, 99, 99, 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
512
                == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
513
            ):
514 1
                return _fmt_retval(self.match_type_dict['transposition'])
515 1
        if tests & self.test_dict['punctuation']:
516 1
            np_src_fn = self._synoname_strip_punct(src_fn)
517 1
            np_tar_fn = self._synoname_strip_punct(tar_fn)
518 1
            np_src_ln = self._synoname_strip_punct(src_ln)
519 1
            np_tar_ln = self._synoname_strip_punct(tar_ln)
520
521 1
            if (np_src_fn == np_tar_fn) and (np_src_ln == np_tar_ln):
522 1
                return _fmt_retval(self.match_type_dict['punctuation'])
523
524 1
            np_src_fn = self._synoname_strip_punct(src_fn.replace('-', ' '))
525 1
            np_tar_fn = self._synoname_strip_punct(tar_fn.replace('-', ' '))
526 1
            np_src_ln = self._synoname_strip_punct(src_ln.replace('-', ' '))
527 1
            np_tar_ln = self._synoname_strip_punct(tar_ln.replace('-', ' '))
528
529 1
            if (np_src_fn == np_tar_fn) and (np_src_ln == np_tar_ln):
530 1
                return _fmt_retval(self.match_type_dict['punctuation'])
531
532 1
        if tests & self.test_dict['initials'] and ln_equal:
533 1
            if src_fn and tar_fn:
534 1
                src_initials = self._synoname_strip_punct(src_fn).split()
535 1
                tar_initials = self._synoname_strip_punct(tar_fn).split()
536 1
                initials = bool(
537
                    (len(src_initials) == len(''.join(src_initials)))
538
                    or (len(tar_initials) == len(''.join(tar_initials)))
539
                )
540 1
                if initials:
541 1
                    src_initials = ''.join(_[0] for _ in src_initials)
542 1
                    tar_initials = ''.join(_[0] for _ in tar_initials)
543 1
                    if src_initials == tar_initials:
544 1
                        return _fmt_retval(self.match_type_dict['initials'])
545 1
                    initial_diff = abs(len(src_initials) - len(tar_initials))
546 1
                    if initial_diff and (
547
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
548
                            initial_diff
549
                            == levenshtein(
550
                                src_initials,
551
                                tar_initials,
552
                                cost=(1, 99, 99, 99),
553
                            )
554
                        )
555
                        or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
556
                            initial_diff
557
                            == levenshtein(
558
                                tar_initials,
559
                                src_initials,
560
                                cost=(1, 99, 99, 99),
561
                            )
562
                        )
563
                    ):
564 1
                        return _fmt_retval(self.match_type_dict['initials'])
565 1
        if tests & self.test_dict['extension']:
566 1
            if src_ln[1] == tar_ln[1] and (
567
                src_ln.startswith(tar_ln) or tar_ln.startswith(src_ln)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
568
            ):
569 1
                if (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
570
                    (not src_len_fn and not tar_len_fn)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
571
                    or (tar_fn and src_fn.startswith(tar_fn))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
572
                    or (src_fn and tar_fn.startswith(src_fn))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
573
                ) and not roman_conflict:
574 1
                    return _fmt_retval(self.match_type_dict['extension'])
575 1
        if tests & self.test_dict['inclusion'] and ln_equal:
576 1
            if (src_fn and src_fn in tar_fn) or (tar_fn and tar_fn in src_ln):
577 1
                return _fmt_retval(self.match_type_dict['inclusion'])
578 1
        if tests & self.test_dict['no_first'] and ln_equal:
579 1
            if src_fn == '' or tar_fn == '':
580 1
                return _fmt_retval(self.match_type_dict['no_first'])
581 1
        if tests & self.test_dict['word_approx']:
582 1
            ratio = self._synoname_word_approximation(
583
                src_ln,
584
                tar_ln,
585
                src_fn,
586
                tar_fn,
587
                {
588
                    'gen_conflict': gen_conflict,
589
                    'roman_conflict': roman_conflict,
590
                    'src_specials': src_specials,
591
                    'tar_specials': tar_specials,
592
                },
593
            )
594 1
            if ratio == 1 and tests & self.test_dict['confusions']:
595 1
                if (
596
                    ' '.join((src_fn, src_ln)).strip()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
597
                    == ' '.join((tar_fn, tar_ln)).strip()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
598
                ):
599 1
                    return _fmt_retval(self.match_type_dict['confusions'])
600 1
            if ratio >= word_approx_min:
601 1
                return _fmt_retval(self.match_type_dict['word_approx'])
602 1
        if tests & self.test_dict['char_approx']:
603 1
            if ca_ratio >= char_approx_min:
604 1
                return _fmt_retval(self.match_type_dict['char_approx'])
605 1
        return _fmt_retval(self.match_type_dict['no_match'])
606
607 1
    def dist(
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
Bug introduced by
Parameters differ from overridden 'dist' method
Loading history...
608
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
609
        src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
610
        tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
611
        word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
612
        char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
613
        tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
614
        ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
615
    ):
616
        """Return the normalized Synoname distance between two words.
617
618
        :param str src: source string for comparison
619
        :param str tar: target string for comparison
620
        :param bool ret_name: return the name of the match type rather than the
621
            int value
622
        :param float word_approx_min: the minimum word approximation value to
623
            signal a 'word_approx' match
624
        :param float char_approx_min: the minimum character approximation value
625
            to signal a 'char_approx' match
626
        :param int or Iterable tests: either an integer indicating tests to
627
            perform or a list of test names to perform (defaults to performing
628
            all tests)
629
        :param bool ret_name: if True, returns the match name rather than its
630
            integer equivalent
631
        :returns: Synoname value
632
        :rtype: int (or str if ret_name is True)
633
        """
634
        return (
635
            synoname(
636
                src, tar, word_approx_min, char_approx_min, tests, ret_name
637
            )
638
            / 14
639
        )
640
641
642 1
def synoname(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
643
    src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
644
    tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
645
    word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
646
    char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
647
    tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
648
    ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
649
):
650
    """Return the Synoname similarity type of two words.
651
652
    This is a wrapper for :py:meth:`Synoname.synoname`.
653
654
    :param str src: source string for comparison
655
    :param str tar: target string for comparison
656
    :param bool ret_name: return the name of the match type rather than the
657
        int value
658
    :param float word_approx_min: the minimum word approximation value to
659
        signal a 'word_approx' match
660
    :param float char_approx_min: the minimum character approximation value to
661
        signal a 'char_approx' match
662
    :param int or Iterable tests: either an integer indicating tests to
663
        perform or a list of test names to perform (defaults to performing all
664
        tests)
665
    :param bool ret_name: if True, returns the match name rather than its
666
        integer equivalent
667
    :returns: Synoname value
668
    :rtype: int (or str if ret_name is True)
669
670
    >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''))
671
    2
672
    >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''),
673
    ... ret_name=True)
674
    'omission'
675
    >>> synoname(('Dore', 'Gustave', ''),
676
    ... ('Dore', 'Paul Gustave Louis Christophe', ''),
677
    ... ret_name=True)
678
    'inclusion'
679
    >>> synoname(('Pereira', 'I. R.', ''), ('Pereira', 'I. Smith', ''),
680
    ... ret_name=True)
681
    'word_approx'
682
    """
683 1
    return Synoname().dist_abs(
684
        src, tar, word_approx_min, char_approx_min, tests, ret_name
685
    )
686
687
688
if __name__ == '__main__':
689
    import doctest
690
691
    doctest.testmod()
692