Completed
Pull Request — master (#141)
by Chris
16:23
created

abydos.distance._synoname.Synoname.dist_abs()   F

Complexity

Conditions 79

Size

Total Lines 282
Code Lines 192

Duplication

Lines 33
Ratio 11.7 %

Code Coverage

Tests 132
CRAP Score 79

Importance

Changes 0
Metric Value
cc 79
eloc 192
nop 7
dl 33
loc 282
ccs 132
cts 132
cp 1
crap 79
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.distance._synoname.Synoname.dist_abs() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.distance._synoname.
20
21
Synoname.
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from collections import Iterable
32
33 1
from ._distance import _Distance
34 1
from ._levenshtein import levenshtein
35 1
from ._ratcliff_obershelp import sim_ratcliff_obershelp
36
37
# noinspection PyProtectedMember
38 1
from ..fingerprint._synoname import SynonameToolcode
39
40 1
__all__ = ['Synoname', 'synoname']
41
42
43 1
class Synoname(_Distance):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
44
    """Synoname.
45
46
    Cf. :cite:`Getty:1991,Gross:1991`
47
    """
48
49 1
    stc = SynonameToolcode()
50
51 1
    test_dict = {
52
        val: 2 ** n
53
        for n, val in enumerate(
54
            (
55
                'exact',
56
                'omission',
57
                'substitution',
58
                'transposition',
59
                'punctuation',
60
                'initials',
61
                'extension',
62
                'inclusion',
63
                'no_first',
64
                'word_approx',
65
                'confusions',
66
                'char_approx',
67
            )
68
        )
69
    }
70 1
    match_name = (
71
        '',
72
        'exact',
73
        'omission',
74
        'substitution',
75
        'transposition',
76
        'punctuation',
77
        'initials',
78
        'extension',
79
        'inclusion',
80
        'no_first',
81
        'word_approx',
82
        'confusions',
83
        'char_approx',
84
        'no_match',
85
    )
86 1
    match_type_dict = {val: n for n, val in enumerate(match_name)}
87
88 1
    def _synoname_strip_punct(self, word):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
89
        """Return a word with punctuation stripped out.
90
91
        Parameters
92
        ----------
93
        word : str
94
            A word to strip punctuation from
95
96
        Returns
97
        -------
98
        str
99
            The word stripped of punctuation
100
101
        Examples
102
        --------
103
        >>> pe = Synoname()
104
        >>> pe._synoname_strip_punct('AB;CD EF-GH$IJ')
105
        'ABCD EFGHIJ'
106
107
        """
108 1
        stripped = ''
109 1
        for char in word:
110 1
            if char not in set(',-./:;"&\'()!{|}?$%*+<=>[\\]^_`~'):
111 1
                stripped += char
112 1
        return stripped.strip()
113
114 1
    def _synoname_word_approximation(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (32/15).
Loading history...
best-practice introduced by
Too many return statements (10/6)
Loading history...
115
        self, src_ln, tar_ln, src_fn='', tar_fn='', features=None
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
116
    ):
117
        """Return the Synoname word approximation score for two names.
118
119
        Parameters
120
        ----------
121
        src_ln : str
122
            Last name of the source
123
        tar_ln : str
124
            Last name of the target
125
        src_fn : str
126
            First name of the source (optional)
127
        tar_fn : str
128
            First name of the target (optional)
129
        features : dict
130
            A dict containing special features calculated using
131
            :py:class:`fingerprint.SynonameToolcode` (optional)
132
133
        Returns
134
        -------
135
        float
136
            The word approximation score
137
138
        Examples
139
        --------
140
        >>> pe = Synoname()
141
        >>> pe._synoname_word_approximation('Smith Waterman', 'Waterman',
142
        ... 'Tom Joe Bob', 'Tom Joe')
143
        0.6
144
145
        """
146 1
        if features is None:
147 1
            features = {}
148 1
        if 'src_specials' not in features:
149 1
            features['src_specials'] = []
150 1
        if 'tar_specials' not in features:
151 1
            features['tar_specials'] = []
152
153 1
        src_len_specials = len(features['src_specials'])
154 1
        tar_len_specials = len(features['tar_specials'])
155
156
        # 1
157 1
        if ('gen_conflict' in features and features['gen_conflict']) or (
158
            'roman_conflict' in features and features['roman_conflict']
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
159
        ):
160 1
            return 0
161
162
        # 3 & 7
163 1
        full_tar1 = ' '.join((tar_ln, tar_fn)).replace('-', ' ').strip()
164 1
        for s_pos, s_type in features['tar_specials']:
165 1
            if s_type == 'a':
166 1
                full_tar1 = full_tar1[
167
                    : -(1 + len(self.stc.synoname_special_table[s_pos][1]))
168
                ]
169 1
            elif s_type == 'b':
170 1
                loc = (
171
                    full_tar1.find(
172
                        ' ' + self.stc.synoname_special_table[s_pos][1] + ' '
173
                    )
174
                    + 1
175
                )
176 1
                full_tar1 = (
177
                    full_tar1[:loc]
178
                    + full_tar1[
179
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
180
                    ]
181
                )
182 1
            elif s_type == 'c':
183 1
                full_tar1 = full_tar1[
184
                    1 + len(self.stc.synoname_special_table[s_pos][1]) :
185
                ]
186
187 1
        full_src1 = ' '.join((src_ln, src_fn)).replace('-', ' ').strip()
188 1
        for s_pos, s_type in features['src_specials']:
189 1
            if s_type == 'a':
190 1
                full_src1 = full_src1[
191
                    : -(1 + len(self.stc.synoname_special_table[s_pos][1]))
192
                ]
193 1
            elif s_type == 'b':
194 1
                loc = (
195
                    full_src1.find(
196
                        ' ' + self.stc.synoname_special_table[s_pos][1] + ' '
197
                    )
198
                    + 1
199
                )
200 1
                full_src1 = (
201
                    full_src1[:loc]
202
                    + full_src1[
203
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
204
                    ]
205
                )
206 1
            elif s_type == 'c':
207 1
                full_src1 = full_src1[
208
                    1 + len(self.stc.synoname_special_table[s_pos][1]) :
209
                ]
210
211 1
        full_tar2 = full_tar1
212 1
        for s_pos, s_type in features['tar_specials']:
213 1
            if s_type == 'd':
214 1
                full_tar2 = full_tar2[
215
                    len(self.stc.synoname_special_table[s_pos][1]) :
216
                ]
217 1
            elif (
218
                s_type == 'X'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
219
                and self.stc.synoname_special_table[s_pos][1] in full_tar2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
220
            ):
221 1
                loc = full_tar2.find(
222
                    ' ' + self.stc.synoname_special_table[s_pos][1]
223
                )
224 1
                full_tar2 = (
225
                    full_tar2[:loc]
226
                    + full_tar2[
227
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
228
                    ]
229
                )
230
231 1
        full_src2 = full_src1
232 1
        for s_pos, s_type in features['src_specials']:
233 1
            if s_type == 'd':
234 1
                full_src2 = full_src2[
235
                    len(self.stc.synoname_special_table[s_pos][1]) :
236
                ]
237 1
            elif (
238
                s_type == 'X'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
239
                and self.stc.synoname_special_table[s_pos][1] in full_src2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
240
            ):
241 1
                loc = full_src2.find(
242
                    ' ' + self.stc.synoname_special_table[s_pos][1]
243
                )
244 1
                full_src2 = (
245
                    full_src2[:loc]
246
                    + full_src2[
247
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
248
                    ]
249
                )
250
251 1
        full_tar1 = self._synoname_strip_punct(full_tar1)
252 1
        tar1_words = full_tar1.split()
253 1
        tar1_num_words = len(tar1_words)
254
255 1
        full_src1 = self._synoname_strip_punct(full_src1)
256 1
        src1_words = full_src1.split()
257 1
        src1_num_words = len(src1_words)
258
259 1
        full_tar2 = self._synoname_strip_punct(full_tar2)
260 1
        tar2_words = full_tar2.split()
261 1
        tar2_num_words = len(tar2_words)
262
263 1
        full_src2 = self._synoname_strip_punct(full_src2)
264 1
        src2_words = full_src2.split()
265 1
        src2_num_words = len(src2_words)
266
267
        # 2
268 1
        if (
269
            src1_num_words < 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
270
            and src_len_specials == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
271
            and src2_num_words < 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
272
            and tar_len_specials == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
273
        ):
274 1
            return 0
275
276
        # 4
277 1
        if (
278
            tar1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
279
            and src1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
280
            and tar1_words[0] == src1_words[0]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
281
        ):
282 1
            return 1
283 1
        if tar1_num_words < 2 and tar_len_specials == 0:
284 1
            return 0
285
286
        # 5
287 1
        last_found = False
288 1
        for word in tar1_words:
289 1
            if src_ln.endswith(word) or word + ' ' in src_ln:
290 1
                last_found = True
291
292 1
        if not last_found:
293 1
            for word in src1_words:
294 1
                if tar_ln.endswith(word) or word + ' ' in tar_ln:
295 1
                    last_found = True
296
297
        # 6
298 1
        matches = 0
299 1
        if last_found:
300 1
            for i, s_word in enumerate(src1_words):
301 1
                for j, t_word in enumerate(tar1_words):
302 1
                    if s_word == t_word:
303 1
                        src1_words[i] = '@'
304 1
                        tar1_words[j] = '@'
305 1
                        matches += 1
306 1
        w_ratio = matches / max(tar1_num_words, src1_num_words)
307 1
        if matches > 1 or (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
308
            matches == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
309
            and src1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
310
            and tar1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
311
            and (tar_len_specials > 0 or src_len_specials > 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
312
        ):
313 1
            return w_ratio
314
315
        # 8
316 1
        if (
317
            tar2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
318
            and src2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
319
            and tar2_words[0] == src2_words[0]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
320
        ):
321 1
            return 1
322
        # I see no way that the following can be True if the equivalent in
323
        # #4 was False.
324
        if tar2_num_words < 2 and tar_len_specials == 0:  # pragma: no cover
325
            return 0
326
327
        # 9
328 1
        last_found = False
329 1
        for word in tar2_words:
330 1
            if src_ln.endswith(word) or word + ' ' in src_ln:
331 1
                last_found = True
332
333 1
        if not last_found:
334 1
            for word in src2_words:
335 1
                if tar_ln.endswith(word) or word + ' ' in tar_ln:
336 1
                    last_found = True
337
338 1
        if not last_found:
339 1
            return 0
340
341
        # 10
342 1
        matches = 0
343 1
        if last_found:
344 1
            for i, s_word in enumerate(src2_words):
345 1
                for j, t_word in enumerate(tar2_words):
346 1
                    if s_word == t_word:
347 1
                        src2_words[i] = '@'
348 1
                        tar2_words[j] = '@'
349 1
                        matches += 1
350 1
        w_ratio = matches / max(tar2_num_words, src2_num_words)
351 1
        if matches > 1 or (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
352
            matches == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
353
            and src2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
354
            and tar2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
355
            and (tar_len_specials > 0 or src_len_specials > 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
356
        ):
357
            return w_ratio
358
359 1
        return 0
360
361 1
    def dist_abs(
0 ignored issues
show
best-practice introduced by
Too many arguments (7/5)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (44/15).
Loading history...
Bug introduced by
Parameters differ from overridden 'dist_abs' method
Loading history...
best-practice introduced by
Too many return statements (18/6)
Loading history...
362
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
363
        src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
364
        tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
365
        word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
366
        char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
367
        tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
368
        ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
369
    ):
370
        """Return the Synoname similarity type of two words.
371
372
        Parameters
373
        ----------
374
        src : str
375
            Source string for comparison
376
        tar : str
377
            Target string for comparison
378
        word_approx_min : float
379
            The minimum word approximation value to signal a 'word_approx'
380
            match
381
        char_approx_min : float
382
            The minimum character approximation value to signal a 'char_approx'
383
            match
384
        tests : int or Iterable
385
            Either an integer indicating tests to perform or a list of test
386
            names to perform (defaults to performing all tests)
387
        ret_name : bool
388
            If True, returns the match name rather than its integer equivalent
389
390
        Returns
391
        -------
392
        int (or str if ret_name is True)
393
            Synoname value
394
395
        Examples
396
        --------
397
        >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''))
398
        2
399
        >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''),
400
        ... ret_name=True)
401
        'omission'
402
        >>> synoname(('Dore', 'Gustave', ''),
403
        ... ('Dore', 'Paul Gustave Louis Christophe', ''), ret_name=True)
404
        'inclusion'
405
        >>> synoname(('Pereira', 'I. R.', ''), ('Pereira', 'I. Smith', ''),
406
        ... ret_name=True)
407
        'word_approx'
408
409
        """
410 1
        if isinstance(tests, Iterable):
411 1
            new_tests = 0
412 1
            for term in tests:
413 1
                if term in self.test_dict:
414 1
                    new_tests += self.test_dict[term]
415 1
            tests = new_tests
416
417 1
        if isinstance(src, tuple):
418 1
            src_ln, src_fn, src_qual = src
419 1
        elif '#' in src:
420 1
            src_ln, src_fn, src_qual = src.split('#')[-3:]
421
        else:
422 1
            src_ln, src_fn, src_qual = src, '', ''
423
424 1
        if isinstance(tar, tuple):
425 1
            tar_ln, tar_fn, tar_qual = tar
426 1
        elif '#' in tar:
427 1
            tar_ln, tar_fn, tar_qual = tar.split('#')[-3:]
428
        else:
429 1
            tar_ln, tar_fn, tar_qual = tar, '', ''
430
431 1
        def _split_special(spec):
432 1
            spec_list = []
433 1
            while spec:
434 1
                spec_list.append((int(spec[:3]), spec[3:4]))
435 1
                spec = spec[4:]
436 1
            return spec_list
437
438 1
        def _fmt_retval(val):
439 1
            if ret_name:
440 1
                return self.match_name[val]
441 1
            return val
442
443
        # 1. Preprocessing
444
445
        # Lowercasing
446 1
        src_fn = src_fn.strip().lower()
447 1
        src_ln = src_ln.strip().lower()
448 1
        src_qual = src_qual.strip().lower()
449
450 1
        tar_fn = tar_fn.strip().lower()
451 1
        tar_ln = tar_ln.strip().lower()
452 1
        tar_qual = tar_qual.strip().lower()
453
454
        # Create toolcodes
455 1
        src_ln, src_fn, src_tc = self.stc.fingerprint(src_ln, src_fn, src_qual)
456 1
        tar_ln, tar_fn, tar_tc = self.stc.fingerprint(tar_ln, tar_fn, tar_qual)
457
458 1
        src_generation = int(src_tc[2])
459 1
        src_romancode = int(src_tc[3:6])
460 1
        src_len_fn = int(src_tc[6:8])
461 1
        src_tc = src_tc.split('$')
462 1
        src_specials = _split_special(src_tc[1])
463
464 1
        tar_generation = int(tar_tc[2])
465 1
        tar_romancode = int(tar_tc[3:6])
466 1
        tar_len_fn = int(tar_tc[6:8])
467 1
        tar_tc = tar_tc.split('$')
468 1
        tar_specials = _split_special(tar_tc[1])
469
470 1
        gen_conflict = (src_generation != tar_generation) and bool(
471
            src_generation or tar_generation
472
        )
473 1
        roman_conflict = (src_romancode != tar_romancode) and bool(
474
            src_romancode or tar_romancode
475
        )
476
477 1
        ln_equal = src_ln == tar_ln
478 1
        fn_equal = src_fn == tar_fn
479
480
        # approx_c
481 1
        def _approx_c():
482 1
            if gen_conflict or roman_conflict:
483 1
                return False, 0
484
485 1
            full_src = ' '.join((src_ln, src_fn))
486 1
            if full_src.startswith('master '):
487 1
                full_src = full_src[len('master ') :]
488 1
                for intro in [
489
                    'of the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
490
                    'of ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
491
                    'known as the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
492
                    'with the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
493
                    'with ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
494
                ]:
495 1
                    if full_src.startswith(intro):
496 1
                        full_src = full_src[len(intro) :]
497
498 1
            full_tar = ' '.join((tar_ln, tar_fn))
499 1
            if full_tar.startswith('master '):
500 1
                full_tar = full_tar[len('master ') :]
501 1
                for intro in [
502
                    'of the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
503
                    'of ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
504
                    'known as the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
505
                    'with the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
506
                    'with ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
507
                ]:
508 1
                    if full_tar.startswith(intro):
509 1
                        full_tar = full_tar[len(intro) :]
510
511 1
            loc_ratio = sim_ratcliff_obershelp(full_src, full_tar)
512 1
            return loc_ratio >= char_approx_min, loc_ratio
513
514 1
        approx_c_result, ca_ratio = _approx_c()
0 ignored issues
show
Unused Code introduced by
The variable approx_c_result seems to be unused.
Loading history...
515
516 1
        if tests & self.test_dict['exact'] and fn_equal and ln_equal:
517 1
            return _fmt_retval(self.match_type_dict['exact'])
518 1
        if tests & self.test_dict['omission']:
519 1 View Code Duplication
            if (
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
520
                fn_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
521
                and levenshtein(src_ln, tar_ln, cost=(1, 1, 99, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
522
            ):
523 1
                if not roman_conflict:
524 1
                    return _fmt_retval(self.match_type_dict['omission'])
525 1
            elif (
526
                ln_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
527
                and levenshtein(src_fn, tar_fn, cost=(1, 1, 99, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
528
            ):
529 1
                return _fmt_retval(self.match_type_dict['omission'])
530 1 View Code Duplication
        if tests & self.test_dict['substitution']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
531 1
            if (
532
                fn_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
533
                and levenshtein(src_ln, tar_ln, cost=(99, 99, 1, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
534
            ):
535 1
                return _fmt_retval(self.match_type_dict['substitution'])
536 1
            elif (
537
                ln_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
538
                and levenshtein(src_fn, tar_fn, cost=(99, 99, 1, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
539
            ):
540 1
                return _fmt_retval(self.match_type_dict['substitution'])
541 1 View Code Duplication
        if tests & self.test_dict['transposition']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
542 1
            if fn_equal and (
543
                levenshtein(src_ln, tar_ln, mode='osa', cost=(99, 99, 99, 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
544
                == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
545
            ):
546 1
                return _fmt_retval(self.match_type_dict['transposition'])
547 1
            elif ln_equal and (
548
                levenshtein(src_fn, tar_fn, mode='osa', cost=(99, 99, 99, 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
549
                == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
550
            ):
551 1
                return _fmt_retval(self.match_type_dict['transposition'])
552 1
        if tests & self.test_dict['punctuation']:
553 1
            np_src_fn = self._synoname_strip_punct(src_fn)
554 1
            np_tar_fn = self._synoname_strip_punct(tar_fn)
555 1
            np_src_ln = self._synoname_strip_punct(src_ln)
556 1
            np_tar_ln = self._synoname_strip_punct(tar_ln)
557
558 1
            if (np_src_fn == np_tar_fn) and (np_src_ln == np_tar_ln):
559 1
                return _fmt_retval(self.match_type_dict['punctuation'])
560
561 1
            np_src_fn = self._synoname_strip_punct(src_fn.replace('-', ' '))
562 1
            np_tar_fn = self._synoname_strip_punct(tar_fn.replace('-', ' '))
563 1
            np_src_ln = self._synoname_strip_punct(src_ln.replace('-', ' '))
564 1
            np_tar_ln = self._synoname_strip_punct(tar_ln.replace('-', ' '))
565
566 1
            if (np_src_fn == np_tar_fn) and (np_src_ln == np_tar_ln):
567 1
                return _fmt_retval(self.match_type_dict['punctuation'])
568
569 1
        if tests & self.test_dict['initials'] and ln_equal:
570 1
            if src_fn and tar_fn:
571 1
                src_initials = self._synoname_strip_punct(src_fn).split()
572 1
                tar_initials = self._synoname_strip_punct(tar_fn).split()
573 1
                initials = bool(
574
                    (len(src_initials) == len(''.join(src_initials)))
575
                    or (len(tar_initials) == len(''.join(tar_initials)))
576
                )
577 1
                if initials:
578 1
                    src_initials = ''.join(_[0] for _ in src_initials)
579 1
                    tar_initials = ''.join(_[0] for _ in tar_initials)
580 1
                    if src_initials == tar_initials:
581 1
                        return _fmt_retval(self.match_type_dict['initials'])
582 1
                    initial_diff = abs(len(src_initials) - len(tar_initials))
583 1
                    if initial_diff and (
584
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
585
                            initial_diff
586
                            == levenshtein(
587
                                src_initials,
588
                                tar_initials,
589
                                cost=(1, 99, 99, 99),
590
                            )
591
                        )
592
                        or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
593
                            initial_diff
594
                            == levenshtein(
595
                                tar_initials,
596
                                src_initials,
597
                                cost=(1, 99, 99, 99),
598
                            )
599
                        )
600
                    ):
601 1
                        return _fmt_retval(self.match_type_dict['initials'])
602 1
        if tests & self.test_dict['extension']:
603 1
            if src_ln[1] == tar_ln[1] and (
604
                src_ln.startswith(tar_ln) or tar_ln.startswith(src_ln)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
605
            ):
606 1
                if (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
607
                    (not src_len_fn and not tar_len_fn)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
608
                    or (tar_fn and src_fn.startswith(tar_fn))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
609
                    or (src_fn and tar_fn.startswith(src_fn))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
610
                ) and not roman_conflict:
611 1
                    return _fmt_retval(self.match_type_dict['extension'])
612 1
        if tests & self.test_dict['inclusion'] and ln_equal:
613 1
            if (src_fn and src_fn in tar_fn) or (tar_fn and tar_fn in src_ln):
614 1
                return _fmt_retval(self.match_type_dict['inclusion'])
615 1
        if tests & self.test_dict['no_first'] and ln_equal:
616 1
            if src_fn == '' or tar_fn == '':
617 1
                return _fmt_retval(self.match_type_dict['no_first'])
618 1
        if tests & self.test_dict['word_approx']:
619 1
            ratio = self._synoname_word_approximation(
620
                src_ln,
621
                tar_ln,
622
                src_fn,
623
                tar_fn,
624
                {
625
                    'gen_conflict': gen_conflict,
626
                    'roman_conflict': roman_conflict,
627
                    'src_specials': src_specials,
628
                    'tar_specials': tar_specials,
629
                },
630
            )
631 1
            if ratio == 1 and tests & self.test_dict['confusions']:
632 1
                if (
633
                    ' '.join((src_fn, src_ln)).strip()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
634
                    == ' '.join((tar_fn, tar_ln)).strip()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
635
                ):
636 1
                    return _fmt_retval(self.match_type_dict['confusions'])
637 1
            if ratio >= word_approx_min:
638 1
                return _fmt_retval(self.match_type_dict['word_approx'])
639 1
        if tests & self.test_dict['char_approx']:
640 1
            if ca_ratio >= char_approx_min:
641 1
                return _fmt_retval(self.match_type_dict['char_approx'])
642 1
        return _fmt_retval(self.match_type_dict['no_match'])
643
644 1
    def dist(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
Bug introduced by
Parameters differ from overridden 'dist' method
Loading history...
645
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
646
        src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
647
        tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
648
        word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
649
        char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
650
        tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
651
    ):
652
        """Return the normalized Synoname distance between two words.
653
654
        Parameters
655
        ----------
656
        src : str
657
            Source string for comparison
658
        tar : str
659
            Target string for comparison
660
        word_approx_min : float
661
            The minimum word approximation value to signal a 'word_approx'
662
            match
663
        char_approx_min : float
664
            The minimum character approximation value to signal a 'char_approx'
665
            match
666
        tests : int or Iterable
667
            Either an integer indicating tests to perform or a list of test
668
            names to perform (defaults to performing all tests)
669
670
        Returns
671
        -------
672
        float
673
            Normalized Synoname distance
674
675
        """
676
        return (
677
            synoname(src, tar, word_approx_min, char_approx_min, tests, False)
678
            / 14
679
        )
680
681
682 1
def synoname(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
683
    src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
684
    tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
685
    word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
686
    char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
687
    tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
688
    ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
689
):
690
    """Return the Synoname similarity type of two words.
691
692
    This is a wrapper for :py:meth:`Synoname.dist_abs`.
693
694
    Parameters
695
    ----------
696
    src : str
697
        Source string for comparison
698
    tar : str
699
        Target string for comparison
700
    word_approx_min : float
701
        The minimum word approximation value to signal a 'word_approx' match
702
    char_approx_min : float
703
        The minimum character approximation value to signal a 'char_approx'
704
        match
705
    tests : int or Iterable
706
        Either an integer indicating tests to perform or a list of test names
707
        to perform (defaults to performing all tests)
708
    ret_name : bool
709
        If True, returns the match name rather than its integer equivalent
710
711
    Returns
712
    -------
713
    int (or str if ret_name is True)
714
        Synoname value
715
716
    Examples
717
    --------
718
    >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''))
719
    2
720
    >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''),
721
    ... ret_name=True)
722
    'omission'
723
    >>> synoname(('Dore', 'Gustave', ''),
724
    ... ('Dore', 'Paul Gustave Louis Christophe', ''), ret_name=True)
725
    'inclusion'
726
    >>> synoname(('Pereira', 'I. R.', ''), ('Pereira', 'I. Smith', ''),
727
    ... ret_name=True)
728
    'word_approx'
729
730
    """
731 1
    return Synoname().dist_abs(
732
        src, tar, word_approx_min, char_approx_min, tests, ret_name
733
    )
734
735
736
if __name__ == '__main__':
737
    import doctest
738
739
    doctest.testmod()
740