Completed
Pull Request — master (#141)
by Chris
11:42
created

abydos.distance._Synoname.Synoname.dist_abs()   F

Complexity

Conditions 79

Size

Total Lines 275
Code Lines 192

Duplication

Lines 34
Ratio 12.36 %

Code Coverage

Tests 132
CRAP Score 79

Importance

Changes 0
Metric Value
cc 79
eloc 192
nop 7
dl 34
loc 275
ccs 132
cts 132
cp 1
crap 79
rs 0
c 0
b 0
f 0

How to fix   Long Method    Complexity   

Long Method

Small methods make your code easier to understand, in particular if combined with a good name. Besides, if your method is small, finding a good name is usually much easier.

For example, if you find yourself adding comments to a method's body, this is usually a good sign to extract the commented part to a new method, and use the comment as a starting point when coming up with a good name for this new method.

Commonly applied refactorings include:

Complexity

Complex classes like abydos.distance._Synoname.Synoname.dist_abs() often do a lot of different things. To break such a class down, we need to identify a cohesive component within that class. A common approach to find such a component is to look for fields/methods that share the same prefixes, or suffixes.

Once you have determined the fields that belong together, you can apply the Extract Class refactoring. If the component makes sense as a sub-class, Extract Subclass is also a candidate, and is often faster.

1
# -*- coding: utf-8 -*-
0 ignored issues
show
Coding Style Naming introduced by
The name _Synoname does not conform to the module naming conventions ((([a-z_][a-z0-9_]*)|([A-Z][a-zA-Z0-9]+))$).

This check looks for invalid names for a range of different identifiers.

You can set regular expressions to which the identifiers must conform if the defaults do not match your requirements.

If your project includes a Pylint configuration file, the settings contained in that file take precedence.

To find out more about Pylint, please refer to their site.

Loading history...
2
3
# Copyright 2018 by Christopher C. Little.
4
# This file is part of Abydos.
5
#
6
# Abydos is free software: you can redistribute it and/or modify
7
# it under the terms of the GNU General Public License as published by
8
# the Free Software Foundation, either version 3 of the License, or
9
# (at your option) any later version.
10
#
11
# Abydos is distributed in the hope that it will be useful,
12
# but WITHOUT ANY WARRANTY; without even the implied warranty of
13
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14
# GNU General Public License for more details.
15
#
16
# You should have received a copy of the GNU General Public License
17
# along with Abydos. If not, see <http://www.gnu.org/licenses/>.
18
19 1
"""abydos.distance._Synoname.
20
21
Synoname.
22
"""
23
24 1
from __future__ import (
25
    absolute_import,
26
    division,
27
    print_function,
28
    unicode_literals,
29
)
30
31 1
from collections import Iterable
32
33 1
from ._Distance import _Distance
34 1
from ._Levenshtein import levenshtein
35 1
from ._RatcliffObershelp import sim_ratcliff_obershelp
36
37
# noinspection PyProtectedMember
38 1
from ..fingerprint._Synoname import SynonameToolcode
39
40 1
__all__ = ['Synoname', 'synoname']
41
42
43 1
class Synoname(_Distance):
0 ignored issues
show
Unused Code introduced by
The variable __class__ seems to be unused.
Loading history...
44
    """Synoname.
45
46
    Cf. :cite:`Getty:1991,Gross:1991`
47
    """
48
49 1
    stc = SynonameToolcode()
50
51 1
    test_dict = {
52
        val: 2 ** n
53
        for n, val in enumerate(
54
            (
55
                'exact',
56
                'omission',
57
                'substitution',
58
                'transposition',
59
                'punctuation',
60
                'initials',
61
                'extension',
62
                'inclusion',
63
                'no_first',
64
                'word_approx',
65
                'confusions',
66
                'char_approx',
67
            )
68
        )
69
    }
70 1
    match_name = (
71
        '',
72
        'exact',
73
        'omission',
74
        'substitution',
75
        'transposition',
76
        'punctuation',
77
        'initials',
78
        'extension',
79
        'inclusion',
80
        'no_first',
81
        'word_approx',
82
        'confusions',
83
        'char_approx',
84
        'no_match',
85
    )
86 1
    match_type_dict = {val: n for n, val in enumerate(match_name)}
87
88 1
    def _synoname_strip_punct(self, word):
0 ignored issues
show
Coding Style introduced by
This method could be written as a function/class method.

If a method does not access any attributes of the class, it could also be implemented as a function or static method. This can help improve readability. For example

class Foo:
    def some_method(self, x, y):
        return x + y;

could be written as

class Foo:
    @classmethod
    def some_method(cls, x, y):
        return x + y;
Loading history...
89
        """Return a word with punctuation stripped out.
90
91
        Args:
92
            word (str): a word to strip punctuation from
93
94
        Returns:
95
            str: The word stripped of punctuation
96
97
        Examples:
98
            >>> pe = Synoname()
99
            >>> pe._synoname_strip_punct('AB;CD EF-GH$IJ')
100
            'ABCD EFGHIJ'
101
102
        """
103 1
        stripped = ''
104 1
        for char in word:
105 1
            if char not in set(',-./:;"&\'()!{|}?$%*+<=>[\\]^_`~'):
106 1
                stripped += char
107 1
        return stripped.strip()
108
109 1
    def _synoname_word_approximation(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (32/15).
Loading history...
best-practice introduced by
Too many return statements (10/6)
Loading history...
110
        self, src_ln, tar_ln, src_fn='', tar_fn='', features=None
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
111
    ):
112
        """Return the Synoname word approximation score for two names.
113
114
        Args:
115
            src_ln (str): Last name of the source
116
            tar_ln (str): Last name of the target
117
            src_fn (str): First name of the source (optional)
118
            tar_fn (str): First name of the target (optional)
119
            features (dict): A dict containing special features calculated
120
                using :py:class:`fingerprint.SynonameToolcode` (optional)
121
122
        Returns:
123
            float: The word approximation score
124
125
        Examples:
126
            >>> pe = Synoname()
127
            >>> pe._synoname_word_approximation('Smith Waterman', 'Waterman',
128
            ... 'Tom Joe Bob', 'Tom Joe')
129
            0.6
130
131
        """
132 1
        if features is None:
133 1
            features = {}
134 1
        if 'src_specials' not in features:
135 1
            features['src_specials'] = []
136 1
        if 'tar_specials' not in features:
137 1
            features['tar_specials'] = []
138
139 1
        src_len_specials = len(features['src_specials'])
140 1
        tar_len_specials = len(features['tar_specials'])
141
142
        # 1
143 1
        if ('gen_conflict' in features and features['gen_conflict']) or (
144
            'roman_conflict' in features and features['roman_conflict']
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
145
        ):
146 1
            return 0
147
148
        # 3 & 7
149 1
        full_tar1 = ' '.join((tar_ln, tar_fn)).replace('-', ' ').strip()
150 1
        for s_pos, s_type in features['tar_specials']:
151 1
            if s_type == 'a':
152 1
                full_tar1 = full_tar1[
153
                    : -(1 + len(self.stc.synoname_special_table[s_pos][1]))
154
                ]
155 1
            elif s_type == 'b':
156 1
                loc = (
157
                    full_tar1.find(
158
                        ' ' + self.stc.synoname_special_table[s_pos][1] + ' '
159
                    )
160
                    + 1
161
                )
162 1
                full_tar1 = (
163
                    full_tar1[:loc]
164
                    + full_tar1[
165
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
166
                    ]
167
                )
168 1
            elif s_type == 'c':
169 1
                full_tar1 = full_tar1[
170
                    1 + len(self.stc.synoname_special_table[s_pos][1]) :
171
                ]
172
173 1
        full_src1 = ' '.join((src_ln, src_fn)).replace('-', ' ').strip()
174 1
        for s_pos, s_type in features['src_specials']:
175 1
            if s_type == 'a':
176 1
                full_src1 = full_src1[
177
                    : -(1 + len(self.stc.synoname_special_table[s_pos][1]))
178
                ]
179 1
            elif s_type == 'b':
180 1
                loc = (
181
                    full_src1.find(
182
                        ' ' + self.stc.synoname_special_table[s_pos][1] + ' '
183
                    )
184
                    + 1
185
                )
186 1
                full_src1 = (
187
                    full_src1[:loc]
188
                    + full_src1[
189
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
190
                    ]
191
                )
192 1
            elif s_type == 'c':
193 1
                full_src1 = full_src1[
194
                    1 + len(self.stc.synoname_special_table[s_pos][1]) :
195
                ]
196
197 1
        full_tar2 = full_tar1
198 1
        for s_pos, s_type in features['tar_specials']:
199 1
            if s_type == 'd':
200 1
                full_tar2 = full_tar2[
201
                    len(self.stc.synoname_special_table[s_pos][1]) :
202
                ]
203 1
            elif (
204
                s_type == 'X'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
205
                and self.stc.synoname_special_table[s_pos][1] in full_tar2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
206
            ):
207 1
                loc = full_tar2.find(
208
                    ' ' + self.stc.synoname_special_table[s_pos][1]
209
                )
210 1
                full_tar2 = (
211
                    full_tar2[:loc]
212
                    + full_tar2[
213
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
214
                    ]
215
                )
216
217 1
        full_src2 = full_src1
218 1
        for s_pos, s_type in features['src_specials']:
219 1
            if s_type == 'd':
220 1
                full_src2 = full_src2[
221
                    len(self.stc.synoname_special_table[s_pos][1]) :
222
                ]
223 1
            elif (
224
                s_type == 'X'
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
225
                and self.stc.synoname_special_table[s_pos][1] in full_src2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
226
            ):
227 1
                loc = full_src2.find(
228
                    ' ' + self.stc.synoname_special_table[s_pos][1]
229
                )
230 1
                full_src2 = (
231
                    full_src2[:loc]
232
                    + full_src2[
233
                        loc + len(self.stc.synoname_special_table[s_pos][1]) :
234
                    ]
235
                )
236
237 1
        full_tar1 = self._synoname_strip_punct(full_tar1)
238 1
        tar1_words = full_tar1.split()
239 1
        tar1_num_words = len(tar1_words)
240
241 1
        full_src1 = self._synoname_strip_punct(full_src1)
242 1
        src1_words = full_src1.split()
243 1
        src1_num_words = len(src1_words)
244
245 1
        full_tar2 = self._synoname_strip_punct(full_tar2)
246 1
        tar2_words = full_tar2.split()
247 1
        tar2_num_words = len(tar2_words)
248
249 1
        full_src2 = self._synoname_strip_punct(full_src2)
250 1
        src2_words = full_src2.split()
251 1
        src2_num_words = len(src2_words)
252
253
        # 2
254 1
        if (
255
            src1_num_words < 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
256
            and src_len_specials == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
257
            and src2_num_words < 2
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
258
            and tar_len_specials == 0
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
259
        ):
260 1
            return 0
261
262
        # 4
263 1
        if (
264
            tar1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
265
            and src1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
266
            and tar1_words[0] == src1_words[0]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
267
        ):
268 1
            return 1
269 1
        if tar1_num_words < 2 and tar_len_specials == 0:
270 1
            return 0
271
272
        # 5
273 1
        last_found = False
274 1
        for word in tar1_words:
275 1
            if src_ln.endswith(word) or word + ' ' in src_ln:
276 1
                last_found = True
277
278 1
        if not last_found:
279 1
            for word in src1_words:
280 1
                if tar_ln.endswith(word) or word + ' ' in tar_ln:
281 1
                    last_found = True
282
283
        # 6
284 1
        matches = 0
285 1
        if last_found:
286 1
            for i, s_word in enumerate(src1_words):
287 1
                for j, t_word in enumerate(tar1_words):
288 1
                    if s_word == t_word:
289 1
                        src1_words[i] = '@'
290 1
                        tar1_words[j] = '@'
291 1
                        matches += 1
292 1
        w_ratio = matches / max(tar1_num_words, src1_num_words)
293 1
        if matches > 1 or (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
294
            matches == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
295
            and src1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
296
            and tar1_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
297
            and (tar_len_specials > 0 or src_len_specials > 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
298
        ):
299 1
            return w_ratio
300
301
        # 8
302 1
        if (
303
            tar2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
304
            and src2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
305
            and tar2_words[0] == src2_words[0]
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
306
        ):
307 1
            return 1
308
        # I see no way that the following can be True if the equivalent in
309
        # #4 was False.
310
        if tar2_num_words < 2 and tar_len_specials == 0:  # pragma: no cover
311
            return 0
312
313
        # 9
314 1
        last_found = False
315 1
        for word in tar2_words:
316 1
            if src_ln.endswith(word) or word + ' ' in src_ln:
317 1
                last_found = True
318
319 1
        if not last_found:
320 1
            for word in src2_words:
321 1
                if tar_ln.endswith(word) or word + ' ' in tar_ln:
322 1
                    last_found = True
323
324 1
        if not last_found:
325 1
            return 0
326
327
        # 10
328 1
        matches = 0
329 1
        if last_found:
330 1
            for i, s_word in enumerate(src2_words):
331 1
                for j, t_word in enumerate(tar2_words):
332 1
                    if s_word == t_word:
333 1
                        src2_words[i] = '@'
334 1
                        tar2_words[j] = '@'
335 1
                        matches += 1
336 1
        w_ratio = matches / max(tar2_num_words, src2_num_words)
337 1
        if matches > 1 or (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (6/5)
Loading history...
338
            matches == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
339
            and src2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
340
            and tar2_num_words == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
341
            and (tar_len_specials > 0 or src_len_specials > 0)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
342
        ):
343
            return w_ratio
344
345 1
        return 0
346
347 1
    def dist_abs(
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'dist_abs' method
Loading history...
best-practice introduced by
Too many arguments (7/5)
Loading history...
Comprehensibility introduced by
This function exceeds the maximum number of variables (44/15).
Loading history...
best-practice introduced by
Too many return statements (18/6)
Loading history...
348
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
349
        src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
350
        tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
351
        word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
352
        char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
353
        tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
354
        ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
355
    ):
356
        """Return the Synoname similarity type of two words.
357
358
        Args:
359
            src (str): Source string for comparison
360
            tar (str): Target string for comparison
361
            word_approx_min (float): the minimum word approximation value to
362
                signal a 'word_approx' match
363
            char_approx_min (float): the minimum character approximation value
364
                to signal a 'char_approx' match
365
            tests (int or Iterable): either an integer indicating tests to
366
                perform or a list of test names to perform (defaults to
367
                performing all tests)
368
            ret_name (bool): If True, returns the match name rather than its
369
                integer equivalent
370
371
        Returns:
372
            int (or str if ret_name is True): Synoname value
373
374
        Examples:
375
            >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''))
376
            2
377
            >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''),
378
            ... ret_name=True)
379
            'omission'
380
            >>> synoname(('Dore', 'Gustave', ''),
381
            ... ('Dore', 'Paul Gustave Louis Christophe', ''),
382
            ... ret_name=True)
383
            'inclusion'
384
            >>> synoname(('Pereira', 'I. R.', ''), ('Pereira', 'I. Smith', ''),
385
            ... ret_name=True)
386
            'word_approx'
387
388
        """
389 1
        if isinstance(tests, Iterable):
390 1
            new_tests = 0
391 1
            for term in tests:
392 1
                if term in self.test_dict:
393 1
                    new_tests += self.test_dict[term]
394 1
            tests = new_tests
395
396 1
        if isinstance(src, tuple):
397 1
            src_ln, src_fn, src_qual = src
398 1
        elif '#' in src:
399 1
            src_ln, src_fn, src_qual = src.split('#')[-3:]
400
        else:
401 1
            src_ln, src_fn, src_qual = src, '', ''
402
403 1
        if isinstance(tar, tuple):
404 1
            tar_ln, tar_fn, tar_qual = tar
405 1
        elif '#' in tar:
406 1
            tar_ln, tar_fn, tar_qual = tar.split('#')[-3:]
407
        else:
408 1
            tar_ln, tar_fn, tar_qual = tar, '', ''
409
410 1
        def _split_special(spec):
411 1
            spec_list = []
412 1
            while spec:
413 1
                spec_list.append((int(spec[:3]), spec[3:4]))
414 1
                spec = spec[4:]
415 1
            return spec_list
416
417 1
        def _fmt_retval(val):
418 1
            if ret_name:
419 1
                return self.match_name[val]
420 1
            return val
421
422
        # 1. Preprocessing
423
424
        # Lowercasing
425 1
        src_fn = src_fn.strip().lower()
426 1
        src_ln = src_ln.strip().lower()
427 1
        src_qual = src_qual.strip().lower()
428
429 1
        tar_fn = tar_fn.strip().lower()
430 1
        tar_ln = tar_ln.strip().lower()
431 1
        tar_qual = tar_qual.strip().lower()
432
433
        # Create toolcodes
434 1
        src_ln, src_fn, src_tc = self.stc.fingerprint(src_ln, src_fn, src_qual)
435 1
        tar_ln, tar_fn, tar_tc = self.stc.fingerprint(tar_ln, tar_fn, tar_qual)
436
437 1
        src_generation = int(src_tc[2])
438 1
        src_romancode = int(src_tc[3:6])
439 1
        src_len_fn = int(src_tc[6:8])
440 1
        src_tc = src_tc.split('$')
441 1
        src_specials = _split_special(src_tc[1])
442
443 1
        tar_generation = int(tar_tc[2])
444 1
        tar_romancode = int(tar_tc[3:6])
445 1
        tar_len_fn = int(tar_tc[6:8])
446 1
        tar_tc = tar_tc.split('$')
447 1
        tar_specials = _split_special(tar_tc[1])
448
449 1
        gen_conflict = (src_generation != tar_generation) and bool(
450
            src_generation or tar_generation
451
        )
452 1
        roman_conflict = (src_romancode != tar_romancode) and bool(
453
            src_romancode or tar_romancode
454
        )
455
456 1
        ln_equal = src_ln == tar_ln
457 1
        fn_equal = src_fn == tar_fn
458
459
        # approx_c
460 1
        def _approx_c():
461 1
            if gen_conflict or roman_conflict:
462 1
                return False, 0
463
464 1
            full_src = ' '.join((src_ln, src_fn))
465 1
            if full_src.startswith('master '):
466 1
                full_src = full_src[len('master ') :]
467 1
                for intro in [
468
                    'of the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
469
                    'of ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
470
                    'known as the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
471
                    'with the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
472
                    'with ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
473
                ]:
474 1
                    if full_src.startswith(intro):
475 1
                        full_src = full_src[len(intro) :]
476
477 1
            full_tar = ' '.join((tar_ln, tar_fn))
478 1
            if full_tar.startswith('master '):
479 1
                full_tar = full_tar[len('master ') :]
480 1
                for intro in [
481
                    'of the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
482
                    'of ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
483
                    'known as the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
484
                    'with the ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
485
                    'with ',
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
486
                ]:
487 1
                    if full_tar.startswith(intro):
488 1
                        full_tar = full_tar[len(intro) :]
489
490 1
            loc_ratio = sim_ratcliff_obershelp(full_src, full_tar)
491 1
            return loc_ratio >= char_approx_min, loc_ratio
492
493 1
        approx_c_result, ca_ratio = _approx_c()
0 ignored issues
show
Unused Code introduced by
The variable approx_c_result seems to be unused.
Loading history...
494
495 1
        if tests & self.test_dict['exact'] and fn_equal and ln_equal:
496 1
            return _fmt_retval(self.match_type_dict['exact'])
497 1 View Code Duplication
        if tests & self.test_dict['omission']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
498 1
            if (
499
                fn_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
500
                and levenshtein(src_ln, tar_ln, cost=(1, 1, 99, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
501
            ):
502 1
                if not roman_conflict:
503 1
                    return _fmt_retval(self.match_type_dict['omission'])
504 1
            elif (
505
                ln_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
506
                and levenshtein(src_fn, tar_fn, cost=(1, 1, 99, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
507
            ):
508 1
                return _fmt_retval(self.match_type_dict['omission'])
509 1 View Code Duplication
        if tests & self.test_dict['substitution']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
510 1
            if (
511
                fn_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
512
                and levenshtein(src_ln, tar_ln, cost=(99, 99, 1, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
513
            ):
514 1
                return _fmt_retval(self.match_type_dict['substitution'])
515 1
            elif (
516
                ln_equal
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
517
                and levenshtein(src_fn, tar_fn, cost=(99, 99, 1, 99)) == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
518
            ):
519 1
                return _fmt_retval(self.match_type_dict['substitution'])
520 1 View Code Duplication
        if tests & self.test_dict['transposition']:
0 ignored issues
show
Duplication introduced by
This code seems to be duplicated in your project.
Loading history...
521 1
            if fn_equal and (
522
                levenshtein(src_ln, tar_ln, mode='osa', cost=(99, 99, 99, 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
523
                == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
524
            ):
525 1
                return _fmt_retval(self.match_type_dict['transposition'])
526 1
            elif ln_equal and (
527
                levenshtein(src_fn, tar_fn, mode='osa', cost=(99, 99, 99, 1))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
528
                == 1
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
529
            ):
530 1
                return _fmt_retval(self.match_type_dict['transposition'])
531 1
        if tests & self.test_dict['punctuation']:
532 1
            np_src_fn = self._synoname_strip_punct(src_fn)
533 1
            np_tar_fn = self._synoname_strip_punct(tar_fn)
534 1
            np_src_ln = self._synoname_strip_punct(src_ln)
535 1
            np_tar_ln = self._synoname_strip_punct(tar_ln)
536
537 1
            if (np_src_fn == np_tar_fn) and (np_src_ln == np_tar_ln):
538 1
                return _fmt_retval(self.match_type_dict['punctuation'])
539
540 1
            np_src_fn = self._synoname_strip_punct(src_fn.replace('-', ' '))
541 1
            np_tar_fn = self._synoname_strip_punct(tar_fn.replace('-', ' '))
542 1
            np_src_ln = self._synoname_strip_punct(src_ln.replace('-', ' '))
543 1
            np_tar_ln = self._synoname_strip_punct(tar_ln.replace('-', ' '))
544
545 1
            if (np_src_fn == np_tar_fn) and (np_src_ln == np_tar_ln):
546 1
                return _fmt_retval(self.match_type_dict['punctuation'])
547
548 1
        if tests & self.test_dict['initials'] and ln_equal:
549 1
            if src_fn and tar_fn:
550 1
                src_initials = self._synoname_strip_punct(src_fn).split()
551 1
                tar_initials = self._synoname_strip_punct(tar_fn).split()
552 1
                initials = bool(
553
                    (len(src_initials) == len(''.join(src_initials)))
554
                    or (len(tar_initials) == len(''.join(tar_initials)))
555
                )
556 1
                if initials:
557 1
                    src_initials = ''.join(_[0] for _ in src_initials)
558 1
                    tar_initials = ''.join(_[0] for _ in tar_initials)
559 1
                    if src_initials == tar_initials:
560 1
                        return _fmt_retval(self.match_type_dict['initials'])
561 1
                    initial_diff = abs(len(src_initials) - len(tar_initials))
562 1
                    if initial_diff and (
563
                        (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
564
                            initial_diff
565
                            == levenshtein(
566
                                src_initials,
567
                                tar_initials,
568
                                cost=(1, 99, 99, 99),
569
                            )
570
                        )
571
                        or (
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
572
                            initial_diff
573
                            == levenshtein(
574
                                tar_initials,
575
                                src_initials,
576
                                cost=(1, 99, 99, 99),
577
                            )
578
                        )
579
                    ):
580 1
                        return _fmt_retval(self.match_type_dict['initials'])
581 1
        if tests & self.test_dict['extension']:
582 1
            if src_ln[1] == tar_ln[1] and (
583
                src_ln.startswith(tar_ln) or tar_ln.startswith(src_ln)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
584
            ):
585 1
                if (
0 ignored issues
show
best-practice introduced by
Too many boolean expressions in if statement (7/5)
Loading history...
586
                    (not src_len_fn and not tar_len_fn)
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
587
                    or (tar_fn and src_fn.startswith(tar_fn))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
588
                    or (src_fn and tar_fn.startswith(src_fn))
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
589
                ) and not roman_conflict:
590 1
                    return _fmt_retval(self.match_type_dict['extension'])
591 1
        if tests & self.test_dict['inclusion'] and ln_equal:
592 1
            if (src_fn and src_fn in tar_fn) or (tar_fn and tar_fn in src_ln):
593 1
                return _fmt_retval(self.match_type_dict['inclusion'])
594 1
        if tests & self.test_dict['no_first'] and ln_equal:
595 1
            if src_fn == '' or tar_fn == '':
596 1
                return _fmt_retval(self.match_type_dict['no_first'])
597 1
        if tests & self.test_dict['word_approx']:
598 1
            ratio = self._synoname_word_approximation(
599
                src_ln,
600
                tar_ln,
601
                src_fn,
602
                tar_fn,
603
                {
604
                    'gen_conflict': gen_conflict,
605
                    'roman_conflict': roman_conflict,
606
                    'src_specials': src_specials,
607
                    'tar_specials': tar_specials,
608
                },
609
            )
610 1
            if ratio == 1 and tests & self.test_dict['confusions']:
611 1
                if (
612
                    ' '.join((src_fn, src_ln)).strip()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
613
                    == ' '.join((tar_fn, tar_ln)).strip()
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
614
                ):
615 1
                    return _fmt_retval(self.match_type_dict['confusions'])
616 1
            if ratio >= word_approx_min:
617 1
                return _fmt_retval(self.match_type_dict['word_approx'])
618 1
        if tests & self.test_dict['char_approx']:
619 1
            if ca_ratio >= char_approx_min:
620 1
                return _fmt_retval(self.match_type_dict['char_approx'])
621 1
        return _fmt_retval(self.match_type_dict['no_match'])
622
623 1
    def dist(
0 ignored issues
show
Bug introduced by
Parameters differ from overridden 'dist' method
Loading history...
best-practice introduced by
Too many arguments (6/5)
Loading history...
624
        self,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
625
        src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
626
        tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
627
        word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
628
        char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
629
        tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
630
    ):
631
        """Return the normalized Synoname distance between two words.
632
633
        Args:
634
            src (str): Source string for comparison
635
            tar (str): Target string for comparison
636
            word_approx_min (float): the minimum word approximation value to
637
                signal a 'word_approx' match
638
            char_approx_min (float): the minimum character approximation value
639
                to signal a 'char_approx' match
640
            tests (int or Iterable): either an integer indicating tests to
641
                perform or a list of test names to perform (defaults to
642
                performing all tests)
643
644
        Returns:
645
            float: Normalized Synoname distance
646
647
        """
648
        return (
649
            synoname(src, tar, word_approx_min, char_approx_min, tests, False)
650
            / 14
651
        )
652
653
654 1
def synoname(
0 ignored issues
show
best-practice introduced by
Too many arguments (6/5)
Loading history...
655
    src,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
656
    tar,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
657
    word_approx_min=0.3,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
658
    char_approx_min=0.73,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
659
    tests=2 ** 12 - 1,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
660
    ret_name=False,
0 ignored issues
show
Coding Style introduced by
Wrong hanging indentation before block (add 4 spaces).
Loading history...
661
):
662
    """Return the Synoname similarity type of two words.
663
664
    This is a wrapper for :py:meth:`Synoname.dist_abs`.
665
666
    Args:
667
        src (str): Source string for comparison
668
        tar (str): Target string for comparison
669
        word_approx_min (float): the minimum word approximation value to signal
670
            a 'word_approx' match
671
        char_approx_min (float): the minimum character approximation value to
672
            signal a 'char_approx' match
673
        tests (int or Iterable): either an integer indicating tests to perform
674
            or a list of test names to perform (defaults to performing all
675
            tests)
676
        ret_name (bool): If True, returns the match name rather than its
677
            integer equivalent
678
679
    Returns:
680
        int (or str if ret_name is True): Synoname value
681
682
    Examples:
683
        >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''))
684
        2
685
        >>> synoname(('Breghel', 'Pieter', ''), ('Brueghel', 'Pieter', ''),
686
        ... ret_name=True)
687
        'omission'
688
        >>> synoname(('Dore', 'Gustave', ''),
689
        ... ('Dore', 'Paul Gustave Louis Christophe', ''),
690
        ... ret_name=True)
691
        'inclusion'
692
        >>> synoname(('Pereira', 'I. R.', ''), ('Pereira', 'I. Smith', ''),
693
        ... ret_name=True)
694
        'word_approx'
695
696
    """
697 1
    return Synoname().dist_abs(
698
        src, tar, word_approx_min, char_approx_min, tests, ret_name
699
    )
700
701
702
if __name__ == '__main__':
703
    import doctest
704
705
    doctest.testmod()
706